Replace line breaks in PRE tags using RegEx in ASP/VBScript

This article discusses how to replace unnecessary line breaks (<br>, <br />) in HTML pre-formatted ‘PRE‘ tags using RegEx and ASP/VBScript.

The Problem

When I added a [code] tag to the HPC:Factor Community Forums markup some years ago. There was an obvious, but low-priority problem that had niggled at me. The BBS tag renders a pre-formatted ‘PRE‘ tag into the post HTML and styles it using a console font and fixed-size characters for improved legibility. The way that MegaBBS works however is that new lines (vbcrlf, vblf, \r\n, \n, chr(13) & chr(10), chr(10)) are replaced with ‘<br />‘ as a batch replace at the beginning of the forum sanitisation and rendering process.

This means that when the browser renders the <pre></pre> tag, it renders both the vblf and the <br />, leading to double line breaks.

For example

Dim i
i = 0
while (i < 1000)
  Response.Write i
  i = (i + 1)
loop

becomes

<pre><br />
Dim i<br />
i = 0<br />
while (i < 1000)<br />
  Response.Write i<br />
  i = (i + 1)<br />
loop<br />
</pre><br />

leading the browser to render

Dim i

i = 0

while (i < 1000)

Response.Write i

i = (i + 1)

loop

This wastes screen space and reduces legibility.

RegEx Fix

The solution is very simple, use RegEx to re-parse the [code] block after it has globally replaced the line breaks.

I added the following code at the bottom of the MBBS Code loop in the MBBSDecode function  in include.asp

if (vBBSDecodeArray(0, index) = "\[code\]") then	' This looks for the PRE tag for the code and then removes the <br />'s from it to return it to pure pre-formatted
	mBBSRegEx.pattern = "\<pre[^>]*\>((.|\n)*)\<\/pre\>"
	for each sNewText in mBBSRegEx.execute(MBBSDecode)
		MBBSDecode = Replace(MBBSDecode, sNewText.Value, (Replace(sNewText.Value, "<br />", "")))
	next
end if

To evaluate what this means line-by-line

  1. When it is parsing the [code] tag from the list of all BBS markup statements
  2. Set a RegEx pattern to search for the opening PRE tag with zero or more attributes e.g. <pre attribute1="one" attribute2="two"> ending with </pre>. The “((.|\n)*)” ensures that the search looks for all characters, including over new lines, for as many characters and new lines as is necessary to encounter the closing </pre> tag.
  3. For every positive match i.e. for every <pre>*</pre> match
  4.  In the matches string, replace <br /> with "", then replace the match in the original source string (MBBSDecode) with the fixed string
  5. Move to the next match until there are no more matches

 

To genericise the example

Dim strHtml
Dim match
Dim matches
Dim regEx

strHtml = "<body><p>hello</p><pre class="">line one" & vbcrlf & "<br />line two" & vbcrlf & "<br />line three" & vbcrlf & "<br /></pre><p>hello</p></body>"

set regEx = New RegExp
    regEx.Pattern = "\<pre[^>]*\>((.|\n)*)\<\/pre\>"
    regEx.IgnoreCase = true
    regEx.Global = true
set matches = regEx.Execute(strHtml)

For Each match in matches
    strHtml =  = Replace(strHtml, match.Value, (Replace(match.Value, "<br />", "")))
Next

' strHtml will now effectively be:
' "<body><p>hello</p><pre class="">line one" & vbcrlf & "line two" & vbcrlf & "line three" & vbcrlf & "</pre><p>hello</p></body>"

Fairly simple, but as with most things RegEx, a headache for most of us – unless you are using it all the time.