This article discusses how to replace unnecessary line breaks (<br>, <br />
) in HTML pre-formatted ‘PRE
‘ tags using RegEx and ASP/VBScript.
The Problem
When I added a [code] tag to the HPC:Factor Community Forums markup some years ago. There was an obvious, but low-priority problem that had niggled at me. The BBS tag renders a pre-formatted ‘PRE
‘ tag into the post HTML and styles it using a console font and fixed-size characters for improved legibility. The way that MegaBBS works however is that new lines (vbcrlf, vblf, \r\n, \n, chr(13) & chr(10), chr(10))
are replaced with ‘<br />
‘ as a batch replace at the beginning of the forum sanitisation and rendering process.
This means that when the browser renders the <pre></pre>
tag, it renders both the vblf
and the <br />
, leading to double line breaks.
For example
Dim i i = 0 while (i < 1000) Response.Write i i = (i + 1) loop
becomes
<pre><br /> Dim i<br /> i = 0<br /> while (i < 1000)<br /> Response.Write i<br /> i = (i + 1)<br /> loop<br /> </pre><br />
leading the browser to render
Dim i i = 0 while (i < 1000) Response.Write i i = (i + 1) loop
This wastes screen space and reduces legibility.
RegEx Fix
The solution is very simple, use RegEx to re-parse the [code] block after it has globally replaced the line breaks.
I added the following code at the bottom of the MBBS Code loop in the MBBSDecode function in include.asp
if (vBBSDecodeArray(0, index) = "\[code\]") then ' This looks for the PRE tag for the code and then removes the <br />'s from it to return it to pure pre-formatted mBBSRegEx.pattern = "\<pre[^>]*\>((.|\n)*)\<\/pre\>" for each sNewText in mBBSRegEx.execute(MBBSDecode) MBBSDecode = Replace(MBBSDecode, sNewText.Value, (Replace(sNewText.Value, "<br />", ""))) next end if
To evaluate what this means line-by-line
- When it is parsing the
[code]
tag from the list of all BBS markup statements - Set a RegEx pattern to search for the opening
PRE
tag with zero or more attributes e.g.<pre attribute1="one" attribute2="two">
ending with</pre>
. The “((.|\n)*)
” ensures that the search looks for all characters, including over new lines, for as many characters and new lines as is necessary to encounter the closing</pre>
tag. - For every positive match i.e. for every
<pre>*</pre>
match - In the matches string, replace
<br />
with""
, then replace the match in the original source string (MBBSDecode) with the fixed string - Move to the next match until there are no more matches
To genericise the example
Dim strHtml Dim match Dim matches Dim regEx strHtml = "<body><p>hello</p><pre class="">line one" & vbcrlf & "<br />line two" & vbcrlf & "<br />line three" & vbcrlf & "<br /></pre><p>hello</p></body>" set regEx = New RegExp regEx.Pattern = "\<pre[^>]*\>((.|\n)*)\<\/pre\>" regEx.IgnoreCase = true regEx.Global = true set matches = regEx.Execute(strHtml) For Each match in matches strHtml = = Replace(strHtml, match.Value, (Replace(match.Value, "<br />", ""))) Next ' strHtml will now effectively be: ' "<body><p>hello</p><pre class="">line one" & vbcrlf & "line two" & vbcrlf & "line three" & vbcrlf & "</pre><p>hello</p></body>"
Fairly simple, but as with most things RegEx, a headache for most of us – unless you are using it all the time.