Clean Truncated HTML Tags

While showing list of articles in a website usually you will see a short description with each article. As a programmer, sometimes I have to gather some number of words or characters from the article body text and show it as a short description.

When I get this number of letters from the body say by using CAST in SQL, as example:

SELECT TOP 10 ID, Title, CAST (Body AS NVARCHAR(100)) AS shortDescription FROM Articles

Assume I in one of the records I has this original value in the body field:

While comparing two entities, we tend to see both of them as competitors and <a href=http://www.somesite.com&#8221; target=“blank”>consequently comparing</a> them to find a winner.

After running the SQL I will get:

While comparing two entities, we tend to see both of them as competitors and <a href=http://www.so

As you can see the result has invalid/corrupted HTML which definitely will effect the page display if I output it as it is. Say I decided to strip out the HTML tags from the description before outputing it with this code:

<cfset text = REReplace(text, "<[^>]*>", "", "ALL")>

It will not do any thing because there is no correct tags in this string, so it will not replace any thing. For that you need to replace any unclosed tags, it could be ‘<blockquote class=”somting”‘ of even ‘<blockquo’. Hence, I use this line of code to clean any truncated HTML tags:

<cfset text = REReplace(text, "<[^>]*$", "", "ALL")>

Mostly I use the both replacement above.

 

Leave a comment