Stripping html tags
Hershel Fisch
hershf at rgllc.us
Mon Nov 12 19:12:27 EST 2007
On 11/3/07 3:48 AM, "FlexibleLearning at aol.com" <FlexibleLearning at aol.com>
wrote:
Hi all, I just paid attention to this thread and I tough if my 2 cents could
help then I'll share it with you.
This is what I use to strip html
function fStripHtml tParam
replace " " with space in tParam
replace "<p" with return & "<p " in tParam
replace "<tr>" with return & "<tr>" in tParam
replace "…" with "," in tParam
replace "'" with "'" in tParam
replace """ with quote in tParam
replace "’" with "'" in tParam
replace "“" with quote in tParam
replace "”" with quote in tParam
replace "&" with "&" in tParam
replace "<br>" with return in tParam
replace tab with "" in tParam
repeat for each line tL in tParam
put tL into tNl
put 1 into x
repeat for (the number of chars in tnl) times
if char x of tNl =">" then
put "" into char x of tNl
put 0 into d
end if
if char x of tNl = "<" then
put 1 into d
end if
if d =1 then
put "" into char x of tNl
else
add 1 to x
end if
end repeat
--replace space & space & space & space with space in tNl
replace space & space & space with space in tNl
--replace space & space with space in tNl
if char 1 in tNl = " " then put "" into char 1 in tNl
if char -1 in tNl = " " then put "" into char -1 in tNl
if tNl <> "" then put tNl & return after tNf
end repeat
return tNf
end fStripHtml
Hope it helps .
Hershel
>
> This is a seriously detailed stripper, Jim!
>
> Small error in syntax:
>
> replace "<td" with numtochar(160)&"<td" in pHtml
> should be...
> replace "<td" with numtochar(160)&"td>" in pHtml
>
> Also, a couple of lines were posted html2Txt-mangled. Could you clarify:
> -----
> replace " " with space in pHtml
> replace "
> " with return in pHtml
> replace "
>
> " with return in pHtml
> -----
>
> If you post the handler as plain text, any html formatted text should be
> correctly handled by the emailer.
>
>
> /H
>
> -------------------------------
> -------------------------------------------------
> function StripTags pHtml
> local tRegex,tPrevText
> get ("é,à,ç")
> get it & (",>,<,ê")
> get it & (",è,©,")
> get it & (",',·,&")
> -- add more chars if you wish, then...
> constant kHtml = it
> constant kConvertedHtml = "é,à,ç,>,<,ê,è,©"
> --using contants means you cannot accidentally
> -- modify these vars and damage the results
> -----
> replace numtochar(13) with empty in pHtml
> replace tab with empty in pHtml
> replace "<td" with numtochar(160)&"<td" in pHtml
> -----
> put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
> put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
> put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
> -----
> replace " " with space in pHtml
> replace "
> " with return in pHtml
> replace "
>
> " with return in pHtml
> -----
> put "<[^><]*>" into tRegex
> put replacetext(pHtml,tRegex,"") into pHtml
> put replacetext(pHtml,tRegex,"") into pHtml
>
> ----- repeat replacements until there are no changes
> repeat until tPrevText is pHtml
> put pHtml into tPrevText
> put replacetext(pHtml," +",space) into pHtml
> put replacetext(pHtml,"^ ","") into pHtml
> end repeat
> -----
> replace (space & return) with return in pHtml
> replace (return & space) with return in pHtml
> filter pHtml without empty
> replace numtochar(160) with empty in pHtml
> -----
> replace """ with quote in pHtml
> repeat with i = 1 to the number of items of kHtml
> replace item i of kHtml with item i of kConvertedHtml in pHtml
> end repeat
> -----
> --put pHtml into msg --let's you see the result in the msg box
> return pHtml
> end StripTags
>
>
> Jim Ault
> Las Vegas
>
> ------------------------------------------------
> --------------------------------
>
>
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list