Anyone got one of these?
Trevor DeVore
lists at mangomultimedia.com
Fri Jan 26 15:13:34 EST 2007
On Jan 26, 2007, at 10:56 AM, Chipp Walters wrote:
> function stripAllTagsBut pHtml,pTagsList
> --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
> --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..
>
>
> It's used to strip all tags from HTML but those in the pTagsList
> parameter.
>
> IOW, it can be used to grab the HTML of a page, and strip
> everything but the
> img tags.
>
> I'm starting to write it, but thought I'd ask-- just in case.
Well, I have one I've been working on that takes a list of things to
strip. You could modify it to fit your needs maybe. The first
version was much more compact and use matchText. Then I stress
tested it and it was slooooow and I had to call it quite often and
with large amounts of text. So I came up with the attached version.
--
Trevor DeVore
Blue Mango Learning Systems - www.bluemangolearning.com
trevor at bluemangolearning.com
/**
* Cleanses a string of the specified Revolution HTML tags.
*
* @param pHTML HTML to act on.
* @param pStripFilter List of tags to strip:
p,size,face,lang,color,bgcolor,b,i,u,strike,sub,sup,box,threedbox,expand
ed,condensed,img,a.
* @param pStripTrailingCR pass true to strip any trailing CR from
END of the pHTML.
*
* @return empty
*/
FUNCTION str_stripHTML pHTML, pStripFilter, pStripTrailingCR
local tProp,tFontFilter,tInlineFilter,tAttributeFilter,tStart,tEnd
local tSkip,tOffset1,tOffset2,tDeleteChars,i
set the wholematches to true
--> PROCESS pStripFilter
REPEAT for each item tProp in pStripFilter
IF tProp is among the items of
"p,b,i,u,strike,sub,sup,box,threedbox,expanded,condensed" THEN
put tProp &comma after tAttributeFilter
ELSE IF tProp is among the items of
"face,size,color,bgcolor,lang" THEN
put tProp &comma after tFontFilter
ELSE IF tProp is among the items of "img,a" THEN
put tProp & comma after tInlineFilter
END IF
END REPEAT
--> PROCESS
REPEAT forever --> OK, I TRIED USING MATCHCHUNK WITH THIS BUT IT
WAS A GAZILLION TIMES SLOWER
put offset("<font", pHTML, tSkip) into tOffset1
IF tOffset1 > 0 THEN
put offset(">", pHTML, tSkip + tOffset1) into tOffset2 --
> GET CLOSING TAG
--> LOOP THROUGH PROPS AND ERASE
REPEAT for each item tProp in tFontFilter
put offset(space & tProp & "=" & quote, pHTML, tSkip
+ tOffset1) into tStart
IF tStart > 0 AND tSkip + tOffset1 + tStart < tSkip
+ tOffset1 + tOffset2 THEN --> ONLY LOOK FOR PROPS IN CURRENT FONT TAG
get tSkip + tOffset1 + tStart + length(tProp) + 2
put offset(quote, pHTML, it) into tEnd
IF tEnd > 0 THEN
put tSkip + tStart + tOffset1 & comma & it +
tEnd & cr after tDeleteChars
END IF
END IF
END REPEAT
--> NOW MOVE BACKWARDS THROUGH LIST AND DELETE
REPEAT with i = the number of lines of tDeleteChars down
to 1
delete char (item 1 of line i of tDeleteChars) to
(item 2 of line i of tDeleteChars) of pHTML
END REPEAT
put empty into tDeleteChars
ELSE
exit REPEAT
END IF
add tOffset1 + 4 to tSkip
END REPEAT
REPEAT for each item tProp in tAttributeFilter
replace "<"&tProp&">" with empty in pHTML
replace "</"&tProp&">" with empty in pHTML
END REPEAT
REPEAT for each item tProp in tInlineFilter
REPEAT forever
put offset("<"&tProp, pHTML) into tStart
IF tStart > 0 THEN
put offset(">", pHTML, tStart) into tEnd
IF tEnd > 0 THEN
delete char tStart to (tStart+tEnd) of pHTML
ELSE
exit REPEAT
END IF
ELSE
exit REPEAT
END IF
END REPEAT
END REPEAT
IF "a" is among the items of tInlineFilter THEN
replace "</a>" with empty in pHTML
END IF
--> REMOVE ANY LONELY <FONT> TAGS
REPEAT forever
put offset("<font>", pHTML) into tStart
IF tStart > 0 THEN
put offset("</font>", pHTML, tStart) into tEnd
IF tEnd > 0 THEN
delete char tStart+tEnd to tStart+tEnd+6 of pHTML
delete char tStart to tStart+5 of pHTML
ELSE
exit REPEAT
END IF
ELSE
exit REPEAT
END IF
END REPEAT
--> REMOVE TRAILING RETURNS
IF pStripTrailingCR THEN
REPEAT forever
IF char -8 to -1 of pHTML is cr&"<p></p>" THEN
delete char -8 to -1 of pHTML
ELSE
exit REPEAT
END IF
END REPEAT
END IF
return pHTML
END str_stripHTML
More information about the use-livecode
mailing list