What is the best/fastest way to extract strings of text?

Jim Ault jimaultwins at yahoo.com
Tue Aug 2 11:33:40 EDT 2011


For Html, the returns have no meaning, so the first step I would take  
is to
    replace cr with empty in textBlock
and make the text a single line...
but this may not be good, depending on the original textBlock



On Aug 2, 2011, at 1:01 AM, Keith Clarke wrote:

> The recipe I (learned here and) use with extracting specific HTML /  
> XML elements is to get the specific target elements on their own  
> line, remove the unwanted lines and then move the target string  
> items in the remaining lines out into a separate variable -  
> something like...
>
> 1. Get the target elements into their own line by prefixing the  
> opening tag with return, using: replace "<#B>" with return & "<#B>"  
> in theSource
> 2. Get the closing tag onto its own line by adding a return suffix,  
> using: replace "<#E>" with "<#E>" & return in theSource
> 2. Remove the unwanted lines, (that lack the specific opening tag)  
> using: filter theSource with "<#B>"
> 3. Delimit the line into items at the '>' character, using: set the  
> item delimiters to numtochar(62)
> 4. Iterate through the list to extract the string, using:
> 	repeat for each line l in theSource
> 		put item 2 and return after theExtract
> 	end repeat
> 5. Clean-up the extract of any extra returns, using: filter  
> theExtract without empty
>
> If (my pre-coffee brain worked) theExtract should contain the tagged  
> strings in theSource.
>
>

Jim Ault
Las Vegas






More information about the use-livecode mailing list