HTML Tags and muliline regular expresions.

Jim Ault JimAultWins at yahoo.com
Wed Aug 9 22:11:27 EDT 2006


David, 
It really depends on what you want to end up with.
If you want to find ever kind of tag, or other goal.
A six-liner quick trick I would use is to:

replace cr with empty in htmlBlock
replace "</" with (cr&"</") in htmlBlock  --optional depending on result
replace "<" with (cr&"<") in htmlBlock
replace ">" with (">"&cr) in htmlBlock

Now everything is isolated.   All of the tags are on their own line and the
text run is on a line in between.

Your question specifically -----------------------------------
> OK - I want to extract an HTML tag and all its contents up to the closing
> tag.
add the following two lines to the above

filter htmlBlock without "</*"
replace (">"&cr) with ">" in htmlBlock

----your done and should have  .....
<openingTag>Line of text you want to keep
<openingTag2>And the next line of text you have to have

then you could be more specific
put htmlBlock into imgTags
filter imgTags with "<img*"

put htmlBlock into fontTags
filter fontTags with "<font*"

etc.

Hope this give you a quick alternative that Rev should do a little faster
than regEx, especially on a large page, or many, many tags.  I really like
regEx,  and use it, but only when I need to.

Jim Ault
Las Vegas


On 8/9/06 10:59 AM, "David Bovill" <david at openpartnership.net> wrote:

> OK - I want to extract an HTML tag and all its contents up to the closing
> tag.
> 
> I have so many bits of code floating around for doing this going back years
> and i would really like to do it properly and reliably one - so here is the
> question how do I use a regular expression to do this?
> 
> First I have to admit that i can't remember how to do matchText / matchChunk
> stuff that covers more than one line - seem OK when everything is on one
> line... just a caviat :)
> 
> I took a look in some detail at the this site:
> 
>      http://regexlib.com/DisplayPatterns.aspx?cattabindex=7&categoryId=8
> 
> but have not got very far. When I look at things like this:
> 
> 
>    - <[^>]*name[\s]*=[\s]*"?[^\w_]*"?[^>]*>
>    - <(\/{0,1})textarea(.*?)(\/{0,1})\>
>    -  
> (?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+
> |.*?)?</a\s*>)
> 
>    - <a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>
>    - \s(type|name|value)=(?:(\w+)|(?:"(.*?)")|(?:\'(.*)\'))
> 
> I go weak at the knees :) Anyone have any good regular expressions for this
> sort of html processing in rev?
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution





More information about the use-livecode mailing list