HTML Tags and muliline regular expresions.
Jim Ault
JimAultWins at yahoo.com
Wed Aug 9 22:11:27 EDT 2006
David,
It really depends on what you want to end up with.
If you want to find ever kind of tag, or other goal.
A six-liner quick trick I would use is to:
replace cr with empty in htmlBlock
replace "</" with (cr&"</") in htmlBlock --optional depending on result
replace "<" with (cr&"<") in htmlBlock
replace ">" with (">"&cr) in htmlBlock
Now everything is isolated. All of the tags are on their own line and the
text run is on a line in between.
Your question specifically -----------------------------------
> OK - I want to extract an HTML tag and all its contents up to the closing
> tag.
add the following two lines to the above
filter htmlBlock without "</*"
replace (">"&cr) with ">" in htmlBlock
----your done and should have .....
<openingTag>Line of text you want to keep
<openingTag2>And the next line of text you have to have
then you could be more specific
put htmlBlock into imgTags
filter imgTags with "<img*"
put htmlBlock into fontTags
filter fontTags with "<font*"
etc.
Hope this give you a quick alternative that Rev should do a little faster
than regEx, especially on a large page, or many, many tags. I really like
regEx, and use it, but only when I need to.
Jim Ault
Las Vegas
On 8/9/06 10:59 AM, "David Bovill" <david at openpartnership.net> wrote:
> OK - I want to extract an HTML tag and all its contents up to the closing
> tag.
>
> I have so many bits of code floating around for doing this going back years
> and i would really like to do it properly and reliably one - so here is the
> question how do I use a regular expression to do this?
>
> First I have to admit that i can't remember how to do matchText / matchChunk
> stuff that covers more than one line - seem OK when everything is on one
> line... just a caviat :)
>
> I took a look in some detail at the this site:
>
> http://regexlib.com/DisplayPatterns.aspx?cattabindex=7&categoryId=8
>
> but have not got very far. When I look at things like this:
>
>
> - <[^>]*name[\s]*=[\s]*"?[^\w_]*"?[^>]*>
> - <(\/{0,1})textarea(.*?)(\/{0,1})\>
> -
> (?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+
> |.*?)?</a\s*>)
>
> - <a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>
> - \s(type|name|value)=(?:(\w+)|(?:"(.*?)")|(?:\'(.*)\'))
>
> I go weak at the knees :) Anyone have any good regular expressions for this
> sort of html processing in rev?
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list