HTML Tags and muliline regular expresions.

David Bovill david at openpartnership.net
Wed Aug 9 13:59:08 EDT 2006


OK - I want to extract an HTML tag and all its contents up to the closing
tag.

I have so many bits of code floating around for doing this going back years
and i would really like to do it properly and reliably one - so here is the
question how do I use a regular expression to do this?

First I have to admit that i can't remember how to do matchText / matchChunk
stuff that covers more than one line - seem OK when everything is on one
line... just a caviat :)

I took a look in some detail at the this site:

     http://regexlib.com/DisplayPatterns.aspx?cattabindex=7&categoryId=8

but have not got very far. When I look at things like this:


   - <[^>]*name[\s]*=[\s]*"?[^\w_]*"?[^>]*>
   - <(\/{0,1})textarea(.*?)(\/{0,1})\>
   -  (?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>)

   - <a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>
   - \s(type|name|value)=(?:(\w+)|(?:"(.*?)")|(?:\'(.*)\'))

I go weak at the knees :) Anyone have any good regular expressions for this
sort of html processing in rev?



More information about the use-livecode mailing list