using the SHELL function to GREP a body of text

jameshale james at thehales.id.au
Fri Mar 18 12:23:33 EDT 2016


Peter TB Brett wrote
> Unfortunately XHTML and HTML are not regular languages, which means that 
> they cannot be processed correctly with regular expressions.
> 
> Indeed, "Implement an HTML parser using regular expressions" is a 
> well-known prank project to suggest for inexperienced developers to 
> waste their time on...
> 
> So, your approach is sadly not workable.
> 
> If you're processing XHTML, I recommend using revXML.
> 
> If you need to process arbitrary HTML, then unfortunately the only 
> sensible option is to use a browser...

Bummer.
Not only are XHTML and HTML not regular languages but their use in ePub's is
even more irregular (if that is possible.)
I have some texts which include both  forms: 
Others where every tag 'h', 'p' etc has an id attribute.

A browser is not an option as I will need to use LCs chunking and text
selection features.

I am using the htmltext of a field and given the htmltext function ignores
most of what I was trying to remove it probably doesn't matter in the end. 
Just a bit untidy.

Thanks Peter.



--
View this message in context: http://runtime-revolution.278305.n4.nabble.com/using-the-SHELL-function-to-GREP-a-body-of-text-tp4702346p4702348.html
Sent from the Revolution - User mailing list archive at Nabble.com.




More information about the use-livecode mailing list