using the SHELL function to GREP a body of text
jameshale
james at thehales.id.au
Fri Mar 18 12:23:33 EDT 2016
Peter TB Brett wrote
> Unfortunately XHTML and HTML are not regular languages, which means that
> they cannot be processed correctly with regular expressions.
>
> Indeed, "Implement an HTML parser using regular expressions" is a
> well-known prank project to suggest for inexperienced developers to
> waste their time on...
>
> So, your approach is sadly not workable.
>
> If you're processing XHTML, I recommend using revXML.
>
> If you need to process arbitrary HTML, then unfortunately the only
> sensible option is to use a browser...
Bummer.
Not only are XHTML and HTML not regular languages but their use in ePub's is
even more irregular (if that is possible.)
I have some texts which include both forms:
Others where every tag 'h', 'p' etc has an id attribute.
A browser is not an option as I will need to use LCs chunking and text
selection features.
I am using the htmltext of a field and given the htmltext function ignores
most of what I was trying to remove it probably doesn't matter in the end.
Just a bit untidy.
Thanks Peter.
--
View this message in context: http://runtime-revolution.278305.n4.nabble.com/using-the-SHELL-function-to-GREP-a-body-of-text-tp4702346p4702348.html
Sent from the Revolution - User mailing list archive at Nabble.com.
More information about the use-livecode
mailing list