using the SHELL function to GREP a body of text

Peter TB Brett peter.brett at livecode.com
Fri Mar 18 11:57:50 EDT 2016


On 18/03/2016 15:20, jameshale wrote:
> I have large bodies of xhtml/html text stored in an array which I would like
> to clean up using GREP.
> I have been using the 'replacetext' function to great effect but I have hit
> an impasse.
>
> There are some situations where I think I really need to use back references
> but LC's GREP does not allow them.

Hi James,

Unfortunately XHTML and HTML are not regular languages, which means that 
they cannot be processed correctly with regular expressions.

Indeed, "Implement an HTML parser using regular expressions" is a 
well-known prank project to suggest for inexperienced developers to 
waste their time on...

So, your approach is sadly not workable.

If you're processing XHTML, I recommend using revXML.

If you need to process arbitrary HTML, then unfortunately the only 
sensible option is to use a browser...

                                Peter

-- 
Dr Peter Brett <peter.brett at livecode.com>
LiveCode Open Source Team

LiveCode 2016 Conference: https://livecode.com/edinburgh-2016/




More information about the use-livecode mailing list