Help Wrapping HTMLTidy in LCB
lists at mangomultimedia.com
Fri Nov 22 17:56:34 EST 2019
On Fri, Nov 22, 2019 at 2:25 PM Richard Gaskin via use-livecode <
use-livecode at lists.runrev.com> wrote:
> Trevor DeVore wrote:
> > While looking at solutions for converting HTML into XHTML that can be
> > parsed by revXML I decided to test HTMLTidy which has an option to
> > output the input as XHTML. While I could bundle up the tidy command
> > line tool and include it with my app, I prefer to wrap things up in
> > LCB if possible.
> Is conversion to XHTML the way to go?
> I've tried using the XML external to parse even RSS files -- ostensibly
> pure XML -- only to find it choke on some of them. I've gone back to
> hand-crafted pull-parsers.
There are definitely other ways to approach the problem I'm trying to
solve. In fact, in other areas of my app I will extract parts of HTML by
without relying on revXML.
In this particular case I already have some LC code that parses HTML placed
on the clipboard and converts it into data structure used by the
application. This was originally implemented using the revXML callback
feature (no tree is created in memory) and that API has worked well for the
conversions I need to make. HTML may be placed on the clipboard when
copying text and images from web browsers or by our good friend Microsoft
Word. Microsoft Word places some very "interesting" HTML on the clipboard
that needs to be massaged quite a bit before running it through revXML.
There is a speed hit that occurs when running some of the regex patterns on
the Word HTML that are used to strip out some markup and do things such as
add quotes around attributes.
Given the code that I have in place already, I would prefer to leverage
HTMLTidy rather than fix every potential "gotcha" or spend time trying to
optimize the code. I'm betting that HTMLTidy can do it better and faster
given how mature it is.
More information about the use-livecode