How to extract an entire element from an HTML file?

Tom Glod tom at makeshyft.com
Mon Nov 26 14:46:08 EST 2018


I've been thinking about a simple html parser as well to extract email
addresses or urls from a page.....

Tools that might help

1. regular expressions
2. item delimiter and chunks. (set itemdelimiter to tag you are trying to
extract)
3.replace command

Good luck.


On Mon, Nov 26, 2018 at 10:18 AM Keith Clarke via use-livecode <
use-livecode at lists.runrev.com> wrote:

> Thanks for the warning and the link to the parsers, Trevor.
>
> I get the point regarding unclean HTML - as I won’t be in control of the
> source. Following a cursory glance through the dictionary, I’m also a tad
> concerned about the variability in HTML tag content (e.g.
>
> <div class=“red box”>content & elements</div>
> vs.
> <div class=“box red”>content & elements</div>
>
> ...and hence, how much wrangling might be needed to identify all the nodes
> in the tree with a specific class, where jQuery’s "$j(‘.red’).html();”
> saves a lot of the heavy lifting involved.
>
> I’ll have a look at those parsers, too - though I doubt my coding chops
> are up to creating a library wrapper - indeed, I’ll have to Google what one
> is! :-)
> Best,
> Keith
>
> > On 26 Nov 2018, at 13:42, Trevor DeVore via use-livecode <
> use-livecode at lists.runrev.com> wrote:
> >
> > On Mon, Nov 26, 2018 at 3:30 AM Keith Clarke via use-livecode <
> > use-livecode at lists.runrev.com> wrote:
> >
> >> Thanks for the steer, Paul - I’ve not worked with XML in LiveCode so
> >> hadn’t made the connection between the HTML markup structure & XML.
> >
> >
> > Keith,
> >
> > I’ve used revXML for parsing HTML in somewhat controlled conditions.
> While
> > revXML can work for HTML, your results will vary based on how well
> > structured the HTML is. If there are tags that are not closed or are out
> of
> > balance then revXML won’t give you the results you expect. If you are
> > generating the HTML then it shouldn’t be a problem. If it is third party
> > HTML then you may have to massage the HTML input to get it to work.
> >
> > It would be great if there were a library wrapper around one of the
> > dedicated HTML parsers listed on this page:
> >
> > https://en.m.wikipedia.org/wiki/Comparison_of_HTML_parsers
> >
> > --
> > Trevor DeVore
> > ScreenSteps
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



More information about the use-livecode mailing list