How to structure HTML text (tags and attributes) for processing in LiveCode?
keith.clarke at clarkeandclarke.co.uk
Sun Jun 12 05:56:48 CDT 2011
Thanks for the steer Stephen - I have Remo but hadn't discovered Jerry's tutorials before. Much to study there.
The screen-scraping lessons start from the premise that the HTML source is already reasonably structured into lines - for filtering, etc - so it doesn't help with my challenge of getting the page source into the state where I can apply some of these techniques.
However, it did make me think of experimenting with the replace function - replace "/*>" with "/*>" & return in tHTML - to soft-wrap the HTML by tag.
So, I think I'm on the right track now.
On 12 Jun 2011, at 10:57, stephen barncard wrote:
> Jerry Daniels has an excellent series on screen scraping. Several video
> On 12 June 2011 02:27, Keith Clarke <keith.clarke at clarkeandclarke.co.uk>wrote:
>> Hi folks,
>> Local rainy Saturday night broadband load prevented me from seeing the
>> whole of Colin Holgate's fascinating LiveCode Live presentation on working
>> with web page source HTML text - so I can't wait for the recording!
>> Meanwhile, I'm trying to extract various html tags and specific attributes
>> from a page's source code - you know, this and that, where <tag>stuff="this"
>> I'm trying to create the situation where I can iterate through the text
>> using something like 'repeat for each tag' and within that loop, 'repeat for
>> each attribute' - the question is, how to get the source HTML text
>> structured and delimited so that 'HTML tag = line' and 'HTML tag attribute =
>> Given there are no obvious single character itemDelimiters in HTML and the
>> inefficiency of building-up an algorithm from scratch with chunk functions,
>> are any specialised resources, techniques or tricks available - maybe I
>> missed something in the libURL feature-set?
More information about the use-livecode