How to get the text of web framed pages?
Eric Chatonet
eric.chatonet at sosmartsoftware.com
Tue May 24 16:50:14 EDT 2005
Hi Stephen,
Thanks for your input.
At the moment, I have 2 efficient functions with 2 different
approaches for extracting all urls from the text of a framed page:
The first one from Ken Ray uses regex with machText and another one I
wrote uses items with quote as the item delimiter.
Ken's solution is a bit slow but more reliable than mine that is much
faster but a little bit silly ;-)
I go on digging in and I shall share solutions on this list when it
will be solid enough.
Le 24 mai 05 à 20:02, Stephen Barncard a écrit :
> I would look for the word frameset in a tag inside a page, then get
> all the valid URLS inside the frame. Then I would check each URL
> for size, and pick the largest file, or the number of lines. That
> will be where the main content is.
Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
So Smart Software
For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch
Plugins, tutorials and more on our website
----------------------------------------------------------------
Web site http://www.sosmartsoftware.com/
Email eric.chatonet at sosmartsoftware.com/
Phone 33 (0)1 43 31 77 62
Mobile 33 (0)6 20 74 50 86
----------------------------------------------------------------
More information about the use-livecode
mailing list