How to get the text of web framed pages?

Eric Chatonet eric.chatonet at sosmartsoftware.com
Tue May 24 16:50:14 EDT 2005


Hi Stephen,

Thanks for your input.
At the moment, I have 2 efficient functions with 2 different  
approaches for extracting all urls from the text of a framed page:
The first one from Ken Ray uses regex with machText and another one I  
wrote uses items with quote as the item delimiter.
Ken's solution is a bit slow but more reliable than mine that is much  
faster but a little bit silly ;-)
I go on digging in and I shall share solutions on this list when it  
will be solid enough.

Le 24 mai 05 à 20:02, Stephen Barncard a écrit :

> I would look for the word frameset in a tag inside a page, then get  
> all the valid URLS inside the frame. Then I would check each URL  
> for size, and pick the largest file, or the number of lines. That  
> will be where the main content is.

Best regards from Paris,

Eric Chatonet.
----------------------------------------------------------------
So Smart Software

For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch

Plugins, tutorials and more on our website
----------------------------------------------------------------
Web site        http://www.sosmartsoftware.com/
Email        eric.chatonet at sosmartsoftware.com/
Phone        33 (0)1 43 31 77 62
Mobile        33 (0)6 20 74 50 86
----------------------------------------------------------------



More information about the use-livecode mailing list