How to get the text of web framed pages?
Ken Ray
kray at sonsothunder.com
Tue May 24 09:34:31 EDT 2005
On 5/24/05 7:10 AM, "Eric Chatonet" <eric.chatonet at sosmartsoftware.com>
wrote:
> You are right about some frames that can be dowloaded by parsing the
> <frame src tag.
> But you are not sure to get the whole contents :-(
> I shall give you an example.
> The following url: http://www.major-k.de/revstart.html (BTW great
> stuff :-) will give you:
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
> <meta http-equiv="content-type" content="text/
> html;charset=ISO-8859-1">
> <meta name="generator" content="Adobe GoLive 6">
> <title>Willkommen bei major-k</title>
> </head>
> <frameset cols="160,*">
> <frame name="menu" noresize scrolling="no" src="menu1.html">
> <frame name="main" noresize src="xtalk.html">
> <noframes>
> <body bgcolor="#ffffff">
> <p></p>
> </body>
> </noframes>
> </frameset>
> </html>
>
> How to get back from this the effective text contents of this page
> from our friend website?
Eric, you just check the "src" attribute of a specific frame. If it doesn't
start with "http://" then it is a relative path, and you can subtitute what
is in the "src" attribute of a specific frame in place of the last
"/"-delimited item from the original path. That is, if your original path
is:
http://www.major-k.de/revstart.html
You read that page and see if it has a <frameset> tag. If it does, you look
at the <frame> tag and identify the source:
<frame name="menu" noresize scrolling="no" src="menu1.html">
In the above line, that's "menu1.html". You now put that in place of the
last "/"-delimited item of your original path to get:
http://www.major-k.de/menu1.html
And then you read *that* page to get its contents.
HTH,
Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: kray at sonsothunder.com
More information about the use-livecode
mailing list