Help with web page download
dcragg at lacscentre.co.uk
Mon Jul 15 04:28:00 EDT 2002
At 10:36 am -0700 13/7/02, Bruce Wilson wrote:
>The following script has worked on about 80% of web sites to
>download pages. However some sites require me to post info or wants
>to set cookies, or to be a big browser,etc. which goes beyond my
>scripting abilities. Any help would be appreciated. Bruce Wilson
> put word 1 of fld "symbol" into sym --Stock symbol
> put char 1 of sym into b
> put "http://user:firstname.lastname@example.org/n/"&b&"/"&sym&".html" into myurl
> put url myurl into dataVar
> put HTMLtoTXT (dataVar) into dataVar --HTMLtoTXT is external Func.
> put cleanUpTXT (dataVar) into fld "data" --clean up & put into fld
I can't offer much help. I think you'll have to deal with those 20%
of sites on a case-by-case basis.
For sites that want you to post data, you'll need to know the format
of the data to be posted. Examining a web page that posts to the site
should help you find out what is required. If a form is used to post
data from a web page, the pattern of the data is typically of the
where field_1, etc. is the name of the form field.
If you have to be a big browser, setting the "User-Agent" field in
the http headers should help. Something like the following before
getting the url:
put "User-Agent: Mozilla/4.0 (compatible; MSIE 5.0b2; Windows NT)"
set the httpHeaders to tAgentString
You can probably find suitable strings on the web. This example was taken from:
Note that the httpHeaders gets reset after each url request.
For cookies, this is from an older mail:
>I've not done it, and don't know too much about the mechanism, but
>it should be possible using the httpHeaders property and the
>libUrlLastRHHeaders() returns the headers of the reponse to the most
>recent http request. You should be able to parse out the cookie
>header from this, and store it or whatever you need to do.
>When sending a cookie from client to server, you can set the
>httpHeaders to something like:
>Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
>(example from rfc 2965. <http://www.faqs.org/rfcs/rfc2965.html>)
More information about the Use-livecode