Help with web page download

Dave Cragg dcragg at lacscentre.co.uk
Mon Jul 15 04:28:00 EDT 2002


At 10:36 am -0700 13/7/02, Bruce Wilson wrote:
>The following script has worked on about 80% of web sites to
>download pages. However some sites require me to post info or wants
>to set cookies, or to be a big browser,etc. which goes beyond my
>scripting abilities. Any help would be appreciated.   Bruce Wilson
>
>
>on mouseup
>   put word 1 of fld "symbol" into sym     --Stock symbol
>   put char 1 of sym into b
>   put "http://user:psswrd@biz.yahoo.com/n/"&b&"/"&sym&".html" into myurl
>   put url myurl into dataVar
>   put HTMLtoTXT (dataVar) into dataVar     --HTMLtoTXT is external Func.
>   put cleanUpTXT (dataVar) into fld "data" --clean up & put into fld
>end mouseup

I can't offer much help. I think you'll have to deal with those 20% 
of sites on a case-by-case basis.

For sites that want you to post data, you'll need to know the format 
of the data to be posted. Examining a web page that posts to the site 
should help you find out what is required. If a form is used to post 
data from a web page, the pattern of the data is typically of the 
style:

field1="value_1"&field_2="value_2"&field_3="value_3"

where field_1, etc. is the name of the form field.

If you have to be a big browser, setting the "User-Agent" field in 
the http headers should help. Something like the following before 
getting the url:

put "User-Agent:  Mozilla/4.0 (compatible; MSIE 5.0b2; Windows NT)" 
into tAgentString
set the httpHeaders to tAgentString

You can probably find suitable strings on the web. This example was taken from:
<http://www-106.ibm.com/developerworks/xml/library/client/client.html?dwzone=xml>

Note that the httpHeaders gets reset after each url request.

For cookies, this is from an older mail:

>I've not done it, and don't know too much about the mechanism, but
>it should be possible using the httpHeaders property and the
>libUrlLastRHHeaders() function.
>
>libUrlLastRHHeaders() returns the headers of the reponse to the most
>recent http request. You should be able to parse out the cookie
>header from this, and store it or whatever you need to do.
>
>When sending a cookie from client to server, you can set the
>httpHeaders to something like:
>
>Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
>
>(example from rfc 2965.  <http://www.faqs.org/rfcs/rfc2965.html>)


Cheers
Dave Cragg



More information about the Use-livecode mailing list