Help with web page download
Bruce Wilson
skycap at earthlink.net
Mon Jul 15 11:27:01 EDT 2002
Thanks for your Help Dave.
Bruce
Dave Cragg wrote:
> At 10:36 am -0700 13/7/02, Bruce Wilson wrote:
>
>> The following script has worked on about 80% of web sites to
>> download pages. However some sites require me to post info or wants
>> to set cookies, or to be a big browser,etc. which goes beyond my
>> scripting abilities. Any help would be appreciated. Bruce Wilson
>>
>>
>> on mouseup
>> put word 1 of fld "symbol" into sym --Stock symbol
>> put char 1 of sym into b
>> put "http://user:psswrd@biz.yahoo.com/n/"&b&"/"&sym&".html" into myurl
>> put url myurl into dataVar
>> put HTMLtoTXT (dataVar) into dataVar --HTMLtoTXT is external Func.
>> put cleanUpTXT (dataVar) into fld "data" --clean up & put into fld
>> end mouseup
>
>
> I can't offer much help. I think you'll have to deal with those 20% of
> sites on a case-by-case basis.
>
> For sites that want you to post data, you'll need to know the format of
> the data to be posted. Examining a web page that posts to the site
> should help you find out what is required. If a form is used to post
> data from a web page, the pattern of the data is typically of the style:
>
> field1="value_1"&field_2="value_2"&field_3="value_3"
>
> where field_1, etc. is the name of the form field.
>
> If you have to be a big browser, setting the "User-Agent" field in the
> http headers should help. Something like the following before getting
> the url:
>
> put "User-Agent: Mozilla/4.0 (compatible; MSIE 5.0b2; Windows NT)" into
> tAgentString
> set the httpHeaders to tAgentString
>
> You can probably find suitable strings on the web. This example was
> taken from:
> <http://www-106.ibm.com/developerworks/xml/library/client/client.html?dwzone=xml>
>
> Note that the httpHeaders gets reset after each url request.
>
> For cookies, this is from an older mail:
>
>> I've not done it, and don't know too much about the mechanism, but
>> it should be possible using the httpHeaders property and the
>> libUrlLastRHHeaders() function.
>>
>> libUrlLastRHHeaders() returns the headers of the reponse to the most
>> recent http request. You should be able to parse out the cookie
>> header from this, and store it or whatever you need to do.
>>
>> When sending a cookie from client to server, you can set the
>> httpHeaders to something like:
>>
>> Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
>>
>> (example from rfc 2965. <http://www.faqs.org/rfcs/rfc2965.html>)
>
>
>
> Cheers
> Dave Cragg
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
More information about the use-livecode
mailing list