Help with web page download

Bruce Wilson skycap at earthlink.net
Mon Jul 15 11:27:01 EDT 2002


Thanks for your Help Dave.
Bruce

Dave Cragg wrote:

> At 10:36 am -0700 13/7/02, Bruce Wilson wrote:
> 
>> The following script has worked on about 80% of web sites to
>> download pages. However some sites require me to post info or wants
>> to set cookies, or to be a big browser,etc. which goes beyond my
>> scripting abilities. Any help would be appreciated.   Bruce Wilson
>>
>>
>> on mouseup
>>   put word 1 of fld "symbol" into sym     --Stock symbol
>>   put char 1 of sym into b
>>   put "http://user:psswrd@biz.yahoo.com/n/"&b&"/"&sym&".html" into myurl
>>   put url myurl into dataVar
>>   put HTMLtoTXT (dataVar) into dataVar     --HTMLtoTXT is external Func.
>>   put cleanUpTXT (dataVar) into fld "data" --clean up & put into fld
>> end mouseup
> 
> 
> I can't offer much help. I think you'll have to deal with those 20% of 
> sites on a case-by-case basis.
> 
> For sites that want you to post data, you'll need to know the format of 
> the data to be posted. Examining a web page that posts to the site 
> should help you find out what is required. If a form is used to post 
> data from a web page, the pattern of the data is typically of the style:
> 
> field1="value_1"&field_2="value_2"&field_3="value_3"
> 
> where field_1, etc. is the name of the form field.
> 
> If you have to be a big browser, setting the "User-Agent" field in the 
> http headers should help. Something like the following before getting 
> the url:
> 
> put "User-Agent:  Mozilla/4.0 (compatible; MSIE 5.0b2; Windows NT)" into 
> tAgentString
> set the httpHeaders to tAgentString
> 
> You can probably find suitable strings on the web. This example was 
> taken from:
> <http://www-106.ibm.com/developerworks/xml/library/client/client.html?dwzone=xml>
> 
> Note that the httpHeaders gets reset after each url request.
> 
> For cookies, this is from an older mail:
> 
>> I've not done it, and don't know too much about the mechanism, but
>> it should be possible using the httpHeaders property and the
>> libUrlLastRHHeaders() function.
>>
>> libUrlLastRHHeaders() returns the headers of the reponse to the most
>> recent http request. You should be able to parse out the cookie
>> header from this, and store it or whatever you need to do.
>>
>> When sending a cookie from client to server, you can set the
>> httpHeaders to something like:
>>
>> Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
>>
>> (example from rfc 2965.  <http://www.faqs.org/rfcs/rfc2965.html>)
> 
> 
> 
> Cheers
> Dave Cragg
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> http://lists.runrev.com/mailman/listinfo/use-revolution
> 





More information about the use-livecode mailing list