Put URL failing -- HTTP header problems?
Sivakatirswami
katir at hindu.org
Tue May 9 23:28:43 EDT 2006
On May 09, 2006, at 12:57 PM, Dave Cragg wrote:
> In that case it would be useful to see the http request that Rev
> (libUrl) is sending, using
>
> liburlSetLogField the long id of field <field>
Done
see below:
but think I found the problem: I'm parsing an index page where the
URL strings appear to be a single long string like this:
http://www.pacsoa.org.au/palms/Acanthophoenix/index.html
but if I paste these from the msg box into this email my test list of
ten URL looks like this:
where the variable
"tOneGenus"
seems to contain: http://www.pacsoa.org.au/palms/Acanthophoenix/
index.html
But if I paste these into an email I get...
tURLs (of course these are not working )
http://www.pacsoa.org.au/palms/Acoelorrhaphe
/index.html
http://www.pacsoa.org.au/palms/Acrocomia
/index.html
http://www.pacsoa.org.au/palms/Actinokentia
/index.html
http://www.pacsoa.org.au/palms/Actinorhytis
/index.html
http://www.pacsoa.org.au/palms/Adonidia
/index.html
http://www.pacsoa.org.au/palms/Aiphanes
/index.html
http://www.pacsoa.org.au/palms/Allagoptera
/index.html
http://www.pacsoa.org.au/palms/Alloschmidia
/index.html
http://www.pacsoa.org.au/palms/Alsmithia
/index.html
Looks like I have some kind of CRLF in data following the folder, so
my GET request fails in Revolution, but pasting the same string into
Firefox works. And, it appears my script is possibly generating this
but I can see it....
Now, if I chop the end off
to give us urls like this:
http://www.pacsoa.org.au/palms/Acoelorrhaphe
these work just fine here is my script...
on mouseup
set the cursor to busy
getPalms
# previous crawlers
--getGaneshas
end mouseup
ON getPalms
--> site is: http://www.pacsoa.org.au/palms/index.html
# we need to dig every */palms/*.html file on this page
# so first is to extract all the URL's
-- put fld "MainURL" into tStartURL
put "http://www.pacsoa.org.au/palms/index.html" into tStartURL
put URL tStartURL into tMainListing # this works..
REPEAT for each line x in tMainListing
IF x contains "/palms/" THEN # we got one for sure
put x & cr after tPalmList
END IF
END REPEAT
delete line 1 to 2 of tPalmList
delete line -1 of tPalmList
# clean out tags:
put "<[^><]*>" into tRex
put replacetext(tPalmList, tRex, "") into tPalmsList
replace " " with "" in tPalmsList
REPEAT for each line x in tPalmslist
put "http://www.pacsoa.org.au/palms/" before x
# i think I am getting an extra CR introduced here...I
don't know why
put "/index.html" after x
put x & cr after tGenusListing
END REPEAT
--> Step through Genus listing
put line 1 to 10 of tGenusListing into tTestList
liburlSetLogField the long id of field "logField"
--repeat for each line tOneGenus in tGenusListing
REPEAT for each line tOneGenus in tTestList
--> extract the genus name first
set the itemdel to "/"
put item 5 of tOneGenus into tGenus
-- delete item -1 of tOneGenus
put tOneGenus & cr after tURLs
put url (tOneGenus) into tOneGenusPage
wait 5 ticks
put tOneGenusPage into fld "previewer"
--> from each page we have to extract the species URLs
--repeat for each line x in tOneGenusPage
--if x contains ("/" & tGenus & "/") then put x & cr after
tSpeciesPages
--end repeat
END REPEAT
--put tSpeciesPages
--> Load the Species URL's and then save and .jpg file therein
put tURLs
END getPalms
tURLs (of course these are not working )
http://www.pacsoa.org.au/palms/Acoelorrhaphe
/index.html
http://www.pacsoa.org.au/palms/Acrocomia
/index.html
http://www.pacsoa.org.au/palms/Actinokentia
/index.html
http://www.pacsoa.org.au/palms/Actinorhytis
/index.html
http://www.pacsoa.org.au/palms/Adonidia
/index.html
http://www.pacsoa.org.au/palms/Aiphanes
/index.html
http://www.pacsoa.org.au/palms/Allagoptera
/index.html
http://www.pacsoa.org.au/palms/Alloschmidia
/index.html
http://www.pacsoa.org.au/palms/Alsmithia
/index.html
These work: but you can see the extra CR coming in from somewhere i
don't see these extra lines in the message box though...
http://www.pacsoa.org.au/palms/Acanthophoenix
http://www.pacsoa.org.au/palms/Acoelorrhaphe
http://www.pacsoa.org.au/palms/Acrocomia
http://www.pacsoa.org.au/palms/Actinokentia
http://www.pacsoa.org.au/palms/Actinorhytis
http://www.pacsoa.org.au/palms/Adonidia
http://www.pacsoa.org.au/palms/Aiphanes
http://www.pacsoa.org.au/palms/Allagoptera
http://www.pacsoa.org.au/palms/Alloschmidia
http://www.pacsoa.org.au/palms/Alsmithia
socket selected: 209.15.79.148:80|6956
GET /palms/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 200 OK
Date: Wed, 10 May 2006 03:00:57 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Last-Modified: Sat, 25 Mar 2006 08:39:13 GMT
ETag: "88fc5f-6154-442501b1"
Accept-Ranges: bytes
Content-Length: 24916
Content-Type: text/html
socket selected: 209.15.79.148:80|6956
GET /palms/Acanthophoenix
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:00:58 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6956
socket selected: 209.15.79.148:80|6957
GET /palms/Acoelorrhaphe
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:00:58 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6957
socket selected: 209.15.79.148:80|6958
GET /palms/Acrocomia
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:00:59 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6958
socket selected: 209.15.79.148:80|6959
GET /palms/Actinokentia
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:00:59 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6959
socket selected: 209.15.79.148:80|6960
GET /palms/Actinorhytis
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:00 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6960
socket selected: 209.15.79.148:80|6961
GET /palms/Adonidia
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:00 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6961
socket selected: 209.15.79.148:80|6962
GET /palms/Aiphanes
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:00 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6962
socket selected: 209.15.79.148:80|6963
GET /palms/Allagoptera
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:01 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6963
socket selected: 209.15.79.148:80|6964
GET /palms/Alloschmidia
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:01 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6964
socket selected: 209.15.79.148:80|6965
GET /palms/Alsmithia
/index.html HTTP/1.1
Host: www.pacsoa.org.au
User-Agent: Revolution (MacOS)
HTTP/1.1 400 Bad Request
Date: Wed, 10 May 2006 03:01:02 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3
Connection: close
Content-Type: text/html; charset=iso-8859-1
CLOSED 209.15.79.148:80|6965
>
> Cheers
> Dave
More information about the use-livecode
mailing list