Put URL failing -- HTTP header problems?
Sivakatirswami
katir at hindu.org
Wed May 10 00:43:27 EDT 2006
I think I am close the source of the problem which at this point I
take to be the spurious introduction of "char[13]" into what should
be line delimited lists. Where this char(13) is present in a GET
request URL, libURL fails. (of course)
REPEAT for each line tOneGenus in tTestList
--> extract the genus name first
set the itemdel to "/"
put item 5 of tOneGenus into tGenus
put tGenus after tGeni
delete item -1 of tOneGenus
put tOneGenus & cr after tURLs
put url (tOneGenus) into tOneGenusPage
--> from each page we have to extract the species URLs
REPEAT for each line x in tOneGenusPage
IF x contains ("/" & tGenus & "/") THEN put x & cr after
tSpeciesPages
END REPEAT
END REPEAT
set the clipboarddata["text"] to tGeni
is generating a string on Mac OSX like this... it appears in the msg
box as a long line with no spaces and no breaks
AcanthophoenixAcoelorrhapheAcrocomiaActinokentiaAdonidiaAiphanesAllagopt
eraAlloschmidiaAlsmithiaI
If paste this here we see:
Acanthophoenix
Acoelorrhaphe
Acrocomia
Actinokentia
Actinorhytis
Adonidia
Aiphanes
Allagoptera
Alloschmidia
Alsmithia
If I do a byte by byte examination I get something interesting... char
(13) is present after each one:
65,99,97,110,116,104,111,112,104,111,101,110,105,120,13,65,99,111,101,10
8,111,114,114,104,97,112,104,101,13,65,99,114,111,99,111,109,105,97,13,6
5,99,116,105,110,111,107,101,110,116,105,97,13,65,99,116,105,110,111,114
,
104,121,116,105,115,13,65,100,111,110,105,100,105,97,13,65,105,112,104,9
7,110,101,115,13,65,108,108,97,103,111,112,116,101,114,97,13,65,108,108,
111,115,99,104,109,105,100,105,97,13,65,108,115,109,105,116,104,105,97,1
3,
so, I'm not sure where or how this is being introduced.
but where the variable watcher is showing me
http://www.pacsoa.org.au/palms/Areca/index.html
in fact that string is:
http://www.pacsoa.org.au/palms/Areca(char[13])/index.html
and this is what is causing the URL GET requests to break. If you
paste it into a URL field in FireFox the char(13) is not passed (my
assumption)
I have an odd feeling that Rev is introducing this... I could be
wrong...
Here again is my script. This is easy to simulate for those who may
be interested: make new stack. create two fields "Previewer" and
"Logfield" and one button with following script:
--> all handlers
ON mouseup
set the cursor to busy
getPalms
# previous crawlers
--getGaneshas
END mouseup
ON getPalms
--> site is: http://www.pacsoa.org.au/palms/index.html
# we need to dig every */palms/*.html file on this page
# so first is to extract all the URL's
-- put fld "MainURL" into tStartURL
put "http://www.pacsoa.org.au/palms/index.html" into tStartURL
put URL tStartURL into tMainListing
REPEAT for each line x in tMainListing
IF x contains "/palms/" THEN # we got one for sure
put x & cr after tPalmList
END IF
END REPEAT
--check it out
delete line 1 to 2 of tPalmList
delete line -1 of tPalmList
put "<[^><]*>" into tRex
put replacetext(tPalmList, tRex, "") into tPalmsList
replace " " with "" in tPalmsList
REPEAT for each line x in tPalmslist
put "http://www.pacsoa.org.au/palms/" before x
put "/index.html" after x
put x & cr after tGenusListing
END REPEAT
--> Step through Genus listing
put line 1 to 10 of tGenusListing into tTestList
liburlSetLogField the long id of field "logField"
--repeat for each line tOneGenus in tGenusListing
REPEAT for each line tOneGenus in tTestList
--> extract the genus name first
set the itemdel to "/"
put item 5 of tOneGenus into tGenus
put tGenus after tGeni
delete item -1 of tOneGenus
put tOneGenus & cr after tURLs
put url (tOneGenus) into tOneGenusPage
--> from each page we have to extract the species URLs
REPEAT for each line x in tOneGenusPage
IF x contains ("/" & tGenus & "/") THEN put x & cr after
tSpeciesPages
END REPEAT
END REPEAT
set the clipboarddata["text"] to tGeni
--> Load the Species URL's and then save and .jpg file therein
END getPalms
More information about the use-livecode
mailing list