Put URL failing -- HTTP header problems?

Sivakatirswami katir at hindu.org
Wed May 10 00:43:27 EDT 2006


I think I am close the source of the problem which at this point I  
take to be the spurious introduction of "char[13]" into what should  
be line delimited lists. Where this char(13) is present in a GET  
request URL, libURL fails. (of course)

REPEAT for each line tOneGenus in tTestList
         --> extract the genus name first
         set the itemdel to "/"
         put item 5 of tOneGenus into tGenus
         put tGenus after tGeni
         delete item -1 of tOneGenus
         put tOneGenus & cr after tURLs
         put url (tOneGenus) into tOneGenusPage
         --> from each page we have to extract the species URLs
         REPEAT for each line x in tOneGenusPage
             IF x contains ("/" & tGenus & "/") THEN put x & cr after  
tSpeciesPages
         END REPEAT

     END REPEAT

set the clipboarddata["text"] to tGeni

is generating a string on Mac OSX like this... it appears in the msg  
box as a long line with no spaces and no breaks


AcanthophoenixAcoelorrhapheAcrocomiaActinokentiaAdonidiaAiphanesAllagopt 
eraAlloschmidiaAlsmithiaI

If paste this here we see:

Acanthophoenix
Acoelorrhaphe
Acrocomia
Actinokentia
Actinorhytis
Adonidia
Aiphanes
Allagoptera
Alloschmidia
Alsmithia

If I do a byte by byte examination I get something interesting... char 
(13) is present after each one:

65,99,97,110,116,104,111,112,104,111,101,110,105,120,13,65,99,111,101,10 
8,111,114,114,104,97,112,104,101,13,65,99,114,111,99,111,109,105,97,13,6 
5,99,116,105,110,111,107,101,110,116,105,97,13,65,99,116,105,110,111,114 
, 
104,121,116,105,115,13,65,100,111,110,105,100,105,97,13,65,105,112,104,9 
7,110,101,115,13,65,108,108,97,103,111,112,116,101,114,97,13,65,108,108, 
111,115,99,104,109,105,100,105,97,13,65,108,115,109,105,116,104,105,97,1 
3,

so, I'm not sure where or how this is being introduced.

but where the variable watcher is showing me

http://www.pacsoa.org.au/palms/Areca/index.html

  in fact that string is:

http://www.pacsoa.org.au/palms/Areca(char[13])/index.html

and this is what is causing the URL GET requests to break. If you  
paste it into a URL field in FireFox the char(13) is not passed (my  
assumption)

I have an odd feeling that Rev is introducing this... I could be  
wrong...

Here again is my script.  This is easy to simulate for those who may  
be interested: make new stack. create two fields "Previewer" and  
"Logfield" and one button with following script:

--> all handlers

ON mouseup

     set the cursor to busy
     getPalms

     # previous crawlers

     --getGaneshas

END mouseup

ON getPalms
     --> site is: http://www.pacsoa.org.au/palms/index.html

     # we need to dig every */palms/*.html  file on this page
     # so first is to extract all the URL's

    -- put fld "MainURL" into tStartURL

put "http://www.pacsoa.org.au/palms/index.html" into tStartURL

     put URL tStartURL into tMainListing
     REPEAT for each line x in tMainListing
         IF x contains "/palms/"  THEN # we got one for sure
             put x & cr after tPalmList
         END IF

     END REPEAT

     --check it out
     delete line 1 to 2 of tPalmList
     delete line -1 of tPalmList

     put  "<[^><]*>" into tRex
     put replacetext(tPalmList, tRex, "") into tPalmsList
     replace " " with "" in tPalmsList

     REPEAT for each line x in tPalmslist
         put "http://www.pacsoa.org.au/palms/" before x
         put "/index.html" after x
         put x & cr after tGenusListing
     END REPEAT


     --> Step through Genus listing

     put line 1 to 10 of tGenusListing into tTestList

     liburlSetLogField the long id of field "logField"
     --repeat for each line tOneGenus in tGenusListing
     REPEAT for each line tOneGenus in tTestList
         --> extract the genus name first
         set the itemdel to "/"
         put item 5 of tOneGenus into tGenus
         put tGenus after tGeni
         delete item -1 of tOneGenus
         put tOneGenus & cr after tURLs


         put url (tOneGenus) into tOneGenusPage

         --> from each page we have to extract the species URLs
         REPEAT for each line x in tOneGenusPage
             IF x contains ("/" & tGenus & "/") THEN put x & cr after  
tSpeciesPages
         END REPEAT

     END REPEAT
     set the clipboarddata["text"] to tGeni


     --> Load the Species URL's and then save and .jpg file therein


END getPalms










More information about the use-livecode mailing list