Put URL failing -- HTTP header problems?

Sivakatirswami katir at hindu.org
Tue May 9 23:28:43 EDT 2006


On May 09, 2006, at 12:57 PM, Dave Cragg wrote:

> In that case it would be useful to see the http request that Rev  
> (libUrl) is sending, using
>
>  liburlSetLogField the long id of field <field>

Done

see below:

but think I found the problem:  I'm parsing an index page where the  
URL strings appear to be a single long string like this:

http://www.pacsoa.org.au/palms/Acanthophoenix/index.html

but if I paste these from the msg box into this email my test list of  
ten URL looks like this:

where the variable

"tOneGenus"

seems to contain: http://www.pacsoa.org.au/palms/Acanthophoenix/ 
index.html

But if I paste these into an email I get...

tURLs  (of course these are not working )

http://www.pacsoa.org.au/palms/Acoelorrhaphe
/index.html
http://www.pacsoa.org.au/palms/Acrocomia
/index.html
http://www.pacsoa.org.au/palms/Actinokentia
/index.html
http://www.pacsoa.org.au/palms/Actinorhytis
/index.html
http://www.pacsoa.org.au/palms/Adonidia
/index.html
http://www.pacsoa.org.au/palms/Aiphanes
/index.html
http://www.pacsoa.org.au/palms/Allagoptera
/index.html
http://www.pacsoa.org.au/palms/Alloschmidia
/index.html
http://www.pacsoa.org.au/palms/Alsmithia
/index.html




Looks like I have some kind of CRLF in data following the folder, so  
my GET request fails in Revolution, but pasting the same string into  
Firefox works. And, it appears my script is possibly generating this  
but I can see it....

Now, if I chop the end off

  to give us urls like this:

http://www.pacsoa.org.au/palms/Acoelorrhaphe

these work just fine here is my script...

on mouseup
   set the cursor to busy
   getPalms

# previous crawlers
   --getGaneshas
end mouseup


ON getPalms
     --> site is: http://www.pacsoa.org.au/palms/index.html

     # we need to dig every */palms/*.html  file on this page
     # so first is to extract all the URL's

    -- put fld "MainURL" into tStartURL
put "http://www.pacsoa.org.au/palms/index.html" into tStartURL

     put URL tStartURL into tMainListing # this works..
     REPEAT for each line x in tMainListing
         IF x contains "/palms/"  THEN # we got one for sure
             put x & cr after tPalmList
         END IF

     END REPEAT

     delete line 1 to 2 of tPalmList
     delete line -1 of tPalmList

# clean out tags:
     put  "<[^><]*>" into tRex
     put replacetext(tPalmList, tRex, "") into tPalmsList
     replace " " with "" in tPalmsList

     REPEAT for each line x in tPalmslist
         put "http://www.pacsoa.org.au/palms/" before x
           # i think I am getting an extra CR introduced here...I  
don't know why
         put "/index.html" after x
         put x & cr after tGenusListing
     END REPEAT


     --> Step through Genus listing

     put line 1 to 10 of tGenusListing into tTestList

     liburlSetLogField the long id of field "logField"
     --repeat for each line tOneGenus in tGenusListing
     REPEAT for each line tOneGenus in tTestList
         --> extract the genus name first
         set the itemdel to "/"
         put item 5 of tOneGenus into tGenus
        -- delete item -1 of tOneGenus
         put tOneGenus & cr after tURLs


         put url (tOneGenus) into tOneGenusPage
         wait 5 ticks
         put tOneGenusPage into fld "previewer"



         --> from each page we have to extract the species URLs
         --repeat for each line x in tOneGenusPage
         --if x contains ("/" & tGenus & "/") then put x & cr after  
tSpeciesPages
         --end repeat

     END REPEAT
     --put tSpeciesPages

     --> Load the Species URL's and then save and .jpg file therein

      put tURLs

END getPalms


tURLs  (of course these are not working )

http://www.pacsoa.org.au/palms/Acoelorrhaphe
/index.html
http://www.pacsoa.org.au/palms/Acrocomia
/index.html
http://www.pacsoa.org.au/palms/Actinokentia
/index.html
http://www.pacsoa.org.au/palms/Actinorhytis
/index.html
http://www.pacsoa.org.au/palms/Adonidia
/index.html
http://www.pacsoa.org.au/palms/Aiphanes
/index.html
http://www.pacsoa.org.au/palms/Allagoptera
/index.html
http://www.pacsoa.org.au/palms/Alloschmidia
/index.html
http://www.pacsoa.org.au/palms/Alsmithia
/index.html

These work: but you can see the extra CR coming in from somewhere i  
don't see these extra lines in the message box though...

http://www.pacsoa.org.au/palms/Acanthophoenix

http://www.pacsoa.org.au/palms/Acoelorrhaphe

http://www.pacsoa.org.au/palms/Acrocomia

http://www.pacsoa.org.au/palms/Actinokentia

http://www.pacsoa.org.au/palms/Actinorhytis

http://www.pacsoa.org.au/palms/Adonidia

http://www.pacsoa.org.au/palms/Aiphanes

http://www.pacsoa.org.au/palms/Allagoptera

http://www.pacsoa.org.au/palms/Alloschmidia

http://www.pacsoa.org.au/palms/Alsmithia



socket selected: 209.15.79.148:80|6956
GET /palms/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 200 OK

Date: Wed, 10 May 2006 03:00:57 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Last-Modified: Sat, 25 Mar 2006 08:39:13 GMT

ETag: "88fc5f-6154-442501b1"

Accept-Ranges: bytes

Content-Length: 24916

Content-Type: text/html


socket selected: 209.15.79.148:80|6956
GET /palms/Acanthophoenix
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:00:58 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6956
socket selected: 209.15.79.148:80|6957
GET /palms/Acoelorrhaphe
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:00:58 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6957
socket selected: 209.15.79.148:80|6958
GET /palms/Acrocomia
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:00:59 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6958
socket selected: 209.15.79.148:80|6959
GET /palms/Actinokentia
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:00:59 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6959
socket selected: 209.15.79.148:80|6960
GET /palms/Actinorhytis
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:00 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6960
socket selected: 209.15.79.148:80|6961
GET /palms/Adonidia
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:00 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6961
socket selected: 209.15.79.148:80|6962
GET /palms/Aiphanes
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:00 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6962
socket selected: 209.15.79.148:80|6963
GET /palms/Allagoptera
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:01 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6963
socket selected: 209.15.79.148:80|6964
GET /palms/Alloschmidia
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:01 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6964
socket selected: 209.15.79.148:80|6965
GET /palms/Alsmithia
/index.html HTTP/1.1

Host: www.pacsoa.org.au

User-Agent: Revolution (MacOS)


HTTP/1.1 400 Bad Request

Date: Wed, 10 May 2006 03:01:02 GMT

Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510 PHP/4.2.3

Connection: close

Content-Type: text/html; charset=iso-8859-1


CLOSED 209.15.79.148:80|6965

>
> Cheers
> Dave




More information about the use-livecode mailing list