URL exists?

Dave Cragg dcragg at lacscentre.co.uk
Fri Nov 7 11:48:24 EST 2003


At 1:23 am -0500 7/11/03, Shari wrote:
>What's the fastest way to check if an URL exists?
>
>I tried:
>
>if URL "http://www.whatever.com/something/else.html" is empty then
>   blah blah blah
>end if
>
>Very very slow.  Is there a faster way?

You'll get different results, depending on whether the host server 
(www.whatever.com) or the resource on the server 
(/something/else.html) doesn't exist. If it's the latter, then 
typically you'll get a 404 response (the result will return "error 
404" plus a descriptive string, often "file not found") But data is 
usually returned from such a url call, consisting of the "missing 
page html message" that http servers usually return. So checking for 
empty won't work in this case. However, the whole process shouldn't 
be slow.

The "missing host" shouldn't be slow either if the DNS lookup returns 
quickly. (libUrl does a lookup with hostNameToAdress before making 
any requests.) "invalid host address" is returned in the result.

A delay is most likely to occur because of connection problems, or 
when using numerical IP addresses (192.168.1.1, etc.) that don't 
exist on the local network.

At 2:33 am -0500 7/11/03, Brian Yennie wrote:
>I think the only way that will be technically faster than this would
>be to use sockets, and make a HEAD request, rather than a GET for
>the url in question.
>
>You'd have to look up http protocol, but basically what it does it
>let you get just the http headers for a page rather than the
>contents. You'd also have to parse the return code...

In theory, it should be possible to make a HEAD request using the 
libUrlSetCustomHttpHeaders routine. But I just checked, and it's 
flawed. libUrl tries to read for data beyond the returned headers, 
and you have to wait for a timeout before it returns. I'll try and 
fix this in a future revision, probably with a specific routine for 
using HEAD. Meawhile, here's a *quickly* concocted routine for 
sending HEAD requests. (WARNING: it really needs socketError and 
socketTimeout handlers and a way to jump out of the waits in the main 
routine if something goes wrong.)

## this in a button somewhere
## substitue your values for tHost, tResource, tPort
on mouseUp

   put "www.whatever.com" into tHost
   put "80" into tPort
   put "/something/else.html" into tResource
   put empty into tData
   doHEAD tHost, tPort, tResource, tData
   if the result <> empty then
     answer the result
   else
     answer tData
   end if
end mouseUp

------------------------------------
## this part in the message path (card, stack, library)
local lvWritten, lvReadData, lvReaded

on doHEAD pHost, pPort, pResource, @pResponse

   put "HEAD" && pResource && "HTTP/1.1" into tHeaders
   put crlf & "Host:" && pHost after tHeaders
   put crlf & "User-Agent: Metacard" after tHeaders ##optional
   put crlf & crlf after tHeaders

   put pHost & ":" & pPort into tSocket
   open socket to tSocket
   if the result is not empty then return the result

   put empty into lvWritten
   write tHeaders to socket tSocket with message "written"
   if the result <> empty then
     put the result into tRes
     close socket tSocket
     return tRes
   end if

   wait until lvWritten <> empty with messages

   put empty into lvReaded
   read from socket tSocket with message "readed"
   if the result <> empty then
     put the result into tRes
     close socket tSocket
     return tRes
   end if

   wait until lvReaded <> empty with messages
   put lvReadData into pResponse
   close socket tSocket
   return empty

end doHEAD

on written x, y
   put true into lvWritten
end written

on readed x,y
   put y into lvReadData
   put true into lvReaded
end readed
-----------------------------------

Cheers
Dave


More information about the metacard mailing list