Naive question about interacting with web pages

Sarah Reichelt sarah.reichelt at gmail.com
Tue Aug 25 19:10:53 EDT 2009


> Unlike more or less everyone else on this list (I suspect) I am not used to
> mixing Rev and the Internet. I've been looking at the RunRev tutorial
> material about this sort of thing, and some of it I understand pretty easily
> - for example the stack for retrieving weather data from the BBC web site.
> This essentially calculates image names on the basis of a convention used by
> the BBC for the time, region or whatever and then downloads them. Fine. Now
> what about the kind of site that invites you to put in some reference (say
> it's a catalogue reference or a post code or some key or other), after which
> you activate some kind of server-side code execution by clicking a button,
> pressing return or whatever; this then generates a result, let's say an
> image of a particular product in the catalogue, which is visible via the
> user's browser; this resultant image is not available via a URL until
> explicitly retrieved or constructed for the user's benefit, for example it
> might be extracted from a database, so the BBC Weather example doesn't
> apply.
>
> Now, what if I want to use a Rev program to simulate the user interaction to
> a site like that - so that my Rev program inserts the product code, presses
> return, waits for the image to be generated, and then downloads it (not a
> screen grab, since the image might have a higher resolution than the browser
> can display)? What I need is a way of interacting with the web page(s)
> involved (or really the underlying html), almost by simulating key strokes.
> I suppose I want to treat the html as a kind of API for the facilities of
> the site.
>
> Is this possible, and is it possible without using an external browser? I
> feel it ought to be, but I just don't know where to start. Can anyone
> explain what kind of route I should follow?


This may be possible depending on how the site handles requests.
Some sites have the input data become part of the URL for the results page.

e.g. if I want to do a Google search for chocolate, I can go to Google
and type in chocolate, or I could go directly to
<http://www.google.com/search?q=chocolate>
where you can see that "chocolate" is already part of the URL.

In Rev, I could use:
  get URL "http://www.google.com/search?q=chocolate"
and I would have the text of the returned web page.

Once you have the text of the page, then you can manipulate it using
the standard text chunking, filtering & searching tools.
If you are looking for an image, then you could try:
  get offset("<img src=", tWebPage)

If you then end up with something like:
<img src="images/pic12345.jpg">
you can pull out the "images/pic12345.jpg" part, add the root address
of the web page, to get
  http://www.website.com/images/pic12345.jpg
and then set the filename of an image in your stack to this address.

So firstly, go to the web page you are trying to access and do a
search manually.
Check the URL of the results page and see if it contains the data you entered.
If it does, then use Rev to construct that URL and get the text of the web page.
Then you can examine that text and work out how to extract what you
need from it.

Hope this helps,
Sarah



More information about the use-livecode mailing list