Website scraping - How can I load a 'partial' page?

Mike Bonner bonnmike at gmail.com
Wed Dec 13 10:39:41 EST 2017


I suppose one could use sockets and partial GET requests (using a range:
header), but i suspect it would be easier to just use an intermediary
server to handle things.  To test, I set up an extremely simple page with
the following:

<?lc
put $_GET["page"] into tPage -- a get request TO my pageof the form ?page=
http://url.goes.here
 put char 1 to 6000 of url tpage  -- request the page to be scraped and
return the first 6000 chars

?>
To use this is a simple--  get URL "
http://path.to.my.page.com/scrape.lc?page=http://server.to.scrape.com/pagetoscrape.html
"

if the page to be scraped uses a get style request, it will might be better
to use post instead.

In this way you can use a server on a hot connect to do the heavy lifting
and then just send the results back down.  In fact, you could probably have
the server itself do the scraping and just return any final results (or pop
the results into a database or whatever)  Also in fact, if you have enough
control of the server, and need to scrape the same page over and over for
changes you could most likely set up a cronjob to do the work and a front
end to pull the results.  (don't know what your final objective is, so hard
to say whats best)



On Wed, Dec 13, 2017 at 6:39 AM, Roger Eller via use-livecode <
use-livecode at lists.runrev.com> wrote:

> I have a webpage that I grab with LiveCode, then parse out what I need.
> The data I keep is within the first 1/4th of the page.
>
> Rather than loading the entire page into a variable or a browser object,
> how can I load just the portion that I need and then stop the transmission
> instead of wasting the time and bandwidth to load the entire page?
>
> ~Roger
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list