Getting the text content of a HTML page

Sarah Reichelt sarah.reichelt at gmail.com
Sun Aug 3 04:56:31 EDT 2008


On Sun, Aug 3, 2008 at 12:31 AM, H Baric <hbaric at gmail.com> wrote:
> Hi again *blush*
>
> Okay, this is no doubt something very simple even though I've searched through the docs but can't find exactly how to do this seemingly straightforward task:
>
> * Get the text only from a web page - no html tags, no formatting etc.
>
> I can get the html doc to appear in my field by using:
>
>   put url "http://www.thePage.com" into thePage
>
>   put thePage into field "The Page"
>
>
> (is that correct?) If so, now what? :D

Hi Heather,

Welcome to the Revolution and please don't feel bad about asking
questions. It's great when people ask beginner level questions as I
think a lot of beginners don't like to ask and so get discouraged.

Your script for getting the contents of a web page is perfect.

For transforming that to plain text, there is a trick which may work
if the web page is not too complex. Try this:

put url "http://www.thePage.com" into thePage
set the htmlText of the templateField to thePage
put the text of the templateField into field "The Page"

Cheers,
Sarah



More information about the use-livecode mailing list