Getting the text content of a HTML page

H Baric hbaric at gmail.com
Sun Aug 3 12:52:18 EDT 2008


Ahhhh, thanks Mark. A bit scattered am I at times... okay most times... but 
your script makes perfect sense now and I'm sure it will work. Haven't tried 
it again yet though - next on Experiments-To-Do!

I've been having fun using bits and pieces from everyone's offerings the 
last couple of days, and I really have learned so much more today, with some 
previously headscratcher concepts *starting* to click, so thanks again 
everyone :)

Another question though before I close this bloomin HTML thing and do 
something different (like my abandoned Notepad app).

How do I delete lines (in the resulting text extracted from the html page) 
that contain a url in them? They appear by themselves on separate lines...

I'm currently trying this but really, as you can see, it don't work, and 
that I am confuzzled still by how some things do/are supposed to work:

-- delete lines with a url displayed

repeat for each line tLine in field "thePage" -- or is it better/more 
efficient to do this while still in a variable prior to putting into field?

if "http://" is in tLine then

put replacetext(tLine,"http://*","") -- HALP!

end if

end repeat

MEHHHH. LMAO!! Or should it be done with a different script altogether?

--- Side Question: When you declare "tLine" in the repeat line, what [does | 
will] this variable contain?! Every line in the field? Or just lines as per 
rest of script, or, some, thing!? (one of those headscratchers)

Cheers,
Heather

-----

Heather,

My little example filters out all line that have a < followed by a >.

I thought you were putting returns before the <'s and after the >'s,
which should make the example work, since all tags would be on their
own separate line, if the site doesn't contain any mistakes.

--
Best regards,

Mark Schonewille

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http://www.salery.biz

Benefit from our inexpensive hosting services. See 
http://economy-x-talk.com/server.html
  for more info.

On 3 aug 2008, at 12:29, H Baric wrote:

> Thanks for helping me make sense of that Mark :)
>
> I tested it out on my project, and it filters out everything in the
> entire
> field!
>
> I thought I knew why, as the field displays:
> <table><tr><td>This is a table. Can you see this text?</td></tr></
> table>
>
> Everything IS in between the < and >.
>
> But then I tried adding text outside the table tags, and it still
> filtered
> out them too. Resulting in empty field!
>
> :D
>
> *scratches head*
>
> Cheers,
> Heather
>
> -----
>
> Hi Heather,
>
> This ought to be
>
> filter myLines without "*<*>*"
>
> --
> Best regards,
>
> Mark Schonewille
>
> Economy-x-Talk Consulting and Software Engineering
> http://economy-x-talk.com
> http://www.salery.biz
>
> Benefit from our inexpensive hosting services. See
> http://economy-x-talk.com/server.html
>  for more info.
>
> On 3 aug 2008, at 10:26, H Baric wrote:
>
>> (Re my last post), I guess what I want to know is, how to use the
>> info in
>> the FILTER and REGULAR EXPRESSIONS etc for Processing Text / Data,
>> when
>> there are other characters ***besides A-Z and 0-9***
>>
>> I can't find any examples that use other characters, they all use
>> letters
>> and numbers! Everything I try doesn't work :( So I'm concluding it's
>> not
>> possible? Which can't be...
>>
>> Halp?!
>>
>> Can someone write a line for me using the filter:
>> filter container {with | without} wildcardExpression
>>
>> but using another character(s) not just letters and numbers?
>>
>> Thanks again,
>> Heather
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
use-revolution at lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution 




More information about the use-livecode mailing list