Getting the text content of a HTML page

H Baric hbaric at gmail.com
Mon Aug 4 07:01:17 EDT 2008


Thanks Viktoras,

Can you (or if anyone else reading has a moment) please help me understand 
more about what each part of the is the "</?[A-Za-z]+>" is about?  Actually, 
mostly, what the "/?" bit does?  (I was reading the manual yesterday on 
this, and am confused because I thought "/" before a character means the 
exact literal character that follows it? Or is that the backslash? Haven't 
got it open to check).

And, the "?" I thought is for just one character, unlike the "*" which is 
for multiple/any?

And lastly, does everything in between "[" and "]" mean any combination of 
any letter in that range? In that case, what else, in other circumstances, 
could be used within the square brackets? (examples?) eg 1-9? 3-7? A-M? 
p-w?! What about anything else? Like another expression? Or are there set 
ummm... "arguments" (is that the term) that can only be used there?

Thanks in advance if you or anyone else who has some time to give me a bit 
of an explanation / mini tute! The docs are great, but sometimes I wish they 
were more in depth including more examples and possibilities etc.

As a beginner, sometimes I read one thing and don't realise there's actually 
a whole universe under that scratch of the surface. And I know, as a 
beginner, I'm not ready to NEED most of what is there, but knowing more 
(again, more actual examples with explanations) helps to put it into 
perspective and understand the hows and whys of it. If that makes any sense! 
Which is why RevOnline is great, as are the forums, the examples and 
workshops, and ofcourse this group!

:)

Cheers,
Heather

----- Original Message ----- 
From: "viktoras didziulis" <viktoras at ekoinf.net>
Subject: Re: Getting the text content of a HTML page


one more way to do things using regular expressions:

put the replaceText(myText,"</?[A-Za-z]+>","") into myText

will simply replace all tags with empty string. Where myText is the text
where replacements have to be made. </?[A-Za-z]+> is a regular
expression matching most html tags and "" is empty replacement string.

Viktoras 




More information about the use-livecode mailing list