Getting the text content of a HTML page
H Baric
hbaric at gmail.com
Mon Aug 4 07:01:17 EDT 2008
Thanks Viktoras,
Can you (or if anyone else reading has a moment) please help me understand
more about what each part of the is the "</?[A-Za-z]+>" is about? Actually,
mostly, what the "/?" bit does? (I was reading the manual yesterday on
this, and am confused because I thought "/" before a character means the
exact literal character that follows it? Or is that the backslash? Haven't
got it open to check).
And, the "?" I thought is for just one character, unlike the "*" which is
for multiple/any?
And lastly, does everything in between "[" and "]" mean any combination of
any letter in that range? In that case, what else, in other circumstances,
could be used within the square brackets? (examples?) eg 1-9? 3-7? A-M?
p-w?! What about anything else? Like another expression? Or are there set
ummm... "arguments" (is that the term) that can only be used there?
Thanks in advance if you or anyone else who has some time to give me a bit
of an explanation / mini tute! The docs are great, but sometimes I wish they
were more in depth including more examples and possibilities etc.
As a beginner, sometimes I read one thing and don't realise there's actually
a whole universe under that scratch of the surface. And I know, as a
beginner, I'm not ready to NEED most of what is there, but knowing more
(again, more actual examples with explanations) helps to put it into
perspective and understand the hows and whys of it. If that makes any sense!
Which is why RevOnline is great, as are the forums, the examples and
workshops, and ofcourse this group!
:)
Cheers,
Heather
----- Original Message -----
From: "viktoras didziulis" <viktoras at ekoinf.net>
Subject: Re: Getting the text content of a HTML page
one more way to do things using regular expressions:
put the replaceText(myText,"</?[A-Za-z]+>","") into myText
will simply replace all tags with empty string. Where myText is the text
where replacements have to be made. </?[A-Za-z]+> is a regular
expression matching most html tags and "" is empty replacement string.
Viktoras
More information about the use-livecode
mailing list