HTML Tags and muliline regular expresions.

Jim Ault JimAultWins at yahoo.com
Thu Aug 10 10:43:05 EDT 2006


On 8/10/06 4:06 AM, "David Bovill" <david at openpartnership.net> wrote:
> Super cool reply!
> 
> It's often that way isn't it - blinded by science. That is just because you
> do not really understand something (regular expressions in this case) - you
> are over impressed by them.
> 
> I will give this a try. NB - any problems (given it's limitations) with this
> method you've come across would be good to know. I have run into problems
> with filter before when using it to fliter tables - which got very slow when
> using a few "*" - for instance?

One big caveat with filter -- It does not handle null characters.
Do 'replace null with empty' if you don't know the source of your incoming
data.  This char is usually used to mark the end of a file or memory
variable in RAM space, or reserved for special meanings by every
application.  Opening formatted data files directly in Rev can have nulls.

You can test for nulls by doing....

replace cr with empty in textBlock
replace null with cr in textBlock
--more than one line means nulls are present

As far a speed, you do have to consider the job at hand and how often it
needs to be run, as well as the user experience.  Do they really have to
wait while a set of filter operations is done?

I believe if you try to make your first filter rules positive (inclusion)
and operating on the beginning of the line, it will work fastest. Thus

filter textBlock with "<*"
--will cut the number of text lines that have to be considered next
put ("*"&tab) into C
filter textBlock without  ("*img*"&C&C&C&C&"jpg")
--means a lot of repeat loop cycles for the filter function
--the same goes for regEx, or any text parser.

Moral:  try to chop down as much of the textBlock as possible using simple
rules, then get intricate.  And if possible, structure the data so that it
works better with the filter command, (or regEx, or other text parser)

repeat 1 time
   filter textBlock with "<*"
   if textBlock is empty then exit repeat -- we're done
   filter textBlock with "*>"
   if textBlock is empty then exit repeat -- we're done
   filter textBlock with "*img"
   if textBlock is empty then exit repeat -- we're done
   filter textBlock with "*>*<font"
   if textBlock is empty then exit repeat -- we're done
      --you get the idea.. why have the loop do extra work?
end repeat
--textBlock now has the residue that matches the rules

Hope this helps

Jim Ault
Las Vegas





More information about the use-livecode mailing list