Selecting text using REGEX

Mark Brownell gizmotron at earthlink.net
Sun Sep 28 11:06:01 EDT 2003


On Friday, September 26, 2003, at 09:16  PM, Bojsza wrote:

> The text is always between "value=question?>"  and ends with "<" #the 
> quotes are not part of the text
>
> example
>
> value=question?> TEXT I WISH TO PARSE OUT<
>
> Any suggestions would be helpful (I have several hundred lines to 
> search through).

Hi Bojsza,

This looks like a case for a pull-parser...

Your code looks like part of a fuller tag set that has the attribute, 
"value," that always appears at the end of the start tag.
Example: <grabTag value=question?> TEXT I WISH TO PARSE OUT</grabTag>

but your "value=question?>" fragment could be part of several different 
tag sets.
Example: <grabTag value=question?> TEXT I WISH TO PARSE OUT</grabTag>
Example: <findTag value=question?> TEXT I WISH TO PARSE OUT</findTag>
Example: <dumpTag value=question?> TEXT I WISH TO PARSE OUT</dumpTag>

Your tagging system requires a well formed component to it in that 
there can't be some other tag ending before your parsing technique 
encounters the correct instance of "<".
Example: value=question?> TEXT I WISH TO </b> PARSE OUT<

I would use a pair of offSet() queries to build an array of results. 
This array would end up being keyed numerically with the first instance 
of the fragmented tag set being keyed as 1.

If it turns out that you are using different full tag set names and 
need to tell them from each other then you should add a way to combine 
the numerical value and the tag name while keying the array.

pull-parser:

   put the text of field "targetText" into tText
   put empty into tArray

   put 0 into tStart1
   put 1 into tElementNum
   put the number of chars in "value=question?>" into dChars
   repeat
     put offset("value=question?>", tText, tStart1) into tNum1
     put (tNum1 + tStart1) into tStart1
     if tNum1 < 1 then exit repeat
     put offset("<", tText, tStart1) into tNum2
     put (tNum2 + tStart1) into tStart2
     if tNum2 < 1 then exit repeat
     put char (tStart1 + dChars) to (tStart2 - 1) of tText into zapped
     put zapped into tArray[tElementNum]
     add 1 to tElementNum
   end repeat

You will get an array, tArray, that is either empty or is filled with 
results.

There is probably a regEx way but I have found that this tends to be 
faster in most speed tests.

Mark




More information about the use-livecode mailing list