regex question

J. Landman Gay jacque at hyperactivesw.com
Sun May 18 13:21:38 EDT 2008


jbv wrote:
> Hi list,
> 
> Does anyone have a regex to remove any text between parenthesis
> (including the
> parenthesis) ?
> 
> I've tried many options, but none seems to work perfectly...
> 
> For instance :
> "\(\)"    removes only parenthesis with no text inside
> 
> I've also tried this : "\([a-z 0-9 ]+\)", but no luck...

In my RevLive presentation I addressed this kind of processing, and 
showed the results of a number of speed tests. It turns out that a 
"repeat for each" loop is almost 200 times faster than a regex 
expression. My test did something very similar to what you want to do; 
in my case, I was removing all html from a web page but it would be just 
as easy to substitute parentheses for the "<" and ">" characters I was 
looking for.

The key is using the offset function along with its "skip" paramenter to 
find the first character (left parentheses in your example), then 
getting the offset of the second character (right parentheses) and 
extracting the data around it.

Here is my test example which should be easy for you to modify:

function removeRepeat pData
   repeat for each line l in pData
     put 0 into tSkip
     repeat
       put offset("<",l,tSkip) into tStart
       if tStart = 0 then exit repeat
       put char tSkip+1 to tSkip+tStart-1 of l & space after tNewData
       add tStart to tSkip
       put offset(">",l,tSkip) into tEnd
       if tEnd = 0 then exit repeat
       add tEnd to tSkip
     end repeat
     put cr after tNewData
   end repeat
   filter tNewData without empty
   return tNewData
end removeRepeat

-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com



More information about the use-livecode mailing list