regex question in matchChunk function

Peter Brigham MD pmbrig at gmail.com
Tue Dec 15 17:21:29 EST 2009


Here is one way. These are utility functions I use constantly for text  
processing. Offsets(str,cntr) returns a comma-delimited list of all  
the offsets of str in ctnr. Lineoffsets(str,cntr) does the same with  
lineoffsets. Then you can interate over the list of offsets to do  
whatever you want to each instance of str in cntr. I keep them in a  
utility stack that is in the stackinuse, so it is available to all  
stacks. I don't use regex, as I have never gotten the regex syntax to  
stick in my head firmly enough to find it natural, and in any case  
doing it by script turns out to be as fast or faster.

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig

---------

function offsets str,cntr
    -- returns a comma-delimited list of
    -- all the offsets of str in cntr
    put "" into oList
    put 0 into startPoint
    repeat
       put offset(str,cntr,startPoint) into os
       if os = 0 then exit repeat
       add os to startPoint
       put startPoint & "," after oList
    end repeat
    if char -1 of oList = "," then delete last char of oList
    if oList = "" then return "0"
    return mosList
end offsets

function lineOffsets str,cntr
    -- returns a comma-delimited list of
    -- all the lineoffsets of str in cntr
    put offsets(str,cntr) into charList
    if charList = "0" then return "0"
    put the number of items of charList into nbr
    put "" into mlo
    repeat for each item n in charList
       put the number of lines of (char 1 to n of cntr) \
                & "," after oList
    end repeat
    if char -1 of oList = "," then delete char -1 of oList
    return oList
end lineOffsets

---------

On Dec 15, 2009, at 1:46 PM, Chris Sheffield wrote:

> I am not very familiar with regular expressions, and I'm wondering  
> if someone more knowledgeable could give me a hint as to how to  
> accomplish this.
>
> Given a passage of text, I need to find every instance of certain  
> words within that text and draw a box around them. The box drawing I  
> can handle just fine by including "box" in the textStyle of the  
> found chunk. But it's finding the instances that I'm struggling  
> with. Here is my code. Big warning! This should not be run as is, if  
> anyone wants to attempt it. The second repeat will go forever.
>
> repeat for each line tWord in tDiffWords
>        repeat until matchChunk(tStoryText, "(?i)\b(" & tWord & ") 
> \b", tStartChar, tEndChar) is false
>
>            put the textStyle of char tStartChar to tEndChar of fld  
> "StoryText" into tStyle
>            if tStyle is empty or tStyle is "plain" then
>                put "box" into tStyle
>            else
>                put comma & "box" after tStyle
>            end if
>            set the textStyle of char tStartChar to tEndChar of fld  
> "StoryText" to tStyle
>
>        end repeat
>    end repeat
>
> What I need is some way to use the matchChunk function and continue  
> the search where the last search ended. I read through some regex  
> documentation and came across "\G", but this doesn't seem to work in  
> Rev. But maybe I'm not putting it in the right place in my search  
> string.
>
> Can anyone help? Is there a way to do this? Or can someone recommend  
> another method of accomplishing the same thing? Keep in mind that  
> this needs to search whole words in a story passage, and we're  
> dealing with all kinds of punctuation here, including hyphens, em  
> dashes, etc.
>
> Thanks,
> Chris
>
> --
> Chris Sheffield
> Read Naturally, Inc.
> www.readnaturally.com
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list