Small regex project for pay [CLOSED]

Peter M. Brigham pmbrig at gmail.com
Thu Mar 17 10:23:14 EDT 2016


On Mar 17, 2016, at 8:20 AM, David Bovill wrote:

> Hi Peter, any chance of sharing it?

Sure. Below is the offsets function that returns all the offsets of a string in a container. Then all you have to do is something like this:

function getStringChunks pSearchStr,pText,beginsWholeWord,endsWholeWord
   if beginsWholeWord = empty then put false into beginsWholeWord
   if endsWholeWord = empty then put false into endsWholeWord
   -- default to simple offsets, not whole word offsets
   put offsets(pSearchStr,pText) into offSts
   replace comma with cr in offSts
   put len(pSearchStr) into strLen
   put cr & space & tab & " " into wSpace
   -- include non-breaking space
   repeat for each line i in offSts
      put char i-1 of pText into charBefore
      put char i+strLen of pText into charAfter
      if beginsWholeWord and not (charBefore is in wSpace) then next repeat
      if endsWholeWord and not (charAfter is in wSpace) then next repeat
      put i & comma & (i+strLen-1) & cr after outList
   end repeat
   return line 1 to -1 of outList
end getStringChunks

Pass beginsWholeWord = true and endsWholeWord = true for wholeMatches.
Might not be really fast for pText of 100K+ characters, but should be quite efficient on smaller texts. Often LC's chunking functions are faster than regex anyway.

---------

function offsets str, pContainer
   -- returns a comma-delimited list of all the offsets of str in pContainer
   -- returns 0 if not found
   -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, overlapping offsets are not counted
   -- note: to get the last occurrence of a string in a container (often useful)
   --     use "item -1 of offsets(...)"
   
   if str is not in pContainer then return 0
   put 0 into startPoint
   repeat
      put offset(str,pContainer,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      add length(str)-1 to startPoint
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig





More information about the use-livecode mailing list