Small regex project for pay [CLOSED]
Peter M. Brigham
pmbrig at gmail.com
Thu Mar 17 10:23:14 EDT 2016
On Mar 17, 2016, at 8:20 AM, David Bovill wrote:
> Hi Peter, any chance of sharing it?
Sure. Below is the offsets function that returns all the offsets of a string in a container. Then all you have to do is something like this:
function getStringChunks pSearchStr,pText,beginsWholeWord,endsWholeWord
if beginsWholeWord = empty then put false into beginsWholeWord
if endsWholeWord = empty then put false into endsWholeWord
-- default to simple offsets, not whole word offsets
put offsets(pSearchStr,pText) into offSts
replace comma with cr in offSts
put len(pSearchStr) into strLen
put cr & space & tab & " " into wSpace
-- include non-breaking space
repeat for each line i in offSts
put char i-1 of pText into charBefore
put char i+strLen of pText into charAfter
if beginsWholeWord and not (charBefore is in wSpace) then next repeat
if endsWholeWord and not (charAfter is in wSpace) then next repeat
put i & comma & (i+strLen-1) & cr after outList
end repeat
return line 1 to -1 of outList
end getStringChunks
Pass beginsWholeWord = true and endsWholeWord = true for wholeMatches.
Might not be really fast for pText of 100K+ characters, but should be quite efficient on smaller texts. Often LC's chunking functions are faster than regex anyway.
---------
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often useful)
-- use "item -1 of offsets(...)"
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
-- Peter
Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig
More information about the use-livecode
mailing list