Pattern Matching in Livecode
Peter M. Brigham
pmbrig at gmail.com
Wed Nov 9 10:38:39 EST 2016
On Nov 9, 2016, at 7:46 AM, Alejandro Tejada <capellan2000 at gmail.com> wrote:
>
> Hi all,
>
> Recently I made a very long script for searching 3 words among a list
> of 1080 lines of words.
>
> Download the zipped stack MatchingPatternsv02
> from this forum thread:
> http://forums.livecode.com/viewtopic.php?f=7&t=28288
>
> I suspect that LiveCode provides better tools for this task, but I
> don't know which are and how to use them. Maybe a simpler solution is
> to employ a regex, arrays operation or a really clever handler.
>
> How many different methods (functions and commands) provides Livecode
> to make this task of comparing and finding 3 words (taken from a list
> of 12 words) among 1080 lines of 4 words?
Here’s one way, using some text-munching utility functions. The following is a longish list of handlers, but as you can see the basic function find3words() is pretty compact. Once you have the utilities in place in a library, they can be used in all different kinds of contexts to shortcut things.
You might be able to do this with a Regex expression, I don’t know, I’m allergic to Regex. I like to work in pure LC.
— Peter
Peter M. Brigham
pmbrig at gmail.com
--------
— the following function is not tested:
function find3words pList, pWords
repeat with w = 1 to 3
put line offsets(word w of pWords,pList) into A[w]
end repeat
put A[1] into f1
put A[2] into f2
put A[3] into f3
put intersectLIsts(f1,f2) into out1
put intersectLIsts(out1,f3) into outlist
— now have to check that we have found whole items, not just part of a word
repeat for each item i in outlist
repeat with w = 1 to 3
if not (item w of pWords is among the items of line i of pList) then next repeat
put i & comma after finalList
end repeat
end repeat
if finalList = empty then put 0 into finalList
return item 1 to -1 of finalList
end find3words
function lineOffsets str, pContainer, matchWhole
-- returns a comma-delimited list of all the lineOffsets of str
-- in pContainer
-- if matchWhole = true then only whole lines are located
-- else finds line matches everywhere str is part of a line in pContainer
-- duplicates are stripped out
-- note: to get the last lineOffset of a string in a container (often useful)
-- use "item -1 of lineOffsets(...)"
-- requires offsets()
if matchWhole = empty then put false into matchWhole
put offsets(str,pContainer) into offList
if offList = "0" then return "0"
repeat for each item i in offList
put the number of lines of (char 1 to i of pContainer) into lineNbr
if matchWhole then
if line lineNbr of pContainer <> str then next repeat
end if
put 1 into A[lineNbr]
-- using an array avoids duplicates
end repeat
put the keys of A into lineList
sort lines of lineList ascending numeric
replace cr with comma in lineList
return lineList
end lineOffsets
function offsets str, pContainer
-- returns a comma-delimited list of all the offsets of str in pContainer
-- returns 0 if str is not found
-- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
-- ie, overlapping offsets are not counted
-- note: to get the last occurrence of a string in a container (often useful)
-- use "item -1 of offsets(...)"
if str is not in pContainer then return 0
put 0 into startPoint
repeat
put offset(str,pContainer,startPoint) into thisOffset
if thisOffset = 0 then exit repeat
add thisOffset to startPoint
put startPoint & comma after offsetList
add length(str)-1 to startPoint
end repeat
return item 1 to -1 of offsetList -- delete trailing comma
end offsets
function intersectLists listA, listB, pDelim
-- returns the intersection of two lists, ie., a list of items/lines common to both
-- if pDelim = empty then looks first for the presence of cr in the lists,
-- if found, defaults to cr as the delimiter
-- if no cr found, looks for the presence of comma in the lists,
-- if found, defaults to comma as the delimiter
-- if neither found, returns empty (user should have specified another delim)
-- order of items may be changed, result may require sorting
-- by Peter M. Brigham, pmbrig at gmail.com — freeware
-- the idea of using "split tArray with pDelim and pDelim"
-- comes from Peter Hayworth on the use-LC list -- it's very clever!
-- requires getDelimiters(), noDupes
if listA = empty or listB = empty then return empty
if pDelim = empty then
if listA & listB contains cr then
put cr into pDelim
else if listA & listB contains comma then
put comma into pDelim
else
return empty
end if
end if
noDupes listA,pDelim
noDupes listB,pDelim
put getDelimiters(listA & listB) into tempDelim
if tempDelim begins with "Error" then return "Error in getDelimiters()"
split listA with pDelim and pDelim
split listB with pDelim and pDelim
intersect listA with listB
combine listA with pDelim and tempDelim
replace tempDelim with empty in listA
return listA
end intersectLists
function getDelimiters pText, nbrNeeded
-- returns a cr-delimited list of <nbrNeeded> characters
-- none of which are found in the variable pText
-- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
-- usage: put getDelimiters(pText,2) into tDelims
-- if tDelims begins with "Error" then exit to top -- or whatever
-- put line 1 of tDelims into lineDivider
-- put line 2 of tDelims into itemDivider
-- etc.
if pText = empty then return "Error: no text specified."
if nbrNeeded = empty then put 1 into nbrNeeded -- default 1 delimiter
put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
-- low ASCII values, excluding CR, LF, tab, etc.
put the number of items of baseList into maxNbr
if nbrNeeded > maxNbr then return "Error: max" && maxNbr && "delimiters."
repeat for each item testCharNbr in baseList
put numtochar(testCharNbr) into testChar
if testChar is not in pText then
-- found one, store and get next delim
put testChar & cr after delimList
if the number of lines of delimList = nbrNeeded
then return line 1 to -1 of delimList
-- done
end if
end repeat
-- if we got this far, there was an error
put the number of lines of delimList into totalFound
if totalFound = 0 then
return "Error: cannot get any delimiters."
else if totalFound = 1 then
return "Error: can only get 1 delimiter."
else
return "Error: can only get" && totalFound && "delimiters."
end if
end getDelimiters
on noDupes @pList, pDelim
-- strips duplicate (and empty) lines/items from a list
-- note: pList is referenced, so the original list will be changed.
-- if pDelim = empty then looks first for the presence of cr in pList,
-- if found, defaults to cr as the delimiter
-- if no cr found, looks for the presence of comma in pList,
-- if found, defaults to comma as the delimiter
-- if neither found, exits without changing pList
-- (user should have specified another delim)
-- note: the order of the list will likely be changed, may require sorting
-- note: the split command is inherently case-sensitive
-- (irrespective of the value of the caseSensitive property),
-- so "Chuck" and "chuck" will not be considered duplicates
-- if you need case insensitive, use the noDupes() function instead
-- this command scales better with very large lists than noDupes()
-- note: pDelim could be a string of characters, so you could do:
-- put "apple and orange and pear and orange and banana and apple" into pList
-- noDupes pList," and "
-- after which pList will be: "pear and banana and apple and orange"
-- thanks to Peter Hayworth of the use-LC mailing list --
-- the idea of using "split tArray with pDelim and pDelim" is very clever!
-- adjusted by Peter M. Brigham, pmbrig at gmail.com
-- requires getDelimiters()
if pDelim = empty then
if cr is in pList then
put cr into pDelim
else if comma is in pList then
put comma into pDelim
else
answer "noDupes: no delimiter specified" as sheet
exit noDupes
end if
end if
put getDelimiters(pList) into tempDelim
replace pDelim with tempDelim in pList
split pList by tempDelim and tempDelim
put the keys of pList into pList
filter pList without empty
replace cr with pDelim in pList
end noDupes
More information about the use-livecode
mailing list