problem with counting words
Peter M. Brigham
pmbrig at gmail.com
Fri Oct 17 15:39:48 EDT 2014
Continuing the extension of itemdelimiter…. Now Richard's original nDepth() function can be expanded by using getItem(), so you can get nested "items" with different "itemdelimiters" (any or all of which may be text strings instead of single characters). I renamed Richard's function to something more easily remembered:
function getNestedItems
-- allows specifying different delimiters to extract chunks
-- eg put "aaa,bbb,ccc#ddd#eee#fff,ggg,hhh" into tSomeData
-- put getNestedITems(tSomeData,3,comma,2,"#")
-- returns "ddd" -- #-delim item 2 of comma-delim item 3
-- based on a function by Richard Gaskin, use-LC list, originally named nDepth()
-- expanded to use getItem(), which allows itemdelimiters of more than one char
-- so you could do this:
-- put "a//b//1,2,3,4,5//d//e" into tData
-- put getNestedItems(tData,3,"//",4,comma) -> 4
-- requires getItem(), getDelimiters()
put param(1) into workingString
repeat with i = 2 to paramcount()-1 step 2
put getItem(workingString,param(i),param(i+1)) into workingString
end repeat
return workingString
end getNestedItems
function getItem tList,tIndex,tDelim
-- returns item # tIndex of tList, given itemdelimiter = tDelim
-- could just "get item tIndex of tList" in the calling handler but
-- then have to set and restore the itemDelimiter, so this is less hassle
-- defaults to tDelim = comma
-- allows tDelim to be a string of characters
-- so you could do this:
-- getItem("a//b//c//d//e//f",4,"//") -> d
-- requires getDelimiters()
if tDelim = empty then put comma into tDelim
put getDelimiters(tList) into workingDelim
replace tDelim with workingDelim in tList
set the itemdelimiter to workingDelim
return item tIndex of tList
end getItem
function getDelimiters tText,nbr
-- returns a cr-delimited list of <nbr> characters
-- not found in the variable tText
-- use for delimiters for, eg, parsing text files, loading arrays, etc.
-- usage: put getDelimiters(tText,2) into tDelims
-- put line 1 of tDelims into lineDivider
-- put line 2 of tDelims into itemDivider
-- etc.
if nbr = empty then put 1 into nbr -- default 1 delimiter
put "2,3,4,5,6,7,8,16,17,18,19,20" into baseList
-- could use other non-standard ASCII values
put the number of items of baseList into maxNbr
if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
repeat with tCount = 1 to nbr
put true into failed
repeat with i = 1 to the number of items of baseList
put item i of baseList into testNbr
put numtochar(testNbr) into testChar
if testChar is not in tText then
-- found one, store and get next delim
put false into failed
put testChar into line tCount of delimList
exit repeat
end if
end repeat
if failed then
put the number of lines of delimList into nbrFound
if nbr = 1 then
return "Cannot get delimiter!"
else if nbrFound = 0 then
return "Cannot get any delimiters!"
else
return "Can only get" && nbrFound && "delimiters!"
end if
end if
delete item i of baseList
end repeat
return delimList
end getDelimiters
This should simplify complex text parsing considerably.
-- Peter
Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig
On October 17, 2014, I wrote:
> Re extending the itemdelimiter keyword... The handler I posted for getItem() is easily modified to allow the itemdel to consist of a string of characters, so you could do:
> getItem("a//b//c//d//e//f",4,"//") -> d
>
> So while we're waiting for the language to be expanded, you could use this:
>
> function getItem tList,tIndex,tDelim
> -- returns item # tIndex of tList, given itemdelimiter = tDelim
> -- could just "get item tIndex of tList" in the calling handler but
> -- then have to set and restore the itemDelimiter, so this is less hassle
> -- defaults to tDelim = comma
> -- also allows tDelim to be a string of characters !
>
> -- requires getDelimiters()
>
> if tDelim = empty then put comma into tDelim
> put getDelimiters(tList) into workingDelim
> replace tDelim with workingDelim in tList
> set the itemdelimiter to workingDelim
> return item tIndex of tList
> end getItem
>
> function getDelimiters tText,nbr
> -- returns a cr-delimited list of <nbr> characters
> -- not found in the variable tText
> -- use for delimiters for, eg, parsing text files, loading arrays, etc.
> -- usage: put getDelimiters(tText,2) into tDelims
> -- put line 1 of tDelims into lineDivider
> -- put line 2 of tDelims into itemDivider
> -- etc.
>
> if nbr = empty then put 1 into nbr -- default 1 delimiter
> put "2,3,4,5,6,7,8,16,17,18,19,20" into baseList
> -- could use other non-printing ASCII values
> put the number of items of baseList into maxNbr
> if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
> repeat with tCount = 1 to nbr
> put true into failed
> repeat with i = 1 to the number of items of baseList
> put item i of baseList into testNbr
> put numtochar(testNbr) into testChar
> if testChar is not in tText then
> -- found one, store and get next delim
> put false into failed
> put testChar into line tCount of delimList
> exit repeat
> end if
> end repeat
> if failed then
> put the number of lines of delimList into nbrFound
> if nbr = 1 then
> return "Error: cannot get delimiter!"
> else if nbrFound = 0 then
> return "Error: cannot get any delimiters!"
> else
> return "Error: can only get" && nbrFound && "delimiters!"
> end if
> end if
> delete item i of baseList
> end repeat
> return delimList
> end getDelimiters
>
More information about the use-livecode
mailing list