problem with counting words

Peter M. Brigham pmbrig at gmail.com
Fri Oct 17 15:39:48 EDT 2014


Continuing the extension of itemdelimiter…. Now Richard's original nDepth() function can be expanded by using getItem(), so you can get nested "items" with different "itemdelimiters" (any or all of which may be text strings instead of single characters). I renamed Richard's function to something more easily remembered:

function getNestedItems
   -- allows specifying different delimiters to extract chunks
   -- eg  put "aaa,bbb,ccc#ddd#eee#fff,ggg,hhh" into tSomeData
   --     put getNestedITems(tSomeData,3,comma,2,"#")
   -- returns "ddd" -- #-delim item 2 of comma-delim item 3
   -- based on a function by Richard Gaskin, use-LC list, originally named nDepth()
   -- expanded to use getItem(), which allows itemdelimiters of more than one char
   -- so you could do this:
   --    put "a//b//1,2,3,4,5//d//e" into tData
   --    put getNestedItems(tData,3,"//",4,comma) -> 4
   
   -- requires getItem(), getDelimiters()
   
   put param(1) into workingString
   repeat with i = 2 to paramcount()-1 step 2
      put getItem(workingString,param(i),param(i+1)) into workingString
   end repeat
   return workingString
end getNestedItems

function getItem tList,tIndex,tDelim
   -- returns item # tIndex of tList, given itemdelimiter = tDelim
   -- could just "get item tIndex of tList" in the calling handler but
   --    then have to set and restore the itemDelimiter, so this is less hassle
   -- defaults to tDelim = comma
   -- allows tDelim to be a string of characters
   --    so you could do this:
   --    getItem("a//b//c//d//e//f",4,"//") -> d
   
   -- requires getDelimiters()
   
   if tDelim = empty then put comma into tDelim
   put getDelimiters(tList) into workingDelim
   replace tDelim with workingDelim in tList
   set the itemdelimiter to workingDelim
   return item tIndex of tList
end getItem

function getDelimiters tText,nbr
   -- returns a cr-delimited list of <nbr> characters
   --    not found in the variable tText
   -- use for delimiters for, eg, parsing text files, loading arrays, etc.
   -- usage: put getDelimiters(tText,2) into tDelims
   --        put line 1 of tDelims into lineDivider
   --        put line 2 of tDelims into itemDivider
   --             etc.
   
   if nbr = empty then put 1 into nbr -- default 1 delimiter
   put "2,3,4,5,6,7,8,16,17,18,19,20" into baseList
   -- could use other non-standard ASCII values
   put the number of items of baseList into maxNbr
   if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
   repeat with tCount = 1 to nbr
      put true into failed
      repeat with i = 1 to the number of items of baseList
         put item i of baseList into testNbr
         put numtochar(testNbr) into testChar
         if testChar is not in tText then
            -- found one, store and get next delim
            put false into failed
            put testChar into line tCount of delimList
            exit repeat
         end if
      end repeat
      if failed then
         put the number of lines of delimList into nbrFound
         if nbr = 1 then
            return "Cannot get delimiter!"
         else if nbrFound = 0 then
            return "Cannot get any delimiters!"
         else
            return "Can only get" && nbrFound && "delimiters!"
         end if
      end if
      delete item i of baseList
   end repeat
   return delimList
end getDelimiters

This should simplify complex text parsing considerably.

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig


On October 17, 2014, I wrote:

> Re extending the itemdelimiter keyword... The handler I posted for getItem() is easily modified to allow the itemdel to consist of a string of characters, so you could do:
>   getItem("a//b//c//d//e//f",4,"//") -> d
> 
> So while we're waiting for the language to be expanded, you could use this:
> 
> function getItem tList,tIndex,tDelim
>   -- returns item # tIndex of tList, given itemdelimiter = tDelim
>   -- could just "get item tIndex of tList" in the calling handler but
>   --    then have to set and restore the itemDelimiter, so this is less hassle
>   -- defaults to tDelim = comma
>   -- also allows tDelim to be a string of characters !
> 
>   -- requires getDelimiters()
> 
>   if tDelim = empty then put comma into tDelim
>   put getDelimiters(tList) into workingDelim
>   replace tDelim with workingDelim in tList
>   set the itemdelimiter to workingDelim
>   return item tIndex of tList
> end getItem
> 
> function getDelimiters tText,nbr
>   -- returns a cr-delimited list of <nbr> characters
>   --    not found in the variable tText
>   -- use for delimiters for, eg, parsing text files, loading arrays, etc.
>   -- usage: put getDelimiters(tText,2) into tDelims
>   --        put line 1 of tDelims into lineDivider
>   --        put line 2 of tDelims into itemDivider
>   --             etc.
> 
>   if nbr = empty then put 1 into nbr -- default 1 delimiter
>   put "2,3,4,5,6,7,8,16,17,18,19,20" into baseList
>   -- could use other non-printing ASCII values
>   put the number of items of baseList into maxNbr
>   if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
>   repeat with tCount = 1 to nbr
>      put true into failed
>      repeat with i = 1 to the number of items of baseList
>         put item i of baseList into testNbr
>         put numtochar(testNbr) into testChar
>         if testChar is not in tText then
>            -- found one, store and get next delim
>            put false into failed
>            put testChar into line tCount of delimList
>            exit repeat
>         end if
>      end repeat
>      if failed then
>         put the number of lines of delimList into nbrFound
>         if nbr = 1 then
>            return "Error: cannot get delimiter!"
>         else if nbrFound = 0 then
>            return "Error: cannot get any delimiters!"
>         else
>            return "Error: can only get" && nbrFound && "delimiters!"
>         end if
>      end if
>      delete item i of baseList
>   end repeat
>   return delimList
> end getDelimiters
> 




More information about the use-livecode mailing list