tokens and parsing

David Bovill david.bovill at opn-technologies.com
Thu Jan 10 00:11:01 EST 2002


Don't know if these help. I use them for the function "replaceWords". The
function "knows" not to touch stuff inside quotes, etc... it's old, not
optimised for speed etc), and I haven't checked it thoroughly so use with
care:

------------------------------------------------------------------------
function wordDelim someChar
  -- version 9.0
  -- could be much better!
  put quote & space & return & "!?':;()[]{}<>,." into test
  if someChar is in test then
    return true
  else return false
end wordDelim

function replacWords someWord, someText, newWord, evenWithinQuotes,
startCharNum
  -- version replaceWords 9.1, 2/2/00
  
  put space & someText & space into testText
  put the number of chars of someWord into someWordLength
  repeat
    set the cursor to busy
    put offsetAfter(someWord, testText, startCharNum) into startCharNum
    put startCharNum + someWordLength - 1 into endCharNum
    put endCharNum + 1 into charNumAfterWord
    put startCharNum - 1 into charNumBeforeWord
    put wordDelim(char charNumAfterWord of testText) and wordDelim(char
charNumBeforeWord of testText) into wholeWord
    if startCharNum is 0 then
      delete char 1 of testText
      delete last char of testText
      return testText
    else if wholeWord is false then
      next repeat
    else
      if evenWithinQuotes is not true then
        get charToChunk(startCharNum, testText)
        put item 1 of it into wordNum
        put item 2 of it into itemNum
        put item 3 of it into lineNum
        put item 4 of it into charNumOfline
        put line lineNum of testText into testLine
        get char charNumOfline of testLine
        if withinQuotes(charNumOfline, testLine) is true then
          next repeat
        else
        end if
      else
      end if
      put newWord into char startCharNum to endCharNum of testText
      if newWord is empty then
        if char startCharNum of testText is in " " then
          delete char startCharNum of testText
        end if
      end if
    end if
  end repeat
end replaceWords 

function charToChunk wordOrCharNum, someText
  -- version latest (for old times sake)
  if wordOrCharNum is not a number then put offset(wordOrCharNum, someText)
into wordOrCharNum
  construct_LineChunk wordOrCharNum, someText, charOfLine, wordOfLine,
itemOfLine, lineNum
  return wordOfLine & "," & itemOfLine & "," & lineNum & "," & charOfLine
end charToChunk

on construct_LineChunk charNum, someText, @charOfLine, @wordOfLine,
@itemOfLine, @lineNum
  -- version latest,3/9/01
  get char 1 to charNum of someText
  if someText is empty then
    put 1 into lineNum
    put 1 into itemOfLine
    put 1 into wordOfLine
    put 1 into charOfLine
  else
    put the number of lines of it into lineNum
    put the number of items of line lineNum of it into itemOfLine
    put the number of words of item itemOfLine of line lineNum of it into
wordOfLine
    put the number of chars of line lineNum of it into charOfLine
  end if
end construct_LineChunk

function offsetAfter string, text, startChar
  -- version 9.0
  if startChar < 0 then put 0 into startChar
  delete char 1 to startChar of text
  get offset(string, text)
  if it is 0 then return 0
  else return it + startChar
end offsetAfter

function withinQuotes textOrOffset, someContainer
  -- version latest, 2/2/00
  
  put item 1 of textOrOffset into startCharNum
  put item 2 of textOrOffset into endCharNum
  if endCharNum is empty then put startCharNum into endCharNum
  if startCharNum is a number and endCharNum is a number then
  else
    put offset(textOrOffset, someContainer) into startCharNum
    put startCharNum + the length of textOrOffset - 1 into endCharNum
  end if
  
  if startCharNum is 0 then
    return "Not Found"
  else
    put char 1 to (startCharNum - 1) of someContainer into textBefore
    put countStrings(quote, textBefore) into quotesBefore
    put the number of chars of someContainer into lastCharNum
    put char (endCharNum + 1) to lastCharNum of someContainer into textAfter
    put countStrings(quote, textAfter) into quotesAfter
    if isOdd(quotesBefore) and quotesAfter >= 1 then
      return true
    else
      return false
    end if
  end if
end withinQuotes

------------------------------------------------------------------------

> From: "Ivers, Doug E" <Doug_Ivers at lord.com>
> Subject: RE:  tokens and parsing
> 
>> -----Original Message-----
>> From: Scott Raney [mailto:raney at metacard.com]
>> Subject: Re: popups and tokens (was "Digest-something")
>> 
>> 
> snip
>> 
>>> 2.
>>> Seems that the word parser is little more than an item
>> parser with the
>>> itemDelimiter set to " ",
>> 
>> It's a bit more, because it also skips multiple spaces, and tabs and
>> returns in addition to spaces (neither is possible with item chunks).
>> 
>>> except for the stupid behavior with quotes.  I
>>> would like a true word parser.  Or a parser for which we can specify
>>> multiple delimiters.  Like a java token function.  What is
>> the best/fastest
>>> way to parse words even in the presence of quotes and punctuation?
>> 
>> Use "token" chunks (e.g., "repeat for each token t in
>> <somecontainer>").
>> It's the same parser the engine uses for compiling scripts.
>> 
> snip
>> 
> 
> I did a little testing of the token and it doesn't seem to weed out chars
> such as "." and "!".  So it appears that I will have to write my own word
> parser.
> 
> 





More information about the use-livecode mailing list