tokens and parsing
David Bovill
david.bovill at opn-technologies.com
Thu Jan 10 00:11:01 EST 2002
Don't know if these help. I use them for the function "replaceWords". The
function "knows" not to touch stuff inside quotes, etc... it's old, not
optimised for speed etc), and I haven't checked it thoroughly so use with
care:
------------------------------------------------------------------------
function wordDelim someChar
-- version 9.0
-- could be much better!
put quote & space & return & "!?':;()[]{}<>,." into test
if someChar is in test then
return true
else return false
end wordDelim
function replacWords someWord, someText, newWord, evenWithinQuotes,
startCharNum
-- version replaceWords 9.1, 2/2/00
put space & someText & space into testText
put the number of chars of someWord into someWordLength
repeat
set the cursor to busy
put offsetAfter(someWord, testText, startCharNum) into startCharNum
put startCharNum + someWordLength - 1 into endCharNum
put endCharNum + 1 into charNumAfterWord
put startCharNum - 1 into charNumBeforeWord
put wordDelim(char charNumAfterWord of testText) and wordDelim(char
charNumBeforeWord of testText) into wholeWord
if startCharNum is 0 then
delete char 1 of testText
delete last char of testText
return testText
else if wholeWord is false then
next repeat
else
if evenWithinQuotes is not true then
get charToChunk(startCharNum, testText)
put item 1 of it into wordNum
put item 2 of it into itemNum
put item 3 of it into lineNum
put item 4 of it into charNumOfline
put line lineNum of testText into testLine
get char charNumOfline of testLine
if withinQuotes(charNumOfline, testLine) is true then
next repeat
else
end if
else
end if
put newWord into char startCharNum to endCharNum of testText
if newWord is empty then
if char startCharNum of testText is in " " then
delete char startCharNum of testText
end if
end if
end if
end repeat
end replaceWords
function charToChunk wordOrCharNum, someText
-- version latest (for old times sake)
if wordOrCharNum is not a number then put offset(wordOrCharNum, someText)
into wordOrCharNum
construct_LineChunk wordOrCharNum, someText, charOfLine, wordOfLine,
itemOfLine, lineNum
return wordOfLine & "," & itemOfLine & "," & lineNum & "," & charOfLine
end charToChunk
on construct_LineChunk charNum, someText, @charOfLine, @wordOfLine,
@itemOfLine, @lineNum
-- version latest,3/9/01
get char 1 to charNum of someText
if someText is empty then
put 1 into lineNum
put 1 into itemOfLine
put 1 into wordOfLine
put 1 into charOfLine
else
put the number of lines of it into lineNum
put the number of items of line lineNum of it into itemOfLine
put the number of words of item itemOfLine of line lineNum of it into
wordOfLine
put the number of chars of line lineNum of it into charOfLine
end if
end construct_LineChunk
function offsetAfter string, text, startChar
-- version 9.0
if startChar < 0 then put 0 into startChar
delete char 1 to startChar of text
get offset(string, text)
if it is 0 then return 0
else return it + startChar
end offsetAfter
function withinQuotes textOrOffset, someContainer
-- version latest, 2/2/00
put item 1 of textOrOffset into startCharNum
put item 2 of textOrOffset into endCharNum
if endCharNum is empty then put startCharNum into endCharNum
if startCharNum is a number and endCharNum is a number then
else
put offset(textOrOffset, someContainer) into startCharNum
put startCharNum + the length of textOrOffset - 1 into endCharNum
end if
if startCharNum is 0 then
return "Not Found"
else
put char 1 to (startCharNum - 1) of someContainer into textBefore
put countStrings(quote, textBefore) into quotesBefore
put the number of chars of someContainer into lastCharNum
put char (endCharNum + 1) to lastCharNum of someContainer into textAfter
put countStrings(quote, textAfter) into quotesAfter
if isOdd(quotesBefore) and quotesAfter >= 1 then
return true
else
return false
end if
end if
end withinQuotes
------------------------------------------------------------------------
> From: "Ivers, Doug E" <Doug_Ivers at lord.com>
> Subject: RE: tokens and parsing
>
>> -----Original Message-----
>> From: Scott Raney [mailto:raney at metacard.com]
>> Subject: Re: popups and tokens (was "Digest-something")
>>
>>
> snip
>>
>>> 2.
>>> Seems that the word parser is little more than an item
>> parser with the
>>> itemDelimiter set to " ",
>>
>> It's a bit more, because it also skips multiple spaces, and tabs and
>> returns in addition to spaces (neither is possible with item chunks).
>>
>>> except for the stupid behavior with quotes. I
>>> would like a true word parser. Or a parser for which we can specify
>>> multiple delimiters. Like a java token function. What is
>> the best/fastest
>>> way to parse words even in the presence of quotes and punctuation?
>>
>> Use "token" chunks (e.g., "repeat for each token t in
>> <somecontainer>").
>> It's the same parser the engine uses for compiling scripts.
>>
> snip
>>
>
> I did a little testing of the token and it doesn't seem to weed out chars
> such as "." and "!". So it appears that I will have to write my own word
> parser.
>
>
More information about the use-livecode
mailing list