Bug with token chunk type?

Richard Gaskin ambassador at fourthworld.com
Tue Mar 20 10:50:16 EDT 2007


Dave Cragg wrote:
> On 20 Mar 2007, at 05:47, Richard Gaskin wrote:
>> Why should "s" be parsed as a separate "token" from the numeric  
>> portion  if it trails or is anywhere in the middle, but only if  
>> it's at the beginning then it's considered a part of the  
>> alphanumeric string? :\
>>
>> Unless someone can come up with a good reason why this is what one  
>> should expect I'll log it as a bug tomorrow.
> 
> Does there need to be a good reason?

I'm one of those old-fashioned guys who still believes computers are 
deterministic systems. :)

It's possible to discover bugs in software, even Rev.  When this one was 
discovered it was unknown whether this was a bug or not; to some degree 
the question still remains open, but I have too much to do to explore 
this further so I'm willing to accept the guesses provided here and move 
on to simply replace the use of the token chunk type in my code.

As with any seemingly anomalous behavior, if a definitive answer can't 
be arrived at by the readers here it can be helpful to bring it to the 
attention of the development team.  Bugs happen.

> From the docs, "The token chunk is jmplemented for the Transcript
> language, and probably isn't suitable for use in a general-purpose
> language parser."
> 
> I've always taken this to mean that "token" is the engine's private  
> property, and we have no basis for questionning how it works. We can  
> guess, but we can also expect to be surprised.

When I'm delivering a product I try to avoid guessing. :)

"Token" is a documented keyword which is useful for a great many 
solutions, as evidenced by the frequency with which it's used in 
handlers provided by readers of this list.   While its main benefit is 
for those of us who write IDEs and IDE tools, its value extends to 5GLs 
and a great many other uses as well.  It can be really handy.

Phil Jimmieson's suggestion that this issue may be related to how the 
parser scans for variable names may be the clue that's needed, and Bill 
Marriott's observation that differences in interpreting strings appear 
to be limited to those which might be valid hex symbols appears to cinch it.

So don't worry, I won't pollute Bugzilla with yet another non-bug report 
(last time I reviewed the bugs there it seems about a third were junk, 
either duplicates or just not reading the docs).  I tend to be somewhat 
cautious about entering bug reports there, which is why I brought the 
issue here first.

Thanks to all who helped pin this down. Last night I rewrote the handler 
that used this which had resulted in a bug report from one of my 
customers, and while the current solution is a tad slower the rewrite 
was a good exercise as the new version is clearer and easier to maintain 
and enhance.

There is a cautionary lesson here for those who use the "token" chunk 
type:  Test thoroughly, and expect things that may seem anomalous at 
first in light of the many different needs of the engine's parser. 
Given the testing requirements implied by being unable to anticipate all 
possible behaviors, it may be best to avoid using the "token" chunk type 
except as a last resort when no other solution is available, or in the 
small subset of uses where such anomalies aren't likely to affect outcomes.

-- 
  Richard Gaskin
  Fourth World Media Corporation
  ___________________________________________________________
  Ambassador at FourthWorld.com       http://www.FourthWorld.com



More information about the use-livecode mailing list