Word chunk includes punctuation

Paul Dupuis paul at researchware.com
Mon Aug 13 16:32:58 EDT 2012


On 8/13/2012 12:13 PM, Ken Ray wrote:
> On Aug 13, 2012, at 11:10 AM, James Hale wrote:
>
>> Is this a bug?
> No, it's a 'convention'… it mimics the way that HyperCard recognized a "word"; as stated in the Dictionary under "word":
>
> "A word is delimited by one or more spaces, tabs, or returns, or enclosed by double quotes. A single word can contain multiple characters and multiple items, but not multiple lines."
>
>> If not, has anyone got a workaround that doesn't require me testing for a punctuation character at the end of every word or replacing them all with spaces?
> As Mark pointed out, the use of "token" helps separate the wheat from the chaff (see the entry on "token" in the Dictionary for how a token is defined).
>
>

One caution: token does not separate . (period), ! (exclamation mark),
or ? (question mark). If you are really trying to process English text,
you probably will want to write your own punctuation remover as it can
then figure the difference between a period at the end of a sentence and
a period at the end of abbreviations like "Dr." or "Mr."

-- 
Paul Dupuis
Cofounder
Researchware, Inc.
http://www.researchware.com/
http://www.twitter.com/researchware
http://www.facebook.com/researchware
http://www.linkedin.com/company/researchware-inc





More information about the use-livecode mailing list