problem with counting words

Ralph DiMola rdimola at evergreeninfo.net
Mon Oct 13 18:54:03 EDT 2014


This is something I know about. Between a pretty decent VB com .dll and
additional in house rules I get about 95.98% accuracy splitting up
US/international name components. But there still is the .02% that need
individual attention. I never process lists > 100,000 so this error rate is
acceptable. My system would have heart failure with a > 3 part last name.
Into the .02% bucket... "Dr. Bob Brown Trustee for Ms. June Smith" would not
be split correctly. A reference to a relationship between two people is
beyond my systems ability. The Mac Donald or apostrophes like O'Connel or
hyphens like Foster-Smith are the easy cases even though one never knows
what apostrophe variation will be used. When the last name is space
delimited with nonstandard prefixes that it starts to get interesting. The
only way to sort names with 100% accuracy is to have the name components
from the get-go and use Unicode from start to finish. Maybe Watson can do it
100%, but I can't afford the CPU time. I can't wait until LC 7 gets settled
down and I can use Unicode LC for production text processing.

Ralph DiMola
IT Director
Evergreen Information Services
rdimola at evergreeninfo.net


-----Original Message-----
From: use-livecode [mailto:use-livecode-bounces at lists.runrev.com] On Behalf
Of Bob Sneidar
Sent: Monday, October 13, 2014 4:20 PM
To: How to use LiveCode
Subject: Re: problem with counting words

Understandable. And yet the question is not how you are to interpret a word
boundary, but how a computer which only knows ones and zeros can. It's the
(computer) age old problem: Computers don't do what you want them to. They
only do what you tell them to. ;-)

A great example is how to discern a first, middle and last name in a full
name field. Turns out it cannot be done with 100% reliability. Some names
have spaces in them like Mac Donald or apostrophes like O'Connel or hyphens
like Foster-Smith. Some people have more than three words in their full
name. You would have to create a series of special case statements because
when mankind invented last names, computers had not been invented yet.

Bob S


On Oct 12, 2014, at 13:04 ,
larry at significantplanet.org<mailto:larry at significantplanet.org> wrote:

Hi Terry,
Here is the real problem.  I don't know much.
I'm sitting here assuming that a word is a word, regardless of whether it is
inside quotes.

_______________________________________________
use-livecode mailing list
use-livecode at lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list