problem with counting words

Robert Brenstein rjb at robelko.com
Mon Oct 13 19:18:02 EDT 2014


Not particularly challenging but a cool name:

Maximilian Maria Thurn and Taxis


On 13.10.2014 at 18:54 Uhr -0400 Ralph DiMola apparently wrote:
>This is something I know about. Between a pretty decent VB com .dll and
>additional in house rules I get about 95.98% accuracy splitting up
>US/international name components. But there still is the .02% that need
>individual attention. I never process lists > 100,000 so this error rate is
>acceptable. My system would have heart failure with a > 3 part last name.
>Into the .02% bucket... "Dr. Bob Brown Trustee for Ms. June Smith" would not
>be split correctly. A reference to a relationship between two people is
>beyond my systems ability. The Mac Donald or apostrophes like O'Connel or
>hyphens like Foster-Smith are the easy cases even though one never knows
>what apostrophe variation will be used. When the last name is space
>delimited with nonstandard prefixes that it starts to get interesting. The
>only way to sort names with 100% accuracy is to have the name components
>from the get-go and use Unicode from start to finish. Maybe Watson can do it
>100%, but I can't afford the CPU time. I can't wait until LC 7 gets settled
>down and I can use Unicode LC for production text processing.
>
>Ralph DiMola
>IT Director
>Evergreen Information Services
>rdimola at evergreeninfo.net




More information about the use-livecode mailing list