New chunks

Bob Sneidar bobsneidar at iotecdigital.com
Thu Mar 13 11:45:21 EDT 2014


Office products stumble at name and standard street abbreviations as well. Floating point numbers however are ignored. Apparently what they are looking for is period-space or period-<break of some kind>. Not very comprehensive, but of course nothing could accommodate all abbreviations. The example, "I used to know H. G. Wells." will produce 3 sentences. But if you ignore a single letter followed by a period, then the phrase, “A is greater than B. Therefore, B is not greater than A.” will only produce one sentence. 

This evokes in me a severe doubt that the number of sentences can ever be absolutely determined with 100% confidence. I suppose had the English language been developed in the digital age, someone would have thought of this conundrum and used a different character for abbreviations, decimal indicators and sentences. 

Things like this make me ponder in what scenario would it be necessary to isolate sentences at all. If Microsoft Word, the defacto word processor of the world, cannot absolutely detect all sentences in all situations, they obviously don’t think there is a real need for it. Can anyone cite an application that can detect sentences with 100% certainty? If so, figure out what they are using. 

Bob


On Mar 13, 2014, at 02:47 , Fraser Gordon <fraser.gordon at runrev.com> wrote:

> 
> On 13 Mar 2014, at 04:48, Jim Hurley <jhurley0305 at sbcglobal.net> wrote:
>> 
>> So I really can't see the purpose of RR's "sentence chunk". I wish they would explain.
>> 
> 
> We'd be using ICU's sentence breaking code. They include a whole bunch of language-related knowledge with the library and can use that to tell the difference between decimal points, full stops, abbreviations, etc. You're right about it not being perfect but it does seem pretty reliable.
> 
> Regards,
> Fraser
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list