New chunks

Bob Sneidar bobsneidar at iotecdigital.com
Wed Mar 12 19:58:46 EDT 2014


Pretty sure Livecode is going to do a simple delimiter on period. You would have to prep the data first by replacing periods in any word that is a number with a placeholder, processing your sentences, then restoring the placeholders (if you need to). 

You could get fancy by setting the lineDelimiter to space, then finding every line that ends in a period and processing everything in-between. It’s doubtful a number would end in a period without it being the end of a sentence. 

Bob


On Mar 11, 2014, at 15:34 , Jim Hurley <jhurley0305 at sbcglobal.net> wrote:

> Can someone explain how the “sentence" chunk would work?
> How are decimal points, and points in an abbreviation distinguished from the “period” that deliniates the end of a “sentence?”
> Does it presume that the exitsing text has special embedded “periods?”
> 
> I’ve written my own, but it is very cumbersome and not flawless. I use it to do manuscript analysis.
> Like: Find all sentences in which “time” and “party” occur anywhere in the same sentence.
> 
> My ignorance on unicode is profound.
> Jim
> 
> C
>> Message: 15
>> Date: Tue, 11 Mar 2014 18:15:18 +0000
>> From: Benjamin Beaumont <ben at runrev.com>
>> To: LiveCode Developer List <livecode-dev at lists.runrev.com>, 	How to
>> 	use LiveCode <use-livecode at lists.runrev.com>
>> Subject: New chunks
>> Message-ID:
>> 	<CADd0_Txbhdem4PbKXifXUsujqPLs9HROME6vKhF=Sk1zNp29cQ at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>> 
>> Hi All,
>> 
>> We're in the process of adding some new chunk types in LiveCode 7 and we
>> would appreciate suggestions for a particular chunk name.
>> 
>> The new chunk types are:
>> 
>> naturalword (breaks on unicode word boundaries)
>> sentence (breaks on unicode sentence boundaries)
>> paragraph (Same behaviour as current 'line' chunk)
>> 
>> The first chunk is called 'naturalword' because 'word' is already in use.
>> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
>> option for backward compatibility. We are also limited by the current
>> parser which doesn't allow us to use the form:
>> 
>> put natural word 1 of "this is a string of words"
>> 
>> 'naturalword' is the clearest internal suggestion at the moment and we'd
>> love to get the input from community members if there is an even clearer
>> option.
>> 
>> Warm regards and thank you for your input.
>> 
>> Ben
>> 
>> _____
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list