New chunks

Richmond richmondmathewson at gmail.com
Tue Mar 11 15:26:55 EDT 2014


On 11/03/14 20:15, Benjamin Beaumont wrote:
> Hi All,
>
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
>
> The new chunk types are:
>
> naturalword (breaks on unicode word boundaries)

Well; in theory that looks good until you start to think about languages 
which are
written (such as Sanskrit) with no obvious word boundaries and both 
vowel mutation (Sandhi)
at what would be word boundaries, and consonant fusion.

Languages such as Inuit and Hungarian are agglutinative, and in some 
cases what we (speakers of West
European languages) would term a sentence consists of a single word with 
loads of affixes; some at
the front (prefixes).

Many Austronesian languages use infixes (i.e. twiddly bits shoved into 
the middle of 'words').

These also crop up in Afro-Asiatic languages such as Arabic.

There are also some examples in English such as "fan-f*cking-tabulous".

We could also get sweaty about circumfixes, where a bit gets put on the 
front and a bit gets put on the back as
a sort of split morpheme (not to be confused with split-pea bara).

> sentence (breaks on unicode sentence boundaries)

That looks a bit fishy.

How are you going to work out what marks a sentence boundary in every 
language that one can write
with Unicode? And there are languages where the idea of a 'sentence' is 
absent.

> paragraph (Same behaviour as current 'line' chunk)
>
> The first chunk is called 'naturalword' because 'word' is already in use.
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
> option for backward compatibility. We are also limited by the current
> parser which doesn't allow us to use the form:
>
> put natural word 1 of "this is a string of words"
>
> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.

I'm sorry to be such a "pill", but word and sentence boundaries are such 
culture-bound concepts
that they will only be any good for languages that mark word and 
sentence boundaries.

This is about the same as stating dogmatically that "all bananas are 
yellow", when they are not.

> Warm regards and thank you for your input.

You may not thank me.

Richmond.

>
> Ben
>
> _____________________________________________
>
> Benjamin Beaumont . RunRev Ltd
>
>





More information about the use-livecode mailing list