New chunks
Richmond
richmondmathewson at gmail.com
Tue Mar 11 15:26:55 EDT 2014
On 11/03/14 20:15, Benjamin Beaumont wrote:
> Hi All,
>
> We're in the process of adding some new chunk types in LiveCode 7 and we
> would appreciate suggestions for a particular chunk name.
>
> The new chunk types are:
>
> naturalword (breaks on unicode word boundaries)
Well; in theory that looks good until you start to think about languages
which are
written (such as Sanskrit) with no obvious word boundaries and both
vowel mutation (Sandhi)
at what would be word boundaries, and consonant fusion.
Languages such as Inuit and Hungarian are agglutinative, and in some
cases what we (speakers of West
European languages) would term a sentence consists of a single word with
loads of affixes; some at
the front (prefixes).
Many Austronesian languages use infixes (i.e. twiddly bits shoved into
the middle of 'words').
These also crop up in Afro-Asiatic languages such as Arabic.
There are also some examples in English such as "fan-f*cking-tabulous".
We could also get sweaty about circumfixes, where a bit gets put on the
front and a bit gets put on the back as
a sort of split morpheme (not to be confused with split-pea bara).
> sentence (breaks on unicode sentence boundaries)
That looks a bit fishy.
How are you going to work out what marks a sentence boundary in every
language that one can write
with Unicode? And there are languages where the idea of a 'sentence' is
absent.
> paragraph (Same behaviour as current 'line' chunk)
>
> The first chunk is called 'naturalword' because 'word' is already in use.
> Renaming the current 'word' chunk to 'token' to free up 'word' is not an
> option for backward compatibility. We are also limited by the current
> parser which doesn't allow us to use the form:
>
> put natural word 1 of "this is a string of words"
>
> 'naturalword' is the clearest internal suggestion at the moment and we'd
> love to get the input from community members if there is an even clearer
> option.
I'm sorry to be such a "pill", but word and sentence boundaries are such
culture-bound concepts
that they will only be any good for languages that mark word and
sentence boundaries.
This is about the same as stating dogmatically that "all bananas are
yellow", when they are not.
> Warm regards and thank you for your input.
You may not thank me.
Richmond.
>
> Ben
>
> _____________________________________________
>
> Benjamin Beaumont . RunRev Ltd
>
>
More information about the use-livecode
mailing list