Unicode and languages
paul at researchware.com
Fri Jun 5 13:15:11 EDT 2020
In all the added stuff the LC7 and higher Unicode engine includes, is
there any way to determine the LANGUAGE of a range of text?
We have a tool that helps researchers transcribe text from digital
media. It is used internationally. We have added spell checking using
lclSpell form Live Code Labs, a LiveCode store add-on.
For lclSpell, we only have Dictionaries for a small set of languages.
You can build you own Dictionaries for lclSpell, but we'll still only
have Dictionaries for a small subset of the languages people transcribe
in. We also have people who do BOTH transcription AND translations.
For example, transcribing a Chinese language media recording, typing in
the Simplified or Traditional Chinese characters AND then translate it
to English, typing the English translation after the transcription.
With lclSpell (or I suspect ANY LiveCode compatible spell checker) if
you try to spell check a reasonably large chunk of text that is NOT in
the same language as your Dictionary, it ties up LiveCode forever, or at
least such a long time and most people would force-quit. It is after all
marking every word as misspelled and trying to do whatever it does to
Now, you can react, that the researcher should just KNOW better than to
do Spell check a text in a language that is not their loaded Dictionary!
However, people are people, and will do such things and expect software
to protect them from their own mistakes. Also, with mixed transcription
and translation, you do want to spell check the English part and skip
the Chinese (if you do not have a Chinese Dictionary)
So, we're looking for a way to detect the LANGUAGE of a range of text,
in a LiveCode field, to be able to then determine whether it matches the
current (or any available) dictionary or not and act accordingly.
There is a "fontLanguage" function in LC, but that seem to predate
Unicode Everywhere and seem pretty useless now.
For example. in a new stack, with a single scrolling field, we paste in
a Chinese text and then execute:
put the fontLanguage of (the effective textfont of char 1 to -1 of fld 1)
and get "ansi". Even you you set the range (char 2 to 3) that is
specifically Chinese (no white space), it still returns "ansi". The
textFont returns empty and the effective textFont returns "Segue UI"
I don't even know if language exists in the IBM Unicode engine as some
exportable property a future version of LiveCode could expose.
Any clever ideas or thoughts on this problem are welcome.
More information about the use-livecode