Unicode and languages

Richmond richmondmathewson at gmail.com
Fri Jun 5 14:40:23 EDT 2020


I doubt that. But if you can determine the Unicode range that is being 
used you can at least know which writing system is being used. You could 
then trap for individual glyphs (such as 'џ', which is only used in 
Macedonian) to narrow things down a spot.

On 5.06.20 20:15, Paul Dupuis via use-livecode wrote:
> In all the added stuff the LC7 and higher Unicode engine includes, is 
> there any way to determine the LANGUAGE of a range of text?
>
> USE-CASE
>
> We have a tool that helps researchers transcribe text from digital 
> media. It is used internationally. We have added spell checking using 
> lclSpell form Live Code Labs, a LiveCode store add-on.
>
> For lclSpell, we only have Dictionaries for a small set of languages. 
> You can build you own Dictionaries for lclSpell, but we'll still only 
> have Dictionaries for a small subset of the languages people 
> transcribe in. We also have people who do BOTH transcription AND 
> translations.
>
> For example, transcribing a Chinese language media recording, typing 
> in the Simplified or Traditional Chinese characters AND then translate 
> it to English, typing the English translation after the transcription.
>
> With lclSpell (or I suspect ANY LiveCode compatible spell checker) if 
> you try to spell check a reasonably large chunk of text that is NOT in 
> the same language as your Dictionary, it ties up LiveCode forever, or 
> at least such a long time and most people would force-quit. It is 
> after all marking every word as misspelled and trying to do whatever 
> it does to determine  that.
>
> Now, you can react, that the researcher should just KNOW better than 
> to do Spell check a text in a language that is not their loaded 
> Dictionary! However, people are people, and will do such things and 
> expect software to protect them from their own mistakes. Also, with 
> mixed transcription and translation, you do want to spell check the 
> English part and skip the Chinese (if you do not have a Chinese 
> Dictionary)
>
> So, we're looking for a way to detect the LANGUAGE of a range of text, 
> in a LiveCode field, to be able to then determine whether it matches 
> the current (or any available) dictionary or not and act accordingly.
>
> There is a "fontLanguage" function in LC, but that seem to predate 
> Unicode Everywhere and seem pretty useless now.
>
> For example. in a new stack, with a single scrolling field, we paste 
> in a Chinese text and then execute:
>
> put the fontLanguage of (the effective textfont of char 1 to -1 of fld 1)
>
> and get "ansi". Even you you set the range (char 2 to 3) that is 
> specifically Chinese (no white space), it still returns "ansi". The 
> textFont returns empty and the effective textFont returns "Segue UI"
>
> I don't even know if language exists in the IBM Unicode engine as some 
> exportable property a future version of LiveCode could expose.
>
> Any clever ideas or thoughts on this problem are welcome.
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list