Languages & Internationalisation, part I ( was Intern. II )

manuel companys mcompanys at mac.com
Tue Jan 7 19:04:01 EST 2003


The following is more specially true for 'western european languages' 
speakers using macOS, but can be usefull for other R-R users as well.

EASIEST TO LOCALIZE LANGUAGES:
******************************************
No special fonts needed neither to create nor to use the software. Easy 
to do since the 1984 mac 128. To type the translation easier you just 
may need to select the keyboard, on the fly in the menu bar since 
system 7. The fonts are called 'Western European Languages' (Latin1)

This group includes:
- Romanic languages: French, Italian, Occitanic (Provençal, 
Lenguadocian, North-occitanic, Gascon), Catalan, Italian, Corsican, 
Sardinian, Spanish, Portuguese [but NOT Rumanian]
- Germanic languages: English, Dutch, Platt Deutsch, German, Yiddish, 
Dannish, Sweedish, Norwegian [but NOT Icelandish; I don't know about 
Frisian and Feroese]
- Finno-Ougrian languages: Finnish [but NOT Hungarian; I don't know 
about Estonian and other northern languages.
- Euskarian (Basque)
- many so called 'third world' languages not needing extra diacritics 
(accents, cedillas, bars, etc.)

EASIER TO LOCALIZE LANGUAGES
****************************************

Central European Language (Latin 2, I guess) use a font set very 
similar but with some differences in diacritized letters. Of course 
both the programmer and the user need Central European Fonts; but a 
1984 mac could be used as far as the language is concerned.

This group includes;

--Germanic languages: English, Dutch, Platt Deutsch, German, Yiddish

--Finno-Ougrian: Finnish, Hungarian

--Slavic languages: Polish, Cheh, Slovak [but NOT Slovene nor 
Serbo-Croatian; I don't know about other slavic languages using the 
latin script]

EASY TO LOCALIZE LANGUAGES
**************************************
A. LATIN EXTENDED LANGUAGES
You can
~~EITHER: 1) make a compatible font that will include all the wished 
characters (with a unique ASCII adress for the most frequent characters 
or by 0-offset of the diacritic). This font shall of course be given to 
the user, 2) make the appropriate keyboard map configuration to make 
the input easy to the translater. This solution may frighten some 
people but it is easy to do since all the needed diacritcals are 
already there in Western European and creating and testing the keyboard 
KCHR resource with reseadit is a matter of hours, AND this solution 
DOES NOT require a new brand powerful computer neither to create nor to 
use the program: any mac can do that since 1984.
~~OR: you can simply use the Extended Latin Subset of Unicode. All the 
fonts in macOS X have 360 ASCII adresses including all the chars 
supposedly used in all the languages using the latin alphabet. If you 
are an english board user, you are lucky: you have the 'Extended 
english' keyboard mapping from the input language menu (not quite 
ergonomic but reasonably easy to use); other wise, have an 
ergonomicqlly designed keyboard mapping from apple (according to your 
wishes) before the typing mistakes drive you crazy.

Using the two-byte Unicode system, many characters happen now to have 
two ASCII codes since the first 256 one-byte adressable characters are 
all still there. Besides, this Unicode stuff is not yet perfectly 
finalized, most fonts still are uncomplete and/or have blurry or not 
style-matching characters. AND MOST IMPORTANT: you need a fast 
powerfull computer with lots of RAM and a disc with hundreds of 
Megabytes.

Using Unicode in 2003 to write such languages as lituanian, esperanto, 
slovene, croatian, albanese, romanian, maltese or turkish, amounts to 
use a whole battery of bazookas to kill a mosquito.** You could even 
miss the mosquito and get some unespected 'dommages collatéraux' as we 
say in french.

B. OTHER ALPHABETIC SCRIPT 'SIMPLE' LANGUAGES
I mean  a) really alphabetic, (not syllabic like japanese katakana); b) 
  only one shape for each letter no matter the litteral environment 
(this excludes arabic); c) not needing to change our standard 
horizontal left-to-right  system (this also excludes hebrew).

The case is technically the same than for Latin extended languages. You 
only need the appropriate font, Cyrillic, for instqnce. Of course if 
the trqsnlator is used to a latin alphabet he may want to have an 
ergonomically defined keyboard according to his habits.

This group includes, among other languages, greek and the cyrillic 
alphabet group which is in a pretty similar situation as the latin one: 
a central nucleous  ('easier': russian, ukrainian, bulgarian) and the 
'extended more or less 'easy': cyrillic serbo-croat (Serbia, 
Montenegro) and most non-indoeuropean languages of the former Soviet 
Union.,

C. SIMPLE SYLLABIC SCRIPT LANGUAGES
I mean a) close to one-to-one correspondance between characters and 
phonemic syllables b) no (or very few) context sensitive shape changes 
and c) not needing to change our horizontal left-to-right system. This 
is exactly the case* of japanese Katakana; or Hiragana, for that matter.

Technically speaking the problem is very similar as with alphabetical 
languages: there is enough room in the 256 single-byte adressed apple 
fonts to fit the whole katakana AND the latin alpha-numeric plus 
frequently used punctuation and symbols. You even can find such fonts. 
You just may need to make (or have made for you) an ergonomical 
keyboard fitting your habits.

.................................

[For the not so easy languages, "La suite au prochain numéro"** as we 
say in french. Well, if I am not kicked off the list before, for boring 
all the nice people out there with my linguistic junk.]  ;-)
______________
* Ok, they write HA for 'wa' but this does not deserve another 
Hieroshima or Nagasaki trick, does it?
** Bill Gate's Entourage mqiling program does still better: it compells 
the europeans to switch to Unicode if they want to use their currrency 
sign, which has had an accessible ASCII adress since 1984 (the euro 
sign took the place of the so called 'currency" sign nobody as ever 
used for decades.
*** 'To be continued in the next issue.



More information about the use-livecode mailing list