ASCII, Unicode and Livecode

Peter W A Wood peterwawood at gmail.com
Wed Aug 17 00:16:09 EDT 2011


Richmond

> So, there I am reading a BBC computer manual (as one does) where I am
> informed that the ASCII set consists of 128 chars . . .
> 
> . . . that's funny, because I had 256 (i.e. 2 x 128) rumbling around in my mind.

A common misconception and the source of endless problems in applications which share data from multiple computers or are run all over the world due to the different ways people worked out how to make use of the extra 128 combinations that are available (MacRoman, ISO-8859-1, ISO-8859-2,etc and, not forgetting, the myriad of Windows Code Pages.

> As I have to develop a script that determines whether a char is a single-byte char or the first half of a double-byte char . . . ouch . . . this is probably fairly важно, to use a rather useful Bulgarian word . . .
> 
> Messing around in Livecode, I set up a stack with 2 fields; one containing a string of ASCII text:
> 
> "LAT" Еat my cheese
> 
> and the other containing a Unicode string:
> 
> "UNIK" 'a Sanskrit word' I won't try to represent here [anyway, a spot of juicy double-byted goodness]
> 
> then I set up a third field called "C1"
> 
> 2 buttons; one to get the charToNum for the first char in fld "LAT", and one to do the
> same for fld "UNIK"
> 
> they both contain this sort of script:
> 
> on mouseUp
> put charToNum(char 1 of fld "LAT") into fld "C1"
> end mouseUp
> 
> and that's all very jolly, except that
> 
> on mouseUp
> put charToNum(char 1 of fld "UNIK") into fld "C1"
> end mouseUp
> 
> returns "9" which is within the ASCII range, so there is no way I can use that sort of script to determine what I want to.

It is just not possible to distinguish between a two consecutive single-byte chars and a double-byte char, after all they are just combinations of 16 bits. To handle any text, you need to know how the text is encoded. If you haven't read http://www.joelonsoftware.com/articles/Unicode.html you probably should. It even mentions Bulgaria!

Regards

Peter






More information about the Use-livecode mailing list