Elegant way to express constant UTF8 string in script?

Ben Rubinstein benr_mc at cogapp.com
Mon Jun 30 10:38:35 EDT 2014


I think this problem should be solved in LC 7 (possibly using normaliseText); 
but I need a solution that I can ship now (and it's been threatened that LC 7 
will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to use it).

My app processes some data from - and then, re-organised, to - UTF8 text 
files. Occasionally it needs to insert a constant string; and for various 
reasons (all of them excellent) I want to specify these constant strings in 
the script.  So far, so good.  Now however one of these constant strings needs 
to contain a character which is not in ASCII.  Actually two of them.  So I 
need to express a UTF8 string in my script.  And I'm searching for an elegant 
way to do this.

My constant string used to look something like this:

    constant kMyConstantString = "This is my ice cream"

but now it needs to read something like
    constant kMyConstantString = "This ice cream is (c) Ben and Jerry's Inc"

(only with a smart apostrophe and a proper copyright symbol).

I thought I could just about manage with this

   put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc, "ANSI"), 
"UTF8") into kMyConstantString

(that is, encode from ANSI to Unicode, then from Unicode into UTF8).

I tested it on Mac and it seemed to work.  The UTF8 file was generated and 
this text came out just right.


However, it turned out that when the code was compiled and run on Windows, the 
copyright symbol came out OK, but the apostrophe came out as o-tilde.

This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless; 
instead it interprets the source encoding as whatever is typical for the 
operating system.  I wrote the script on Mac; in MacRoman, © is 0xA9 and smart 
apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is ©, but 0xD5 is o-tilde.

So... what's the most elegant way to this (is there one)?  Is there any 
alternative to just looking up the UTF8 encodings and writing:

   put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc") 
into kMyConstantString

?

TIA,

Ben




More information about the use-livecode mailing list