Elegant way to express constant UTF8 string in script?
J. Landman Gay
jacque at hyperactivesw.com
Mon Jun 30 17:18:15 CEST 2014
This is exactly what I've been dealing with for a week. You need two steps : first check the platform and if it's Windows then run macToISO on the string. After that your existing conversion to UTF8 should work.
On June 30, 2014 9:38:35 AM CDT, Ben Rubinstein <benr_mc at cogapp.com> wrote:
>I think this problem should be solved in LC 7 (possibly using
>but I need a solution that I can ship now (and it's been threatened
>that LC 7
>will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to
>My app processes some data from - and then, re-organised, to - UTF8
>files. Occasionally it needs to insert a constant string; and for
>reasons (all of them excellent) I want to specify these constant
>the script. So far, so good. Now however one of these constant
>to contain a character which is not in ASCII. Actually two of them.
>need to express a UTF8 string in my script. And I'm searching for an
>way to do this.
>My constant string used to look something like this:
> constant kMyConstantString = "This is my ice cream"
>but now it needs to read something like
>constant kMyConstantString = "This ice cream is (c) Ben and Jerry's
>(only with a smart apostrophe and a proper copyright symbol).
>I thought I could just about manage with this
>put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc,
>"UTF8") into kMyConstantString
>(that is, encode from ANSI to Unicode, then from Unicode into UTF8).
>I tested it on Mac and it seemed to work. The UTF8 file was generated
>this text came out just right.
>However, it turned out that when the code was compiled and run on
>copyright symbol came out OK, but the apostrophe came out as o-tilde.
>This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless;
>instead it interprets the source encoding as whatever is typical for
>operating system. I wrote the script on Mac; in MacRoman, © is 0xA9
>apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is ©, but 0xD5 is
>So... what's the most elegant way to this (is there one)? Is there any
>alternative to just looking up the UTF8 encodings and writing:
>put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc")
>use-livecode mailing list
>use-livecode at lists.runrev.com
>Please visit this url to subscribe, unsubscribe and manage your
Jacqueline Landman Gay | jacque at hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
More information about the use-livecode