Elegant way to express constant UTF8 string in script?
J. Landman Gay
jacque at hyperactivesw.com
Mon Jun 30 11:18:15 EDT 2014
This is exactly what I've been dealing with for a week. You need two steps : first check the platform and if it's Windows then run macToISO on the string. After that your existing conversion to UTF8 should work.
On June 30, 2014 9:38:35 AM CDT, Ben Rubinstein <benr_mc at cogapp.com> wrote:
>I think this problem should be solved in LC 7 (possibly using
>normaliseText);
>but I need a solution that I can ship now (and it's been threatened
>that LC 7
>will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to
>use it).
>
>My app processes some data from - and then, re-organised, to - UTF8
>text
>files. Occasionally it needs to insert a constant string; and for
>various
>reasons (all of them excellent) I want to specify these constant
>strings in
>the script. So far, so good. Now however one of these constant
>strings needs
>to contain a character which is not in ASCII. Actually two of them.
>So I
>need to express a UTF8 string in my script. And I'm searching for an
>elegant
>way to do this.
>
>My constant string used to look something like this:
>
> constant kMyConstantString = "This is my ice cream"
>
>but now it needs to read something like
>constant kMyConstantString = "This ice cream is (c) Ben and Jerry's
>Inc"
>
>(only with a smart apostrophe and a proper copyright symbol).
>
>I thought I could just about manage with this
>
>put uniDecode(uniEncode("This ice cream is Ben and Jerrys Inc,
>"ANSI"),
>"UTF8") into kMyConstantString
>
>(that is, encode from ANSI to Unicode, then from Unicode into UTF8).
>
>I tested it on Mac and it seemed to work. The UTF8 file was generated
>and
>this text came out just right.
>
>
>However, it turned out that when the code was compiled and run on
>Windows, the
>copyright symbol came out OK, but the apostrophe came out as o-tilde.
>
>This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless;
>
>instead it interprets the source encoding as whatever is typical for
>the
>operating system. I wrote the script on Mac; in MacRoman, is 0xA9
>and smart
>apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is , but 0xD5 is
>o-tilde.
>
>So... what's the most elegant way to this (is there one)? Is there any
>
>alternative to just looking up the UTF8 encodings and writing:
>
>put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc")
>
>into kMyConstantString
>
>?
>
>TIA,
>
>Ben
>
>_______________________________________________
>use-livecode mailing list
>use-livecode at lists.runrev.com
>Please visit this url to subscribe, unsubscribe and manage your
>subscription preferences:
>http://lists.runrev.com/mailman/listinfo/use-livecode
--
Jacqueline Landman Gay | jacque at hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
More information about the use-livecode
mailing list