Elegant way to express constant UTF8 string in script?

J. Landman Gay jacque at hyperactivesw.com
Mon Jun 30 11:18:15 EDT 2014


This is exactly what I've been dealing with for a week.  You need two steps : first check the platform and if it's Windows then run macToISO on the string.  After that your existing conversion to UTF8 should work. 

On June 30, 2014 9:38:35 AM CDT, Ben Rubinstein <benr_mc at cogapp.com> wrote:
>I think this problem should be solved in LC 7 (possibly using
>normaliseText); 
>but I need a solution that I can ship now (and it's been threatened
>that LC 7 
>will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to
>use it).
>
>My app processes some data from - and then, re-organised, to - UTF8
>text 
>files. Occasionally it needs to insert a constant string; and for
>various 
>reasons (all of them excellent) I want to specify these constant
>strings in 
>the script.  So far, so good.  Now however one of these constant
>strings needs 
>to contain a character which is not in ASCII.  Actually two of them. 
>So I 
>need to express a UTF8 string in my script.  And I'm searching for an
>elegant 
>way to do this.
>
>My constant string used to look something like this:
>
>    constant kMyConstantString = "This is my ice cream"
>
>but now it needs to read something like
>constant kMyConstantString = "This ice cream is (c) Ben and Jerry's
>Inc"
>
>(only with a smart apostrophe and a proper copyright symbol).
>
>I thought I could just about manage with this
>
>put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc,
>"ANSI"), 
>"UTF8") into kMyConstantString
>
>(that is, encode from ANSI to Unicode, then from Unicode into UTF8).
>
>I tested it on Mac and it seemed to work.  The UTF8 file was generated
>and 
>this text came out just right.
>
>
>However, it turned out that when the code was compiled and run on
>Windows, the 
>copyright symbol came out OK, but the apostrophe came out as o-tilde.
>
>This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless;
>
>instead it interprets the source encoding as whatever is typical for
>the 
>operating system.  I wrote the script on Mac; in MacRoman, © is 0xA9
>and smart 
>apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is ©, but 0xD5 is
>o-tilde.
>
>So... what's the most elegant way to this (is there one)?  Is there any
>
>alternative to just looking up the UTF8 encodings and writing:
>
>put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc")
>
>into kMyConstantString
>
>?
>
>TIA,
>
>Ben
>
>_______________________________________________
>use-livecode mailing list
>use-livecode at lists.runrev.com
>Please visit this url to subscribe, unsubscribe and manage your
>subscription preferences:
>http://lists.runrev.com/mailman/listinfo/use-livecode

-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com




More information about the use-livecode mailing list