Elegant way to express constant UTF8 string in script?

Mark Schonewille m.schonewille at economy-x-talk.com
Mon Jun 30 14:10:45 EDT 2014


Hi Ben,

My solution will work in pre-7 and is 100% vanilla LiveCode (no idea why you explicitly mention again that it should be script-only). You'll have to change your script when you move to 7. Obviously, you could write a script for both versions using the do command for the 7-specific part of your script.

--
Kind regards,

Mark Schonewille
Economy-x-Talk
Http://economy-x-talk.com

Share the clipboard of your computer over a local network with Clipboard Link http://clipboardlink.economy-x-talk.com


Op 30 jun. 2014 om 19:24 heeft Ben Rubinstein <benr_mc at cogapp.com> het volgende geschreven:

> Hi Mark,
> 
> Thanks for the reply.  The problem is
> 
> a) I want to do this purely in script
> 
> b) A character directly entered into the script on a Mac comes out different on Windows (i.e. the scripts don't know what character set they're in; they're simply stored with no indication of character set, and on every platform they're interpreted as the supposedly 'native' platform for that character set).
> 
> Presumably in 7.0 I won't even need to use normaliseText, because the scripts will themselves be stored in Unicode or UTF8, and therefore I can use any Unicode character in a real script constant.  But not in 6.x.
> 
> Ben
> 
> On 30/06/2014 16:09, Mark Schonewille wrote:
>> Hi Ben,
>> 
>> The apostrophe doesn't work because you convert to ASCII text that looks different on different platforms. If you don't use unidecode and just set the unicodeText of a field to your Unicode string, it should work. If that's not practical, you could use macToIso() to convert your string to Latin-1.
>> 
>> --
>> Kind regards,
>> 
>> Mark Schonewille
>> Economy-x-Talk
>> Http://economy-x-talk.com
>> 
>> Share the clipboard of your computer over a local network with Clipboard Link http://clipboardlink.economy-x-talk.com
>> 
>> 
>> Op 30 jun. 2014 om 16:38 heeft Ben Rubinstein <benr_mc at cogapp.com> het volgende geschreven:
>> 
>>> I think this problem should be solved in LC 7 (possibly using normaliseText); but I need a solution that I can ship now (and it's been threatened that LC 7 will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to use it).
>>> 
>>> My app processes some data from - and then, re-organised, to - UTF8 text files. Occasionally it needs to insert a constant string; and for various reasons (all of them excellent) I want to specify these constant strings in the script.  So far, so good.  Now however one of these constant strings needs to contain a character which is not in ASCII.  Actually two of them.  So I need to express a UTF8 string in my script.  And I'm searching for an elegant way to do this.
>>> 
>>> My constant string used to look something like this:
>>> 
>>>   constant kMyConstantString = "This is my ice cream"
>>> 
>>> but now it needs to read something like
>>>   constant kMyConstantString = "This ice cream is (c) Ben and Jerry's Inc"
>>> 
>>> (only with a smart apostrophe and a proper copyright symbol).
>>> 
>>> I thought I could just about manage with this
>>> 
>>>  put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc, "ANSI"), "UTF8") into kMyConstantString
>>> 
>>> (that is, encode from ANSI to Unicode, then from Unicode into UTF8).
>>> 
>>> I tested it on Mac and it seemed to work.  The UTF8 file was generated and this text came out just right.
>>> 
>>> 
>>> However, it turned out that when the code was compiled and run on Windows, the copyright symbol came out OK, but the apostrophe came out as o-tilde.
>>> 
>>> This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless; instead it interprets the source encoding as whatever is typical for the operating system.  I wrote the script on Mac; in MacRoman, © is 0xA9 and smart apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is ©, but 0xD5 is o-tilde.
>>> 
>>> So... what's the most elegant way to this (is there one)?  Is there any alternative to just looking up the UTF8 encodings and writing:
>>> 
>>>  put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc") into kMyConstantString
>>> 
>>> ?
>>> 
>>> TIA,
>>> 
>>> Ben
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list