Elegant way to express constant UTF8 string in script?
Mark Schonewille
m.schonewille at economy-x-talk.com
Mon Jun 30 14:10:45 EDT 2014
Hi Ben,
My solution will work in pre-7 and is 100% vanilla LiveCode (no idea why you explicitly mention again that it should be script-only). You'll have to change your script when you move to 7. Obviously, you could write a script for both versions using the do command for the 7-specific part of your script.
--
Kind regards,
Mark Schonewille
Economy-x-Talk
Http://economy-x-talk.com
Share the clipboard of your computer over a local network with Clipboard Link http://clipboardlink.economy-x-talk.com
Op 30 jun. 2014 om 19:24 heeft Ben Rubinstein <benr_mc at cogapp.com> het volgende geschreven:
> Hi Mark,
>
> Thanks for the reply. The problem is
>
> a) I want to do this purely in script
>
> b) A character directly entered into the script on a Mac comes out different on Windows (i.e. the scripts don't know what character set they're in; they're simply stored with no indication of character set, and on every platform they're interpreted as the supposedly 'native' platform for that character set).
>
> Presumably in 7.0 I won't even need to use normaliseText, because the scripts will themselves be stored in Unicode or UTF8, and therefore I can use any Unicode character in a real script constant. But not in 6.x.
>
> Ben
>
> On 30/06/2014 16:09, Mark Schonewille wrote:
>> Hi Ben,
>>
>> The apostrophe doesn't work because you convert to ASCII text that looks different on different platforms. If you don't use unidecode and just set the unicodeText of a field to your Unicode string, it should work. If that's not practical, you could use macToIso() to convert your string to Latin-1.
>>
>> --
>> Kind regards,
>>
>> Mark Schonewille
>> Economy-x-Talk
>> Http://economy-x-talk.com
>>
>> Share the clipboard of your computer over a local network with Clipboard Link http://clipboardlink.economy-x-talk.com
>>
>>
>> Op 30 jun. 2014 om 16:38 heeft Ben Rubinstein <benr_mc at cogapp.com> het volgende geschreven:
>>
>>> I think this problem should be solved in LC 7 (possibly using normaliseText); but I need a solution that I can ship now (and it's been threatened that LC 7 will 'fix' a 'bug' which isn't, so I'm not sure if I'll ever able to use it).
>>>
>>> My app processes some data from - and then, re-organised, to - UTF8 text files. Occasionally it needs to insert a constant string; and for various reasons (all of them excellent) I want to specify these constant strings in the script. So far, so good. Now however one of these constant strings needs to contain a character which is not in ASCII. Actually two of them. So I need to express a UTF8 string in my script. And I'm searching for an elegant way to do this.
>>>
>>> My constant string used to look something like this:
>>>
>>> constant kMyConstantString = "This is my ice cream"
>>>
>>> but now it needs to read something like
>>> constant kMyConstantString = "This ice cream is (c) Ben and Jerry's Inc"
>>>
>>> (only with a smart apostrophe and a proper copyright symbol).
>>>
>>> I thought I could just about manage with this
>>>
>>> put uniDecode(uniEncode("This ice cream is © Ben and Jerry’s Inc, "ANSI"), "UTF8") into kMyConstantString
>>>
>>> (that is, encode from ANSI to Unicode, then from Unicode into UTF8).
>>>
>>> I tested it on Mac and it seemed to work. The UTF8 file was generated and this text came out just right.
>>>
>>>
>>> However, it turned out that when the code was compiled and run on Windows, the copyright symbol came out OK, but the apostrophe came out as o-tilde.
>>>
>>> This is because uniEncode(..., "ANSI") is a lie; "ANSI" is meaningless; instead it interprets the source encoding as whatever is typical for the operating system. I wrote the script on Mac; in MacRoman, © is 0xA9 and smart apostrophe is 0xD5; in ISO-8859-1 (and UTF8), 0xA9 is ©, but 0xD5 is o-tilde.
>>>
>>> So... what's the most elegant way to this (is there one)? Is there any alternative to just looking up the UTF8 encodings and writing:
>>>
>>> put format("This ice cream is \xC2\xA9 Ben and Jerry\xE2\x80\x99s Inc") into kMyConstantString
>>>
>>> ?
>>>
>>> TIA,
>>>
>>> Ben
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list