unidecode broken on Intel Macs?
Mark Schonewille
m.schonewille at economy-x-talk.com
Wed Aug 1 10:36:23 EDT 2007
Hi Klaus,
You are right, but only with regard to languages that can be
expressed in single-byte characters. If you are working with double-
byte languages, you can't simple unidecode into single-byte
characters. Apparently, Rev 1.0 couldn't handle Chinese and Arabic
(but I never tried that with version 1.0).
I am not sure that the addition of the capability to handle double-
byte characters automatically implies that unidecode no longer cuts
off the second byte of each pair regardless of platform. I did have a
Mac file recently, which I had to convert to little endian before
unidecoding it on Windows.
Best regards,
Mark Schonewille
--
Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com
Quickly extract data from your HyperCard stacks with DIFfersifier.
http://differsifier.economy-x-talk.com
Op 1-aug-2007, om 16:20 heeft Klaus Major het volgende geschreven:
> Hi Mark,
>
>
>> A big-endian (motorola) unicode character will be in the form :
>> msb lsb, so if the character falls within the ascii range, say
>> "A", then it will be <numToChar(65) numToChar(0)>.
>>
>> If it's in little-endian (intel) format, the same char will be
>> <numToChar(0) numToChar(65)>.
>>
>> Unidecode simply removes the most significant byte of each unicode
>> char/pair, so on Intel, thats the second byte, and on motorola
>> that's the first byte.
>
> Yep, that's what I read in the docs.
>
> But the docs also read:
> "The ability to handle double-byte characters on "little-endian"
> processors was added in version 2.0. In previous versions, the
> uniDecode function always removed the second byte of each pair of
> bytes, regardless of platform."
>
> This gives me the impression that the function itself will take
> care of the differences between the processors -> "...regardless of
> platform"!
> Maybe I am wrong?
>
>> So the upshot is that if your data is big-endian (motorola), then
>> to work with unidecode on intel, you'll need to swap each pair of
>> bytes.
>>
>> function swapBytes pString
>> repeat with n = 1 to length(pString) - 1 step 2
>> put char n+1 of pString & char n of pString after swappedString
>> end repeat
>> return swappedString
>> end swapBytes
>
> Thanks a lot, will try this (well maybe... ;-)
>
>> I'm hoping that we'll get a complete revamp of Revs unicode
>> handling, one of these days, but we're stuck with this sort of
>> thing for now. :(
More information about the use-livecode
mailing list