unidecode broken on Intel Macs?

Mark Schonewille m.schonewille at economy-x-talk.com
Wed Aug 1 10:36:23 EDT 2007


Hi Klaus,

You are right, but only with regard to languages that can be  
expressed in single-byte characters. If you are working with double- 
byte languages, you can't simple unidecode into single-byte  
characters. Apparently, Rev 1.0 couldn't handle Chinese and Arabic  
(but I never tried that with version 1.0).

I am not sure that the addition of the capability to handle double- 
byte characters automatically implies that unidecode no longer cuts  
off the second byte of each pair regardless of platform. I did have a  
Mac file recently, which I had to convert to little endian before  
unidecoding it on Windows.

Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.  
http://differsifier.economy-x-talk.com


Op 1-aug-2007, om 16:20 heeft Klaus Major het volgende geschreven:

> Hi Mark,
>
>
>> A big-endian (motorola) unicode character will be in the form :  
>> msb lsb, so if the character falls within the ascii range, say  
>> "A", then it will be <numToChar(65) numToChar(0)>.
>>
>> If it's in little-endian (intel) format, the same char will be  
>> <numToChar(0) numToChar(65)>.
>>
>> Unidecode simply removes the most significant byte of each unicode  
>> char/pair, so on Intel, thats the second byte, and on motorola  
>> that's the first byte.
>
> Yep, that's what I read in the docs.
>
> But the docs also read:
> "The ability to handle double-byte characters on "little-endian"  
> processors was added in version 2.0. In previous versions, the  
> uniDecode function always removed the second byte of each pair of  
> bytes, regardless of platform."
>
> This gives me the impression that the function itself will take  
> care of the differences between the processors -> "...regardless of  
> platform"!
> Maybe I am wrong?
>
>> So the upshot is that if your data is big-endian (motorola), then  
>> to work with unidecode on intel, you'll need to swap each pair of  
>> bytes.
>>
>> function swapBytes pString
>>   repeat with n = 1 to length(pString) - 1 step 2
>>     put char n+1 of pString & char n of pString after swappedString
>>   end repeat
>>   return swappedString
>> end swapBytes
>
> Thanks a lot, will try this (well maybe... ;-)
>
>> I'm hoping that we'll get a complete revamp of Revs unicode  
>> handling, one of these days, but we're stuck with this sort of  
>> thing for now. :(




More information about the use-livecode mailing list