unidecode broken on Intel Macs?

Klaus Major klaus at major-k.de
Wed Aug 1 10:20:25 EDT 2007


Hi Mark,


> A big-endian (motorola) unicode character will be in the form : msb  
> lsb, so if the character falls within the ascii range, say "A",  
> then it will be <numToChar(65) numToChar(0)>.
>
> If it's in little-endian (intel) format, the same char will be  
> <numToChar(0) numToChar(65)>.
>
> Unidecode simply removes the most significant byte of each unicode  
> char/pair, so on Intel, thats the second byte, and on motorola  
> that's the first byte.

Yep, that's what I read in the docs.

But the docs also read:
"The ability to handle double-byte characters on "little-endian"  
processors was added in version 2.0. In previous versions, the  
uniDecode function always removed the second byte of each pair of  
bytes, regardless of platform."

This gives me the impression that the function itself will take care  
of the differences between the processors -> "...regardless of  
platform"!
Maybe I am wrong?

> So the upshot is that if your data is big-endian (motorola), then  
> to work with unidecode on intel, you'll need to swap each pair of  
> bytes.
>
> function swapBytes pString
>   repeat with n = 1 to length(pString) - 1 step 2
>     put char n+1 of pString & char n of pString after swappedString
>   end repeat
>   return swappedString
> end swapBytes

Thanks a lot, will try this (well maybe... ;-)

> I'm hoping that we'll get a complete revamp of Revs unicode  
> handling, one of these days, but we're stuck with this sort of  
> thing for now. :(
>
> Best,
>
> Mark

Regards from germany

Klaus Major
klaus at major-k.de
http://www.major-k.de





More information about the use-livecode mailing list