best/fastest way to tell if a field contains unicode text?

Fraser Gordon fraser.gordon at runrev.com
Thu Mar 20 13:50:28 EDT 2014


On 20 Mar 2014, at 17:28, Mark Wieder <mwieder at ahsoftware.net> wrote:
> 
> In that case, shouldn't the setUnicode value be "true" rather than false?
> And does it make sense to have that property settable any more?

All the previous Unicode functionality has been left as-is in order to avoid breaking existing stacks. However, they can be completely ignored in future. I don't think that there are any existing LiveCode commands/functions/properties/etc containing the word "unicode" that are actually useful in the 7.0 engine (but they all need to remain for backwards compatibility).

One annoyance is that the unicodeText of a field is not, in fact, unicode text in the 7.0 engine - it is binary data. Similarly, the uniEncode and uniDecode functions convert between two different forms of binary data rather than binary data and text. As uniEncode and uniDecode do completely the wrong thing as far as 7.0 is concerned, they are deprecated and should be replaced with textDecode(binary encoding -> text) and textEncode(text -> binary encoding). Again, backwards compatibility.

> put unidecode("hello bucko")
> 
> converts the text to 敨汬Ɐ戠捵潫.

What you asked the engine to do there is convert UTF-16 to binary data (as uniDecode expects binary) which it does, giving you "hello bucko" in an 8-bit encoding. UniDecode then takes that and drops the high bytes of each UTF-16 codeunit that it expects the binary data to contain. But it isn't UTF-16 so bad things happen.

In 7.0 you should instead say textEncode("hello bucko", "native") and you'll get some nice, 8-bit binary data. Or, if you just pass it to something that expects binary data (like a file), it'll get converted to 8-bit automatically.

Regards,
Fraser





More information about the use-livecode mailing list