Ideal Unicode?

Jeff Massung massung at gmail.com
Mon Aug 15 20:26:08 EDT 2011


On Mon, Aug 15, 2011 at 3:32 PM, Kee Nethery <kee at kagi.com> wrote:

> In my perfect programming world ...
>
> I'd want all characters all the time for any place characters are displayed
> to be displayed and entered as unicode characters and represented as UTF8
> bytes.
>
> If the display version has "割劥" I'd want the language to recognize those as
> two characters and as 6 bytes.
>
> I want UTF8 instead of UTF16 because UTF8 is the same byte stream
> regardless of processor endian-ness and more importantly, the entire web
> uses UTF8.
>
> Is this crazy talk or would this be your ideal programming system for
> unicode?
>
>
You had me up until UTF8 for everything. While I understand the sentiment,
this has the potential to absolutely *suck* the performance out of LC apps.
UTF8 is great because it's indistinguishable from ASCII. Other than that,
it's an absolute PITA to work with because you can't just grab data. You
can't say "give me the 104th character" of a string w/o traversing the 103
characters preceding it, because they may be 1, 2, or more bytes long each.

Now, think about all the LC out there that do things like "replace the
second character of the fourth word of the fifth line of myString with ...."
There's a lot. Similarly, getting the length of a string would require going
through the string to do so.

Now, there's ways around this performance hit, but they all require using
more memory. And if you are already using more memory, why not just use wide
characters for everything anyway? Your end-user will never know the
difference, and you can write data out using UTF8 if it's more convenient,
and read it in that way. But, internally, I'd prefer using fixed 16- or
32-bits for each character and knowing that when I ask for character 103843
of a very large buffer, I get it back in O(1) time.

But that's just me. ;-)

Of course, this all assumes Rev is moving towards 5.0 and the new field
which is 100% unicode. But over the past month or so several questions have
been asked about progress on this front with zero feedback - not even an
acknowledgement that the question was asked.

/shrug

Jeff M.



More information about the use-livecode mailing list