What is LC's internal text format?

Monte Goulding monte at appisle.net
Mon Nov 12 18:50:00 EST 2018


Text strings in LiveCode are native encoded (MacRoman or ISO 8859) where possible and where you don’t explicitly tell the engine it’s unicode (via textDecode) so that they can follow faster single byte code paths. If you use textDecode then the engine will first check if the text can be native encoded and use native if so otherwise it will use UTF 16 encoding.

For what it’s worth using `offset` is the wrong thing to do if you have textEncoded your strings into binary data. You want to use `byteOffset` otherwise the engine will convert your data to a string and assume native encoding. This is probably why you are getting some case insensitivity.

I haven’t been following along the offset discussion. I’ll have to take a look to see if there were some speed comparisons between offset and codepointOffset.

Cheers

Monte

> On 13 Nov 2018, at 9:35 am, Ben Rubinstein via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
> This is something that I've been wondering about for a while.
> 
> My unexamined assumption had been that in the 'new' fully unicode LC, text was held in UTF-8. However when I saved some text strings in binary I got something like UTF-8 - but not quite. And the recent experiments with offset suggested that LC at the least is able to distinguish between a string which is fully represented as single-byte (or perhaps ASCII?). And the reports of the ingenious investigators using UTF-32 to speed up offsets, and discovering that offset somehow managed to be case-insensitive in this case, made me wonder whether after using textEncode(xt, "UTF-32") LC marks the string in some way to give a clue about how to interpret it as text?
> 
> So could someone who is familar with this bit of the engine enlighten us? In particular:
> - What is the internal format?
> - Is it different on different platforms?
> - Given that it appears to include a flag to indicate whether it is single-byte text or not, are there any other attributes?
> - Does saving a string in 'binary' file faithfully report the internal format?
> 
> TIA,
> 
> Ben
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list