What is LC's internal text format?

Ben Rubinstein benr_mc at cogapp.com
Tue Nov 13 06:43:08 EST 2018

I'm grateful for all the information, but _outraged_ that the thread that I 
carefully created separate from the offset thread was so quickly hijacked for 
the continuing (useful!) detailed discussion on that topic.

 From recent contributions on both threads I'm getting some more insights, but 
I'd really like to understand clearly what's going on. I do think that I 
should have asked this question more broadly: how does the engine represent 
values internally?

I believe from what I've read that the engine can distinguish the following 
kinds of value:
	- empty
	- array
	- number
	- string
	- binary string

 From Monte I get that the internal encoding for 'string' may be MacRoman, ISO 
8859 (I thought it would be CP1252), or UTF16 - presumably with some attribute 
to tell the engine which one in each case.

So then my question is whether a 'binary string' is a pure blob, with no clues 
as to interpretation; or whether in fact it does have some attributes to 
suggest that it might be interpreted as UTF8, UTF132 etc?

If there are no such attributes, how does codepointOffset operate when passed 
a binary string?

If there are such attributes, how do they get set? Evidently if textEncode is 
used, the engine knows that the resulting value is the requested encoding. But 
what happens if the program reads a file as 'binary' - presumable the result 
is a binary string, how does the engine treat it?

Is there any way at LiveCode script level to detect what a value is, in the 
above terms?

And one more question: if a string, or binary string, is saved in a 'binary' 
file, are the bytes stored on disk a faithful rendition of the bytes that 
composed the value in memory, or an interpretation of some kind?



More information about the use-livecode mailing list