What is LC's internal text format?
Ben Rubinstein
benr_mc at cogapp.com
Tue Nov 13 06:43:08 EST 2018
I'm grateful for all the information, but _outraged_ that the thread that I
carefully created separate from the offset thread was so quickly hijacked for
the continuing (useful!) detailed discussion on that topic.
From recent contributions on both threads I'm getting some more insights, but
I'd really like to understand clearly what's going on. I do think that I
should have asked this question more broadly: how does the engine represent
values internally?
I believe from what I've read that the engine can distinguish the following
kinds of value:
- empty
- array
- number
- string
- binary string
From Monte I get that the internal encoding for 'string' may be MacRoman, ISO
8859 (I thought it would be CP1252), or UTF16 - presumably with some attribute
to tell the engine which one in each case.
So then my question is whether a 'binary string' is a pure blob, with no clues
as to interpretation; or whether in fact it does have some attributes to
suggest that it might be interpreted as UTF8, UTF132 etc?
If there are no such attributes, how does codepointOffset operate when passed
a binary string?
If there are such attributes, how do they get set? Evidently if textEncode is
used, the engine knows that the resulting value is the requested encoding. But
what happens if the program reads a file as 'binary' - presumable the result
is a binary string, how does the engine treat it?
Is there any way at LiveCode script level to detect what a value is, in the
above terms?
And one more question: if a string, or binary string, is saved in a 'binary'
file, are the bytes stored on disk a faithful rendition of the bytes that
composed the value in memory, or an interpretation of some kind?
TIA,
Ben
More information about the use-livecode
mailing list