Character Encodings and Livecode fields

Richmond richmondmathewson at gmail.com
Sun Jan 26 11:03:33 EST 2014


On 26/01/14 17:09, Graham Samuel wrote:
> The recent discussions under 'Character Encodings' and other related subjects brought me back to some questions:
>
> 1. Is there an actual property of an LC field that can be examined programmatically to show whether the field works as a Unicode string or not? (I know there 'unicodeText', which is a property of a text string, but not actually of a field AFAIKS).
>
> 2. How does the mechanism (which I apparently unearthed using Richmond's little Unicode-querying utility) work whereby a non-Unicode character string appended to a Unicode string in an LC field itself becomes Unicode? Is everything made into two-byte characters? I assume this is the case, but I want to be sure. Experiments with Richmond's utility are confusing - the whole string appears to be Unicode if one explicitly Unicode character is present; but if that character is deleted while the others remain, it seems that the string stops being Unicode - this is scarcely credible, but it's what I seem to be seeing.

Nothing confusing about it: my stack does NOT detect if some text within 
a field is Unicode (play around with it):

what it DOES do is this:

1. Take the text in the field "ORIGIN" and copies it into the fld "OOT" 
using this command:

put fld "ORIGIN" into fld "OOT"

Now, if the text is ASCII the two fields will have identical contents; 
but if anything (a single char) in the field "ORIGIN" is NOT ASCII
it will get mucked about with and appear differently in the field "OOT".

2. Compare the 2 fields like this:

if fld "ORIGIN" = fld "OOT" then
     put "This is NOT Unicode"
else
     put "This IS Unicode"
end if

So, quite obviously, if fld "ORIGIN" contains some plain, vanilla ASCII 
in it and then one appends a non-ASCII char,
that will end up with "This IS Unicode" in the Message box.

>
> 3. If I paste one of the Mac-only 'special' non-straight-ascii characters (Mac-Roman - like the square root character) into a Unicode string, will it end up as the Unicode version of the same symbol? I think not, so some kind of pre-filtering would be needed using platform knowledge before allowing the characters in the field to be parsed as Unicode.
>
> My objective as before is to allow a user to type or paste text into a field from any reasonable source, such as a text processor (any platform), a web page or maybe a Tex document, and for non-ascii characters like pi, square root etc to be included, with the whole field always ending up as Unicode (you can see I'm interested in mathematical stuff, but this could also work for other languages and special characters). The recent discussions make me doubt the feasibility of this. Does anyone know exactly what the mother ship is planning in respect of 'promoting' non-Unicode character strings to Unicode?
>
> TIA
>
> Graham
> _______________________________________________
>

Richmond.




More information about the use-livecode mailing list