Double-Value Unicode?
Richmond
richmondmathewson at gmail.com
Sun May 18 06:20:32 EDT 2014
"Surrogates Area: U+D800–U+DFFF
When using UTF-16 to represent supplementary characters, pairs of 16-bit
code units are
used for each character. These units are called surrogates. To
distinguish them from ordinary
characters, they are allocated in a separate area. The Surrogates Area
consists of 1,024
low-half surrogate code points and 1,024 high-half surrogate code
points. For the formal
definition of a surrogate pair and the role of surrogate pairs in the
Unicode Conformance
Clause, see Section 3.8, Surrogates, and Section 5.4, Handling Surrogate
Pairs in UTF-16.
The use of surrogate pairs in the Unicode Standard is formally
equivalent to the Universal
Transformation Format-16 (UTF-16) defined in ISO 10646."
"High-Surrogate. The high-surrogate code points are assigned to the
range U+D800..
U+DBFF. The high-surrogate code point is always the first element of a
surrogate pair.
Low-Surrogate. The low-surrogate code points are assigned to the range
U+DC00..
U+DFFF. The low-surrogate code point is always the second element of a
surrogate pair.
Private-Use High-Surrogates. The high-surrogate code points from
U+DB80..U+DBFF
are private-use high-surrogate code points (a total of 128 code points)."
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
-----------------------------------------------
Lots of bl**dy boring reading . . . certainly NOT "Lady Chatterley's Lover."
-----------------------------------------------
Oddly enough: there seems to be no formula here for working the pairs out!
Here's a formula:
http://perldoc.perl.org/Encode/Unicode.html
and, joy, oh joy! The word "ensurrogate": Wow, way cool.
Richmond.
More information about the use-livecode
mailing list