Double-Value Unicode?

Richmond richmondmathewson at gmail.com
Sun May 18 06:20:32 EDT 2014


"Surrogates Area: U+D800–U+DFFF
When using UTF-16 to represent supplementary characters, pairs of 16-bit 
code units are
used for each character. These units are called surrogates. To 
distinguish them from ordinary
characters, they are allocated in a separate area. The Surrogates Area 
consists of 1,024
low-half surrogate code points and 1,024 high-half surrogate code 
points. For the formal
definition of a surrogate pair and the role of surrogate pairs in the 
Unicode Conformance
Clause, see Section 3.8, Surrogates, and Section 5.4, Handling Surrogate 
Pairs in UTF-16.
The use of surrogate pairs in the Unicode Standard is formally 
equivalent to the Universal
Transformation Format-16 (UTF-16) defined in ISO 10646."

"High-Surrogate. The high-surrogate code points are assigned to the 
range U+D800..
U+DBFF. The high-surrogate code point is always the first element of a 
surrogate pair.
Low-Surrogate. The low-surrogate code points are assigned to the range 
U+DC00..
U+DFFF. The low-surrogate code point is always the second element of a 
surrogate pair.
Private-Use High-Surrogates. The high-surrogate code points from 
U+DB80..U+DBFF
are private-use high-surrogate code points (a total of 128 code points)."

http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf

-----------------------------------------------

Lots of bl**dy boring reading . . . certainly NOT "Lady Chatterley's Lover."

-----------------------------------------------

Oddly enough: there seems to be no formula here for working the pairs out!

Here's a formula:

http://perldoc.perl.org/Encode/Unicode.html

and, joy, oh joy! The word "ensurrogate": Wow, way cool.

Richmond.




More information about the use-livecode mailing list