the mouseText and Unicode: a 3-char puzzle

Slava Paperno slava at lexiconbridge.com
Tue Jun 21 02:40:43 EDT 2011


Following Tariel's report, here is a puzzle:

Make a text entry field and set its font to Arial,Unicode.

Put these three characters in the field and lock it:

<->

The first one is decimal 171, the last one is decimal 187; they are called
Double Angle Quotation Marks.

The one in the middle is called Em-Dash, decimal 8212.

Give the field this mouseDown script:

on mouseDown
   PUT "FIELD"
   repeat with i = 1 to length( the unicodeText of field "TextToClick")
      put cr & byteToNum(byte i of the unicodeText of field "TextToClick")
after msg
   end repeat
   
   put the unicodeText of field "TextToClick" into locEntireText --this is
UTF16
   PUT cr & "VAR UTF-16" after msg
   repeat with i = 1 to length(locEntireText)
      put cr & byteToNum(byte i of locEntireText) after msg
   end repeat
   
   put uniDecode(locEntireText, "UTF8") into locEntireText --this is UTF8
   PUT cr & "VAR UTF-8" after msg
   repeat with i = 1 to length(locEntireText)
      put cr & byteToNum(byte i of locEntireText) after msg
   end repeat
end mouseDown

When I click the field in LC 4.6.1 on my Windows 7 machine, I get this
display in the Message box:

FIELD
171
0
20
32
187
0
VAR UTF-16
171
0
20
32
187
0
VAR UTF-8
194
171
226
128
148
194
187

The FIELD and the VAR UTF-16 reports are entirely predictable, but the VAR
UTF-8 list is puzzling to me. I expected six bytes, not seven.

There is a practical reason for trying to solve this puzzle: these three
characters throw off the byte count that I used in the workaround for the
"clickedUnicodeText" problem that was discussed under this Subject line the
other day. I feel obliged to restore order in this chaotic universe, or fall
asleep trying.

Thanks, Tariel, and thank you all for reading this,

Slava

> -----Original Message-----
> From: Tariel Gogoberidze [mailto:tariel at me.com]
> Sent: Monday, June 20, 2011 11:58 AM
> To: slava at lexiconbridge.com
> Subject: Re: the mouseText and Unicode: CONCLUSION
> 
> 
> Hi Slava,
> 
> Tried your script (nice job), but with text I copied from some Russian
> web side it brakes on word "dikanky" and all words after that.
> Try attached stack, you will see on which char it brakes farther word
> selection and removing this char will allow correct selection again.
> 
> regards
> Tariel







More information about the use-livecode mailing list