double byte chars?

Slava Paperno slava at lexiconbridge.com
Sat Jun 11 15:12:00 EDT 2011


The "set useUnicode to true" command is necessary only if you use the charToNum() or numToChar() functions. Otherwise they’re not useful.

The text in your fields is in UTF-16, and you should access it as unicodeText of field "MyField."

Word chunks of unicodeText can be correctly retrieved if you use:

word 2 of unicodeText of field "MyField"

There is a tutorial by Devin Asay on the use of Unicode in LiveCode at http://www.runrev.com/developers/lessons-and-tutorials/tutorials/unicode-in-revolution/ It has examples of retrieving a specific word chunk.

If you start processing Russian text in your variables, you will often find it better to convert it to UTF8 first: put uniDecode(unicodeText of field "MyField", "UTF") into MyUTF8String. To put the result back into a field, convert it back to UTF-16: set unicodeText of field "MyField" to uniEncode(MyUTF8String, "UTF8")

A sure-fire way to do any sort of string comparisons is to convert everything to decimal code points and then work with the numbers. Some parts of LC is not capable of shipping Unicode strings, and in those situations using the  numbers solves the problem.

If you are reading your UTF-8 text from Unicode text files (e.g. saved from Notepad with the UTF-8 encoding option), you may have to take into account the first three bytes that you read in: they are the Byte Order Marker. You'll want to delete them from your strings before trying to access a specific byte in the string. 

If you still get into trouble, feel free to ask me offlist (sp27 at cornell.edu) for a sample application that shows these operations. I'm still working on it, but when I'm done, I'll make it available online.

Best regards,

Slava


> -----Original Message-----
> From: use-livecode-bounces at lists.runrev.com [mailto:use-livecode-
> bounces at lists.runrev.com] On Behalf Of Richmond Mathewson
> Sent: Saturday, June 11, 2011 2:24 PM
> To: How to use LiveCode
> Subject: Re: double byte chars?
> 
> On 06/11/2011 09:14 PM, Lars Brehmer wrote:
> > My project has Russian text fields (Arial,Russian). With one
> exception, everything works fine.
> >
> > Problem: a filter-as-you-type script.
> >
> > field "t1":     зо
> > field "t2":     меня зовут Виктор  --underscoring shows the matches--
> > field "t3":     зовут курить почему
> >
> > I want to do is find a word in fields t2 and t3 that begins with the
> 2 letters in field t1. Word 2 in field t2 and word 1 in field t3 should
> be matches. But this only works if the matching word is the first word
> in the field!
> >
> > Some simple message box scripts:
> 
> At the risk of insulting you, as you are using Unicode I have a funny
> feeling you have to
> prefix this sort of this with
> 
> set the useUnicode to true
> > put fld "t1"&  cr&  fld "t2"&  cr&  fld "t3"
> >
> > The result is a bunch of numbers, symbols and squares. You can
> clearly spot the matches.
> >
> > Next in the message box:   --char 1 to 4 -- double byte chars--
> >
> > put char 1 to 4 in fld "t1" into aText
> > put char 1 to 4 in word 2 in fld "t2" into bText
> > put char 1 to 4 in word 1 in fld "t3" into cText
> > put aText&  cr&  bText&  cr&  cText
> >
> > This should be 3 identical lines, right? But no. Line 2 is missing
> the final char.
> >
> > 7(square)>(square)
> > 7(square)>
> > 7(square)>(square)
> >
> > Next: comparing the strings
> >
> > if cText = aText then beep - it beeps
> > if cText is in aText then beep - it beeps
> > if bText = aText then beep - no beep, obviously
> >
> > BUT
> >
> > if bText is in aText then beep - also no beep!
> >
> > And then
> >
> > put char 1 to 5 in word 2 in field "t2", it returns the same as the
> other two:
> >
> > 7(square)>(square)
> >
> > so then
> >
> > put char 1 to 5 in word 2 into bText
> >
> > but
> >
> > if bText = (or is in) aText still returns nothing
> >
> > Why is that last double byte char always missing when the word is not
> word 1 in its field? If I do char 1 to 3 I get this (again!)
> >
> > 7(square)>
> > 7(square)   --last char missing!
> > 7(square)>
> >
> > Using itemDEL = space and char 1 to x in item z behaves the same.
> >
> > Anyone know the answer?
> >
> > Cheers,
> >
> > Lars
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode






More information about the use-livecode mailing list