Importing Unicode text to a field .. How?

Richmond Mathewson richmondmathewson at gmail.com
Fri May 27 16:04:38 EDT 2011


Something just occurred to me . . .
>>>
>>> #  THE FOLLOWING IS A SINGLE LINE IN THE LC FIELD:
>>>
>>> Converted from மயிலை text in 
>>> /Users/sivakatirswami/Documents/Tamil/Natchintanai in Unicode/3 
>>> Thannai Thannaal.txt தன்னைத்தன்னால்தன்னைத் தன்னால் அறிந்திட வேண்டுமேதானா 
>>> யெங்குஞ் செறிந்திட வேண்டுமே[snip]
>>>
>>> on mouseup
>>>    answer file "Choose the Unicode for this song" with "OK"
>>>    put url ("binfile:/"&  it) into tUnicode
>>>    set the useUnicode to true
>>>   set the unicodetext of  fld "Unicode_Script" to  tUnicode
>>>   replace numtochar(13) with numtochar(10) in fld "unicode_Script"
>>>    # the above line restores the line breaks but destroys the 
>>> text... or rather, converts it to some other encoding displays 
>>> Japanese characters
>>> end mouseup
>>>

Unicode text is double-byte stuff, while the CR and LF are single-byte 
things.

SO . . . double-byte strings SHOULD always consist of an even number of 
bytes,

AND, if a text field is flagged as containing unicodeText when the 
engine starts reading its contents
it will start taking "double-byte bites" of the string.

THEREFORE, inserting either numToChar(10) or numToChar(13) into a 
double-byte text will
throw the engine out of kilter because it will start "biting" 
double-bytes "off".

TAKE A LOOK at this:

http://en.wikipedia.org/wiki/Newline

and, just possibly, you need to replace you CR/LF with :

numToChar(8232)

worth a try . . .  :)




More information about the use-livecode mailing list