Importing Unicode text to a field .. How?
Richmond Mathewson
richmondmathewson at gmail.com
Fri May 27 16:04:38 EDT 2011
Something just occurred to me . . .
>>>
>>> # THE FOLLOWING IS A SINGLE LINE IN THE LC FIELD:
>>>
>>> Converted from மயிலை text in
>>> /Users/sivakatirswami/Documents/Tamil/Natchintanai in Unicode/3
>>> Thannai Thannaal.txt தன்னைத்தன்னால்தன்னைத் தன்னால் அறிந்திட வேண்டுமேதானா
>>> யெங்குஞ் செறிந்திட வேண்டுமே[snip]
>>>
>>> on mouseup
>>> answer file "Choose the Unicode for this song" with "OK"
>>> put url ("binfile:/"& it) into tUnicode
>>> set the useUnicode to true
>>> set the unicodetext of fld "Unicode_Script" to tUnicode
>>> replace numtochar(13) with numtochar(10) in fld "unicode_Script"
>>> # the above line restores the line breaks but destroys the
>>> text... or rather, converts it to some other encoding displays
>>> Japanese characters
>>> end mouseup
>>>
Unicode text is double-byte stuff, while the CR and LF are single-byte
things.
SO . . . double-byte strings SHOULD always consist of an even number of
bytes,
AND, if a text field is flagged as containing unicodeText when the
engine starts reading its contents
it will start taking "double-byte bites" of the string.
THEREFORE, inserting either numToChar(10) or numToChar(13) into a
double-byte text will
throw the engine out of kilter because it will start "biting"
double-bytes "off".
TAKE A LOOK at this:
http://en.wikipedia.org/wiki/Newline
and, just possibly, you need to replace you CR/LF with :
numToChar(8232)
worth a try . . . :)
More information about the use-livecode
mailing list