Getting Kanji from a .csv file

Dar Scott dsc at swcp.com
Fri Jun 7 22:24:09 EDT 2013


OK, using the Unicode byte order mark as a signature does work for TextEdit.

The "byte order mark" is a non displaying Unicode character.  The code is U+FEFF.  That is, it is FEFF in base 16, which we write as the numeral 0xFEFF in LiveCode.  It is just a big character.  It can be used as a pattern, a signature, to indicate what form and encoding scheme the Unicode is in in the file.  It can even be used to recognize UTF8 in contrast with other encodings.

It is sufficient for TextEdit to decide the file is UTF8.  (TextEdit is not that smart and relies on cheats it puts into resources, so the signature is important.)

You can put it in front of your unicode data before you put it into the field or after.  It is preserved by the field (my worries were for naught).  

Just make sure you have it at the front of the file before you save.

Here is how to put it in front of your unicode text:

set the useUnicode to true -- make numToChar use 16-bit chunks not bytes
put numToChar( 0xFEFF ) before myUnicodeText

That's it!

After you convert myUnicodeText (so named in my example) to UTF8 and save it, your file will be 3 bytes bigger than the original (that character is expanded to 3 bytes in UTF8).  The file can grow if you keep editing the same file, so once you have the above working, work on only adding it if it is not already there.  

I know this is a lot to take in and I apologize for not being able to explain things simply.  Just ask and I will try.  Or somebody who can figure out what I'm saying might be able to explain it better.

Dar



On Jun 7, 2013, at 7:46 PM, Dar Scott wrote:

> Oh, TextEdit cheats.  Did this come from TextEdit?  It puts some info in the resource fork.  That is lost when you write back out.
> 
> I'll ponder this.  Or maybe some OS X resource experts might know.
> 
> Dar
> 
> On Jun 7, 2013, at 5:21 PM, Howard Bornstein wrote:
> 
>>> I don't know what characters the field might throw away.  So, putting the
>>> file into the field and then modifying the field seems scary to me.  Maybe
>>> all the data is there, but maybe not.
>>> 
>> 
>> The actual command I used was <set the unicodetext of fld "ProcessedFile"
>> to uniencode(fld  "ProcessedFile, "UTF8")> (extraneous "the" in my first
>> example)
>> 
>> I had no problems with this. In fact, it processed a file with about
>> 300,000 lines in just a few seconds.
>> 
>> And then save the field much like this:
>>> 
>>> put uniDecode( the unicodeText of field "Processed File", "UTF8" ) into
>>> URL ("file:"&kFile2)
>>> 
>> 
>> I tried some variations of this but was not able to save the file from
>> within LC and still have the Kanji viewable in TextEdit. I guess you didn't
>> read the part about teaching to the imbecile because the rest of your
>> explanation was way over my head.
>> 
>> But thanks for trying.
>> 
>> I would still like to find a way to do this from within LC.
>> 
>> -- 
>> Regards,
>> 
>> Howard Bornstein
>> -----------------------
>> www.designeq.com
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list