Getting Kanji from a .csv file

Dar Scott dsc at swcp.com
Sat Jun 8 01:35:05 EDT 2013


   put URL ("file:" & inputFile) into originalFile  -- UTF-8
   put uniEncode(originalFile,"UTF8") into UofO  -- UTF-16
   set the unicodeText of field "Test" to UofO -- UTF-16
--
-- process the field
--
   put the unicodeText of field "Test" into UofF -- UTF-16
   set the useUnicode to true
   put numtoChar(0xFEFF) before UofF -- UTF-16
   set the useUnicode to false
   put uniDecode(UofF,"UTF8") into UTF8ofF -- UTF-8
   put UTF8ofF into URL ("file:"&outputFile)  -- UTF-8


On Jun 7, 2013, at 11:14 PM, Howard Bornstein wrote:

> Hmmm, I tried what you suggested but it didn't seem to work.
> 
> Here's my code with your snippet inserted:
> 
>   *put* uniEncode (the unicodeText of field "ConvertedText",  "UTF8") into
> thetext
> 
>   *set* the useUnicode to true *-- make numToChar use 16-bit chunks not
> bytes*
> 
>   *put* numToChar( 0xFEFF ) before thetext
> 
>   *put* thetext into URL ("file:Messages.txt")
> 
> I noticed that at one point you suggested <put uniDecode( the unicodeText
> of field "Processed File", "UTF8" ) into URL ("file:"&kFile2)>
> 
> 
> if I change the first line of my code to use uniDecode:
> 
> *put* uniDecode (the unicodeText of field "ConvertedText",  "UTF8") into
> thetext
> 
> then the entire document shows up as Kanji and lots of garbage characters.
> It should contain only English and Kanji.
> 
> 
> On Fri, Jun 7, 2013 at 7:24 PM, Dar Scott <dsc at swcp.com> wrote:
> 
>> OK, using the Unicode byte order mark as a signature does work for
>> TextEdit.
>> 
>> The "byte order mark" is a non displaying Unicode character.  The code is
>> U+FEFF.  That is, it is FEFF in base 16, which we write as the numeral
>> 0xFEFF in LiveCode.  It is just a big character.  It can be used as a
>> pattern, a signature, to indicate what form and encoding scheme the Unicode
>> is in in the file.  It can even be used to recognize UTF8 in contrast with
>> other encodings.
>> 
>> It is sufficient for TextEdit to decide the file is UTF8.  (TextEdit is
>> not that smart and relies on cheats it puts into resources, so the
>> signature is important.)
>> 
>> You can put it in front of your unicode data before you put it into the
>> field or after.  It is preserved by the field (my worries were for naught).
>> 
>> Just make sure you have it at the front of the file before you save.
>> 
>> Here is how to put it in front of your unicode text:
>> 
>> set the useUnicode to true -- make numToChar use 16-bit chunks not bytes
>> put numToChar( 0xFEFF ) before myUnicodeText
>> 
>> That's it!
>> 
>> After you convert myUnicodeText (so named in my example) to UTF8 and save
>> it, your file will be 3 bytes bigger than the original (that character is
>> expanded to 3 bytes in UTF8).  The file can grow if you keep editing the
>> same file, so once you have the above working, work on only adding it if it
>> is not already there.
>> 
>> I know this is a lot to take in and I apologize for not being able to
>> explain things simply.  Just ask and I will try.  Or somebody who can
>> figure out what I'm saying might be able to explain it better.
>> 
>> Dar
>> 
>> 
>> 
>> On Jun 7, 2013, at 7:46 PM, Dar Scott wrote:
>> 
>>> Oh, TextEdit cheats.  Did this come from TextEdit?  It puts some info in
>> the resource fork.  That is lost when you write back out.
>>> 
>>> I'll ponder this.  Or maybe some OS X resource experts might know.
>>> 
>>> Dar
>>> 
>>> On Jun 7, 2013, at 5:21 PM, Howard Bornstein wrote:
>>> 
>>>>> I don't know what characters the field might throw away.  So, putting
>> the
>>>>> file into the field and then modifying the field seems scary to me.
>> Maybe
>>>>> all the data is there, but maybe not.
>>>>> 
>>>> 
>>>> The actual command I used was <set the unicodetext of fld
>> "ProcessedFile"
>>>> to uniencode(fld  "ProcessedFile, "UTF8")> (extraneous "the" in my first
>>>> example)
>>>> 
>>>> I had no problems with this. In fact, it processed a file with about
>>>> 300,000 lines in just a few seconds.
>>>> 
>>>> And then save the field much like this:
>>>>> 
>>>>> put uniDecode( the unicodeText of field "Processed File", "UTF8" ) into
>>>>> URL ("file:"&kFile2)
>>>>> 
>>>> 
>>>> I tried some variations of this but was not able to save the file from
>>>> within LC and still have the Kanji viewable in TextEdit. I guess you
>> didn't
>>>> read the part about teaching to the imbecile because the rest of your
>>>> explanation was way over my head.
>>>> 
>>>> But thanks for trying.
>>>> 
>>>> I would still like to find a way to do this from within LC.
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> Howard Bornstein
>>>> -----------------------
>>>> www.designeq.com
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>> 
>>> 
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> 
> 
> 
> -- 
> Regards,
> 
> Howard Bornstein
> -----------------------
> www.designeq.com
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list