Getting Kanji from a .csv file
Dar Scott
dsc at swcp.com
Sat Jun 8 01:35:05 EDT 2013
put URL ("file:" & inputFile) into originalFile -- UTF-8
put uniEncode(originalFile,"UTF8") into UofO -- UTF-16
set the unicodeText of field "Test" to UofO -- UTF-16
--
-- process the field
--
put the unicodeText of field "Test" into UofF -- UTF-16
set the useUnicode to true
put numtoChar(0xFEFF) before UofF -- UTF-16
set the useUnicode to false
put uniDecode(UofF,"UTF8") into UTF8ofF -- UTF-8
put UTF8ofF into URL ("file:"&outputFile) -- UTF-8
On Jun 7, 2013, at 11:14 PM, Howard Bornstein wrote:
> Hmmm, I tried what you suggested but it didn't seem to work.
>
> Here's my code with your snippet inserted:
>
> *put* uniEncode (the unicodeText of field "ConvertedText", "UTF8") into
> thetext
>
> *set* the useUnicode to true *-- make numToChar use 16-bit chunks not
> bytes*
>
> *put* numToChar( 0xFEFF ) before thetext
>
> *put* thetext into URL ("file:Messages.txt")
>
> I noticed that at one point you suggested <put uniDecode( the unicodeText
> of field "Processed File", "UTF8" ) into URL ("file:"&kFile2)>
>
>
> if I change the first line of my code to use uniDecode:
>
> *put* uniDecode (the unicodeText of field "ConvertedText", "UTF8") into
> thetext
>
> then the entire document shows up as Kanji and lots of garbage characters.
> It should contain only English and Kanji.
>
>
> On Fri, Jun 7, 2013 at 7:24 PM, Dar Scott <dsc at swcp.com> wrote:
>
>> OK, using the Unicode byte order mark as a signature does work for
>> TextEdit.
>>
>> The "byte order mark" is a non displaying Unicode character. The code is
>> U+FEFF. That is, it is FEFF in base 16, which we write as the numeral
>> 0xFEFF in LiveCode. It is just a big character. It can be used as a
>> pattern, a signature, to indicate what form and encoding scheme the Unicode
>> is in in the file. It can even be used to recognize UTF8 in contrast with
>> other encodings.
>>
>> It is sufficient for TextEdit to decide the file is UTF8. (TextEdit is
>> not that smart and relies on cheats it puts into resources, so the
>> signature is important.)
>>
>> You can put it in front of your unicode data before you put it into the
>> field or after. It is preserved by the field (my worries were for naught).
>>
>> Just make sure you have it at the front of the file before you save.
>>
>> Here is how to put it in front of your unicode text:
>>
>> set the useUnicode to true -- make numToChar use 16-bit chunks not bytes
>> put numToChar( 0xFEFF ) before myUnicodeText
>>
>> That's it!
>>
>> After you convert myUnicodeText (so named in my example) to UTF8 and save
>> it, your file will be 3 bytes bigger than the original (that character is
>> expanded to 3 bytes in UTF8). The file can grow if you keep editing the
>> same file, so once you have the above working, work on only adding it if it
>> is not already there.
>>
>> I know this is a lot to take in and I apologize for not being able to
>> explain things simply. Just ask and I will try. Or somebody who can
>> figure out what I'm saying might be able to explain it better.
>>
>> Dar
>>
>>
>>
>> On Jun 7, 2013, at 7:46 PM, Dar Scott wrote:
>>
>>> Oh, TextEdit cheats. Did this come from TextEdit? It puts some info in
>> the resource fork. That is lost when you write back out.
>>>
>>> I'll ponder this. Or maybe some OS X resource experts might know.
>>>
>>> Dar
>>>
>>> On Jun 7, 2013, at 5:21 PM, Howard Bornstein wrote:
>>>
>>>>> I don't know what characters the field might throw away. So, putting
>> the
>>>>> file into the field and then modifying the field seems scary to me.
>> Maybe
>>>>> all the data is there, but maybe not.
>>>>>
>>>>
>>>> The actual command I used was <set the unicodetext of fld
>> "ProcessedFile"
>>>> to uniencode(fld "ProcessedFile, "UTF8")> (extraneous "the" in my first
>>>> example)
>>>>
>>>> I had no problems with this. In fact, it processed a file with about
>>>> 300,000 lines in just a few seconds.
>>>>
>>>> And then save the field much like this:
>>>>>
>>>>> put uniDecode( the unicodeText of field "Processed File", "UTF8" ) into
>>>>> URL ("file:"&kFile2)
>>>>>
>>>>
>>>> I tried some variations of this but was not able to save the file from
>>>> within LC and still have the Kanji viewable in TextEdit. I guess you
>> didn't
>>>> read the part about teaching to the imbecile because the rest of your
>>>> explanation was way over my head.
>>>>
>>>> But thanks for trying.
>>>>
>>>> I would still like to find a way to do this from within LC.
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Howard Bornstein
>>>> -----------------------
>>>> www.designeq.com
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
>
> --
> Regards,
>
> Howard Bornstein
> -----------------------
> www.designeq.com
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list