text file import & charsets - whats wrong

Ueliweb ueliweb at gmx.ch
Wed Feb 13 01:57:16 EST 2013


Hei


today I have 2 questions:


 At the moment I was importing some Data from a tab delimited text file.
I use on Mac: TextWrangler and on Windows: Notepad++Portable  a
'PortableApps'.


First Question:
*****************
the text imported by manually by copying from a text editor I got that
contend in the variable 'theTabText'  with this code:
        ask question "Copy the tab-Text here:"
        put it into theTabText

ID token Long_DE Semester_ID Semester_token Teacher_ID Teacher_token
1 11A 11A_12/13-2 2 12/13-2 2 HER
2 11B 11B_12/13-2 2 12/13-2 ???


if I import the same file with this code:
        put URL ("file:" & tPath) into theTabText

˛ˇ I D  t o k e n  L o n g _ D E  S e m e s t e r _ I D  S e m e s t e r _
t o k e n  T e a c h e r _ I D  T e a c h e r _ t o k e n
 1  1 1 A  1 1 A _ 1 2 / 1 3 - 2  2  1 2 / 1 3 - 2  2  H E R
 2  1 1 B  1 1 B _ 1 2 / 1 3 - 2  2  1 2 / 1 3 - 2   ? ? ?
 3  1 1 A  1 1 A _ 1 2 / 1 3 - 1  1  1 2 / 1 3 - 1  2  H E R

I am confusing about the "˛ˇ" before the text starts and that every
character is seperated by a space ???
Using the text file of an other table both ways of importing I get the same
output into the variable 'theTabText'


Second Question:
********************

If I import the same source file on Windows or an Mac then the 'umlauts'
(äöü ...) are replaced by some unreadable symbols.
I know that this has also to do with charset used by the text file.
But it is al the same about I use "Western (Windows latin 1)",  "Western
(MacOS roman)", "Western (ISO Latin 1)" or even UTF-8.

As I learned LiveCode internal uses UTF-16 format.
And I know about the unicode properties of LiveCode and the macToISO and
ISOToMac functions.
But I could not figure out where and how to use/set this to get text files
correct written to and read from the files.
It Locks to work if I use URLencode/decode as also arrayEncode/Decode but
there are not human readable.

And most important I need to import text data generated of oder programs
where we not get access direct to databases and I need to send date to
external programs like word,excel,pages,numbers,openOffice for the users.

At the moment the best solution is to copy text from the text editors with
" ask question"on the system I am working on. So no matter witch format the
text has there always right formatted after import.
But if take a copy of the stack to an other platform the 'umlauts' are
mixed up and I need every time to reimport the data.
As I am working in developing environment its OK but on real running it is
not suitable data exchange, reading and writing must work properly.

Is there any possibility to get the source encoding/charset of a text
file? And select then the right way to import.
Witch format I should I use to export that all platforms and mentioned
programs canuse it correctly?
How I can set/control the encoding/charset by writing to a file

I hope some one can lead me on the right way

thanks
ueliweb



More information about the use-livecode mailing list