Transforming Unicode to simple text
Sannyasin Sivakatirswami
katir at hindu.org
Tue May 18 00:53:20 EDT 2004
I'm struggling here with repurposing text out of Indesign back down to
the 0-127 char set... I am a unicode, two-byte character baby (know
about as much as will fit on a spoon) and could really use some help,
and InDesign, Mac OSX and Rev all seem to be doing new things I can't
get a grip on...
In my XML output from Indesign, opened in BBEdit (a few words with 2
byte characters) looks like this:
from InDesign on a raw import into a field looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root><Story><Question>Who Am I? Where Did I Come From?</Question>
<SlokaBhashyaNo>ßloka 1</SlokaBhashyaNo>
<Sloka><Italic>Âishis </Italic>proclaim ....Vedic <Italic> ®ishis
</Italic>have given us courage by uttering the simple truth, “God is
the Life of our life.” A great sage carried it further by saying there
is one thing God cannot do: God cannot separate Himself from us. This
is because God is our life.
but imported into a field "input" in Rev i get:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root><Story><Question>ÔªøWho Am I? Where Did I Come
From?</Question>Ä©<SlokaBhashyaNo>üloka
1</SlokaBhashyaNo>Ä©<Sloka><Italic>Çishis </Italic>proclaim
..</Sloka>Ä©<SlokaBhashyaNo>bh¢shya</SlokaBhashyaNo>Ä©<Bhashya>We
are immortal souls ..... by uttering the simple truth, ÔªøÄúGod is the
ÔªøLife of our life.Äù A great sage carried it further by saying there
is one thing God cannot do: ÔªøGod cannot separate Himself from us.
This is be¬cause
etc. some wierd things like
Ôªøçiva
which in InDesign in the Setu script is "Siva" with a mark under the "S"
so, now the old script I used to map
Ä to A
Å to A
Ç to Ch
É to E
etc are failing. also, and this is wierd... if I cut one of these
strange words and then try to paste it elsewhere in the field, Rev will
unexpectedly quit! if I then use something like "put the unidecode of
fld "input" the result is total garbage in the msg box.
If I use this:
on MouseUp
answer file "chose a file"
put url ("file:" & it) into tUTF-8Input
set the unicodeText of fld "input" to tUTF-8Input
end mouseup
it gets even worse: Rev goes into a tall-spin (beachball) for about two
minutes (it's only a 104K file) and then fills the field with Osaka
Japanese font,,, all on a single line.
Oddly, if I just tell Indesign to export selected text... and it asks
me what format: MacIntosh... no encoding... then I get clean output
that I can manipulate anyway i want.... but I can't use InDesign's XML
tagging for this and I need the mark up... and I don't find a
preference for InDesign to let me export simple un-encoded text with
XML tags...
I'm stuck..
Sannyasin Sivakatirswami
Himalayan Academy Publications
at Kauai's Hindu Monastery
katir at hindu.org
www.HimalayanAcademy.com,
www.HinduismToday.com
www.Gurudeva.org
www.Hindu.org
More information about the use-livecode
mailing list