Transforming Unicode to simple text

Sannyasin Sivakatirswami katir at hindu.org
Tue May 18 00:53:20 EDT 2004


I'm struggling here with repurposing text out of Indesign back down to 
the 0-127 char set... I am a unicode, two-byte character baby (know 
about as much as will fit on a spoon) and could really use some help, 
and  InDesign, Mac OSX and Rev all seem to be doing new things I can't 
get a grip on...

  In my XML output from Indesign, opened in BBEdit  (a few words with 2 
byte characters) looks like this:

from InDesign on a raw import into a field looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root><Story><Question>Who Am I? Where Did I Come From?</Question>
<SlokaBhashyaNo>ßloka 1</SlokaBhashyaNo>
<Sloka><Italic>Âishis </Italic>proclaim ....Vedic <Italic>  ®ishis 
</Italic>have given us courage by uttering the simple truth, “God is 
the Life of our life.” A great sage carried it further by saying there 
is one thing God cannot do: God cannot separate Himself from us. This 
is be­cause God is our life.

but imported into a field "input" in Rev i get:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root><Story><Question>ÔªøWho Am I? Where Did I Come 
From?</Question>‚Ä©<SlokaBhashyaNo>üloka 
1</SlokaBhashyaNo>‚Ä©<Sloka><Italic>Çishis </Italic>proclaim 
..</Sloka>‚Ä©<SlokaBhashyaNo>bh¢shya</SlokaBhashyaNo>‚Ä©<Bhashya>We 
are immortal souls ..... by uttering the simple truth, “God is the 
Life of our life.” A great sage carried it further by saying there 
is one thing God cannot do: ÔªøGod cannot separate Himself from us. 
This is be¬cause

etc. some wierd things like

Ôªøçiva

which in InDesign in the Setu script is "Siva" with a mark under the "S"

so, now the old script I used to map
Ä to	A
Å to	A
Ç to	Ch
É	to E

etc are failing. also, and this is wierd... if I cut one of these 
strange words and then try to paste it elsewhere in the field, Rev will 
unexpectedly quit!  if I then use something like "put the unidecode of 
fld "input"  the result is total garbage in the msg box.

If I use this:
on MouseUp
   answer file "chose a file"
    put url ("file:" & it) into tUTF-8Input
   set the unicodeText of fld "input" to tUTF-8Input
end mouseup

it gets even worse: Rev goes into a tall-spin (beachball) for about two 
minutes (it's only a 104K file) and then fills the field with Osaka 
Japanese font,,, all on a single line.



Oddly, if I just tell Indesign to export selected text... and it asks 
me what format: MacIntosh... no encoding... then I get clean output 
that I can manipulate anyway i want.... but I can't use InDesign's XML 
tagging for this and I need the mark up... and I don't find a 
preference for InDesign to let me export simple un-encoded text with 
XML tags...

I'm stuck..


Sannyasin Sivakatirswami
Himalayan Academy Publications
at Kauai's Hindu Monastery
katir at hindu.org

www.HimalayanAcademy.com,
www.HinduismToday.com
www.Gurudeva.org
www.Hindu.org


More information about the use-livecode mailing list