char(4) not replaceable?
Sannyasin Sivakatirswami
katir at hindu.org
Thu Apr 22 20:23:35 EDT 2004
That does help, a lot... I was kind of coming to that conclusion after
doing my home work with the Rev docs and reading all the unicode
entries and testing a number of the onboard rev unicode functions. But
it still didn't get me from a to b
Here's the challenge.
If the clipboard from InDesign contains a two byte character and when I
paste it into Rev (or BBEdit for that matter) and it appears in Osaka
as a Japanese character, I think we know we have a two byte character.
Why it looks one way InDesign and another way in Rev ...don't know...
In order to "downsize" that two-byte character to a suitable 0-127 char
equivalent string (In this context which is lang:English alpha:Roman, I
want ALL text to be super dumb and pass painlessly through any and all
future user agents in any hardware/software context) how do I do that?
e.g. our editors use some odd glyph in InDesign and our web guy is
repurposing this for the web and he pastes it into my little web pager
rev app, and sees wierd characters... In theory, if I knew what the two
values were, what I usually do is, in the background, clean it first
put char(26) into tStringToReplace
replace tStringToReplace with quote in tIncomingText
so he never see anything but 1-127 from the start.
So challenge is: find any way to, programatically, identify
a) that an incoming character *is* two-byte and
b) if it is, then to know what it is and replace it with lo-ascii
range equivalent.
If it could be translated would it look like char(204,218) or what?
Then, do you cat the two?
put char(204) & char(218) into tStringToReplace
replace tStringToReplace with "Y"
## where this could be some two-byte character "Y" with marks above it
of some kind
I know if I actually paste some wierd string into the script editor,
assuming I know for sure what it's equivalent is... this does work:
replace "[paste 2-byte char here]" with "sh"
but, i won't always know what the incoming wierd character is... Also,
since examining every single incoming char might slow operations down
considerably... I might just let the user fix these manually: so I need
at least for the user to be able to select the two-byte character in a
rev field and then have a script that will examine the selected chunk
and do the necessary replacement. This could work for small articles in
our magazine, but I'm about to embark on repurposing 1000 page books
from InDesign to web so I'll like to get a better handle on this from
inside Rev.
I already have a matrix for HTML entities that looks like this:
Ä A
Å A
Ç Ch
É E
Ñ N
etc. (with every possible >127 character in the fonts in use)
So, if I could identify the two-byte characters I would just extend
this...
Sannyasin Sivakatirswami
Himalayan Academy Publications
at Kauai's Hindu Monastery
katir at hindu.org
www.HimalayanAcademy.com,
www.HinduismToday.com
www.Gurudeva.org
www.Hindu.org
On Apr 21, 2002, at 1:18 PM, Brian Yennie wrote:
> Sannyasin,
>
> I don't know if this is something you already have a handle on, but
> the first thing to know about Unicode is that each character is _two_
> bytes instead of one, so some of this weird pasting behavior happens
> because the receiving application treats the two bytes as two
> consecutive characters.
>
> The reason why, most likely, you think you are getting a valid ASCII
> number but not seeing a valid ASCII character is because you are
> actually testing the charToNum() of a two character string- and
> charToNum() only considers the first character.
>
> For example, charToNum("apple") is the same as charToNum("a"), even
> though they are obviously different strings to the human eye.
>
> HTH!
More information about the use-livecode
mailing list