Unicode in mixed texts

pkc pkc at mac.com
Thu May 12 20:13:51 EDT 2005


I think I have solved this problem in my application, and I hope my comments will be helpful to Thomas McGrath (glad to help out a fellow of the Dal gCais Tuath but that is for another day).

Devin is quite right, there is a general strategy here will allow the intermediate use of HTML to automate some standard unicode, but after 14 hours of experimentation there appear to be some idiosyncracies of Transcript and/or Revolution that might be noted to save time, frustration and tears for those who come after.

First, if you are just dealing with texts and their display inside a bigger application (that is, your application doesn't do anything critical in terms of evaluating or manipulating East Asian text), "use Unicode" is not important to you.  It will only create the risk of complication elsewhere in your app if you throw it in. On the other hand, making caseSensitive true could be very important, if you wish for any reason to make substitutes in lines of unicode-rendered text that has not been html-ized (see below).

Second, there is a sequence in which this must be done that is not entirely intuitive, particularly if you already have a lot of code that you are trying to organize around Devin's principle. 

1) Take the existing file and set its textFont to Unicode.  Do this only once. If you accidentally have two commands for this in your mass of lines, you will get something else (it might degrade to 8-bit, I do not know; but it won't work).

2) Put the HTML text of this source file into another container ("B").

3) SET the htmlText of a destination container to that intermediate container "B."

I'm sure that more messing around would show a way to collapse this to two steps instead of three, but if you have other things going on in the application, this seems to be the most stable method.

Third, when your file goes from Unicode to HTML, there will be many reasons why you might want to edit (if by some sequence of commands) the content of that intermediate HTML container. Adding or subtracting carriage returns, changing the font, etc.  You may imagine that the normal range of HTML properties is available, and spend many hours trying to get the effects that HTML would promise.  WRONG. Most of the established Asian fonts will do nothing.  I had the best results with PMingLiu, which is common to both Mac and Windows and will probably give the least problems if you just want to get this over with for now.

BEWARE:  I could find no way at all to get Transcript to respond to anything other than "ja" for the language property in the font tag in the HTML container. There probably is one, but I could not find it. Yes, the Transcript documentation says you can use "chinese". If you want to pass many many revisions without any result, you can try it. Only "ja" works. It's Unicode, so really, "Venusian" should probably work, and it is really not logical to require anything at all there other than just "unicode" (but you will get crazy results if you don't have "ja").  Just a friendly word of advice. If you want to get anywhere with East Asian fonts in Transcript, don't bother trying to call your language by its name. That will only end in tears.  Just stick with "Japanese" if you know what is good for you.  

I owe Devin a big acknowledgement for the pointer.  Someday I hope to understand the kind remarks made by Dar and Lynn. After a lot of reading I understand some (not all) of their comments, but I still don't see any way to get Transcript to deal with anything other than UTF-16, which is fine with me. 

This is a nice, direct, clean way to solve the problem, but there are these few little pointers that can make a big difference.  So, enjoy.

Pamela Crossley
Dartmouth
USA


More information about the use-livecode mailing list