htmltext to unicode

Devin Asay devin_asay at byu.edu
Fri Jan 15 19:25:09 EST 2010


Kee,
On Jan 15, 2010, at 4:53 PM, Kee Nethery wrote:

> First, thanks to Devin Asay for the wonderful articles on Rev and  
> Unicode.

Thanks. Glad it's helped.
>
> Second, I'm going unicode with my code and I'm stumped.
>
> There is a database I access and I get what appears to be HTML  
> encoded text. For example:
>
> I am not unicode text but what follows is:
> セキュリティー  
> 個人情報  
> 代金返却方針  
> ヘルプ
> And although I have no idea what that says, it is UTF8 in the  
> database.

These are HTML unicode entities.
>
> I think this is what revtalk refers to as htmltext. I'd like to put  
> this into a field as unicode (UTF16). How do I display this as the  
> characters or glyphs that it should be viewed as, and how do I get  
> it into UTF16?
>
> -- horrific thought --
> Am I going to have to manually convert these HTML entities into  
> UFT16 using the numtochar function for each set?

It's not as terrible as you fear. You have to make a couple of  
educated guesses about the language, but just do this:

Paste your unicode entities into a field, say, "code" and surround  
them with html tags:

<font face="Osaka"  
lang="ja">セキュリティー  
個人情報  
代金返却方針  
ヘルプ</font>

If they don't look quite right, try different fonts in the face  
attribute or other languages in the lang attribute. For example,  
Chinese Simplified is "zh-CN" and Traditional is "zh-TW". Google html  
language codes if you need to try others.

Then make another field "final" and do this:

set the htmltext of fld "final" to fld "code"

Now you have real unicode text and you can save it or do whatever you  
need by referring to the unicodeText of fld "final".

HTH,

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University




More information about the use-livecode mailing list