Displaying Foreign Web Pages

revolution at knowledgeworks.plus.com revolution at knowledgeworks.plus.com
Thu Jun 10 21:20:11 EDT 2004


(Apologies to those for whom non-European languages are not foreign!)

I am preparing an application that will manipulate HTML text that is 
supplied in multiple languages.  However, in trying to understand how this 
works in Rev, I'm kind of confused.

First of all, if I have a web page written in e.g. a combination of 
English and an asian character set (such as Thai) and my browser (either 
Internet Explorer, Firefox or even the embedded browser inside Lotus 
Notes) displays the Thai characters just fine.  I do have Thai language 
accessibility options installed on Windows XP (through Control Panel), so 
maybe this will not for those who do not do that.  If I view the source of 
these web pages, the Thai is displayed in amongst the HTML, even in an 
application as simple as Notepad.

However, what confuses me is why I cannot get Rev to display a page in 
Thai.  Take http://www.google.co.th.  Issuing the instruction
answer URL "http://www.google.co.th"
displays the page as styled text, but the Thai characters are replaced 
with inappropriate Roman characters. 

Furthermore, if I just put the URL into a field to see the HTML that makes 
up the page, these inappropriate Roman characters are there too in the 
HTML.  But if I copy the HTML from Notepad, and paste it into a field in 
Rev, the Thai characters appear within the HTML just as they do in 
Notepad. 

I see from the list's archives that people have been using unicodeText and 
htmlText in various ways to manipulate non-Roman character sets, but I 
can't seem to get either to work even in this simple example.   Here's 
what I did:

I saved the Google page locally and tried to use the unicodeText/binfile 
combination with:
set the unicodeText of field "field 1" to URL "binfile:c:\temp\google.htm" 
as per the Transcript dictionary entry on unicodeText.  This loads the 
html from the local file, but the Thai characters have been subsituted for 
square blocks (Notepad can still display this local file with Thai 
characters intact).

I tried to use htmlText to set the contents of a second field to the HTML 
(with Thai) that I successfully pasted into a Rev field, but that gave me 
a result similar to 
answer URL "http://www.google.co.th"

I am sure that since there are several people who have written about these 
issues on the list before, we can at least nail a simple example like 
this, that others can use to build up when dealing with non-Roman 
character sets.  The fundamental problem seems to be that Rev cannot read 
the Thai characters from a file or URL, but can display them when they are 
copied over from the clipboard. 

Regards,
Bernard




More information about the use-livecode mailing list