Unicode revisited, this time with htmlText

Alex Rice alex at mindlube.com
Sat Nov 22 00:58:17 EST 2003


On Nov 21, 2003, at 5:03 PM, tuviah snyder wrote:
> Well that's the way you specify unicode characters in the HTML spec. 
> Any
> other way would have byteorder issues, associating with it, and would
> require binary data be embedded into HTML which is supposed be plain 
> text.

Not true in practice. The encoding of HTML can be specified in the HTTP 
Content-type header from the web server, or in a META tag in the HTML 
itself (yet in the HTML itself) Read this article that was posted to 
improve-rev recently:

<http://www.joelonsoftware.com/articles/Unicode.html>

Here is a section from that article that talks about this issue:

"""
For a web page, the original idea was that the web server would return 
a similar Content-Type http header along with the web page itself -- 
not in the HTML itself, but as one of the response headers that are 
sent before the HTML page. 

This causes problems. Suppose you have a big web server with lots of 
sites and hundreds of pages contributed by lots of people in lots of 
different languages and all using whatever encoding their copy of 
Microsoft FrontPage saw fit to generate. The web server itself wouldn't 
really know what encoding each file was written in, so it couldn't send 
the Content-Type header.

It would be convenient if you could put the Content-Type of the HTML 
file right in the HTML file itself, using some kind of special tag. Of 
course this drove purists crazy... how can you read the HTML file until 
you know what encoding it's in?! Luckily, almost every encoding in 
common use does the same thing with characters between 32 and 127, so 
you can always get this far on the HTML page without starting to use 
funny letters:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But that meta tag really has to be the very first thing in the <head> 
section because as soon as the web browser sees this tag it's going to 
stop parsing the page and start over after reinterpreting the whole 
page using the encoding you specified.
"""


Alex Rice <alex at mindlube.com> | Mindlube Software | 
<http://mindlube.com>

what a waste of thumbs that are opposable
to make machines that are disposable  -Ani DiFranco



More information about the use-livecode mailing list