XML Headaches

David Bovill david at openpartnership.net
Mon Jul 9 07:48:28 EDT 2007


Is the text actually UTF8 encoded - saying that it contians an an accented e
(é) - and reading docs / doing this by hand may be a bit error prone? The
first thing I'd do is check the XML with a validator and make sure that
works - before looking for bugs?

I've got some documentation with links to the best sources I can find here:
http://handlers.rev-co.de/wiki/XML

This bit would seem relevant:

That means that in a UTF-8 XML document, you cannot simply use a single byte
> with decimal value 233 to represent "�" (and there is no predefined é
> entity as there is in HTML); instead, you must either enter the UTF-8
> multi-byte escape sequence, or use a special kind of XML reference called a
> character reference:
>
> <p>That is everyone's favourite café.</p>
>
> When your text consists primarily of unaccented Roman characters, this is
> often the easiest way to escape the occasional accented or non-Roman
> character. Since "�" appears at position 233 in Unicode (as in ISO-8859-1),
> the XML parser will read the string correctly as "That is everyone's
> favourite caf�."
>
I also put yur XML through this online validation service and found a bunch
of errors:  http://www.xml.com/pub/a/tools/ruwf/check.html

Hope this helps.


More information about the use-livecode mailing list