XML encoding oddities
Mark Smith
lists at futilism.com
Sun Nov 9 11:50:04 EST 2008
I'm seeing some (I think) very strange behaviour from the XML library...
(warning, this is quite long, and won't be of much interest to anyone
who isn't using the library...)
This is on an intel macintosh, OS 10.4.11
In a button, I have the following script:
-----
on mouseUp
put toXml() into tXml
put tXml & cr & fromXml(tXml)
end mouseUp
function toXml
put "<whatshappening></whatshappening>" into tXml
put revCreateXmlTree(tXml, true, true, false) into tTree
put revXmlRootNode(tTree) into tNode
revAddXmlNode tTree, tNode, "name", "fred"
put revXmlText(tTree) into tText
revDeleteXmlTree tTree
return tText
end toXml
function fromXml pXml
put revCreateXmlTree(pXml, true, true, false) into tTree
put revXmlRootNode(tTree) into tNode
put revXmlFirstChild(tTree, tNode) into tChild
put revXmlNodeContents(tTree, tChild) into tContent
revDeleteXmlTree(tTree)
return tChild & cr & tContent
end fromXml
-----
The output is:
<?xml version="1.0"?>
<whatshappening><name>fred</name></whatshappening>
/whatshappening/name
fred
So all is good. If I change "fred" in the toXml function to "fréd",
(acute accent on the 'e'), I get this:
<?xml version="1.0"?>
<whatshappening><name></name></whatshappening>
/whatshappening/name
The content has simply disappeared, so I guess I need to encode non-
ascii material. OK, but as what? (ideally UTF-8), and how do I
indicate what I've done in my xml document?
However, if I now add an accented string as an attribute:
-----
function toXml
put "<whatshappening></whatshappening>" into tXml
put revCreateXmlTree(tXml, true, true, false) into tTree
put revXmlRootNode(tTree) into tNode
revAddXmlNode tTree, tNode, "name", "fred"
revSetXmlAttribute tTree, tNode & "/name", "orig", "fréd"
put revXmlText(tTree) into tText
revDeleteXmlTree tTree
return tText
end toXml
function fromXml pXml
put revCreateXmlTree(pXml, true, true, false) into tTree
put revXmlRootNode(tTree) into tNode
put revXmlFirstChild(tTree, tNode) into tChild
put revXmlNodeContents(tTree, tChild) into tContent
put revXmlAttribute(tTree, tChild, "orig") into tAtt
revDeleteXmlTree(tTree)
return tChild & cr & tContent & cr & tAtt
end fromXml
-----
I get:
<?xml version="1.0" encoding="ISO-8859-1"?>
<whatshappening><name orig="fréd">fred</name></whatshappening>
/whatshappening/name
fred
frŽd
An encoding attribute has now been aded to the xml header, and some
version of the "orig" attribute value (not ISO-8859-1, as far as I
can tell) has been produced. ????
So, finally, is there a way to encode xml documents as UTF-8 (or
whatever) without having to encode each part myself, and add the
encoding attribute to the header myself?
What is slightly worrying is that it seems the library will add an
encoding attribute to the header in some circumstances, but not others.
Ken (if you're reading this), does your library deal with this stuff
better?
Best,
Mark
More information about the use-livecode
mailing list