XML encoding oddities

Mark Smith lists at futilism.com
Sun Nov 9 11:50:04 EST 2008


I'm seeing some (I think) very strange behaviour from the XML library...

(warning, this is quite long, and won't be of much interest to anyone  
who isn't using the library...)

This is on an intel macintosh, OS 10.4.11

In a button, I have the following script:

-----
on mouseUp
    put toXml() into tXml
    put tXml & cr & fromXml(tXml)
end mouseUp

function toXml
    put "<whatshappening></whatshappening>" into tXml
    put revCreateXmlTree(tXml, true, true, false) into tTree
    put revXmlRootNode(tTree) into tNode

    revAddXmlNode tTree, tNode, "name", "fred"

    put revXmlText(tTree) into tText
    revDeleteXmlTree tTree
    return tText
end toXml

function fromXml pXml
    put revCreateXmlTree(pXml, true, true, false) into tTree
    put revXmlRootNode(tTree) into tNode
    put revXmlFirstChild(tTree, tNode) into tChild

    put revXmlNodeContents(tTree, tChild) into tContent
    revDeleteXmlTree(tTree)
    return tChild & cr & tContent
end fromXml
-----

The output is:

<?xml version="1.0"?>
<whatshappening><name>fred</name></whatshappening>

/whatshappening/name
fred

So all is good. If I change "fred" in the toXml function to "fréd",  
(acute accent on the 'e'), I get this:

<?xml version="1.0"?>
<whatshappening><name></name></whatshappening>

/whatshappening/name

The content has simply disappeared, so I guess I need to encode non- 
ascii material. OK, but as what? (ideally UTF-8), and how do I  
indicate what I've done in my xml document?

However, if I now add an accented string as an attribute:

-----
function toXml
    put "<whatshappening></whatshappening>" into tXml
    put revCreateXmlTree(tXml, true, true, false) into tTree
    put revXmlRootNode(tTree) into tNode

    revAddXmlNode tTree, tNode, "name", "fred"
    revSetXmlAttribute tTree, tNode & "/name", "orig", "fréd"

    put revXmlText(tTree) into tText
    revDeleteXmlTree tTree
    return tText
end toXml

function fromXml pXml
    put revCreateXmlTree(pXml, true, true, false) into tTree
    put revXmlRootNode(tTree) into tNode
    put revXmlFirstChild(tTree, tNode) into tChild

    put revXmlNodeContents(tTree, tChild) into tContent
    put revXmlAttribute(tTree, tChild, "orig") into tAtt
    revDeleteXmlTree(tTree)
    return tChild & cr & tContent & cr & tAtt
end fromXml
-----

I get:

<?xml version="1.0" encoding="ISO-8859-1"?>
<whatshappening><name orig="fréd">fred</name></whatshappening>

/whatshappening/name
fred
frŽd

An encoding attribute has now been aded to the xml header, and some  
version of the "orig" attribute value (not ISO-8859-1, as far as I  
can tell) has been produced. ????

So, finally, is there a way to encode xml documents as UTF-8 (or  
whatever) without having to encode each part myself, and add the  
encoding attribute to the header myself?

What is slightly worrying is that it seems the library will add an  
encoding attribute to the header in some circumstances, but not others.

Ken (if you're reading this), does your library deal with this stuff  
better?

Best,

Mark




More information about the use-livecode mailing list