XML Headaches

David Bovill david at openpartnership.net
Mon Jul 9 14:10:37 EDT 2007


On 09/07/07, Malte Brill <revolution at derbrill.de> wrote:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> Works as expected (unless there is more to it)
>
> <?xml version="1.0" encoding="UTF-8"? >
>
> (Mind space before ">" does not. However, the parser does not
> complain and builds the tree. Just it looses data then. Seems like
> having a whole in the bucket where certain chars slip through :-)


 This was one of the errors picked up when I ran your xml through:

I also put yur XML through this online validation service and found a bunch
> of errors:  http://www.xml.com/pub/a/tools/ruwf/check.html
>

Instead of unidecode(uniencode(myXML,"UTF8"),"ANSII") for the whole
> XML data I have the following script now:
>
> -- Remove byte order mark from UTF8 text
>    if charToNum(char 1 of tVar) is 239 then
>      if charToNum(char 2 of tVar) is 187 then
>        if charToNum(char 3 of tVar) is 191 then
>          delete char 1 to 3 of tVar
>        end if
>      end if
>    end if


So you do this for any node data? In other words before adding any data to a
node you should run it over a handler like this?

on xml_SafeEncode @nodeContents
    put unidecode(uniencode(nodeContents,"UTF8"),"ANSII") into nodeContents

    -- Remove byte order mark from UTF8 text
    put numToChar(239) & numToChar(187) & numToChar(191) into testBomHeader
    if char 1 to 3 of utf8Text = testBomHeader then
        delete char 1 to 3 of tVar
    end if
end xml_SafeEncode



More information about the use-livecode mailing list