XML+MySQL+UTF8 = MISERY

Tereza Snyder tereza at califex.com
Fri Sep 17 01:39:01 EDT 2010


Hi all,

In Rev (version 4.0) on a Mac :

1. Build an xml file starting with a 'shell' :
    put revCreateXMLTree( encodeUtf8( merge(fld "XMLshell")), true,true,false) into tmp
    
    1a. the shell has the header: <?xml version="1.0" encoding="utf-8"?>

    1b. the encodeUtf8 function:
         return uniDecode(uniEncode(pStr,"UTF8"),"english")
    
2. I add about a thousand nodes to the xml. (It's a branching survey.)
    
3. I save it with:  put tXML into URL ("file:" & tDestinationFilePath) 
    where tDestinationFilePath is a local file

4. The file is copied to a remote server

5. I download it in a Rev app via FTP (using libURL)

6. In the app, I make a new xml tree: 
    put revCreateXMLTree( tXML, false, true, false) into tTreeID
    
    6a I 'score' the survey by updating attributes and contents,

    6b I put the xml into a variable:
         put encodeUtf8( revXMLText( tID ,,true)) into tData

    6c I save it back to the FTP server.
    
7. I download the file as in step 5 and work with it and save it via FTP, ad infinitum
    
=============================
    ALL THE ABOVE STEPS WORK RELIABLY
=============================

Now I want to alter the sequence to store the scored survey xml in a MySQL database :

1. As above

2. As above

3. As above

4. As above

5. As above

6. As above

   6a As above
   
   6b I upload the scored survey to a MySQL database on the remote server, first escaping it:
	   put revXMLText( gTaxoDataA[gCurrentSlot][kID] ,,true) into tXML
	   replace "\" with "\\" in tXML
	   replace quote with "\" & quote in tXML
	   replace "'" with "\'" in tXML
	   replace tab with "\t" in tXML
	   replace cr with "\n" in tXML
	   put encodeUtf8( tXML ) into tData
	   
    6c I use INSERT to put tData into a column named "xml" in the MySQL database
    
    6d Using other tools like Navicat, I see that the xml column does indeed contain the xml
  
7. I use a SELECT statement to retreive the xml, and attempt to make an xml tree again BUT no matter what I do with encoding or decoding or whatever, I get this error from revCreateXMLTree:

    "xmlerr, can't parse xml no root element"


--> If I load the file from the first sequence into BBEdit, BBEdit reports that it is utf8, with "Unix (LF)" line endings.

--> If save the xml I get back from the MySQL in the second sequence--as a file or a binfile--and load the file into BBEdit, BBEdit reports that it is utf8, with "Classic Mac (CR)" line endings.

I thought, "Oh boy! thats it!" but NO. 

I replaced numToChar(13) with numToChar(10) in the xml from MySQL, but still the same error on attempting to make an xml tree. 

The xml itself looks identical and BBEdit says there are no differences, but it also says there are about 7000 more characters in the MySQL version! What's up with that?

Does anything stand out to those of you who have used MySQL? and revXML? and utf8? 

Is there hope?

t




-- 
Tereza Snyder
Califex Software, Inc.
<www.califexsoftware.com>







More information about the use-livecode mailing list