Strange Entities from htmlText

Sivakatirswami katir at hindu.org
Sat Jul 23 01:10:29 EDT 2005


I don't really think this is a Rev issue but actually some wierd  
Microsoft issue or an email issue?

Some text which originally came from MSWord, is passed to an email  
(by cut and paste into mail.app on the mac) and then to a field. and  
then output this via the "htmlText" property to an XML document which  
is destined to run against a XSLT using xsltProc (run via shell  
commands from a Rev UI) This XML file is then urlEncoded as prep for  
uploading via POST ... a Rev CGI, gets the POST (engine is Darwin  
running on Xserve..) which  urlDecodes it and saves it back to an XML  
file.. goal being (obviously) that the XML on the server is exactly  
the same as was generated by my rev app on the remote client, before  
uploading

This system is working really well, btw...until I decided to make use  
of the htmlText of that field...

In the original document I am seeing curly quotes and curly  
apostrophes... which were pasted into the original input field...  
now, my script cleans these up to  straight quotes first, and then we  
get the htmlText...

htmlText result: [snippet from a complete XML file]

<p>In The Blessings of Children  Tiruvalluvar begins by  
describing the benefits of having children and states that an  
intelligent child is the greatest blessing to the family and is  
indeed the family s real wealth.</p>

if I run this thru xsltProc against my style sheet (which is turning  
the xml into a .shtml file) these all error out as "unknown  
entities... unable to parse /file"

I don't see these entities on BBEdits Entity list... and the other  
weird thing is the introduction of a space before the closing quote  
or apostrophe...

And we also are seeing another gruesome manifestation:

  “To foster a sense of self-worth in children, corporal  
punishment must be eliminated completely. To think that assaulting a  
child--a criminal offense between adults--constitutes discipline, is  
virtually insane. Yet, in the US, it is still legal in many states.  
Discipline means to teach. The only thing the paddle teaches, is  
hatred. This hatred is very often repressed and unconsciously  
directed toward self. When this happens, you have crippled a mind for  
life.”

could the urlEncoding/Decoding be doing something nasty here?

And what is even wierder: if the user manually enters a quote or  
apostrophe in the field... we get what we expect to get:

"e;

Any clues?

Sivakatirswami



More information about the use-livecode mailing list