Strange Entities from htmlText

Martin Baxter martin at
Sat Jul 23 03:41:06 EDT 2005


As a clue... One of the originating applications is (reasonably enough) 
encoding the curly quotes as unicode entities:

decimal 8220 and 8221.

This is 201C and 201D in hex.
20 in hex = 32 in decimal which is space
1C in hex = 28 in decimal hence 
so 1D in hex = 29 in decimal hence 

Martin Baxter

Sivakatirswami wrote:
> I don't really think this is a Rev issue but actually some wierd  
> Microsoft issue or an email issue?
> Some text which originally came from MSWord, is passed to an email  (by 
> cut and paste into on the mac) and then to a field. and  then 
> output this via the "htmlText" property to an XML document which  is 
> destined to run against a XSLT using xsltProc (run via shell  commands 
> from a Rev UI) This XML file is then urlEncoded as prep for  uploading 
> via POST ... a Rev CGI, gets the POST (engine is Darwin  running on 
> Xserve..) which  urlDecodes it and saves it back to an XML  file.. goal 
> being (obviously) that the XML on the server is exactly  the same as was 
> generated by my rev app on the remote client, before  uploading
> This system is working really well, btw...until I decided to make use  
> of the htmlText of that field...
> In the original document I am seeing curly quotes and curly  
> apostrophes... which were pasted into the original input field...  now, 
> my script cleans these up to  straight quotes first, and then we  get 
> the htmlText...
> htmlText result: [snippet from a complete XML file]
> <p>In The Blessings of Children  Tiruvalluvar begins by  
> describing the benefits of having children and states that an  
> intelligent child is the greatest blessing to the family and is  indeed 
> the family s real wealth.</p>
> if I run this thru xsltProc against my style sheet (which is turning  
> the xml into a .shtml file) these all error out as "unknown  entities... 
> unable to parse /file"
> I don't see these entities on BBEdits Entity list... and the other  
> weird thing is the introduction of a space before the closing quote  or 
> apostrophe...
> And we also are seeing another gruesome manifestation:
>  “To foster a sense of self-worth in children, corporal  punishment 
> must be eliminated completely. To think that assaulting a  child--a 
> criminal offense between adults--constitutes discipline, is  virtually 
> insane. Yet, in the US, it is still legal in many states.  Discipline 
> means to teach. The only thing the paddle teaches, is  hatred. This 
> hatred is very often repressed and unconsciously  directed toward self. 
> When this happens, you have crippled a mind for  life.”
> could the urlEncoding/Decoding be doing something nasty here?
> And what is even wierder: if the user manually enters a quote or  
> apostrophe in the field... we get what we expect to get:
> "e;
> Any clues?
> Sivakatirswami
> _______________________________________________
> use-revolution mailing list
> use-revolution at
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:

More information about the Use-livecode mailing list