Strange Entities from htmlText

Sivakatirswami katir at hindu.org
Sat Jul 23 11:25:40 EDT 2005


Aloha, Martin:

Thanks, that's definitely a clue... I'm not trained in these encoding  
issues, so I'm way out of my depth...

what's mystifying is  that I thought that the basic ANSI set (0-255)  
was the same in ASCII and Unicode

a) on the decimal list in BBEdit's ASCII 28 and29 are empty
b) put numToChar(28)  returns a strange square box with an X thru it  
in Rev.
c) 147 and 148 are  not even on BBEdit's list at all... and these are  
definitely double quotes of some kind in the original text

But, this is helpful..I need to to a lo-level byte by byte  
examination of some of the source text files...

Thanks

Sivakatirswami




On Jul 22, 2005, at 9:41 PM, Martin Baxter wrote:

> Greetings,
>
> As a clue... One of the originating applications is (reasonably  
> enough) encoding the curly quotes as unicode entities:
>
> decimal 8220 and 8221.
>
> This is 201C and 201D in hex.
> 20 in hex = 32 in decimal which is space
> 1C in hex = 28 in decimal hence 
> so 1D in hex = 29 in decimal hence 
>
> Martin Baxter
>
> Sivakatirswami wrote:
>
>> I don't really think this is a Rev issue but actually some wierd   
>> Microsoft issue or an email issue?
>> Some text which originally came from MSWord, is passed to an  
>> email  (by cut and paste into mail.app on the mac) and then to a  
>> field. and  then output this via the "htmlText" property to an XML  
>> document which  is destined to run against a XSLT using xsltProc  
>> (run via shell  commands from a Rev UI) This XML file is then  
>> urlEncoded as prep for  uploading via POST ... a Rev CGI, gets the  
>> POST (engine is Darwin  running on Xserve..) which  urlDecodes it  
>> and saves it back to an XML  file.. goal being (obviously) that  
>> the XML on the server is exactly  the same as was generated by my  
>> rev app on the remote client, before  uploading
>> This system is working really well, btw...until I decided to make  
>> use  of the htmlText of that field...
>> In the original document I am seeing curly quotes and curly   
>> apostrophes... which were pasted into the original input field...   
>> now, my script cleans these up to  straight quotes first, and then  
>> we  get the htmlText...
>> htmlText result: [snippet from a complete XML file]
>> <p>In The Blessings of Children  Tiruvalluvar begins by   
>> describing the benefits of having children and states that an   
>> intelligent child is the greatest blessing to the family and is   
>> indeed the family s real wealth.</p>
>> if I run this thru xsltProc against my style sheet (which is  
>> turning  the xml into a .shtml file) these all error out as  
>> "unknown  entities... unable to parse /file"
>> I don't see these entities on BBEdits Entity list... and the  
>> other  weird thing is the introduction of a space before the  
>> closing quote  or apostrophe...
>> And we also are seeing another gruesome manifestation:
>>  “To foster a sense of self-worth in children, corporal   
>> punishment must be eliminated completely. To think that assaulting  
>> a  child--a criminal offense between adults--constitutes  
>> discipline, is  virtually insane. Yet, in the US, it is still  
>> legal in many states.  Discipline means to teach. The only thing  
>> the paddle teaches, is  hatred. This hatred is very often  
>> repressed and unconsciously  directed toward self. When this  
>> happens, you have crippled a mind for  life.”
>> could the urlEncoding/Decoding be doing something nasty here?
>> And what is even wierder: if the user manually enters a quote or   
>> apostrophe in the field... we get what we expect to get:
>> "e;
>> Any clues?
>> Sivakatirswami
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your  
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>




More information about the use-livecode mailing list