Strange Entities from htmlText
Sivakatirswami
katir at hindu.org
Sat Jul 23 11:25:40 EDT 2005
Aloha, Martin:
Thanks, that's definitely a clue... I'm not trained in these encoding
issues, so I'm way out of my depth...
what's mystifying is that I thought that the basic ANSI set (0-255)
was the same in ASCII and Unicode
a) on the decimal list in BBEdit's ASCII 28 and29 are empty
b) put numToChar(28) returns a strange square box with an X thru it
in Rev.
c) 147 and 148 are not even on BBEdit's list at all... and these are
definitely double quotes of some kind in the original text
But, this is helpful..I need to to a lo-level byte by byte
examination of some of the source text files...
Thanks
Sivakatirswami
On Jul 22, 2005, at 9:41 PM, Martin Baxter wrote:
> Greetings,
>
> As a clue... One of the originating applications is (reasonably
> enough) encoding the curly quotes as unicode entities:
>
> decimal 8220 and 8221.
>
> This is 201C and 201D in hex.
> 20 in hex = 32 in decimal which is space
> 1C in hex = 28 in decimal hence
> so 1D in hex = 29 in decimal hence
>
> Martin Baxter
>
> Sivakatirswami wrote:
>
>> I don't really think this is a Rev issue but actually some wierd
>> Microsoft issue or an email issue?
>> Some text which originally came from MSWord, is passed to an
>> email (by cut and paste into mail.app on the mac) and then to a
>> field. and then output this via the "htmlText" property to an XML
>> document which is destined to run against a XSLT using xsltProc
>> (run via shell commands from a Rev UI) This XML file is then
>> urlEncoded as prep for uploading via POST ... a Rev CGI, gets the
>> POST (engine is Darwin running on Xserve..) which urlDecodes it
>> and saves it back to an XML file.. goal being (obviously) that
>> the XML on the server is exactly the same as was generated by my
>> rev app on the remote client, before uploading
>> This system is working really well, btw...until I decided to make
>> use of the htmlText of that field...
>> In the original document I am seeing curly quotes and curly
>> apostrophes... which were pasted into the original input field...
>> now, my script cleans these up to straight quotes first, and then
>> we get the htmlText...
>> htmlText result: [snippet from a complete XML file]
>> <p>In The Blessings of Children Tiruvalluvar begins by
>> describing the benefits of having children and states that an
>> intelligent child is the greatest blessing to the family and is
>> indeed the family s real wealth.</p>
>> if I run this thru xsltProc against my style sheet (which is
>> turning the xml into a .shtml file) these all error out as
>> "unknown entities... unable to parse /file"
>> I don't see these entities on BBEdits Entity list... and the
>> other weird thing is the introduction of a space before the
>> closing quote or apostrophe...
>> And we also are seeing another gruesome manifestation:
>> To foster a sense of self-worth in children, corporal
>> punishment must be eliminated completely. To think that assaulting
>> a child--a criminal offense between adults--constitutes
>> discipline, is virtually insane. Yet, in the US, it is still
>> legal in many states. Discipline means to teach. The only thing
>> the paddle teaches, is hatred. This hatred is very often
>> repressed and unconsciously directed toward self. When this
>> happens, you have crippled a mind for life.
>> could the urlEncoding/Decoding be doing something nasty here?
>> And what is even wierder: if the user manually enters a quote or
>> apostrophe in the field... we get what we expect to get:
>> "e;
>> Any clues?
>> Sivakatirswami
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
More information about the use-livecode
mailing list