Strange Entities from htmlText
Martin Baxter
martin at materiaprima.fsnet.co.uk
Sat Jul 23 03:41:06 EDT 2005
Greetings,
As a clue... One of the originating applications is (reasonably enough)
encoding the curly quotes as unicode entities:
decimal 8220 and 8221.
This is 201C and 201D in hex.
20 in hex = 32 in decimal which is space
1C in hex = 28 in decimal hence
so 1D in hex = 29 in decimal hence
Martin Baxter
Sivakatirswami wrote:
> I don't really think this is a Rev issue but actually some wierd
> Microsoft issue or an email issue?
>
> Some text which originally came from MSWord, is passed to an email (by
> cut and paste into mail.app on the mac) and then to a field. and then
> output this via the "htmlText" property to an XML document which is
> destined to run against a XSLT using xsltProc (run via shell commands
> from a Rev UI) This XML file is then urlEncoded as prep for uploading
> via POST ... a Rev CGI, gets the POST (engine is Darwin running on
> Xserve..) which urlDecodes it and saves it back to an XML file.. goal
> being (obviously) that the XML on the server is exactly the same as was
> generated by my rev app on the remote client, before uploading
>
> This system is working really well, btw...until I decided to make use
> of the htmlText of that field...
>
> In the original document I am seeing curly quotes and curly
> apostrophes... which were pasted into the original input field... now,
> my script cleans these up to straight quotes first, and then we get
> the htmlText...
>
> htmlText result: [snippet from a complete XML file]
>
> <p>In The Blessings of Children Tiruvalluvar begins by
> describing the benefits of having children and states that an
> intelligent child is the greatest blessing to the family and is indeed
> the family s real wealth.</p>
>
> if I run this thru xsltProc against my style sheet (which is turning
> the xml into a .shtml file) these all error out as "unknown entities...
> unable to parse /file"
>
> I don't see these entities on BBEdits Entity list... and the other
> weird thing is the introduction of a space before the closing quote or
> apostrophe...
>
> And we also are seeing another gruesome manifestation:
>
> To foster a sense of self-worth in children, corporal punishment
> must be eliminated completely. To think that assaulting a child--a
> criminal offense between adults--constitutes discipline, is virtually
> insane. Yet, in the US, it is still legal in many states. Discipline
> means to teach. The only thing the paddle teaches, is hatred. This
> hatred is very often repressed and unconsciously directed toward self.
> When this happens, you have crippled a mind for life.
>
> could the urlEncoding/Decoding be doing something nasty here?
>
> And what is even wierder: if the user manually enters a quote or
> apostrophe in the field... we get what we expect to get:
>
> "e;
>
> Any clues?
>
> Sivakatirswami
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>
More information about the use-livecode
mailing list