Unwanted characters from pasted-in text.

stephen barncard stephenREVOLUTION2 at barncard.com
Wed Jun 24 15:05:10 EDT 2009


I have a wordpress blog that has survived many upgrades. Some text was
entered by users by pasting text into the earlier simple Wordpress entry
fields in the early years. The text was readable and correct at the time.
I'm guessing that Wordpress changed their MySQL character encoding.
In the transition to the later
versions, the text has a lot of character errors, usually involving
'smart' quotes, apostrophes and dashes.
Also an occasional null would work its way in, and you know how rev love
nulls.

In trying to clean this up with rev, here's my brute force method:


<code>

put fld "output2" into tBlock
replace numToChar(0) with empty in tBlock -- nulls
replace "â€ù" WITH quote in tBlock
replace "’" WITH "'" in tBlock
replace "â€a" WITH "'" in tBlock
replace "�" WITH quote in tBlock
replace "“" WITH quote in tBlock
replace "â€ù" WITH quote in tBlock
replace "–" WITH "-" in tBlock
replace "ˆ" WITH "-" in tBlock
put tBlock into fld "output"

</code>



Does this look familiar?


I'd be ok with this, but every time I run this with other posts, I get new
codes that aren't covered above.


Is there some kind of unicode trick that would cover everything?


Or am I stuck with this method?


thanks,


sqb

-------------------------
Stephen Barncard
San Francisco
http://barncard.com



More information about the use-livecode mailing list