A peculiar character substitution problem with URL
Dave Cragg
dave.cragg at lacscentre.co.uk
Mon Aug 19 13:12:00 EDT 2013
On 19 Aug 2013, at 17:04, Jonathan Lynch <jonathandlynch at gmail.com> wrote:
> This is just the strangest thing. On some websites - but not all - trying
> to get the html of that website using "get url" or "put url" is causing
> some characters to be substituted.
>
> These are not obscure unicode characters. They seem to be characters in the
> upper ANSI set.
>
> For example, on this web page:
> http://emergency.cdc.gov/disasters/wildfires/facts.asp
>
> If I use the following code:
>
> put URL "http://emergency.cdc.gov/disasters/wildfires/facts.asp" into field
> 1
>
> The right single quote character --> ’ <-- ( which is character number 146)
> gets converted into ’
>
>
> I do not understand why ’ becomes ’
>
Jonathan,
The page source for the url indicates the page is encoded as UTF-8. This is from the 'head' section of the page.
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
So it looks like it may be 'obscure unicode characters'. :-)
What happens when you do something like this:
put URL "http://emergency.cdc.gov/disasters/wildfires/facts.asp" into tTemp
put uniDecode(uniEncode(tTemp, "UTF8")) into field 1
Cheers
Dave Cragg
More information about the use-livecode
mailing list