A peculiar character substitution problem with URL

Dave Cragg dave.cragg at lacscentre.co.uk
Mon Aug 19 13:12:00 EDT 2013


On 19 Aug 2013, at 17:04, Jonathan Lynch <jonathandlynch at gmail.com> wrote:

> This is just the strangest thing. On some websites - but not all - trying
> to get the html of that website using "get url" or "put url" is causing
> some characters to be substituted.
> 
> These are not obscure unicode characters. They seem to be characters in the
> upper ANSI set.
> 
> For example, on this web page:
> http://emergency.cdc.gov/disasters/wildfires/facts.asp
> 
> If I use the following code:
> 
> put URL "http://emergency.cdc.gov/disasters/wildfires/facts.asp" into field
> 1
> 
> The right single quote character --> ’ <-- ( which is character number 146)
> gets converted into ’
> 
> 
> I do not understand why ’ becomes ’
> 

Jonathan,

The page source for the url indicates the page is encoded as UTF-8. This is from the 'head' section of the page. 

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

So it looks like it may be 'obscure unicode characters'. :-)

What happens when you do something like this:

put URL "http://emergency.cdc.gov/disasters/wildfires/facts.asp" into tTemp
put uniDecode(uniEncode(tTemp, "UTF8")) into field 1

Cheers
Dave Cragg










More information about the use-livecode mailing list