Problems downloading accented characters from a web page

Devin Asay devin_asay at byu.edu
Fri May 19 23:51:07 EDT 2006


On May 19, 2006, at 7:55 PM, Sarah Reichelt wrote:

> Hi All,
>
> I have a routine that downloads a web page and extracts certain text.
> This works fine except when the characters are accented. I'm not sure
> how well the characters will transfer in the email, but I'll try to
> give an example:
>
> Accented e (Ž) - I never could remember which was an acute and which
> was a grave but it's numToChar(142). On the web page viewed in a
> browser and checking the source, it looks perfect. When I download
> that page into a Rev, the Ž becomes "Ã(c)" i.e. square root &
> copyright, charToNum 195 & 169.
>
> I've tried using ISOtoMac and uniDecode and the 2 combined in various
> ways, but I can't get it to give me the correct accented e.
>
> Any ideas?

Sarah,

If it's on a web page it might be utf-8 (the metatag on the page  
source might tell you for sure), especially because it's rendering  
the character as two characters in rev. You could try this to see  
what happens:

put url ("http://the.web.page/file.html") into tRawHtmlTxt
-- extract the stuff you want here
set the unicodetext of fld "myfld" to uniencode(tRawHtmlTxt,"utf8")

See if that helps.

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University




More information about the use-livecode mailing list