Problems downloading accented characters from a web page
Devin Asay
devin_asay at byu.edu
Fri May 19 23:51:07 EDT 2006
On May 19, 2006, at 7:55 PM, Sarah Reichelt wrote:
> Hi All,
>
> I have a routine that downloads a web page and extracts certain text.
> This works fine except when the characters are accented. I'm not sure
> how well the characters will transfer in the email, but I'll try to
> give an example:
>
> Accented e () - I never could remember which was an acute and which
> was a grave but it's numToChar(142). On the web page viewed in a
> browser and checking the source, it looks perfect. When I download
> that page into a Rev, the becomes "Ã(c)" i.e. square root &
> copyright, charToNum 195 & 169.
>
> I've tried using ISOtoMac and uniDecode and the 2 combined in various
> ways, but I can't get it to give me the correct accented e.
>
> Any ideas?
Sarah,
If it's on a web page it might be utf-8 (the metatag on the page
source might tell you for sure), especially because it's rendering
the character as two characters in rev. You could try this to see
what happens:
put url ("http://the.web.page/file.html") into tRawHtmlTxt
-- extract the stuff you want here
set the unicodetext of fld "myfld" to uniencode(tRawHtmlTxt,"utf8")
See if that helps.
Devin
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
More information about the use-livecode
mailing list