reading and converting web page HTML text
François Chaplais
francois.chaplais at mines-paristech.fr
Sat Mar 6 17:10:52 EST 2010
Le 6 mars 2010 à 23:01, Mark Stuart a écrit :
> Hi all,
> I'm reading the HTML text of a web page and parsing it. Some of the text
> that I'm parsing contains (") - braces not included.
>
> What runrev function do I use to convert that HTML text to the double quote
> (") character?
> There will be other characters that I also need to convert, such as
> (Björnke).
> After reading and parsing the text, I'll be loading a DataGrid.
>
> I've tried some functions, but with no success.
>
> Regards,
> Mark Stuart
>
digging in my mail archive I found this post from Sarah (it puts unicode text into a field from an HTML source, if I am correct)
HTH
--------------------------------------------------------------------
On Sun, Jul 26, 2009 at 7:18 AM, Sivakatirswami<katir at hindu.org> wrote:
> Is there a way to get htmlEntities
>
> "“Kanwar”
>
> The rest of their lifestyle — names, marriage rituals, dressing styles
> — continued to be the same...."
>
> to appear correctly in a field where such enties are part of the html used
> to set the htmltext of a field?
I had to wrestle with this recently and after numerous attempts with
uniencode, unidecode, macToISO etc., I ended up writing my own
function to do it:
function decodeEntities pText
if pText contains "&#" is false then return pText
set the useunicode to true
put empty into tNew
repeat until pText is empty
put char 1 of pText into c
if c <> "&" then
put c after tNew
delete char 1 of pText
else
put empty into tCode
delete char 1 to 2 of pText
repeat until char 1 of pText = ";"
put char 1 of pText after tCode
delete char 1 of pText
if pText is empty then exit repeat
end repeat
delete char 1 of pText
put numtochar(tCode) into tChar
set the unicodetext of the templatefield to tChar
put the text of the templatefield after tNew
end if
end repeat
set the useunicode to true
return tNew
end decodeEntities
Use it like this: put decodeEntities("“Kanwar”")
which returns: “Kanwar” (curly opening & closing quotes which
may not show in the email).
I feel sure that there must be a better method, but until someone
discovers it, this function seems to do the job.
Cheers,
Sarah
_______________________________________________
use-revolution mailing list
use-revolution at lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
More information about the use-livecode
mailing list