Converting hex character references

Alex Tweedly alex at tweedly.net
Mon Jan 10 16:15:54 EST 2005


Richard Gaskin wrote:

> Converting ISO-8959-1 character references to displayable text is a snap:
>
> If field 1 contains this:
>
>       Don’t give up & call it quits.
>
> ...I can get the plain text like this:
>
>      set the htmlText of fld 2 to the text of fld 1
>      get the text of fld 2
>
>
> But what do I do when the data I'm working with contains hex character 
> references?:
>
>      Don’t give up & call it quits.
>
> I have a bunch of XML files that are UTF-8 encoded and chock full o' 
> hex character references like that, and doing a replace on each or 
> hunting them down to do a baseConvert would be inefficient.
>
> I'd like to think some combination of Unicode functions/properties 
> would do the trick, but alas I'm too braindead to come up with the 
> winning solution.


Sorry, I'm clueless about Unicode; noting leaps out of the docs to 
suggest itself.

If there isn't a clever Unicode method, you could do the following .... 
note it ignores the more complex parts of UTF-*, and deals only with 
those chars that can be represented in 2 hex digits ....

It uses replace to do the actual changes - but only does one replace for 
each character encoded in the original, so should be pretty fast (NB : 
not tested for speed - only for working correctly in simple cases).


on mouseUP
  local tText, tArr, tNew, tmp
  put the text of field "inField" into tText
  put tText into tmp
  split tmp by "&" and ";"
  put the keys of tmp into tArr
  filter tArr with "#x*"
  repeat for each line L in tArr
    put baseconvert(char 3 to 4 of L, 16, 10) into tNew
    replace (char 2 to 4 of L) with tNew in tText
  end repeat
  put tText after msg
end mouseUp

-- Alex.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.6.9 - Release Date: 06/01/2005



More information about the use-livecode mailing list