Translating escape sequences

J. Landman Gay jacque at hyperactivesw.com
Wed Mar 15 12:43:43 EDT 2017


The problem with the pseudo code is that there's no clear indication of how 
many characters at the end to preserve. I'm not sure how the libraries deal 
with that.

--
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com



On March 15, 2017 2:28:57 AM Richmond Mathewson via use-livecode 
<use-livecode at lists.runrev.com> wrote:

> No; it won't always be 4 characters, here's an admittedly extremely
> obscure ancient Sinhala number;
> 0x111F4.
>
> Of course the chances of encountering whacky characters like that is
> small, but you'll have to make sure you
> can cope with them should they crop up.
>
> If you look at Eduardo Ba\u00f1uls you will have to strip what comes
> after the '\' of the prefix 'u'
> and the suffix 'uls' and then you can cope with whatever is left:
>
> Reasonably pseudo-code following:
>
> set the item delimiter to \
> put what's after the item delimiter into HOLDER
> delete char 1 of HOLDER
> delete the last char of HOLDER
> delete the last char of HOLDER
> delete the last char of HOLDER
> put "0x" & HOLDER into NUNUM
>
> at this point "NUNUM" could be alost any length, but that should not
> matter unduly.
>
> Richmond.
>
> On 3/14/17 11:26 pm, J. Landman Gay via use-livecode wrote:
>> I'm dealing with non-English languages, and JSON data retrieved from a
>> database comes in with unicode escape sequences like this: Eduardo
>> Ba\u00f1uls.
>>
>> I need to translate those. I can do it by replacing the "\u" with "0x"
>> and then using numToCodepoint() to get the UTF16 character. But there
>> could be many of these in the same string, so I'm looking for a
>> one-shot command that might just do them all. I don't think we have one.
>>
>> The alternative is to loop through all the text, getting an offset for
>> each "\u" and then calculating the number of characters after that to
>> use with numToCodepoint(). But will it always be 4 characters in any
>> language?
>>
>> Or is there an easier way?
>>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode






More information about the use-livecode mailing list