Translating escape sequences

Richmond Mathewson richmondmathewson at gmail.com
Wed Mar 15 16:16:57 EDT 2017


Just knock off the last 3, and what is left is what you want.

Richmond.

On 3/15/17 6:43 pm, J. Landman Gay via use-livecode wrote:
> The problem with the pseudo code is that there's no clear indication 
> of how many characters at the end to preserve. I'm not sure how the 
> libraries deal with that.
>
> -- 
> Jacqueline Landman Gay         |     jacque at hyperactivesw.com
> HyperActive Software           |     http://www.hyperactivesw.com
>
>
>
> On March 15, 2017 2:28:57 AM Richmond Mathewson via use-livecode 
> <use-livecode at lists.runrev.com> wrote:
>
>> No; it won't always be 4 characters, here's an admittedly extremely
>> obscure ancient Sinhala number;
>> 0x111F4.
>>
>> Of course the chances of encountering whacky characters like that is
>> small, but you'll have to make sure you
>> can cope with them should they crop up.
>>
>> If you look at Eduardo Ba\u00f1uls you will have to strip what comes
>> after the '\' of the prefix 'u'
>> and the suffix 'uls' and then you can cope with whatever is left:
>>
>> Reasonably pseudo-code following:
>>
>> set the item delimiter to \
>> put what's after the item delimiter into HOLDER
>> delete char 1 of HOLDER
>> delete the last char of HOLDER
>> delete the last char of HOLDER
>> delete the last char of HOLDER
>> put "0x" & HOLDER into NUNUM
>>
>> at this point "NUNUM" could be alost any length, but that should not
>> matter unduly.
>>
>> Richmond.
>>
>> On 3/14/17 11:26 pm, J. Landman Gay via use-livecode wrote:
>>> I'm dealing with non-English languages, and JSON data retrieved from a
>>> database comes in with unicode escape sequences like this: Eduardo
>>> Ba\u00f1uls.
>>>
>>> I need to translate those. I can do it by replacing the "\u" with "0x"
>>> and then using numToCodepoint() to get the UTF16 character. But there
>>> could be many of these in the same string, so I'm looking for a
>>> one-shot command that might just do them all. I don't think we have 
>>> one.
>>>
>>> The alternative is to loop through all the text, getting an offset for
>>> each "\u" and then calculating the number of characters after that to
>>> use with numToCodepoint(). But will it always be 4 characters in any
>>> language?
>>>
>>> Or is there an easier way?
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your 
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your 
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list