Translating escape sequences

Mike Bonner bonnmike at gmail.com
Wed Mar 15 17:03:48 EDT 2017


does this mean one could replace /u with 0x and then replace uls with empty
and end up with the correct end result?

On Wed, Mar 15, 2017 at 2:16 PM, Richmond Mathewson via use-livecode <
use-livecode at lists.runrev.com> wrote:

> Just knock off the last 3, and what is left is what you want.
>
> Richmond.
>
> On 3/15/17 6:43 pm, J. Landman Gay via use-livecode wrote:
>
>> The problem with the pseudo code is that there's no clear indication of
>> how many characters at the end to preserve. I'm not sure how the libraries
>> deal with that.
>>
>> --
>> Jacqueline Landman Gay         |     jacque at hyperactivesw.com
>> HyperActive Software           |     http://www.hyperactivesw.com
>>
>>
>>
>> On March 15, 2017 2:28:57 AM Richmond Mathewson via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>
>> No; it won't always be 4 characters, here's an admittedly extremely
>>> obscure ancient Sinhala number;
>>> 0x111F4.
>>>
>>> Of course the chances of encountering whacky characters like that is
>>> small, but you'll have to make sure you
>>> can cope with them should they crop up.
>>>
>>> If you look at Eduardo Ba\u00f1uls you will have to strip what comes
>>> after the '\' of the prefix 'u'
>>> and the suffix 'uls' and then you can cope with whatever is left:
>>>
>>> Reasonably pseudo-code following:
>>>
>>> set the item delimiter to \
>>> put what's after the item delimiter into HOLDER
>>> delete char 1 of HOLDER
>>> delete the last char of HOLDER
>>> delete the last char of HOLDER
>>> delete the last char of HOLDER
>>> put "0x" & HOLDER into NUNUM
>>>
>>> at this point "NUNUM" could be alost any length, but that should not
>>> matter unduly.
>>>
>>> Richmond.
>>>
>>> On 3/14/17 11:26 pm, J. Landman Gay via use-livecode wrote:
>>>
>>>> I'm dealing with non-English languages, and JSON data retrieved from a
>>>> database comes in with unicode escape sequences like this: Eduardo
>>>> Ba\u00f1uls.
>>>>
>>>> I need to translate those. I can do it by replacing the "\u" with "0x"
>>>> and then using numToCodepoint() to get the UTF16 character. But there
>>>> could be many of these in the same string, so I'm looking for a
>>>> one-shot command that might just do them all. I don't think we have one.
>>>>
>>>> The alternative is to loop through all the text, getting an offset for
>>>> each "\u" and then calculating the number of characters after that to
>>>> use with numToCodepoint(). But will it always be 4 characters in any
>>>> language?
>>>>
>>>> Or is there an easier way?
>>>>
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list