How to find offsets in Unicode Text fast

Richmond richmondmathewson at gmail.com
Sat Nov 10 14:41:03 EST 2018


I don't know who told you that ð was an Icelandic d.

The ð is called the "eth", and was used in Anglo-Saxon interchangeably 
with the
thorn to represent the 2 sounds that are now represented in English by 
the digraph
th.

As such Icelandic has retained the eth sign.

In Icelandic the /d/ sound is represented by the letter d.

Richmond.


On 10.11.18 г. 21:30 ч., Geoff Canyon via use-livecode wrote:
> This is faster -- under some circumstances, much faster! Any idea why
> textEncoding suddenly fixes everything?
>
> On Sat, Nov 10, 2018 at 5:13 AM Niggemann, Bernd via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>
>> This is a little late but there was a discussion about the slowness of
>> simple offset() when dealing with text that contains Unicode characters.
>>
>> Geoff Canyon and Brian Milby found a faster solution by setting the
>> itemDelimiter to the search string.
>> They even provided a way to find the position of substrings in the search
>> string which the offset() command does by design.
>>
>> Here I propose a variant of the offset() form that uses UTF16 to search,
>> easily adaptable to UTF32 if necessary.
>>
>> To test (as in Brian's testStack) add a unicode character to the text to
>> be searched e.g. at the end. Just any non-ASCII character to see the speed
>> penalty of simple offset(). I used ð (Icelandic d) or use any chinese
>> character.
>>
>>
>> Kind regards
>> Bernd
>>
>> -------------------------------------------
>> function allOffsets pDelim, pString, pCaseSensitive
>>     local tNewPos, tPos, tResult
>>
>>     put textEncode(pDelim,"UTF16") into pDelim
>>     put textEncode(pString,"UTF16") into pString
>>
>>     set the caseSensitive to pCaseSensitive is true
>>     put 0 into tPos
>>     repeat forever
>>        put offset(pDelim, pString, tPos) into tNewPos
>>        if tNewPos = 0 then exit repeat
>>        add tNewPos to tPos
>>        put tPos div 2 + tPos mod 2,"" after tResult
>>     end repeat
>>     if tResult is empty then return 0
>>     else return char 1 to -2 of tResult
>> end allOffsets
>> -----------------------------------------
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list