How to find offsets in Unicode Text fast

Geoff Canyon gcanyon at gmail.com
Sat Nov 10 14:30:11 EST 2018


This is faster -- under some circumstances, much faster! Any idea why
textEncoding suddenly fixes everything?

On Sat, Nov 10, 2018 at 5:13 AM Niggemann, Bernd via use-livecode <
use-livecode at lists.runrev.com> wrote:

> This is a little late but there was a discussion about the slowness of
> simple offset() when dealing with text that contains Unicode characters.
>
> Geoff Canyon and Brian Milby found a faster solution by setting the
> itemDelimiter to the search string.
> They even provided a way to find the position of substrings in the search
> string which the offset() command does by design.
>
> Here I propose a variant of the offset() form that uses UTF16 to search,
> easily adaptable to UTF32 if necessary.
>
> To test (as in Brian's testStack) add a unicode character to the text to
> be searched e.g. at the end. Just any non-ASCII character to see the speed
> penalty of simple offset(). I used ð (Icelandic d) or use any chinese
> character.
>
>
> Kind regards
> Bernd
>
> -------------------------------------------
> function allOffsets pDelim, pString, pCaseSensitive
>    local tNewPos, tPos, tResult
>
>    put textEncode(pDelim,"UTF16") into pDelim
>    put textEncode(pString,"UTF16") into pString
>
>    set the caseSensitive to pCaseSensitive is true
>    put 0 into tPos
>    repeat forever
>       put offset(pDelim, pString, tPos) into tNewPos
>       if tNewPos = 0 then exit repeat
>       add tNewPos to tPos
>       put tPos div 2 + tPos mod 2,"" after tResult
>    end repeat
>    if tResult is empty then return 0
>    else return char 1 to -2 of tResult
> end allOffsets
> -----------------------------------------
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



More information about the use-livecode mailing list