How to find offsets in Unicode Text fast
richmondmathewson at gmail.com
Sat Nov 10 14:41:03 EST 2018
I don't know who told you that ð was an Icelandic d.
The ð is called the "eth", and was used in Anglo-Saxon interchangeably
thorn to represent the 2 sounds that are now represented in English by
As such Icelandic has retained the eth sign.
In Icelandic the /d/ sound is represented by the letter d.
On 10.11.18 г. 21:30 ч., Geoff Canyon via use-livecode wrote:
> This is faster -- under some circumstances, much faster! Any idea why
> textEncoding suddenly fixes everything?
> On Sat, Nov 10, 2018 at 5:13 AM Niggemann, Bernd via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>> This is a little late but there was a discussion about the slowness of
>> simple offset() when dealing with text that contains Unicode characters.
>> Geoff Canyon and Brian Milby found a faster solution by setting the
>> itemDelimiter to the search string.
>> They even provided a way to find the position of substrings in the search
>> string which the offset() command does by design.
>> Here I propose a variant of the offset() form that uses UTF16 to search,
>> easily adaptable to UTF32 if necessary.
>> To test (as in Brian's testStack) add a unicode character to the text to
>> be searched e.g. at the end. Just any non-ASCII character to see the speed
>> penalty of simple offset(). I used ð (Icelandic d) or use any chinese
>> Kind regards
>> function allOffsets pDelim, pString, pCaseSensitive
>> local tNewPos, tPos, tResult
>> put textEncode(pDelim,"UTF16") into pDelim
>> put textEncode(pString,"UTF16") into pString
>> set the caseSensitive to pCaseSensitive is true
>> put 0 into tPos
>> repeat forever
>> put offset(pDelim, pString, tPos) into tNewPos
>> if tNewPos = 0 then exit repeat
>> add tNewPos to tPos
>> put tPos div 2 + tPos mod 2,"" after tResult
>> end repeat
>> if tResult is empty then return 0
>> else return char 1 to -2 of tResult
>> end allOffsets
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
More information about the use-livecode