How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Niggemann, Bernd Bernd.Niggemann at uni-wh.de
Thu Nov 1 11:27:53 EDT 2018


Hi Geoff,

thank you for this beautiful script.

I modified it a bit to accept multi-character search string and also for case sensitivity.

It definitely is a lot faster for unicode text than anything I have seen.

-----------------------------
function offsetList D,S, pCase
   -- returns a comma-delimited list of the offsets of D in S
   -- pCase is a boolean for caseSensitive
   set the caseSensitive to pCase
   set the itemDel to D
   put the length of D into tDelimLength
   repeat for each item i in S
      add length(i) + tDelimLength to C
      put C - (tDelimLength - 1),"" after R
   end repeat
   set the itemDel to comma
   if char -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList
------------------------------

Kind regards
Bernd





> 
> Date: Thu, 1 Nov 2018 00:15:37 -0700
> From: Geoff Canyon
> To: How to use LiveCode <use-livecode at lists.runrev.com>
> Subject: Re: How to find the offset of the last instance of a
> 	repeating	character in a string?
> 
> I was curious if using the itemDelimiter might work for this, so I wrote
> the below code out of curiosity; but in my quick testing with single-byte
> characters it was only about 30% faster than the above methods, so I didn't
> bother to post it.
> 
> But Ben Rubinstein just posted about a terrible slow-down doing pretty much
> this same thing for text with unicode characters. So I ran a simple test
> with 8000 character long strings that start with a single unicode
> character, this is about 15x faster than offset() with skip. For
> 100,000-character lines it's about 300x faster, so it seems to be immune to
> the line-painter issues skip is subject to. So for what it's worth:
> 
> function offsetList D,S
>   -- returns a comma-delimited list of the offsets of D in S
>   set the itemDel to D
>   repeat for each item i in S
>      add length(i) + 1 to C
>      put C,"" after R
>   end repeat
>   set the itemDel to comma
>   if char -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
> end offsetList
> 





More information about the use-livecode mailing list