How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Alex Tweedly alex at tweedly.net
Sat Nov 3 09:32:57 EDT 2018


Hi Geoff,

unfortunately the impact of overlapping delimiter strings is more severe 
than simply not finding them. The code on github gets the wrong answer 
if there is an overlapping string at the very end of the search string, e.g.

alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10

I suspect the test for

  if char -dLength to -1 of S is D then return char 1 to -2 of R
should be (something like)
   if item -1 of S is empty then return char 1 to -2 of R
but to be honest, I'm not 10% certain of that.

Alex.



On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> I like that, changing it. Now available at
> https://github.com/gcanyon/alloffsets
>
> One thing I don't see how to do without significantly impacting performance
> is to return all offsets if there are overlapping strings. For example:
>
> allOffsets("aba","abababa")
>
> would return 1,5, when it might be reasonable to expect it to return 1,3,5.
> Using the offset function with numToSkip would make that easy; adapting
> allOffsets to do so would be harder to do cleanly I think.
>
> gc
>
> On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>
>> how about allOffsets?
>>
>> Bob S
>>
>>
>>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> use-livecode at lists.runrev.com> wrote:
>>> All of those return a single value; I wanted to convey the concept of
>>> returning multiple values. To me listOffset implies it does the same
>> thing
>>> as itemOffset, since items come in a list. How about:
>>>
>>> offsets -- not my favorite because it's almost indistinguishable from
>> offset
>>> offsetsOf -- seems a tad clumsy
>>>
>>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>>> use-livecode at lists.runrev.com> wrote:
>>>
>>>> It probably should be named listOffset, like itemOffset or lineOffset.
>>>>
>>>> Bob S
>>>>
>>>>
>>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>>>> use-livecode at lists.runrev.com> wrote:
>>>>> Nice! I *just* finished creating a github repository for it, and adding
>>>>> support for multi-char search strings, much as you did. I was coming to
>>>> the
>>>>> list to post the update when I saw your post.
>>>>>
>>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>>>>>
>>>>> Here's my updated version:
>>>>>
>>>>> function offsetList D,S,pCase
>>>>>   -- returns a comma-delimited list of the offsets of D in S
>>>>>   set the caseSensitive to pCase is true
>>>>>   set the itemDel to D
>>>>>   put length(D) into dLength
>>>>>   put 1 - dLength into C
>>>>>   repeat for each item i in S
>>>>>      add length(i) + dLength to C
>>>>>      put C,"" after R
>>>>>   end repeat
>>>>>   set the itemDel to comma
>>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>>>>>   put length(C) + 1 into lenC
>>>>>   put length(R) into lenR
>>>>>   if lenC = lenR then return 0
>>>>>   return char 1 to lenR - lenC - 1 of R
>>>>> end offsetList
>>>>>
>>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>>>>> use-livecode at lists.runrev.com> wrote:
>>>>>
>>>>>> Hi Geoff,
>>>>>>
>>>>>> thank you for this beautiful script.
>>>>>>
>>>>>> I modified it a bit to accept multi-character search string and also
>> for
>>>>>> case sensitivity.
>>>>>>
>>>>>> It definitely is a lot faster for unicode text than anything I have
>>>> seen.
>>>>>> -----------------------------
>>>>>> function offsetList D,S, pCase
>>>>>>   -- returns a comma-delimited list of the offsets of D in S
>>>>>>   -- pCase is a boolean for caseSensitive
>>>>>>   set the caseSensitive to pCase
>>>>>>   set the itemDel to D
>>>>>>   put the length of D into tDelimLength
>>>>>>   repeat for each item i in S
>>>>>>      add length(i) + tDelimLength to C
>>>>>>      put C - (tDelimLength - 1),"" after R
>>>>>>   end repeat
>>>>>>   set the itemDel to comma
>>>>>>   if char -1 of S is D then return char 1 to -2 of R
>>>>>>   put length(C) + 1 into lenC
>>>>>>   put length(R) into lenR
>>>>>>   if lenC = lenR then return 0
>>>>>>   return char 1 to lenR - lenC - 1 of R
>>>>>> end offsetList
>>>>>> ------------------------------
>>>>>>
>>>>>> Kind regards
>>>>>> Bernd
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>>>>>> From: Geoff Canyon
>>>>>>> To: How to use LiveCode <use-livecode at lists.runrev.com>
>>>>>>> Subject: Re: How to find the offset of the last instance of a
>>>>>>>      repeating       character in a string?
>>>>>>>
>>>>>>> I was curious if using the itemDelimiter might work for this, so I
>>>> wrote
>>>>>>> the below code out of curiosity; but in my quick testing with
>>>> single-byte
>>>>>>> characters it was only about 30% faster than the above methods, so I
>>>>>> didn't
>>>>>>> bother to post it.
>>>>>>>
>>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> pretty
>>>>>> much
>>>>>>> this same thing for text with unicode characters. So I ran a simple
>>>> test
>>>>>>> with 8000 character long strings that start with a single unicode
>>>>>>> character, this is about 15x faster than offset() with skip. For
>>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>>>> immune
>>>>>> to
>>>>>>> the line-painter issues skip is subject to. So for what it's worth:
>>>>>>>
>>>>>>> function offsetList D,S
>>>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>>>> set the itemDel to D
>>>>>>> repeat for each item i in S
>>>>>>>     add length(i) + 1 to C
>>>>>>>     put C,"" after R
>>>>>>> end repeat
>>>>>>> set the itemDel to comma
>>>>>>> if char -1 of S is D then return char 1 to -2 of R
>>>>>>> put length(C) + 1 into lenC
>>>>>>> put length(R) into lenR
>>>>>>> if lenC = lenR then return 0
>>>>>>> return char 1 to lenR - lenC - 1 of R
>>>>>>> end offsetList
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> use-livecode at lists.runrev.com
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list