How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Brian Milby brian at milby7.com
Sat Nov 3 12:27:39 EDT 2018


I've posted a binary stack version that includes my version.  I cloned and
made a "bwm" branch in my clone.  Here's the direct link to the script with
the posted code (updated to use private functions):

https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1009.livecodescript

The binary stack can be found here:

https://github.com/bwmilby/alloffsets/tree/bwm/bwm

There are 3 button across the top.  The first is Geoff's version.  The
second is my combined version.  The third is the one with private functions
added.  The first button replaces the results field.  The second and third
add their results to the results field.

The top field is the string to find (needle), the second is the string to
search (haystack), the third is for the results.
Everything is in a background group so you can add cards for unique
searches.

On Sat, Nov 3, 2018 at 9:17 AM Brian Milby <brian at milby7.com> wrote:

> Good catch Alex.  My code was closer, but didn't handle repeating
> characters correctly.  Here is an updated version.
>
> function allOffsets2 D,S,pCase
>    local dLength, C, R
>    -- returns a comma-delimited list of the offsets of D in S
>    set the caseSensitive to pCase is true
>    set the itemDel to D
>    put length(D) into dLength
>    put 1 - dLength into C
>
>    if dLength > 1 then
>       local n, i, j, D2, L2
>       put 0 into n
>       repeat with i = 2 to dLength
>          if char i to -1 of D is char 1 to -i of D then
>             add 1 to n
>             put char (1-i) to -1 of D into D2[n]
>             put i-1 into L2[n]
>          end if
>       end repeat
>    end if
>
>    repeat for each item i in S
>       if C > 0 and n > 0 then
>          repeat with j = 1 to n
>             if i&D begins with D2[j] then
>                put C+L2[j],"" after R
>             end if
>          end repeat
>       end if
>       add length(i) + dLength to C
>       put C,"" after R
>    end repeat
>    set the itemDel to comma
>    delete char -1 of R
>
>    if item -1 of R > len(S) then
>       if the number of items of R is 1 then
>          return 0
>       else
>          delete item -1 of R
>       end if
>    end if
>
>    if len(i) > 0 then
>       repeat with j = n down to len(i)+1
>          if char -len(D2[j]) to -1 of S is D2[j] then
>             delete item -1 of R
>          end if
>       end repeat
>    end if
>    return R
> end allOffsets2
>
>
> On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>
>> Hi Geoff,
>>
>> unfortunately the impact of overlapping delimiter strings is more severe
>> than simply not finding them. The code on github gets the wrong answer
>> if there is an overlapping string at the very end of the search string,
>> e.g.
>>
>> alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10
>>
>> I suspect the test for
>>
>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>> should be (something like)
>>    if item -1 of S is empty then return char 1 to -2 of R
>> but to be honest, I'm not 10% certain of that.
>>
>> Alex.
>>
>>
>>
>> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
>> > I like that, changing it. Now available at
>> > https://github.com/gcanyon/alloffsets
>> >
>> > One thing I don't see how to do without significantly impacting
>> performance
>> > is to return all offsets if there are overlapping strings. For example:
>> >
>> > allOffsets("aba","abababa")
>> >
>> > would return 1,5, when it might be reasonable to expect it to return
>> 1,3,5.
>> > Using the offset function with numToSkip would make that easy; adapting
>> > allOffsets to do so would be harder to do cleanly I think.
>> >
>> > gc
>> >
>> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
>> > use-livecode at lists.runrev.com> wrote:
>> >
>> >> how about allOffsets?
>> >>
>> >> Bob S
>> >>
>> >>
>> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> >> use-livecode at lists.runrev.com> wrote:
>> >>> All of those return a single value; I wanted to convey the concept of
>> >>> returning multiple values. To me listOffset implies it does the same
>> >> thing
>> >>> as itemOffset, since items come in a list. How about:
>> >>>
>> >>> offsets -- not my favorite because it's almost indistinguishable from
>> >> offset
>> >>> offsetsOf -- seems a tad clumsy
>> >>>
>> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>> >>> use-livecode at lists.runrev.com> wrote:
>> >>>
>> >>>> It probably should be named listOffset, like itemOffset or
>> lineOffset.
>> >>>>
>> >>>> Bob S
>> >>>>
>> >>>>
>> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>> >>>> use-livecode at lists.runrev.com> wrote:
>> >>>>> Nice! I *just* finished creating a github repository for it, and
>> adding
>> >>>>> support for multi-char search strings, much as you did. I was
>> coming to
>> >>>> the
>> >>>>> list to post the update when I saw your post.
>> >>>>>
>> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>> >>>>>
>> >>>>> Here's my updated version:
>> >>>>>
>> >>>>> function offsetList D,S,pCase
>> >>>>>   -- returns a comma-delimited list of the offsets of D in S
>> >>>>>   set the caseSensitive to pCase is true
>> >>>>>   set the itemDel to D
>> >>>>>   put length(D) into dLength
>> >>>>>   put 1 - dLength into C
>> >>>>>   repeat for each item i in S
>> >>>>>      add length(i) + dLength to C
>> >>>>>      put C,"" after R
>> >>>>>   end repeat
>> >>>>>   set the itemDel to comma
>> >>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>> >>>>>   put length(C) + 1 into lenC
>> >>>>>   put length(R) into lenR
>> >>>>>   if lenC = lenR then return 0
>> >>>>>   return char 1 to lenR - lenC - 1 of R
>> >>>>> end offsetList
>> >>>>>
>> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>> >>>>> use-livecode at lists.runrev.com> wrote:
>> >>>>>
>> >>>>>> Hi Geoff,
>> >>>>>>
>> >>>>>> thank you for this beautiful script.
>> >>>>>>
>> >>>>>> I modified it a bit to accept multi-character search string and
>> also
>> >> for
>> >>>>>> case sensitivity.
>> >>>>>>
>> >>>>>> It definitely is a lot faster for unicode text than anything I have
>> >>>> seen.
>> >>>>>> -----------------------------
>> >>>>>> function offsetList D,S, pCase
>> >>>>>>   -- returns a comma-delimited list of the offsets of D in S
>> >>>>>>   -- pCase is a boolean for caseSensitive
>> >>>>>>   set the caseSensitive to pCase
>> >>>>>>   set the itemDel to D
>> >>>>>>   put the length of D into tDelimLength
>> >>>>>>   repeat for each item i in S
>> >>>>>>      add length(i) + tDelimLength to C
>> >>>>>>      put C - (tDelimLength - 1),"" after R
>> >>>>>>   end repeat
>> >>>>>>   set the itemDel to comma
>> >>>>>>   if char -1 of S is D then return char 1 to -2 of R
>> >>>>>>   put length(C) + 1 into lenC
>> >>>>>>   put length(R) into lenR
>> >>>>>>   if lenC = lenR then return 0
>> >>>>>>   return char 1 to lenR - lenC - 1 of R
>> >>>>>> end offsetList
>> >>>>>> ------------------------------
>> >>>>>>
>> >>>>>> Kind regards
>> >>>>>> Bernd
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>> >>>>>>> From: Geoff Canyon
>> >>>>>>> To: How to use LiveCode <use-livecode at lists.runrev.com>
>> >>>>>>> Subject: Re: How to find the offset of the last instance of a
>> >>>>>>>      repeating       character in a string?
>> >>>>>>>
>> >>>>>>> I was curious if using the itemDelimiter might work for this, so I
>> >>>> wrote
>> >>>>>>> the below code out of curiosity; but in my quick testing with
>> >>>> single-byte
>> >>>>>>> characters it was only about 30% faster than the above methods,
>> so I
>> >>>>>> didn't
>> >>>>>>> bother to post it.
>> >>>>>>>
>> >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> >> pretty
>> >>>>>> much
>> >>>>>>> this same thing for text with unicode characters. So I ran a
>> simple
>> >>>> test
>> >>>>>>> with 8000 character long strings that start with a single unicode
>> >>>>>>> character, this is about 15x faster than offset() with skip. For
>> >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>> >>>> immune
>> >>>>>> to
>> >>>>>>> the line-painter issues skip is subject to. So for what it's
>> worth:
>> >>>>>>>
>> >>>>>>> function offsetList D,S
>> >>>>>>> -- returns a comma-delimited list of the offsets of D in S
>> >>>>>>> set the itemDel to D
>> >>>>>>> repeat for each item i in S
>> >>>>>>>     add length(i) + 1 to C
>> >>>>>>>     put C,"" after R
>> >>>>>>> end repeat
>> >>>>>>> set the itemDel to comma
>> >>>>>>> if char -1 of S is D then return char 1 to -2 of R
>> >>>>>>> put length(C) + 1 into lenC
>> >>>>>>> put length(R) into lenR
>> >>>>>>> if lenC = lenR then return 0
>> >>>>>>> return char 1 to lenR - lenC - 1 of R
>> >>>>>>> end offsetList
>> >>>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> use-livecode mailing list
>> >>>>>> use-livecode at lists.runrev.com
>> >>>>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>>>> subscription preferences:
>> >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>>>
>> >>>>> _______________________________________________
>> >>>>> use-livecode mailing list
>> >>>>> use-livecode at lists.runrev.com
>> >>>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>> subscription preferences:
>> >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>
>> >>>> _______________________________________________
>> >>>> use-livecode mailing list
>> >>>> use-livecode at lists.runrev.com
>> >>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>> subscription preferences:
>> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>
>> >>> _______________________________________________
>> >>> use-livecode mailing list
>> >>> use-livecode at lists.runrev.com
>> >>> Please visit this url to subscribe, unsubscribe and manage your
>> >> subscription preferences:
>> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>
>> >> _______________________________________________
>> >> use-livecode mailing list
>> >> use-livecode at lists.runrev.com
>> >> Please visit this url to subscribe, unsubscribe and manage your
>> >> subscription preferences:
>> >> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>
>> > _______________________________________________
>> > use-livecode mailing list
>> > use-livecode at lists.runrev.com
>> > Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> > http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>



More information about the use-livecode mailing list