How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Brian Milby brian at milby7.com
Sat Nov 3 10:17:43 EDT 2018


Good catch Alex.  My code was closer, but didn't handle repeating
characters correctly.  Here is an updated version.

function allOffsets2 D,S,pCase
   local dLength, C, R
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C

   if dLength > 1 then
      local n, i, j, D2, L2
      put 0 into n
      repeat with i = 2 to dLength
         if char i to -1 of D is char 1 to -i of D then
            add 1 to n
            put char (1-i) to -1 of D into D2[n]
            put i-1 into L2[n]
         end if
      end repeat
   end if

   repeat for each item i in S
      if C > 0 and n > 0 then
         repeat with j = 1 to n
            if i&D begins with D2[j] then
               put C+L2[j],"" after R
            end if
         end repeat
      end if
      add length(i) + dLength to C
      put C,"" after R
   end repeat
   set the itemDel to comma
   delete char -1 of R

   if item -1 of R > len(S) then
      if the number of items of R is 1 then
         return 0
      else
         delete item -1 of R
      end if
   end if

   if len(i) > 0 then
      repeat with j = n down to len(i)+1
         if char -len(D2[j]) to -1 of S is D2[j] then
            delete item -1 of R
         end if
      end repeat
   end if
   return R
end allOffsets2


On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
use-livecode at lists.runrev.com> wrote:

> Hi Geoff,
>
> unfortunately the impact of overlapping delimiter strings is more severe
> than simply not finding them. The code on github gets the wrong answer
> if there is an overlapping string at the very end of the search string,
> e.g.
>
> alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10
>
> I suspect the test for
>
>   if char -dLength to -1 of S is D then return char 1 to -2 of R
> should be (something like)
>    if item -1 of S is empty then return char 1 to -2 of R
> but to be honest, I'm not 10% certain of that.
>
> Alex.
>
>
>
> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> > I like that, changing it. Now available at
> > https://github.com/gcanyon/alloffsets
> >
> > One thing I don't see how to do without significantly impacting
> performance
> > is to return all offsets if there are overlapping strings. For example:
> >
> > allOffsets("aba","abababa")
> >
> > would return 1,5, when it might be reasonable to expect it to return
> 1,3,5.
> > Using the offset function with numToSkip would make that easy; adapting
> > allOffsets to do so would be harder to do cleanly I think.
> >
> > gc
> >
> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> > use-livecode at lists.runrev.com> wrote:
> >
> >> how about allOffsets?
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> >> use-livecode at lists.runrev.com> wrote:
> >>> All of those return a single value; I wanted to convey the concept of
> >>> returning multiple values. To me listOffset implies it does the same
> >> thing
> >>> as itemOffset, since items come in a list. How about:
> >>>
> >>> offsets -- not my favorite because it's almost indistinguishable from
> >> offset
> >>> offsetsOf -- seems a tad clumsy
> >>>
> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> >>> use-livecode at lists.runrev.com> wrote:
> >>>
> >>>> It probably should be named listOffset, like itemOffset or lineOffset.
> >>>>
> >>>> Bob S
> >>>>
> >>>>
> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> >>>> use-livecode at lists.runrev.com> wrote:
> >>>>> Nice! I *just* finished creating a github repository for it, and
> adding
> >>>>> support for multi-char search strings, much as you did. I was coming
> to
> >>>> the
> >>>>> list to post the update when I saw your post.
> >>>>>
> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >>>>>
> >>>>> Here's my updated version:
> >>>>>
> >>>>> function offsetList D,S,pCase
> >>>>>   -- returns a comma-delimited list of the offsets of D in S
> >>>>>   set the caseSensitive to pCase is true
> >>>>>   set the itemDel to D
> >>>>>   put length(D) into dLength
> >>>>>   put 1 - dLength into C
> >>>>>   repeat for each item i in S
> >>>>>      add length(i) + dLength to C
> >>>>>      put C,"" after R
> >>>>>   end repeat
> >>>>>   set the itemDel to comma
> >>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
> >>>>>   put length(C) + 1 into lenC
> >>>>>   put length(R) into lenR
> >>>>>   if lenC = lenR then return 0
> >>>>>   return char 1 to lenR - lenC - 1 of R
> >>>>> end offsetList
> >>>>>
> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> >>>>> use-livecode at lists.runrev.com> wrote:
> >>>>>
> >>>>>> Hi Geoff,
> >>>>>>
> >>>>>> thank you for this beautiful script.
> >>>>>>
> >>>>>> I modified it a bit to accept multi-character search string and also
> >> for
> >>>>>> case sensitivity.
> >>>>>>
> >>>>>> It definitely is a lot faster for unicode text than anything I have
> >>>> seen.
> >>>>>> -----------------------------
> >>>>>> function offsetList D,S, pCase
> >>>>>>   -- returns a comma-delimited list of the offsets of D in S
> >>>>>>   -- pCase is a boolean for caseSensitive
> >>>>>>   set the caseSensitive to pCase
> >>>>>>   set the itemDel to D
> >>>>>>   put the length of D into tDelimLength
> >>>>>>   repeat for each item i in S
> >>>>>>      add length(i) + tDelimLength to C
> >>>>>>      put C - (tDelimLength - 1),"" after R
> >>>>>>   end repeat
> >>>>>>   set the itemDel to comma
> >>>>>>   if char -1 of S is D then return char 1 to -2 of R
> >>>>>>   put length(C) + 1 into lenC
> >>>>>>   put length(R) into lenR
> >>>>>>   if lenC = lenR then return 0
> >>>>>>   return char 1 to lenR - lenC - 1 of R
> >>>>>> end offsetList
> >>>>>> ------------------------------
> >>>>>>
> >>>>>> Kind regards
> >>>>>> Bernd
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>>>>>> From: Geoff Canyon
> >>>>>>> To: How to use LiveCode <use-livecode at lists.runrev.com>
> >>>>>>> Subject: Re: How to find the offset of the last instance of a
> >>>>>>>      repeating       character in a string?
> >>>>>>>
> >>>>>>> I was curious if using the itemDelimiter might work for this, so I
> >>>> wrote
> >>>>>>> the below code out of curiosity; but in my quick testing with
> >>>> single-byte
> >>>>>>> characters it was only about 30% faster than the above methods, so
> I
> >>>>>> didn't
> >>>>>>> bother to post it.
> >>>>>>>
> >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
> >> pretty
> >>>>>> much
> >>>>>>> this same thing for text with unicode characters. So I ran a simple
> >>>> test
> >>>>>>> with 8000 character long strings that start with a single unicode
> >>>>>>> character, this is about 15x faster than offset() with skip. For
> >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
> >>>> immune
> >>>>>> to
> >>>>>>> the line-painter issues skip is subject to. So for what it's worth:
> >>>>>>>
> >>>>>>> function offsetList D,S
> >>>>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>>>> set the itemDel to D
> >>>>>>> repeat for each item i in S
> >>>>>>>     add length(i) + 1 to C
> >>>>>>>     put C,"" after R
> >>>>>>> end repeat
> >>>>>>> set the itemDel to comma
> >>>>>>> if char -1 of S is D then return char 1 to -2 of R
> >>>>>>> put length(C) + 1 into lenC
> >>>>>>> put length(R) into lenR
> >>>>>>> if lenC = lenR then return 0
> >>>>>>> return char 1 to lenR - lenC - 1 of R
> >>>>>>> end offsetList
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> use-livecode mailing list
> >>>>>> use-livecode at lists.runrev.com
> >>>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>>>> subscription preferences:
> >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>>>
> >>>>> _______________________________________________
> >>>>> use-livecode mailing list
> >>>>> use-livecode at lists.runrev.com
> >>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>>> _______________________________________________
> >>>> use-livecode mailing list
> >>>> use-livecode at lists.runrev.com
> >>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>> _______________________________________________
> >>> use-livecode mailing list
> >>> use-livecode at lists.runrev.com
> >>> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> >> _______________________________________________
> >> use-livecode mailing list
> >> use-livecode at lists.runrev.com
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



More information about the use-livecode mailing list