How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)
Brian Milby
brian at milby7.com
Sat Nov 3 10:17:43 EDT 2018
Good catch Alex. My code was closer, but didn't handle repeating
characters correctly. Here is an updated version.
function allOffsets2 D,S,pCase
local dLength, C, R
-- returns a comma-delimited list of the offsets of D in S
set the caseSensitive to pCase is true
set the itemDel to D
put length(D) into dLength
put 1 - dLength into C
if dLength > 1 then
local n, i, j, D2, L2
put 0 into n
repeat with i = 2 to dLength
if char i to -1 of D is char 1 to -i of D then
add 1 to n
put char (1-i) to -1 of D into D2[n]
put i-1 into L2[n]
end if
end repeat
end if
repeat for each item i in S
if C > 0 and n > 0 then
repeat with j = 1 to n
if i&D begins with D2[j] then
put C+L2[j],"" after R
end if
end repeat
end if
add length(i) + dLength to C
put C,"" after R
end repeat
set the itemDel to comma
delete char -1 of R
if item -1 of R > len(S) then
if the number of items of R is 1 then
return 0
else
delete item -1 of R
end if
end if
if len(i) > 0 then
repeat with j = n down to len(i)+1
if char -len(D2[j]) to -1 of S is D2[j] then
delete item -1 of R
end if
end repeat
end if
return R
end allOffsets2
On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
use-livecode at lists.runrev.com> wrote:
> Hi Geoff,
>
> unfortunately the impact of overlapping delimiter strings is more severe
> than simply not finding them. The code on github gets the wrong answer
> if there is an overlapping string at the very end of the search string,
> e.g.
>
> alloffsets("aaaa", "aaaaaaaaa") wrongly gives 1,5,10
>
> I suspect the test for
>
> if char -dLength to -1 of S is D then return char 1 to -2 of R
> should be (something like)
> if item -1 of S is empty then return char 1 to -2 of R
> but to be honest, I'm not 10% certain of that.
>
> Alex.
>
>
>
> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> > I like that, changing it. Now available at
> > https://github.com/gcanyon/alloffsets
> >
> > One thing I don't see how to do without significantly impacting
> performance
> > is to return all offsets if there are overlapping strings. For example:
> >
> > allOffsets("aba","abababa")
> >
> > would return 1,5, when it might be reasonable to expect it to return
> 1,3,5.
> > Using the offset function with numToSkip would make that easy; adapting
> > allOffsets to do so would be harder to do cleanly I think.
> >
> > gc
> >
> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> > use-livecode at lists.runrev.com> wrote:
> >
> >> how about allOffsets?
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> >> use-livecode at lists.runrev.com> wrote:
> >>> All of those return a single value; I wanted to convey the concept of
> >>> returning multiple values. To me listOffset implies it does the same
> >> thing
> >>> as itemOffset, since items come in a list. How about:
> >>>
> >>> offsets -- not my favorite because it's almost indistinguishable from
> >> offset
> >>> offsetsOf -- seems a tad clumsy
> >>>
> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> >>> use-livecode at lists.runrev.com> wrote:
> >>>
> >>>> It probably should be named listOffset, like itemOffset or lineOffset.
> >>>>
> >>>> Bob S
> >>>>
> >>>>
> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> >>>> use-livecode at lists.runrev.com> wrote:
> >>>>> Nice! I *just* finished creating a github repository for it, and
> adding
> >>>>> support for multi-char search strings, much as you did. I was coming
> to
> >>>> the
> >>>>> list to post the update when I saw your post.
> >>>>>
> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >>>>>
> >>>>> Here's my updated version:
> >>>>>
> >>>>> function offsetList D,S,pCase
> >>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>> set the caseSensitive to pCase is true
> >>>>> set the itemDel to D
> >>>>> put length(D) into dLength
> >>>>> put 1 - dLength into C
> >>>>> repeat for each item i in S
> >>>>> add length(i) + dLength to C
> >>>>> put C,"" after R
> >>>>> end repeat
> >>>>> set the itemDel to comma
> >>>>> if char -dLength to -1 of S is D then return char 1 to -2 of R
> >>>>> put length(C) + 1 into lenC
> >>>>> put length(R) into lenR
> >>>>> if lenC = lenR then return 0
> >>>>> return char 1 to lenR - lenC - 1 of R
> >>>>> end offsetList
> >>>>>
> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> >>>>> use-livecode at lists.runrev.com> wrote:
> >>>>>
> >>>>>> Hi Geoff,
> >>>>>>
> >>>>>> thank you for this beautiful script.
> >>>>>>
> >>>>>> I modified it a bit to accept multi-character search string and also
> >> for
> >>>>>> case sensitivity.
> >>>>>>
> >>>>>> It definitely is a lot faster for unicode text than anything I have
> >>>> seen.
> >>>>>> -----------------------------
> >>>>>> function offsetList D,S, pCase
> >>>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>>> -- pCase is a boolean for caseSensitive
> >>>>>> set the caseSensitive to pCase
> >>>>>> set the itemDel to D
> >>>>>> put the length of D into tDelimLength
> >>>>>> repeat for each item i in S
> >>>>>> add length(i) + tDelimLength to C
> >>>>>> put C - (tDelimLength - 1),"" after R
> >>>>>> end repeat
> >>>>>> set the itemDel to comma
> >>>>>> if char -1 of S is D then return char 1 to -2 of R
> >>>>>> put length(C) + 1 into lenC
> >>>>>> put length(R) into lenR
> >>>>>> if lenC = lenR then return 0
> >>>>>> return char 1 to lenR - lenC - 1 of R
> >>>>>> end offsetList
> >>>>>> ------------------------------
> >>>>>>
> >>>>>> Kind regards
> >>>>>> Bernd
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>>>>>> From: Geoff Canyon
> >>>>>>> To: How to use LiveCode <use-livecode at lists.runrev.com>
> >>>>>>> Subject: Re: How to find the offset of the last instance of a
> >>>>>>> repeating character in a string?
> >>>>>>>
> >>>>>>> I was curious if using the itemDelimiter might work for this, so I
> >>>> wrote
> >>>>>>> the below code out of curiosity; but in my quick testing with
> >>>> single-byte
> >>>>>>> characters it was only about 30% faster than the above methods, so
> I
> >>>>>> didn't
> >>>>>>> bother to post it.
> >>>>>>>
> >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
> >> pretty
> >>>>>> much
> >>>>>>> this same thing for text with unicode characters. So I ran a simple
> >>>> test
> >>>>>>> with 8000 character long strings that start with a single unicode
> >>>>>>> character, this is about 15x faster than offset() with skip. For
> >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
> >>>> immune
> >>>>>> to
> >>>>>>> the line-painter issues skip is subject to. So for what it's worth:
> >>>>>>>
> >>>>>>> function offsetList D,S
> >>>>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>>>> set the itemDel to D
> >>>>>>> repeat for each item i in S
> >>>>>>> add length(i) + 1 to C
> >>>>>>> put C,"" after R
> >>>>>>> end repeat
> >>>>>>> set the itemDel to comma
> >>>>>>> if char -1 of S is D then return char 1 to -2 of R
> >>>>>>> put length(C) + 1 into lenC
> >>>>>>> put length(R) into lenR
> >>>>>>> if lenC = lenR then return 0
> >>>>>>> return char 1 to lenR - lenC - 1 of R
> >>>>>>> end offsetList
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> use-livecode mailing list
> >>>>>> use-livecode at lists.runrev.com
> >>>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>>>> subscription preferences:
> >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>>>
> >>>>> _______________________________________________
> >>>>> use-livecode mailing list
> >>>>> use-livecode at lists.runrev.com
> >>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>>> _______________________________________________
> >>>> use-livecode mailing list
> >>>> use-livecode at lists.runrev.com
> >>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>> _______________________________________________
> >>> use-livecode mailing list
> >>> use-livecode at lists.runrev.com
> >>> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> >> _______________________________________________
> >> use-livecode mailing list
> >> use-livecode at lists.runrev.com
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list