to find the first numeric line?

Richard Gaskin ambassador at fourthworld.com
Mon Feb 4 12:39:07 EST 2013


Bravo.

Most xTalk binary search algos depend on line-counting, which still 
requires considerable under-the-hood effort as the engine needs to 
traverse chunks to count CRs, over and over again.

Your byte-based algo us much better - thanks for posting that.

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  Follow me on Twitter:  http://twitter.com/FourthWorldSys


Alex Tweedly wrote:
> So I went back, did the thinking, eliminated kBrute (and thankfully the
> need for an unnecessary assumption/requirement to limit line length).  I
> also changed it so that when we are about to test, we always have a
> complete line identified - and so it is much easier to adapt the code
> for other purposes.
>
> And - I changed the problem definition from
>      "Return the line number of the first line with a leading numeric"
> to
>      "return the character number of the first character of the first
> line with a leading numeric"
>
> This is much more efficient to use outside the function, as well as more
> efficient inside the function. The caller can now do something like
>
>      put getCharacterPosition(myData) into tPos
>      repeat for each line L in (char tPos to -1 of myData)
>         ...
>
> instead of
>     put getLinePosition(myData) into tLPos
>     repeat for each line L in (line tLPos to -1 of myData)
>         ...
>
> or
>     put line 1 of (char tPos to -1 of myData) into temp
> rather than
>    put line tLPos of Mydata into temp
>
> Both of these are much more efficient (will save 10s or 100s of
> millisecs in large data sets).
>
> With these changes, the binary search can now do 10M lines of input in
> typically <1 msec
> Even the worst case I can think of (all 300 Million characters are in a
> single line, no CRs anywhere), it gets a result in 1800 msec.
>
> So - here's the code for anyone who needs an efficient binary search of
> large string data.
> -- Alex.
>
>> function f3 @pD
>>    local theResult
>>    local tLower, tUpper, tMid, c, t
>>
>>    -- NB we are always going to maintain theResult as a feasible answer
>>    -- if there is no 'leading numeric' line, we answer with a number
>> beyond the input data
>>    put (the number of chars in pD) + 1 into theResult
>>
>>    put 1 into tLower
>>    put theResult into tUpper
>>    put trunc( (tUpper + tlower ) / 2) into tMid
>>
>>    repeat 10000 times
>>       -- check for a CR in the upper half
>>       put the offset(CR, char tMid to tUpper of pD) into t
>>       if t = 0 or tMid + t = tUpper then
>>          -- there is no CR in upper half
>>          -- move tUpper down to near tMid.
>>          -- Can't just move it down to exactly tMid because of the
>> corner case
>>          -- where tMid to tUpper exactly spans the last line
>>          put tMid+1 into tUpper
>>       else
>>          -- there is a CR in this upper half, move tMid to just beyond it
>>          add t to tMid
>>          -- now tMid points to the start of a line
>>          -- do the test here - easy to adapt
>>          put char tMid of pD into c
>>          if c is a number then
>>             put tMid into tUpper
>>             put tMid into theResult
>>          else   --
>>             put tMid into tLower
>>          end if
>>       end if
>>
>>       put trunc( (tUpper + tLower ) / 2) into tMid
>>       if tMid = tLower then
>>          -- we're done, just decide if this single remaining line is a
>> candidate for the answer
>>          if char tMid of pD is a number then  -- do the test again !!
>>             put tMid into theResult
>>          end if
>>          exit repeat
>>       end if
>>
>>       if tMid = tLower+1 then
>>          -- tLower must point to the first char of an empty lie, so we
>> can increment over it
>>          put tMid into tLower
>>       end if
>>    end repeat
>>
>>    return theResult
>>
>> end f3
>
>




More information about the use-livecode mailing list