to find the first numeric line?
Richard Gaskin
ambassador at fourthworld.com
Mon Feb 4 12:39:07 EST 2013
Bravo.
Most xTalk binary search algos depend on line-counting, which still
requires considerable under-the-hood effort as the engine needs to
traverse chunks to count CRs, over and over again.
Your byte-based algo us much better - thanks for posting that.
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/FourthWorldSys
Alex Tweedly wrote:
> So I went back, did the thinking, eliminated kBrute (and thankfully the
> need for an unnecessary assumption/requirement to limit line length). I
> also changed it so that when we are about to test, we always have a
> complete line identified - and so it is much easier to adapt the code
> for other purposes.
>
> And - I changed the problem definition from
> "Return the line number of the first line with a leading numeric"
> to
> "return the character number of the first character of the first
> line with a leading numeric"
>
> This is much more efficient to use outside the function, as well as more
> efficient inside the function. The caller can now do something like
>
> put getCharacterPosition(myData) into tPos
> repeat for each line L in (char tPos to -1 of myData)
> ...
>
> instead of
> put getLinePosition(myData) into tLPos
> repeat for each line L in (line tLPos to -1 of myData)
> ...
>
> or
> put line 1 of (char tPos to -1 of myData) into temp
> rather than
> put line tLPos of Mydata into temp
>
> Both of these are much more efficient (will save 10s or 100s of
> millisecs in large data sets).
>
> With these changes, the binary search can now do 10M lines of input in
> typically <1 msec
> Even the worst case I can think of (all 300 Million characters are in a
> single line, no CRs anywhere), it gets a result in 1800 msec.
>
> So - here's the code for anyone who needs an efficient binary search of
> large string data.
> -- Alex.
>
>> function f3 @pD
>> local theResult
>> local tLower, tUpper, tMid, c, t
>>
>> -- NB we are always going to maintain theResult as a feasible answer
>> -- if there is no 'leading numeric' line, we answer with a number
>> beyond the input data
>> put (the number of chars in pD) + 1 into theResult
>>
>> put 1 into tLower
>> put theResult into tUpper
>> put trunc( (tUpper + tlower ) / 2) into tMid
>>
>> repeat 10000 times
>> -- check for a CR in the upper half
>> put the offset(CR, char tMid to tUpper of pD) into t
>> if t = 0 or tMid + t = tUpper then
>> -- there is no CR in upper half
>> -- move tUpper down to near tMid.
>> -- Can't just move it down to exactly tMid because of the
>> corner case
>> -- where tMid to tUpper exactly spans the last line
>> put tMid+1 into tUpper
>> else
>> -- there is a CR in this upper half, move tMid to just beyond it
>> add t to tMid
>> -- now tMid points to the start of a line
>> -- do the test here - easy to adapt
>> put char tMid of pD into c
>> if c is a number then
>> put tMid into tUpper
>> put tMid into theResult
>> else --
>> put tMid into tLower
>> end if
>> end if
>>
>> put trunc( (tUpper + tLower ) / 2) into tMid
>> if tMid = tLower then
>> -- we're done, just decide if this single remaining line is a
>> candidate for the answer
>> if char tMid of pD is a number then -- do the test again !!
>> put tMid into theResult
>> end if
>> exit repeat
>> end if
>>
>> if tMid = tLower+1 then
>> -- tLower must point to the first char of an empty lie, so we
>> can increment over it
>> put tMid into tLower
>> end if
>> end repeat
>>
>> return theResult
>>
>> end f3
>
>
More information about the use-livecode
mailing list