to find the first numeric line?

Alex Tweedly alex at tweedly.net
Sat Feb 2 04:08:06 EST 2013


There may well be a bug in my code - I wrote it between midnight and 1am 
last night, tested it on 50 random cases, and let it be free :-)    I'll 
look over it carefully this morning and see if I find anything.

But no, there isn't a problem with many short lines. We have a block of 
data where the first instance of a leading digit (if there is one) 
occurs; that block is (inclusively) between tLower and tUpper. We don't 
care how many lines there are in that block -only that the first leading 
digit is within it. There's no need to decrease kBrute to deal with 
short lines, because we still do a full search (in this case using 
filter) when we get down to a block of size kBrute.

It would actually be more efficient to decrease kBrute, because then 
we'd do the efficient binary reduction further, until we switch to the 
slower - but easier - brute force method. The purpose of kBrute is to 
avoid a couple of corner cases that I didn't want to think through :-)  
When we calculate a new tMid, we then need to find the start of a line 
near to it, so that we can decide whether to adjust the lower or upper 
limit. Easiest way to do that is to assume we are somewhere in the 
middle of a line, and use the following line (hence the use of line *2* 
of the remaining chunk). However, this means that we need to be sure the 
new 1/2-block contains a CR - hence the need for kBrute.

[Aside - I said "assume we are in the middle of a line" - even if we are 
actually at the start or end of it, it sill works OK to use the 
following line, so it's not a critical assumption]

So - I'll go check the code again (and probably just handle this corner 
case to eliminate kBrute - that should make it marginally faster as well).

-- Alex.

On 02/02/2013 04:13, J. Landman Gay wrote:
> I'm glad you did this, it's interesting. I'm not sure about the binary 
> results. Doesn't the repeat loop exit early if the search block 
> contains a few short lines? I think you could get a block of text that 
> might contain the first instance of a leading digit, but they wouldn't 
> be checked because their total length would be less than kBrute.
>
> Anyway, for lists with shorter lines the script would need to lower 
> the kBrute variable to a smaller number, and then the timing might be 
> different. 10,000 seems like a pretty large exit condition.
>
> I could easily not be understanding what's going on though. These 
> things warp my brain.
>
>
> On 2/1/13 6:41 PM, Alex Tweedly wrote:
>>
>> I just can't resist a good benchmark / coding challenge.
>>
> ...
>>>
>>> function f3 @pD
>>>    local r, tLower, tUpper, tMid, c, t, temp
>>>    constant kBrute = 10000
>>>
>>>    put 1 into tLower
>>>    put (the number of chars in pD) + 1 into tUpper
>>>
>>>    repeat 10000 times -- more than enough :-)
>>>       if tUpper - tLower < kBrute then exit repeat
>>>       put (tUpper + tlower ) / 2 into tMid
>>>       put char 1 of line 2 of (char tMid to tUpper of pD) into c
>>>       if c is a number then
>>>          put tMid into tUpper
>>>       else
>>>          put tMid into tLower
>>>       end if
>>>    end repeat
>>>
>>>    put char tLower to tUpper of pD into temp
>>>    filter temp without "[0-9]*"
>>>    if temp is empty then
>>>       -- no lines start with numeric
>>>       return "No lines"
>>>    else
>>>       put the number of lines in (char  1 to tLower of pD) into t
>>>       add the number of lines in temp to t
>>>       return t
>>>    end if
>>> end f3
>





More information about the use-livecode mailing list