Line numbers for soft-wrapped styled text?

Alex Tweedly alex at tweedly.net
Sun Apr 2 09:49:10 EDT 2017


Sure, here it is.

I couldn't resist doing some simple benchmarks, just to verify my 
intuition. Very glad I did.

OK - here's the theory :

  we're using variations of binary search to find the lowest numbered 
line that is visible, and then again to find the highest numbered line.

So at each stage we have an interval within which the right answer must 
lie.

1. Simple binary search : the next guess is the middle of the current 
interval.

This is simple, and very consistent for the number of guesses needed: 
the worst case is almost identical to the average/typical.

For my test case (25000 lines of heavily styled text), we need approx 14 
guesses for the first calculaiton, and 8 for the second.

2. Simple Newton-Raphson (linear interpolation).

This is usually better than simple binary, *if* there is a reasonable 
correlation between the measurable values and the guessable values. In 
this case there - the height of any chunk of lines is reasonably 
correlated to the number of lines, though not exactly in all cases.

Usually this will take significantly fewer guesses (and in this case it 
takes around 8 + 4 compared to the 14+8 above).

BUT - the worst case can be much, much worse :-(   Imagine a field with 
1000 lines - the first 500 are in 5-point text, and the other 500 are in 
1000 point text, scrolled forward by 400 lines; this makes our guessing 
VERY poor, and will take up to 200 or more guesses.

3. Use a mix of linear interpolation and binary search. I tried simply 
alternating them - one calculated guess, then one 'average' guess, then 
another calculated one, ....

This should significantly limit the worst case - but makes the average 
somewhat worse.

4. Use linear interpolation - but disallow very small (or very large) 
estimates.

This again limits the worst case (not quite so well), and makes the 
average only slightly worse.


I tried each of these strategies on a range of scroll positions in my 
test case field. Results were

(1) Binary   6283

(2) N-R      3757

(3) N-R/binary 4200

(4) N-R limited 3782


So - the script below does (4) above - linear interpolation, with the 
very small or very large percentage guesses disallowed. But, I've left 
in the code (commented out) to also do alternating binary guesses, so it 
can be easily used if your use case is very sensitive to extreme examples.

Note - this only uses the accurate height of a chunk function once it 
has almost determined its answer - up until then, it only needs 
approximate results to make the next guess, so no need for the extra 
work of doing the pixel-accurate calculation.

function visibleTextLines pT
    local vs, sw, m, t, b, L1, L2
    local t1, t2, t3  -- timing
    lock screen; lock messages
    put the vscroll of fld pT into vs
    put the scrollbarWidth of fld pT into sw
    put the margins of fld pT into m
    put (m,m,m,m) into m -- now we have always at least 4 items
    -- value of pixel within the field where top line will be
    put   (item 2 of m) + vs into t
    -- value of pixel within the field where bottom line will be
    put  - (item 4 of m)  + vs + the effective height of fld pT into b
    if the hscrollbar of fld pT then subtract sw from b

    put the millisecs into t1   -- timing
    put findTopLine(pT, t-5) into L1
    put the millisecs into t2   -- timing
    put findBottomLine(pT, b+5, L1) into L2
    put the millisecs into t3   -- timing
    --    put "times" && t2-t1 && t3-t2 &CR after fld "Flog" -- timing
    return (L1,L2)
end visibleTextLines


function findTopLine pFName, pPixel
    -- for fld 'pFName', find the lowest numbered line which will be at 
or beyond pPixel

    local tBelow, tAbove, tGuess, tTotal
    local tTarget

    put 1 into tBelow
    put 0 into tHBelow

    put the number of lines in fld pFName into tAbove
    put the formattedheight of fld pFName into tHAbove

    put pPixel into tTarget

    -- we need to find the first line whose formattedheight is >= tTarget
    local tD, tIteration
    repeat with tIteration = 1 to 1000
       if tAbove = tBelow+1 then exit repeat
       --       if tIteration mod 2 = 0 then  -- alternate between 
lienar interpolation and simple binary halving
       --          put 0.5 into tD
       --       else
       put (tTarget-tHBelow) / (tHAbove-tHBelow) into tD
       --       end if

       -- limit very small or large [percentages
       if td < 0.05 then put 0.05 into tD
       if td > 0.95 then put 0.95 into tD

       put tBelow + trunc( (tAbove-tBelow) * tD) into tGuess
       if tGuess = tBelow then add 1 to tGuess
       --       put tGuess && tBelow && tAbove &CR after fld "Flog"

       put the formattedheight of line 1 to tGuess of fld pFName into tt 
-- NB may be inaccurate !!
       if tt > tTarget then
          put tGuess into tAbove
          put tt into tHAbove
       else
          put tGuess into tBelow
          put tt into tHBelow
       end if
    end repeat
    -- now deal with the possible inaccuracy
    repeat 2 times -- should be no more than 1 !?
       if tBelow = 1 then exit repeat
       if getaccurateheight(1, tBelow-1, pFName) >= tTarget then
          --          put "NEEDED TO SUBTRACT" && tBelow & CR after fld 
"Flog"
          subtract 1 from tBelow
       else
          exit repeat
       end if
    end repeat
    return tBelow
end findTopLine

function findBottomLine pFName, pPixel, pBelow
    -- for fld pFName, find highest numbered line before pPixel
    -- pBelow is the known lower bound
    local tBelow, tAbove, tGuess, tTotal
    local tTarget

    put pBelow-1 into tBelow
    put the formattedheight of line 1 to tBelow of fld pFName into tHBelow

    -- limit to what could possibly fit into the actual field (min text 
size is 5)
    put min(the number of lines in fld pFName, pBelow + the effective 
height of fld pFName div 5)  into tAbove
    put the formattedheight of line 1 to tAbove of fld pFName into tHAbove

    put pPixel into tTarget
    -- we need to find the last line whose formattedheight is < tTarget

    local tD, tIteration
    repeat with tIteration = 1 to 1000
       if tAbove = tBelow+1 then exit repeat

       --       if tIteration mod 2 = 0 then  -- alternate between 
lienar interpolation and simple binary halving
       --          put 0.5 into tD
       --       else
       put (tTarget-tHBelow) / (tHAbove-tHBelow) into tD
       --       end if

       if td < 0.1 then put 0.1 into tD
       if td > 0.9 then put 0.9 into tD
       put tBelow + trunc( (tAbove-tBelow) * tD) into tGuess
       if tGuess = tBelow then add 1 to tGuess

       put the formattedHeight of line 1 to tGuess of fld pFName into tt

       if tt > tTarget then
          put tGuess into tAbove
          put tt into tHAbove
       else
          put tGuess into tBelow
          put tt into tHBelow
       end if

    end repeat
    return tBelow
end findBottomLine

function getAccurateHeight N, M, pFld
    -- return the accurate height of the chunk from N .. M  of field pFld
    local t1, t2
    put the millisecs into t1
    put the formattedHeight of line N to M of fld pFld into h
    --    put "acc" && N && M && the millisecs-t1 &CR after fld "Flog"
    if line M of fld pFld is empty then
       --       put "getaccurate" && the millisecs &CR after fld "Flog"
       add the formattedheight of line M of fld pFLD to h
       --       put the styledText of fld pFld into tA
       --       put the vscroll of fld pFld into tempScroll
       --       put tA[N+1] into tempA    -- save a copy of the 
following line, complete with styling
       --       put tA[N] into tA[N+1]
       --       set the styledText of fld pFld to tA
       --       put the formattedHeight of line 1 to N of fld pFld into h
       --       put tempA into tA[N+1]
       --       set the styledText of fld pFld to tA
       --       set the vscroll of fld pFld to tempScroll
    end if
    add the spacebelow of line M of fld pFld to h
    put the millisecs into t2
    --    put "accurate" && N && M && t2-t1 &CR after fld "Flog"
    return h
end getAccurateHeight

-- Alex.







On 01/04/2017 17:42, hh via use-livecode wrote:
>> Alex T. wrote:
>> I set out to optimise the 'visibleLineNumber' function, and succeeded
>> in getting it down to approx 20% of the original version, by:
>> - reduce from 2 to 1 height calculations per iteration
>> - convert from recursive to iterative
>> - use Newton-Raphson style linear interpolation, with tweaks to avoid
>> the pathological end cases
>> - optimise the start values for finding the last visible line based on
>> the effective field height and min allowed text size
> Alex,
> please think about posting this. It is certainly valuable for other use
> cases.
>
> Hermann
> (This post is NOT an April 1 - joke)
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list