Line numbers for soft-wrapped styled text?

hh hh at hyperhh.de
Sun Apr 2 20:24:17 EDT 2017


Alex,

a brilliant technique. It is certainly usable for other "searching"
tasks. And it simply needs for this use case a faster measurement-method
as input. We could try the selectedLoc:

The script below gets for me the exakt topLefts of the visible ones of
20000 "worse case"-lines in < 1 second, with LC 9, and in LC 6 it is
three times faster, on a medium fast machine (2.5 GHz macMini).
I'm sure you can take the following quick and dirty routine again down
to at least 50% of needed time with your optimization technique.

Hermann

## Commented for occasional readers.
## You still has to use the locations for a numbers display: Use either
## one num-field per line or one for all using Alex's method of space below.

local rg=TEXT", l0, t0, b0, v0
local gpoints -- collects the toplefts of the lines

on updateNbs2
  put the millisecs into m1
  lock screen; lock messages
  put the top of fld rg into t0
  put the left of fld rg into l0
  put the bottom of fld rg into b0
  put the height of fld rg into h0
  put the selectedChunk into sc
  put the vscroll of fld rg into v0
  put the num of lines of fld rg into nL
  put visibleTextLines(nL) into tL
  put "Lines: " & tL & " of " & nL into fld "range"
  put gPoints into fld "info" -- ready to use
  -- avoid bug(?) with the field num:
  if sc is not empty then
    do ("select "& (word 1 to 4 of sc) &" of fld "&quote& rg &quote)
  end if
  set the vscroll of fld rg to v0
  put (the millisecs-m1) & " ms" into fld "timing"
  unlock screen; unlock messages
end updateNbs2

-- returns the visible lines range and (in gPoints) the toplefts
-- n is the num of lines
function visibleTextLines n
  put the scrollbarWidth of fld rg into sw
  put the margins of fld rg into m
  put (m,m,m,m) into m -- now we have always at least 4 items
  put findTopLine(v0+t0-1+item 2 of m,1,n) into L1
  put L1-1 into L2; put v0+b0+6-item 4 of m into x
  put empty into gPoints
  -- now a simple line-after-line check could/should be made faster,
  -- its advantage: We get at the same time already the locations!
  repeat while L2 <= n
    add 1 to L2
    set vscroll of fld rg to v0 -- important for measuring
    select before line L2 of fld rg
    put item 2 of the selectedLoc into slc
    put cr & (l0,slc) after gPoints -- adjust here the left
    if (the vscroll of fld rg) + slc > x then
      put line 2 to -2 of gPoints into gPoints
      exit repeat
    end if
  end repeat
  return L1,L2-1
end visibleTextLines

-- the simplest recursive method as base for optimization
function findTopLine x,n1,n -- x=top pixel value, n1=start, n=max
  put n1+((n-n1) div 2) into m
  set vscroll of fld rg to v0
  select before line m+1 of fld rg
  put the vscroll of fld rg + item 2 of the selectedLoc into vsl
  if vsl >= x then
    set vscroll of fld rg to v0
    select before line m of fld rg
    put the vscroll of fld rg + item 2 of the selectedLoc into vsl
    if vsl < x then return word 2 of the selectedLine
    else
      if m <= n1 then return n1
      else return findTopLine(x,n1,m-1)
    end if
  else
    if m >= n then return n
    else return findTopLine(x,m+1,n)
  end if
end findTopLine

## The field's script is again:
on textchanged
  updateNbs2
end textchanged

on scrollbardrag
  updateNbs2
end scrollbardrag
### END_OF_SCRIPT

> Alex T. wrote:
> Sure, here it is.
> 
> I couldn't resist doing some simple benchmarks, just to verify my 
> intuition. Very glad I did.
> 
> OK - here's the theory :
> 
>   we're using variations of binary search to find the lowest numbered 
> line that is visible, and then again to find the highest numbered line.
> 
> So at each stage we have an interval within which the right answer must 
> lie.
> 
> 1. Simple binary search : the next guess is the middle of the current 
> interval.
> 
> This is simple, and very consistent for the number of guesses needed: 
> the worst case is almost identical to the average/typical.
> 
> For my test case (25000 lines of heavily styled text), we need approx 14 
> guesses for the first calculaiton, and 8 for the second.
> 
> 2. Simple Newton-Raphson (linear interpolation).
> 
> This is usually better than simple binary, *if* there is a reasonable 
> correlation between the measurable values and the guessable values. In 
> this case there - the height of any chunk of lines is reasonably 
> correlated to the number of lines, though not exactly in all cases.
> 
> Usually this will take significantly fewer guesses (and in this case it 
> takes around 8 + 4 compared to the 14+8 above).
> 
> BUT - the worst case can be much, much worse :-(   Imagine a field with 
> 1000 lines - the first 500 are in 5-point text, and the other 500 are in 
> 1000 point text, scrolled forward by 400 lines; this makes our guessing 
> VERY poor, and will take up to 200 or more guesses.
> 
> 3. Use a mix of linear interpolation and binary search. I tried simply 
> alternating them - one calculated guess, then one 'average' guess, then 
> another calculated one, ....
> 
> This should significantly limit the worst case - but makes the average 
> somewhat worse.
> 
> 4. Use linear interpolation - but disallow very small (or very large) 
> estimates.
> 
> This again limits the worst case (not quite so well), and makes the 
> average only slightly worse.
> 
> 
> I tried each of these strategies on a range of scroll positions in my 
> test case field. Results were
> 
> (1) Binary   6283
> 
> (2) N-R      3757
> 
> (3) N-R/binary 4200
> 
> (4) N-R limited 3782
> 
> 
> So - the script below does (4) above - linear interpolation, with the 
> very small or very large percentage guesses disallowed. But, I've left 
> in the code (commented out) to also do alternating binary guesses, so it 
> can be easily used if your use case is very sensitive to extreme examples.




More information about the use-livecode mailing list