Translate metadata to field content

Niggemann, Bernd Bernd.Niggemann at uni-wh.de
Thu Feb 20 14:21:18 EST 2020


In reply to Mark Waddingham's comments


Thank you Mark Waddingham for the improved scripts and the hints as to why they improve speed.

I adapted Mark's version for unique occurrence, changed how the position of the target word is determined in the target line.
It is not safe to assume that the sum of words of the runs is the number of words of the line up to the target word. The reason is that runs are depending on formatting and formatting can create a new run in the middle of a word and thus increase word count.
I did not opt for Mark's use of codeunits because I had the impression it was not faster and makes the code less obvious.

--------------------------------------
local tTextOfRuns
repeat for each key i in tDataA
   local tRunsA
   put tDataA[i]["runs"] into tRunsA
   repeat for each key j in tRunsA
      if tRunsA[j]["metadata"] is tSearchText then
         repeat with m = 1 to j
            put tRunsA[m]["text"] after tTextOfRuns
         end repeat
         put the number of words of tTextOfRuns into tNumWords
         put true into tFlagExit
         exit repeat
      end if
   end repeat
   if tFlagExit then
      exit repeat
   end if
end repeat
--------------------------------------
select word tNumWords of line i of field "x"

text consists of 96881 words and 23161 lines of heavily formatted text
(it is the script of RevDataGridLibraryBehaviorsDataGridButtonBehavior copied twice into a field as described before)

word# old new version, times in ms

96881 240 110
80000 220 100
60000 180  60
30000 120 125
10000  85 125
 1000  50  90
    1  50  60

Timing this is a bit tricky. For "repeat with I = 1 to item 2 of the extents" it is obvious that time increases with increasing the target word number.

For "repeat for each key I in tDataA" it is not sequential but faster. However that also makes for variations in speed depending on the internal state of the array structure.

All timings are estimated averages of 5 to 10 measurements . Variability is typically about +-5 to 10 milliseconds with outliers.

However the overall speed gain is quite impressive and well worth the change.
I learned a lot about handling larger datasets using arrays, than you.

Kind regards
Bernd





More information about the use-livecode mailing list