[script optimization challenge] Multi dimensional array filtering

Brian Yennie briany at qldlearning.com
Thu Aug 6 16:26:40 EDT 2009


Malte,

Here is a quick example of what binary search can do. It's a bit of  
work to maintain the sorted data, but the speed is impressive if you  
have very large data. If you are not familiar with binary search  
(forgive me if you are), it is very intuitive -- it searches like you  
might search an alphabetized phonebook. Look at the middle page, and  
determine if the name is after or before the page you are on. At each  
step you cut the search in half by looking at the "middle" page of the  
remaining section and deciding which side to look on next. Doing the  
simple math, you can see that you can search 2^n records in n steps.  
Suppose you have 1,000,000 records. A linear search will take  
1,000,000 steps. Binary will take 20!

There are two caveats, though:

1) To do a binary search, you must have your data sorted first
2) Even with binary search you will still need to read all of the  
matching rows after you have found one. So if all of your records are  
the same, you lose some benefit.

This sample creates 500,000 rows with 1000 different names.

Whether this is worth doing is going to depend very much on your  
application, but the times I get for this data set are roughly:

BINARY SEARCH:

Sort time (upfront cost): 2.5 seconds
Search time: <= 1 millisecond

ORIGINAL METHOD:

Sort time (upfront cost): 0
Search time: 300 milliseconds

global sortedData

on mouseUp

   ## prep the test data
   local testarray,tprocess,test
   repeat with i=1 to 500000
      put "bob"&random(1000) into testarray[i]["name"]
   end repeat
   answer the number of lines of the keys of testarray

   ## create a sorted version of the data
   put the millisecs into test
   put the keys of testarray into tprocess
   repeat for each line theLine in tprocess
      put testarray[theLine]["name"]&cr after temp
   end repeat
   delete char -1 of temp
   sort lines of temp
   put 1 into i
   put empty into sortedData
   repeat for each line l in temp
      put l into sortedData[i]
      add 1 to i
   end repeat
   put i-1 into sortedData[0]
   answer ("data sorted")&cr&(the millisecs-test)

   ## lookup bob100
   put the millisecs into test
   get count("bob100")
   answer ("found="&it&&(the millisecs-test))
end mouseUp

## calls binary search algorithm and then scans before and after found  
record
function count pName
    put nameOffset(pName) into startLine

    if (startLine < 0) then return 0

    repeat with i=startLine down to 1
       if (sortedData[i] = pName) then
          add 1 to tCount
       else
          exit repeat
       end if
    end repeat

    repeat with i=startLine+1 to sortedData[0]
       if (sortedData[i] = pName) then
          add 1 to tCount
       else
          exit repeat
       end if
    end repeat

    return tCount
end count

## binary search
function nameOffset pName,startKey, endKey
    if (startKey is empty) then put 1 into startKey
    if (endKey is empty) then put sortedData[0] into endKey

    if (startKey > endKey) then return -1

    put ((startKey + endKey) div 2) into tIndex
    put sortedData[tIndex] into tName

    if (pName > tName) then put tIndex + 1 into startKey
    else if (pName < tName) then put tIndex - 1 into endKey
    else if (pName = tName) then return tIndex

    return nameOffset(pName, startKey, endKey)
end nameOffset

> Thank you all.
>
> I ended up creating a new array anduse repeat for each key.
>
> Brian, I would be very much interested in the binary method on  
> sorted data.
>
> Cheers,
>
> Malte






More information about the use-livecode mailing list