# [script optimization challenge] Multi dimensional array filtering

Brian Yennie briany at qldlearning.com
Thu Aug 6 16:26:40 EDT 2009

```Malte,

Here is a quick example of what binary search can do. It's a bit of
work to maintain the sorted data, but the speed is impressive if you
have very large data. If you are not familiar with binary search
(forgive me if you are), it is very intuitive -- it searches like you
might search an alphabetized phonebook. Look at the middle page, and
determine if the name is after or before the page you are on. At each
step you cut the search in half by looking at the "middle" page of the
remaining section and deciding which side to look on next. Doing the
simple math, you can see that you can search 2^n records in n steps.
Suppose you have 1,000,000 records. A linear search will take
1,000,000 steps. Binary will take 20!

There are two caveats, though:

1) To do a binary search, you must have your data sorted first
2) Even with binary search you will still need to read all of the
matching rows after you have found one. So if all of your records are
the same, you lose some benefit.

This sample creates 500,000 rows with 1000 different names.

Whether this is worth doing is going to depend very much on your
application, but the times I get for this data set are roughly:

BINARY SEARCH:

Sort time (upfront cost): 2.5 seconds
Search time: <= 1 millisecond

ORIGINAL METHOD:

Sort time (upfront cost): 0
Search time: 300 milliseconds

global sortedData

on mouseUp

## prep the test data
local testarray,tprocess,test
repeat with i=1 to 500000
put "bob"&random(1000) into testarray[i]["name"]
end repeat
answer the number of lines of the keys of testarray

## create a sorted version of the data
put the millisecs into test
put the keys of testarray into tprocess
repeat for each line theLine in tprocess
put testarray[theLine]["name"]&cr after temp
end repeat
delete char -1 of temp
sort lines of temp
put 1 into i
put empty into sortedData
repeat for each line l in temp
put l into sortedData[i]
end repeat
put i-1 into sortedData[0]

## lookup bob100
put the millisecs into test
get count("bob100")
end mouseUp

## calls binary search algorithm and then scans before and after found
record
function count pName
put nameOffset(pName) into startLine

if (startLine < 0) then return 0

repeat with i=startLine down to 1
if (sortedData[i] = pName) then
else
exit repeat
end if
end repeat

repeat with i=startLine+1 to sortedData[0]
if (sortedData[i] = pName) then
else
exit repeat
end if
end repeat

return tCount
end count

## binary search
function nameOffset pName,startKey, endKey
if (startKey is empty) then put 1 into startKey
if (endKey is empty) then put sortedData[0] into endKey

if (startKey > endKey) then return -1

put ((startKey + endKey) div 2) into tIndex
put sortedData[tIndex] into tName

if (pName > tName) then put tIndex + 1 into startKey
else if (pName < tName) then put tIndex - 1 into endKey
else if (pName = tName) then return tIndex

return nameOffset(pName, startKey, endKey)
end nameOffset

> Thank you all.
>
> I ended up creating a new array anduse repeat for each key.
>
> Brian, I would be very much interested in the binary method on
> sorted data.
>
> Cheers,
>
> Malte

```