alex at tweedly.net
Fri Jan 20 19:53:56 EST 2023
On 20/01/2023 18:26, Geoff Canyon via use-livecode wrote:
> I'm sure someone has done the work to create a more efficient algorithm for
> this. Off the top of my head if I were trying to I'd probably do something
Hmmm. Maybe. But I kind of doubt it (though I'd love to find out I'm wrong).
This (or closely related problems) got a lot of attention on the
mid70s-mid80s, when we were trying to do Design Rule Verification for
VLSI circuits, with millions of transistors. Where the rules could
exploit hierarchy, the "clever tree" data structures and algorithms did
well (See Quad-CIF etc.) But for non-hierarchical problems (such as this
'closest points' case), nothing came close to this 'scanning window' or
'linear scan' approach.
But looking more closely, I realized that the number of times a new
"closest pair" was found is remarkably small - typically between 6 & 10
times for 50,000 points. So it's very feasible to bear the cost of
calculating the actual distance (i.e. doing the sqrt call) each time a
new 'closest pair' is found, and that means the quick filtering test can
be done on the x-distance (rather than x*x). Also, you can do a similar
filter test on the Y coord (though it doesn't let you exit the inner
loop, only allows you to skip the full comparison).
Adding those changes in gets the time down by another 10% or so - so the
original 2000 points comes down from approx 35ms to around 28ms (almost
too small to measure reliably). More reasonably, 50,000 points comes
down from 880ms to 810ms.
function closestPointsSQ pLines
sort pLines numeric by item 1 of each
put pLines into pPoints
split pPoints by CR
put infinity into minDistSQ
put infinity into minDist
put the number of elements in pPoints into N
repeat with i = 1 to N-1
repeat with j = i + 1 to N
put item 1 of pPoints[j] - item 1 of pPoints[i] into t1
if t1 > minDist then exit repeat
put item 2 of pPoints[j] - item 2 of pPoints[i] into t2
if t2 > minDist OR t2 < -minDist then next repeat
-- add 1 to compareCount
put t1 * t1 + t2 * t2 into dist
if dist < minDistSQ then
put dist into minDistSQ
put sqrt(dist) into minDist
put pPoints[i] & " " & pPoints[j] into closestPoints
-- add 1 to newClosest
else if dist = minDistSQ then
put sqrt(dist) into minDist
put return & pPoints[i] & " " & pPoints[j] after closestPoints
-- add 1 to tAlsoClosest
-- put "SQ compared" && compareCount && newClosest && tAlsoClosest
&CR after msg
> 1. Grab two points at random (in case the points are pre-sorted in some
> way) and get the distance.
> 2. Assume that's a reasonable average distance between points.
> 3. Define that as the number to beat, and define a grid based on it.
> 4. Go through all the points, tossing them into buckets based on the grid.
> I'd define the buckets as fully overlapping to avoid missing close pairs.
> 5. The size of the grid buckets is critical: too big and you do too much
> work. Too small and you end up with all singletons in the buckets. This
> would require some experimentation and thought.
> 6. Go through the buckets only comparing the points within them.
> On Fri, Jan 20, 2023 at 10:14 AM Alex Tweedly via use-livecode <
> use-livecode at lists.runrev.com> wrote:
>> On 20/01/2023 16:52, Mark Waddingham via use-livecode wrote:
>>> On 2023-01-20 13:05, Alex Tweedly via use-livecode wrote:
>>>> We need a better algorithm. If we use a "linear scan", we can change
>>>> it from essentially Order(N**2) to approx Order(N).
>>> Slightly pedantic point (I appreciate that you did say 'approx')...
>>> Sorting can not be done in any less time than O(N*log N) - so the
>>> revised algorithm is O(N*log N) as you have to sort the input for it
>>> to be able to scan linearly.
>> I figured someone would pick me up on that :-) It was a pretty big
>> 'approx' :\)
>> The sorting step is indeed O(NlogN).
>> And the rest of it also worse than linear, but I don't have the math
>> knowledge to even begin to think about how much worse. Quick testing
>> says it's worse than O(NlogN), and in practice, it might even be as bad
>> as something like O(N * sqrt(N)), guessing about the number of points
>> within a (variable) sized window to the right of the current scan coord.
>> But in any "real app" usage that I can imagine, the 'clustering' effect
>> would seem to improve it over O(N*sqrt(N)); e.g. if the points tend to
>> cluster rather than be evenly randomly spread, then the number within
>> the scan window goes down on average. "real app" could be players in a
>> 1st person player game, or vehicles on a road simulation, or (in 3d)
>> stars in the universe - in all cases causing the minDist to decrease
>> faster than when they are evenly spread.
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
More information about the use-livecode