# Inefficient code - Solutions

Bill Marriott wjm at wjm.org
Sat Jun 27 14:27:55 CDT 2009

```I also forgot one other critical speed factor: the imagedata uses *four*
bytes per pixel in the image, and the first byte is always the same. So
checking every byte requires four times as many tests as required, compared
to checking four bytes at a time. [Weirdly, my first byte always shows as
255, not 0, as stated in the docs and as working for others.]

I spent a little bit of time putting together a stack with different kinds
of images and a script which reflects some of the concepts I discussed. (By
no means is it a formal "O(ND)" difference algorithm!) It's actually not
truly recursive. On my Core i7 920 system, it takes 10 milliseconds (versus
289) to process a 200x300 image with minimal delta from source; 842
milliseconds (versus 7312) for a 1600x1200 resolution screenshot comparison.
The stack (with various image types) can be found here:

Here is the core of the script:

-- where L is the length of the imagedata,
-- r is initially set to L, n & c are initialized to 0
repeat while c < L
repeat while char c+1 to c+r of a <> char c+1 to c+r of b
add 1 to n; if n mod 1000 = 0 then set the thumbPosition of
scrollbar "Progress" to c
if r >= 8 then
put r div 2 into r
else
put 1 into hAll[c div 4 * 4+1] -- report delta
end if
end repeat
put L - c into r
end repeat

I suspect there is a subtle math error in here as it generates a slightly
different number of changed pixels compared to the byte-by-byte method, but
it does reflect the "divide and conquer" approach, and testing four bytes at
the minimum, as opposed to one at a time. Also, I realized that in the edge
condition of an image being 100% different from the source, the original
method can't be beat. But in the case of screen shots where you might have
only a tiny portion of the screen changing, there is much room for
improvement over the original approach.

This is a very interesting challenge and I hope others pick it up and
further refine the algorithm.

- Bill

p.s.: Richmond:  Thanks for your stack/images. Remember, you can just
replace spaces with %20 in urls to get them to behave :)
http://mathewson.110mb.com/FILEZ/IMAGE%20COMPARE.rev.zip

"Bill Marriott" <wjm at wjm.org> wrote in message
news:h23uk6\$as7\$1 at ger.gmane.org...
> Bert,
>
> Others have pointed out the delay introduced by updating a progress bar
> with every pixel and suggested updating it every 100 or 500 pixels or so.
> Similarly, comparing byte-by-byte is going to be slow.
>
> An immediate, simple improvement will be achieved by comparing *groups* of
> pixels between the two images. For example, if your image is 10,000 bytes
> in size, comparing 500 bytes at a time results in 20 comparisons instead
> of 10,000 comparisons. As you find differences in a block of 500 bytes,
> you can then down-shift to find the differences within that 500-byte block
> with more granularity.
>
> A refinement on this approach is simply to "divide and conquer" by
> constantly dividing the image by half and recursively testing the halves
> for differences. If the differences between the two images are small, the
> comparison can be near-instant.
>
> One of the classic papers on checking for differences between data sets
> can be found here:
>
> http://xmailserver.org/diff2.pdf
>
> Of course, the language in that paper is way beyond my comprehension ;)
>
> I'll putter around with expressing these concepts elegantly in Rev, but
> hopefully this gives you or someone else on the list a starting point for
> an algorithm that is dramatically faster than byte-for-byte testing. (I'd
> love to see the "O(ND)" difference algorithm properly implemented in Rev
> code.)
>
> - Bill

```