Inefficient code - Solutions

Bill Marriott wjm at wjm.org
Sat Jun 27 15:27:55 EDT 2009


I also forgot one other critical speed factor: the imagedata uses *four* 
bytes per pixel in the image, and the first byte is always the same. So 
checking every byte requires four times as many tests as required, compared 
to checking four bytes at a time. [Weirdly, my first byte always shows as 
255, not 0, as stated in the docs and as working for others.]

I spent a little bit of time putting together a stack with different kinds 
of images and a script which reflects some of the concepts I discussed. (By 
no means is it a formal "O(ND)" difference algorithm!) It's actually not 
truly recursive. On my Core i7 920 system, it takes 10 milliseconds (versus 
289) to process a 200x300 image with minimal delta from source; 842 
milliseconds (versus 7312) for a 1600x1200 resolution screenshot comparison. 
The stack (with various image types) can be found here:

http://bill.on-rev.com/linked/Compare.rev

Here is the core of the script:

-- where L is the length of the imagedata,
-- r is initially set to L, n & c are initialized to 0
   repeat while c < L
      repeat while char c+1 to c+r of a <> char c+1 to c+r of b
         add 1 to n; if n mod 1000 = 0 then set the thumbPosition of 
scrollbar "Progress" to c
         if r >= 8 then
            put r div 2 into r
         else
            put 1 into hAll[c div 4 * 4+1] -- report delta
            add 4 to c
         end if
      end repeat
      add r to c
      put L - c into r
   end repeat

I suspect there is a subtle math error in here as it generates a slightly 
different number of changed pixels compared to the byte-by-byte method, but 
it does reflect the "divide and conquer" approach, and testing four bytes at 
the minimum, as opposed to one at a time. Also, I realized that in the edge 
condition of an image being 100% different from the source, the original 
method can't be beat. But in the case of screen shots where you might have 
only a tiny portion of the screen changing, there is much room for 
improvement over the original approach.

This is a very interesting challenge and I hope others pick it up and 
further refine the algorithm.

- Bill

p.s.: Richmond:  Thanks for your stack/images. Remember, you can just 
replace spaces with %20 in urls to get them to behave :)
http://mathewson.110mb.com/FILEZ/IMAGE%20COMPARE.rev.zip




"Bill Marriott" <wjm at wjm.org> wrote in message 
news:h23uk6$as7$1 at ger.gmane.org...
> Bert,
>
> Others have pointed out the delay introduced by updating a progress bar 
> with every pixel and suggested updating it every 100 or 500 pixels or so. 
> Similarly, comparing byte-by-byte is going to be slow.
>
> An immediate, simple improvement will be achieved by comparing *groups* of 
> pixels between the two images. For example, if your image is 10,000 bytes 
> in size, comparing 500 bytes at a time results in 20 comparisons instead 
> of 10,000 comparisons. As you find differences in a block of 500 bytes, 
> you can then down-shift to find the differences within that 500-byte block 
> with more granularity.
>
> A refinement on this approach is simply to "divide and conquer" by 
> constantly dividing the image by half and recursively testing the halves 
> for differences. If the differences between the two images are small, the 
> comparison can be near-instant.
>
> One of the classic papers on checking for differences between data sets 
> can be found here:
>
> http://xmailserver.org/diff2.pdf
>
> Of course, the language in that paper is way beyond my comprehension ;)
>
> I'll putter around with expressing these concepts elegantly in Rev, but 
> hopefully this gives you or someone else on the list a starting point for 
> an algorithm that is dramatically faster than byte-for-byte testing. (I'd 
> love to see the "O(ND)" difference algorithm properly implemented in Rev 
> code.)
>
> - Bill 





More information about the use-livecode mailing list