Performance issues LC8 versus earlier versions.
alex at tweedly.net
Tue Aug 23 18:23:54 EDT 2016
On 22/08/2016 15:47, Richard Gaskin wrote:
> Alex Tweedly wrote:
> > Would caseSensitive make it faster ?
> In theory yes, since it avoids having to run the internal equivalent
> of toLower on each thing being compared.
But since these are bytes, not chars, that doesn't apply.
> However in some recent experiments involving pattern matching on text
> I was unable to measure a difference. That shouldn't be taken as
> definitive; there are a lot of distracting things going on in the
> routine I was testing with. I haven't yet done a good isolated test
> of caseSensitive.
> > Re md5 for repeated use - yes, it probably is worth doing.
> The rsync algo offers an md5 option, but by default it compares files
> based only on mod date and size. The thinking is that if both of
> those match, the odds of having a changed file are very low.
> Perhaps an optimal algo in your system would reserve md5 for those
> cases where size and mod date match, which will eliminate most cases
> with less CPU time.
Thanks Richard, but this is a very different context. In my case, the
mod dates will never match; the duplicate files arise because the user
has imported the same photos from a camera more than once (into
different folders, or into the the same one using auto-renaming), or has
copied a folder of files to trim out the ones to be copied to another
machine, or .... any of a number of things, but all causing the copied
file to have a different mod date from the original.
My original benchmarking was faulty; in fact, taking the md5hash for the
two files is only 50% more expensive than simply comparing them (higher
if they are actually different), but that leaves the conclusion
unchanged - it's not worth the extra complexity. There is an assumption
underlying this - that in real life (different from my development
phase), the majority of genuine duplicates will be dealt with (i.e. one
copy deleted or moved elsewhere) fairly quickly, so the same comparisons
won't be run repeatedly. The remaining cases of same file size are so
rare (around 80 in my full 50,000 file set) that pair-wise comparisons
take only 4 seconds (or 2 seconds if I use an older version of LC), so
no great impact on the user experience.
(The other parts of the overall workflow - where I would like to gather
and use the exif data - are more strongly impacted by the performance
issue - but my desire to use the latest of LC8 rather than an obsolete
version is probably strong enough to override that, and I'll just be
more patient - even though patient is not my natural state :-)
More information about the Use-livecode