randomly order a list
Alex Tweedly
alex at tweedly.net
Tue Jun 4 19:26:07 EDT 2013
No comments on the "collision-or-not-ness", but some concerns about
performance.
The performance of "random() & random()" is conveniently
data-independent, but that for md5digest() is not. With nice short
lines, it is indeed faster than the random&random version, but as the
line size increases, so does the time taken by all of the digest
methods. I didn't test it thoroughly, but the swap-over point is fairly
low - somewhere around 500 chars per line.
-- Alex.
On 04/06/2013 18:51, Geoff Canyon wrote:
> At the risk of beating the decaying equus -- the previously suggested
> random() solutions should be fine for all purposes --I found an alternative
> that:
>
> 1. Is faster than sorting by random(999999999) & random(999999999)
> 2. Is about as fast as sorting by random(999999999)
> 3. Is (I think) less likely to have duplicate sort keys
>
> The drawback is that it is determinative (albeit random) for any given set
> of data, unless you are willing to accept performance equivalent to sorting
> by random(999999999) & random(999999999), while providing near-certainty of
> a true sort (I think).
>
> The one-time, as fast as any solution so far, sort is:
>
> sort lines of myVar by md5digest(each)
>
> Collisions are highly unlikely in 128 bits. Even random(999999999) &
> random(999999999) only provides about 60 bits, which, to be clear, is
> *more* than enough, but md5 is (I think) even more certain, and faster.
> However, it will always produce the same results.
>
> sort lines of myVar by sha1digest(each)
>
> Works roughly the same: 160 bits of guaranteed-no-collision-ness, but it's
> a little slower, although still much faster than random(999999999) &
> random(999999999). Like MD5, it will always sort the same data the same
> (random) way.
>
> The same-ness for either solution can (I think) be fixed by this:
>
> put ticks() into T
> sort lines of myVar by md5digest(T & each)
>
> or
>
> put ticks() into T
> sort lines of myVar by sha1digest(T & each)
>
> That should result in random results each time, and is a little faster
> (MD5) or about 1/3 slower (SHA1) than random(999999999) & random(999999999)
>
> If anyone has thoughts on the collision-or-not-ness of MD5 or SHA1, feel
> free to comment. Otherwise, I hope I'm done now ;-)
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list