randomly order a list

Alex Tweedly alex at tweedly.net
Wed Jun 5 18:14:23 EDT 2013


Your code has a minor bug :-)

You "get MD5Digest(S[1]) "
instead of using   S[i]

Here's the code I used (now extended to check different lengths).

constant K=100000    -- the number of iterations
constant KLength=20   -- this * 36 --> the number of chars per line
on mouseup
    put empty into msg

    put the millisecs into t1
    repeat K times
       put random(999999999) into j3
    end repeat
    put "random" && the millisecs - t1 &CR after msg

    put the millisecs into t1
    repeat K times
       put random(999999999) & random(999999999) into j3
    end repeat
    put "random&random" && the millisecs - t1 &CR after msg

    put empty into L1
    put "abcdefghijklmnopqrstuvwxyz0123456789" after L1
    put the millisecs into t1
    repeat K times
       put md5digest(L1) into j1
    end repeat
    put "md5" && the number of chars in L1 && the millisecs - t1 &CR 
after msg

    put empty into L1
    repeat kLength times
       put "abcdefghijklmnopqrstuvwxyz0123456789" after L1
    end repeat
    put the millisecs into t1
    repeat K times
       put md5digest(L1) into j1
    end repeat
    put "md5" && the number of chars in L1 && the millisecs - t1 &CR 
after msg

end mouseup


and it gives something like

(700 chars)
random 20
random&random 157
md5 36 42
md5 720 235

(1440 chars)
random 17
random&random 161
md5 36 41
md5 1440 431

-- Alex.


On 05/06/2013 18:38, Geoff Canyon wrote:
> What code were you using Alex? I thought the first step(s) of the MD5
> process reduce (or grow) whatever input string is given to 128 bits, and
> then everything from there operates on the 128 bit data. Likewise for SHA1,
> in 160 bits.
>
> In other words, the size of the individual strings should have a limited
> impact on the MD5 algorithm. For example, the two times returned by this
> are nearly identical:
>
> on mouseUp
>     repeat 5
>        put random(999999999) after S[1]
>     end repeat
>     repeat 5000
>        put random(999999999) after S[2]
>     end repeat
>     repeat with i = 1 to 2
>        put ticks() into T
>        repeat 1000000
>           get MD5digest(S[1])
>        end repeat
>        put i && ticks() - T & cr after R
>     end repeat
>     put R
> end mouseUp
>
>
>
> On Tue, Jun 4, 2013 at 6:26 PM, Alex Tweedly <alex at tweedly.net> wrote:
>
>> No comments on the "collision-or-not-ness", but some concerns about
>> performance.
>>
>> The performance of "random() & random()" is conveniently data-independent,
>> but that for md5digest() is not. With nice short lines, it is indeed faster
>> than the random&random version, but as the line size increases, so does the
>> time taken by all of the digest methods. I didn't test it thoroughly, but
>> the swap-over point is fairly low - somewhere around 500 chars per line.
>>
>> -- Alex.
>>
>>
>> On 04/06/2013 18:51, Geoff Canyon wrote:
>>
>>> At the risk of beating the decaying equus -- the previously suggested
>>> random() solutions should be fine for all purposes --I found an
>>> alternative
>>> that:
>>>
>>> 1. Is faster than sorting by random(999999999) & random(999999999)
>>> 2. Is about as fast as sorting by random(999999999)
>>> 3. Is (I think) less likely to have duplicate sort keys
>>>
>>> The drawback is that it is determinative (albeit random) for any given set
>>> of data, unless you are willing to accept performance equivalent to
>>> sorting
>>> by random(999999999) & random(999999999), while providing near-certainty
>>> of
>>> a true sort (I think).
>>>
>>> The one-time, as fast as any solution so far, sort is:
>>>
>>>         sort lines of myVar by md5digest(each)
>>>
>>> Collisions are highly unlikely in 128 bits. Even random(999999999) &
>>> random(999999999) only provides about 60 bits, which, to be clear, is
>>> *more* than enough, but md5 is (I think) even more certain, and faster.
>>> However, it will always produce the same results.
>>>
>>>         sort lines of myVar by sha1digest(each)
>>>
>>> Works roughly the same: 160 bits of guaranteed-no-collision-ness, but it's
>>> a little slower, although still much faster than random(999999999) &
>>> random(999999999). Like MD5, it will always sort the same data the same
>>> (random) way.
>>>
>>> The same-ness for either solution can (I think) be fixed by this:
>>>
>>>         put ticks() into T
>>>         sort lines of myVar by md5digest(T & each)
>>>
>>> or
>>>
>>>         put ticks() into T
>>>         sort lines of myVar by sha1digest(T & each)
>>>
>>> That should result in random results each time, and is a little faster
>>> (MD5) or about 1/3 slower (SHA1) than random(999999999) &
>>> random(999999999)
>>>
>>> If anyone has thoughts on the collision-or-not-ness of MD5 or SHA1, feel
>>> free to comment. Otherwise, I hope I'm done now ;-)
>>> ______________________________**_________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>>>
>>
>> ______________________________**_________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list