Generating Random numbers to conform a distribution

Mark Waddingham mark at livecode.com
Wed Jun 8 02:07:49 EDT 2022


On 2022-06-07 21:51, David V Glasgow via use-livecode wrote:
> Quite a lot of stats and maths packages offer a feature whereby the N,
> the Mean and the SD are variables specified by the user, and N random
> numbers are then generated with the required mean and SD.  I remember
> the venerable and excellent Hypercard  HyperStat
> <https://link.springer.com/content/pdf/10.3758/BF03204668.pdf> (1993)
> by David M Lane doing exactly that.
> 
> Or is there an elegant formula?  I have Googled about and can’t see
> one, but maybe I don’t know the magic words.  And if someone wanted to
> script this in LC what would be the best approach? (just general
> guidance here, wouldn’t want anyone to invest their valuable time in
> what is at present just vague musings)
> 
> Any hints from the stats gurus?

I'm not a stats guru but...

I think all you need to do here is to use some of the intrinsic 
'properties' of the Mean and SD.

Lets say you have a collection X of numbers then the following things 
are always true:

   P1: Mean(c * X) = c * Mean(X)
   P2: Mean(X + k) = k + Mean(X)
   P3: SD(c * X) = abs(c) * SD(X)
   P4: SD(X + k) = SD(X)

In English, scaling a set of numbers scales their mean by the same 
amount, and offsetting a set of numbers offsets their mean by the same 
amount, Similarly, scaling a set of numbers scales their SD by the same 
amount, and offsetting a set of numbers makes no difference to the SD 
(as the SD is a relative quantity - it cares about distance from the 
mean, not magnitude).

Now, hopefully we can agree that if you generate a set of a random 
numbers, then scaling and offsetting them still uniformly does not 
reduce the randomness (randomness means the numbers form a uniform 
distribution over the range of generation, if you scale and offset then 
all you are doing is changing the range - not the distribution).

So with this in mind, let TMean and TSD be the target mean and target 
SD. Then:

   1. Generate N random numbers in the range [0, 1] - S0, ..., SN

   2. Compute SMean := Mean(S0, ..., SN)

   3. Compute SSD := SD(S0, ..., SN)

Now we take a small diversion from a sequence of enumerated steps to ask 
"what offset and scale do we need to apply to the set of numbers so that 
we get TMean and TSD, rather than SMean and SSD?".

The amount we need to scale by is mandated by the SD, specifically:

      c := TSD/SSD

If we scale our source numbers by c and apply SD then we see:

      SD(c * S0, ..., c * SN) = c * SD(S0, ..., SN) [P3 above]
                              = c * SSD
                              = TSD / SSD * SSD
                              = TSD

i.e. Our scaled input numbers give us the desired SD!

So now we just need to play the same 'game' with the Mean. We have:

      Mean(c * S0, ..., c * SN) = c * Mean(S0, ..., SN)
                                = c * SMean

However we really want a mean of TMean so define:

      k := TMean - c * SMean

Then if we translate our (scaled!) source numbers by k and apply Mean 
then we see:

     Mean(c * S0 + k, ..., c * SN + k) = c * Mean(S0, ..., SN) + k [P1 
and P2 above]
                                       = c * SMean + k
                                       = c * SMean + TMean - c * SMean
                                       = TMean

i.e. Our scaled and offset input numbers give us the desired Mean!

Note that SD is invariant under offsetting (P4) so SD(c * S0 + k, ..., c 
* SN + k) = SD(c * S0, ... c * SN) = TSD!

We can now return to our sequence of steps:

   4. Compute c := TSD/SSD

   5. Compute k := TMean - c * SMean

   6. Compute the target random numbers, Tn := c * Sn + k

So, assuming my maths is correct above T0, ..., TN, will be still be 
'random' (for some suitable definition of random), but have Mean of 
TMean and SD of TSD as desired.

In LiveCode Script, the above is something like:

    function randomNumbers pN, pTMean, pTSD
       local tSource
       repeat pN times
          put random(2^31) & comma after tSource
       end repeat

       local tSMean, tSSD
       put average(tSource) into tSMean
       put stdDev(tSource) into tSSD

       local tC, tK
       put pTSD / pSSD into tC
       put pTMean - tC * tSMean into tK

       local tTarget
       repeat for each item tS in tSource
         put tC * tS + tK & comma after tTarget
       end repeat

       return tTarget
    end randomNumbers

Hope this helps!

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps



More information about the use-livecode mailing list