[OT] Weighted distribution of Numbers
hh at hyperhh.de
Mon Aug 5 04:48:30 EDT 2019
[@Mark: A (weighted) mean is a location parameter, one number.]
Here the customer (say Dagobert Duck) wants to change/weight the
distribution of the data.
As Dar says, he could do a mapping from 0-800 to bins as
"bad, neutral, good" simply by setting limits for the bins.
For example 0-30 = bad, 31-70 = neutral, 71-100 = good.
And make these limits transparent and show their frequencies
as they are.
But now Dagobert wants to "adjust" by (1) and/or (2):
Set the limits for the bins such that each bin has a relative
frequency of 1/3 (or given relative frequencies).
This is setting categories by their frequencies in order to
interpret the frequency of the categories.
Change the raw data such that for the given limits each bin has
a relative frequency of 1/3 (or given relative frequencies).
This is filling categories by changing data in order to interpret
the frequency of the categories of the changed data.
In sum, Dagobert wants to change the method on base of the raw
data or change the raw data such that the results are the wished
ones. (Honi soit qui mal y pense ...)
I would accept (1) if one argues from *theoretical* reasons that
the bins are expected to have frequencies of say 30%, 50%, 20%.
This could lead to limits on base of *some* (random part) of raw data:
In order to find these limits simply sort the random data (a random
sample drawn out of the raw data) and take the values that have
approximately 30% or 80% of the values below them (no scaling needed
for that). In statistical terms: Find the 30% and 80% quantiles.
Then one could use these (transparent) limits for the *rest* of the
raw data and new raw data and interpret the frequencies of the bins.
More information about the Use-livecode