Order statistics

The idea behind sampling is the duality between a tuple of IID random variables and a certain multivariate random variable – since a sample of a distribution is just some tuple X = (X1,…Xn), one can consider this to be a random variable taking values in 𝕏n where each Xi is a random variable taking values in 𝕏.

In particular, one can define measurable functions 𝕏n → 𝕏 such as, e.g. sample moments (the sample mean, etc.). Another set of sample statistics one may define (when 𝕏 is a totally ordered set, such as ℝ but notably not something like ℝm) are the order statistics, which we will denote as Ξ©i, where Ξ©i(x1,…xn) gives the $i$th value in the sorted (in ascending order) list – in particular, Ξ©1 is the min  function and Ξ©n is the max  function.

Well, so Ξ©i(X) is a random variable – we can ask about how it's distributed.

For example, to calculate the cumulative of Ξ©n = max , note that max (X) ≀ wβ€„β‡”β€„βˆ€i, Xi ≀ w. Since the $Xi$s are IID, the CDF is just:

FΞ©n(X)(w) = FX(w)n

Further Insight: It's clear that as nβ€„β†’β€„βˆž, this function approaches (non-uniformly) either the Heaviside step function or zero (or really the "Heaviside step function at β€…+β€…βˆž"). This makes sense – if your distribution has a finite upper bound, then you'll eventually get that bound and the maximum of an infinite sample (i.e. of the distribution) will be that bound, but if it doesn't, then you're eventually bound to get every value, the maximum of an infinite sample is infinity.

Illustration of F(x)n for different distributions as $nβ†’βˆž$

Similarly, min (X) ≀ wβ€„β‡”β€„Β¬βˆ€i, ¬(Xi≀w). So the CDF is:

FΞ©1(X)(w) = 1β€…βˆ’β€…(1βˆ’FX(w))n

OK, teaser over. Now consider FΞ©i(X)(w), which is the probability that at least i of the data points are  ≀ w. The probability that some some specific r-selection is exactly these data points is FX(w)r(1βˆ’FX(w))nβ€…βˆ’β€…r. So:

$$F_{\Omega_i(\mathbf{X})}(w)=\sum_{r=i}^{n}{\binom{n}{i}}F_X(w)^i(1-F_X(w))^{n-i}$$

Interestingly, the joint PDF of the order statistics (which are not at all uncorrelated) actually has a much simpler form – the probability that (Ξ©1(X),…Ωn(X)) takes the value (x1,…xn) is zero if the latter is not in ascending order. And if it is, the value can result from (X1,…Xn) taking a value that is some permutation of (x1,…xn) – and there are n! such permutations. So the joint PDF is:

fΞ©(X)(w) = n!∏ifX(wi)

So the formulae above are just some fancy special cases of integration by parts on the above.