Order statistics

The idea behind sampling is the duality between a tuple of IID random variables and a certain multivariate random variable – since a sample of a distribution is just some tuple X = (X₁,…X_n), one can consider this to be a random variable taking values in 𝕏ⁿ where each X_i is a random variable taking values in 𝕏.

In particular, one can define measurable functions 𝕏ⁿ → 𝕏 such as, e.g. sample moments (the sample mean, etc.). Another set of sample statistics one may define (when 𝕏 is a totally ordered set, such as ℝ but notably not something like ℝ^m) are the order statistics, which we will denote as Ω_i, where Ω_i(x₁,…x_n) gives the $i$th value in the sorted (in ascending order) list – in particular, Ω₁ is the min function and Ω_n is the max function.

Well, so Ω_i(X) is a random variable – we can ask about how it's distributed.

For example, to calculate the cumulative of Ω_n = max , note that max (X) ≤ w ⇔ ∀i, X_i ≤ w. Since the $X_i$s are IID, the CDF is just:

F_{Ω_n(X)}(w) = F_X(w)ⁿ

Further Insight: It's clear that as n → ∞, this function approaches (non-uniformly) either the Heaviside step function or zero (or really the "Heaviside step function at + ∞"). This makes sense – if your distribution has a finite upper bound, then you'll eventually get that bound and the maximum of an infinite sample (i.e. of the distribution) will be that bound, but if it doesn't, then you're eventually bound to get every value, the maximum of an infinite sample is infinity.

Illustration of F(x)ⁿ for different distributions as $n→∞$

Similarly, min (X) ≤ w ⇔ ¬∀i, ¬(X_i≤w). So the CDF is:

F_Ω₁(X)(w) = 1 − (1−F_X(w))ⁿ

OK, teaser over. Now consider F_{Ω_i(X)}(w), which is the probability that at least i of the data points are ≤ w. The probability that some some specific r-selection is exactly these data points is F_X(w)^r(1−F_X(w))^n − r. So:

$$F_{\Omega_i(\mathbf{X})}(w)=\sum_{r=i}^{n}{\binom{n}{i}}F_X(w)^i(1-F_X(w))^{n-i}$$

Interestingly, the joint PDF of the order statistics (which are not at all uncorrelated) actually has a much simpler form – the probability that (Ω₁(X),…Ω_n(X)) takes the value (x₁,…x_n) is zero if the latter is not in ascending order. And if it is, the value can result from (X₁,…X_n) taking a value that is some permutation of (x₁,…x_n) – and there are n! such permutations. So the joint PDF is:

f_Ω(X)(w) = n!∏_if_X(w_i)

So the formulae above are just some fancy special cases of integration by parts on the above.