Markov Inequality

Markov Inequality

If the expectation value of a non-negative random variable is small, then the random variable must itself be small with high probability. Markov's inequality quantifies this observation.

Let $X$ be a non-negative random variable with finite expectation $\mathbb{E}[X]$. Then for any $\epsilon > 0$,

\begin{align} \mathbb{P}(X \geq \epsilon) \leq \frac{\mathbb{E}[X]}{\epsilon} \end{align}


In two steps:

\begin{align} \mathbb{E}[X] = \int_{0}^{\infty}{x dP(x)} \geq \int_{\epsilon}^{\infty}{x dP(x)} \geq \int_{\epsilon}^{\infty}{\epsilon dP(x)} = \epsilon \mathbb{P}(X \geq \epsilon) \end{align}

The first inequality uses the fact that the integrand is non-negative, the second that $\epsilon$ lower-bounds the new, shrunk integrand.

Derivative Inequalities

Let $X$ be any random variable, and $f$ a non-negative increasing function. Then $X \geq \epsilon$ if and only if $f(X) \geq f(\epsilon)$. Supposing that $\mathbb{E}[f(X)] \leq \infty$, applying the basic Markov inequality gives

\begin{align} \mathbb{P}(X \geq \epsilon) \leq \frac{\mathbb{E}[f(X)]}{f(\epsilon)} \end{align}

Relationship to the Chebyshev Inequality

Take $f(x) = (x - \mathbb{E}[X])^2$. This is non-negative, whether or not $X$ is, and its expectation, if it exists, is the variance $\mathbb{V}[X]$. Therefore

\begin{align} \mathbb{P}(|X-\mathbb{E}[X]| \geq \epsilon) = \mathbb{P}((X-\mathbb{E}[X])^2 \geq \epsilon^2) \leq \frac{\mathbb{V}[X]}{\epsilon^2} \end{align}

which is the Chebyshev inequality. Similar bounds can be derived for higher moments, but do not have eponyms.

Exponential Inequalities

Picking a $t > 0$ and taking $f(x) = e^{tx}$,

\begin{align} \mathbb{P}(X \geq \epsilon) \leq e^{-t\epsilon} \mathbb{E}[e^{tX}] \end{align}

when the expectation is finite.

The cumulant generating function is $K_X(t) = \log{\mathbb{E}[e^{tX}]}$. The above inequality gives

\begin{align} -\log{\mathbb{P}(X \geq \epsilon)} \leq \sup_{t > 0}{t\epsilon - K_X(t)} \end{align}

the right-hand side being the Legendre-Frenchel transform of the cumulant generating function.

This approach is often useful for getting exponential inequalities for unbounded variables, and is part of Cramér's theorem in large deviations theory.