Point KL-Divergence is not Very Negative Very Often

If X\sim P then for any distribution Q it is unlikely that Q ascribes much greater density to X's outcome than P does. In fact if P,Q have PDFs f_P, f_Q, then:

\begin{align} \mathbb{P}(f_P(X)\leq c f_Q(X)) &= \int \mathbf{1}_{\{x:f_P(x)\leq cf_Q(x) \}} f_P(x)dx \\ &\leq \int c f_Q(x) dx \\ &= c. \end{align}

This carries over to relative entropy:

Noting D_{KL}(P\|Q)=\mathbb{E}[Z] for Z=\ln \frac{f_P(X)}{f_Q(X)}, then for any z,

\begin{align} \mathbb{P}(Z\leq z) &= \mathbb{P}\left(\ln \frac{f_P(X)}{f_Q(X)} \leq z \right) \\ &= \mathbb{P}(f_P(X)\leq e^z f_Q(X)) \\ &\leq e^z. \end{align}

This is actually just an interesting instance of the Chernoff bound. The same thing can be done when P,Q aren't over \mathbb{R} or don't have CDFs, or even with other types of divergences.

Comments