Definitions
Let \(X\) and \(Y\) be discrete random variables defined on finite alphabets \(\mathcal{X}\) \(\mathcal{Y}\), respectively, and with joint probability mass function \(p_{X,Y}\). The mutual information of \(X\) and \(Y\) is the random variable \(I(X,Y)\) defined by
\[ I(X,Y) = \log\frac{p_{X,Y}(X,Y)}{p_X(X)p_Y(Y)}.\]
As with entropy, the base of the logarithm defines the units of mutual information. If the if the logarithm is to the base \(e\), the unit of entropy is the nat.
The average mutual information of \(X\) and \(Y\) is the expectation of the random variable \(I(X,Y)\), and is denoted by \(\mathbb{I}(X,Y)\).
\[ \mathbb{I}(X,Y) = \mathbb{E}(I(X,Y))=\sum_{x\in\mathcal{X},y\in\mathcal{Y}}p_{X,Y}(x,y)\log\frac{p_{X,Y}(x,y)}{p_X(x)p_Y(x)} \]