GMM Appendix A: MLE for Gaussian Parameters: Complete Derivation

In this appendix, we will derive the maximum likelihood estimates for the parameters of a Gaussian distribution.1

Setup

We have \(n\) i.i.d. observations \(x_1, x_2, \ldots, x_n\) from \(\mathcal{N}(\mu, \sigma^2)\).

The log-likelihood is:

\[ \ell(\mu, \sigma^2) = \sum_{i=1}^{n} \log p(x_i \mid \mu, \sigma^2) \]

\[ = \sum_{i=1}^{n} \left[-\frac{1}{2}\log(2\pi) - \frac{1}{2}\log(\sigma^2) - \frac{(x_i-\mu)^2}{2\sigma^2}\right] \]

\[ = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_i-\mu)^2 \]


Derivative with Respect to \(\mu\)

Taking the Partial Derivative

\[\frac{\partial \ell}{\partial \mu} = \frac{\partial}{\partial \mu}\left[-\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_i-\mu)^2\right]\]

The first two terms don’t depend on \(\mu\), so their derivatives are zero:

\[\frac{\partial \ell}{\partial \mu} = -\frac{1}{2\sigma^2} \frac{\partial}{\partial \mu}\sum_{i=1}^{n}(x_i-\mu)^2\]

\[= -\frac{1}{2\sigma^2} \sum_{i=1}^{n}\frac{\partial}{\partial \mu}(x_i-\mu)^2\]

Using the chain rule:

\[\frac{\partial}{\partial \mu}(x_i-\mu)^2 = 2(x_i-\mu) \cdot (-1) = -2(x_i-\mu)\]

Therefore:

\[\frac{\partial \ell}{\partial \mu} = -\frac{1}{2\sigma^2} \sum_{i=1}^{n}[-2(x_i-\mu)]\]

\[= \frac{1}{\sigma^2} \sum_{i=1}^{n}(x_i-\mu)\]

\[\boxed{\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n}(x_i-\mu)}\]

Setting Equal to Zero

\[\frac{1}{\sigma^2} \sum_{i=1}^{n}(x_i-\mu) = 0\]

Since \(\sigma^2 \neq 0\):

\[\sum_{i=1}^{n}(x_i-\mu) = 0\]

\[\sum_{i=1}^{n}x_i - n\mu = 0\]

\[\boxed{\hat{\mu}_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^{n}x_i = \bar{x}}\]

The MLE for the mean is simply the sample mean!


Derivative with Respect to \(\sigma^2\)

Taking the Partial Derivative

Let’s work with \(v = \sigma^2\) for clarity:

\[\ell(\mu, v) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log(v) - \frac{1}{2v}\sum_{i=1}^{n}(x_i-\mu)^2\]

\[\frac{\partial \ell}{\partial v} = \frac{\partial}{\partial v}\left[-\frac{n}{2}\log(v)\right] + \frac{\partial}{\partial v}\left[-\frac{1}{2v}\sum_{i=1}^{n}(x_i-\mu)^2\right]\]

First Term

\[\frac{\partial}{\partial v}\left[-\frac{n}{2}\log(v)\right] = -\frac{n}{2} \cdot \frac{1}{v} = -\frac{n}{2v}\]

Second Term

Let \(S = \sum_{i=1}^{n}(x_i-\mu)^2\) (this doesn’t depend on \(v\)):

\[\frac{\partial}{\partial v}\left[-\frac{S}{2v}\right] = -\frac{S}{2} \cdot \frac{\partial}{\partial v}(v^{-1})\]

\[= -\frac{S}{2} \cdot (-v^{-2}) = \frac{S}{2v^2}\]

Combining

\[\frac{\partial \ell}{\partial v} = -\frac{n}{2v} + \frac{1}{2v^2}\sum_{i=1}^{n}(x_i-\mu)^2\]

Switching back to \(\sigma^2\):

\[\boxed{\frac{\partial \ell}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^{n}(x_i-\mu)^2}\]

Or equivalently:

\[\frac{\partial \ell}{\partial \sigma^2} = \frac{1}{2\sigma^4}\left[\sum_{i=1}^{n}(x_i-\mu)^2 - n\sigma^2\right]\]

Setting Equal to Zero

\[-\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^{n}(x_i-\mu)^2 = 0\]

Multiply both sides by \(2\sigma^4\):

\[-n\sigma^2 + \sum_{i=1}^{n}(x_i-\mu)^2 = 0\]

\[n\sigma^2 = \sum_{i=1}^{n}(x_i-\mu)^2\]

\[\boxed{\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2}\]

With Unknown Mean

In practice, we don’t know the true \(\mu\), so we plug in \(\hat{\mu} = \bar{x}\):

\[\boxed{\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2}\]

The MLE for the variance is the sample variance!

(Note: This estimator is biased. The unbiased estimator uses \(n-1\) in the denominator instead of \(n\), but MLE gives \(n\).)


Summary of Results

Log-Likelihood

\[\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(x_i-\mu)^2\]

Partial Derivatives

\[\boxed{\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n}(x_i-\mu)}\]

\[\boxed{\frac{\partial \ell}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4}\sum_{i=1}^{n}(x_i-\mu)^2}\]

Maximum Likelihood Estimates

\[\boxed{\hat{\mu}_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^{n}x_i = \bar{x}}\]

\[\boxed{\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2}\]

These are the sample mean and sample variance—exactly what intuition suggests!

Back to top

Footnotes

  1. Courtesy of Claude.ai↩︎