GMM Appendix C: Derivation of the Complete Data Likelihood

Starting Point: The Generative Model¹

Recall that the GMM generates each point by:

First: Choose cluster \(k\) with probability \(\pi_k\)
Then: Generate point from that cluster: \(x_i \sim \mathcal{N}(\mu_k, \Sigma_k)\)

Step 1: Joint Probability with Known Assignment

If we know that point \(x_i\) was generated by cluster \(z_i\), the joint probability is:

\[ p(x_i, z_i \mid \theta) = p(z_i) \cdot p(x_i \mid z_i, \theta) = \pi_{z_i} \cdot \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \]

This is just applying the chain rule: probability of choosing cluster \(z_i\) times probability of generating \(x_i\) from that cluster.

Step 2: Complete Data Likelihood (All Points)

For all \(n\) independent data points:

\[ p(X, Z \mid \theta) = \prod_{i=1}^{n} \pi_{z_i} \cdot \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \]

Step 3: Take the Log

\[ \log p(X, Z \mid \theta) = \sum_{i=1}^{n} \left[ \log \pi_{z_i} + \log \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \right] \]

Step 4: The Indicator Function Trick

Here’s the key insight! Since \(z_i\) takes on exactly one value from \(\{1, 2, \ldots, K\}\), we can rewrite \(\pi_{z_i}\) using an indicator function:

\[ \log \pi_{z_i} = \sum_{k=1}^{K} \mathbb{1}(z_i = k) \log \pi_k \]

Why? Because only one term in this sum is non-zero (when \(k = z_i\)), and it equals \(\log \pi_{z_i}\).

Similarly: \[ \log \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) = \sum_{k=1}^{K} \mathbb{1}(z_i = k) \log \mathcal{N}(x_i \mid \mu_k, \Sigma_k) \]

Step 5: Final Form

Combining everything:

\[ \log p(X, Z \mid \theta) = \sum_{i=1}^{n} \sum_{k=1}^{K} \mathbb{1}(z_i = k) \left[ \log \pi_k + \log \mathcal{N}(x_i \mid \mu_k, \Sigma_k) \right] \]

Why Use This Form?

As mentioned earlier, this form is much easier to work with because:

The log operates on individual terms (not on a sum)
In the E-step, we replace the indicator \(\mathbb{1}(z_i = k)\) with its expected value (the responsibility \(\gamma_{ik}\))
This makes the optimization tractable with closed-form updates!

Footnotes

Courtesy of Claude.ai↩︎

Starting Point: The Generative Model1