GMM Appendix C: Derivation of the Complete Data Likelihood
Starting Point: The Generative Model1
Recall that the GMM generates each point by:
- First: Choose cluster \(k\) with probability \(\pi_k\)
- Then: Generate point from that cluster: \(x_i \sim \mathcal{N}(\mu_k, \Sigma_k)\)
Step 1: Joint Probability with Known Assignment
If we know that point \(x_i\) was generated by cluster \(z_i\), the joint probability is:
\[ p(x_i, z_i \mid \theta) = p(z_i) \cdot p(x_i \mid z_i, \theta) = \pi_{z_i} \cdot \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \]
This is just applying the chain rule: probability of choosing cluster \(z_i\) times probability of generating \(x_i\) from that cluster.
Step 2: Complete Data Likelihood (All Points)
For all \(n\) independent data points:
\[ p(X, Z \mid \theta) = \prod_{i=1}^{n} \pi_{z_i} \cdot \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \]
Step 3: Take the Log
\[ \log p(X, Z \mid \theta) = \sum_{i=1}^{n} \left[ \log \pi_{z_i} + \log \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) \right] \]
Step 4: The Indicator Function Trick
Here’s the key insight! Since \(z_i\) takes on exactly one value from \(\{1, 2, \ldots, K\}\), we can rewrite \(\pi_{z_i}\) using an indicator function:
\[ \log \pi_{z_i} = \sum_{k=1}^{K} \mathbb{1}(z_i = k) \log \pi_k \]
Why? Because only one term in this sum is non-zero (when \(k = z_i\)), and it equals \(\log \pi_{z_i}\).
Similarly: \[ \log \mathcal{N}(x_i \mid \mu_{z_i}, \Sigma_{z_i}) = \sum_{k=1}^{K} \mathbb{1}(z_i = k) \log \mathcal{N}(x_i \mid \mu_k, \Sigma_k) \]
Step 5: Final Form
Combining everything:
\[ \log p(X, Z \mid \theta) = \sum_{i=1}^{n} \sum_{k=1}^{K} \mathbb{1}(z_i = k) \left[ \log \pi_k + \log \mathcal{N}(x_i \mid \mu_k, \Sigma_k) \right] \]
Why Use This Form?
As mentioned earlier, this form is much easier to work with because:
- The log operates on individual terms (not on a sum)
- In the E-step, we replace the indicator \(\mathbb{1}(z_i = k)\) with its expected value (the responsibility \(\gamma_{ik}\))
- This makes the optimization tractable with closed-form updates!
Footnotes
Courtesy of Claude.ai↩︎