From K-Means to GMM: Hard vs Soft Clustering
You have a pile of unlabelled data and you want to find groups in it. K-Means is the algorithm everyone reaches for first — it's fast, simple, and usually works. But it makes a bold assumption: eve...

Source: DEV Community
You have a pile of unlabelled data and you want to find groups in it. K-Means is the algorithm everyone reaches for first — it's fast, simple, and usually works. But it makes a bold assumption: every data point belongs to exactly one cluster. No uncertainty, no hedging. What happens to a point sitting right between two clusters? K-Means forces a choice. Gaussian Mixture Models (GMMs) offer an alternative — soft assignments that express how uncertain we are. By the end of this post, you'll implement K-Means from scratch, see why it's secretly a special case of the EM algorithm, and understand exactly when soft clustering beats hard clustering. Quick Win: K-Means from Scratch Let's cluster some data. Click the badge to open the interactive notebook: Here's K-Means in 30 lines — Lloyd's algorithm: import numpy as np def kmeans(X, K, max_iter=100, seed=42): """ K-Means clustering (Lloyd's algorithm). Args: X: (N, D) array of data points K: number of clusters max_iter: maximum iterations se