The Case for Hidden Variables
Series parts
On this page
Suppose you are looking at a pile of documents.
Some are about sports. Some are about finance. Some are about politics. You never see a field called topic. You only see words.
And yet it is hard to shake the feeling that the documents make more sense if some hidden variable is sitting behind them; something like “this article is mostly about sports,” which then makes words like goal, league, and season more likely.
That instinct is the doorway to latent variable models.
The main idea
A latent variable model says: the visible data may be easier to explain if we assume there is some unobserved structure underneath it.
The basic setup is a probability distribution over observed variables and unobserved variables :
The variables are observed in the dataset. The variables are never observed directly. The definition is simple. It does not tell you whether is a topic, a cluster, a hidden state, or something continuous and geometric. It just says: there is what you see, and there is what the model posits behind what you see.
A mixture model
Suppose each observation comes from one of hidden components. Then the joint factorizes as
and the marginal over the observed variable is
That formula means something very ordinary: first choose a hidden cause, then generate the visible effect.
Why bother introducing something you cannot observe?
Because sometimes one blunt model of the visible data is worse than several sharper models indexed by a hidden cause.
Instead of trying to model all documents with one giant word distribution, you might assume there is a hidden topic variable that changes the likely vocabulary.
Latent variables let the model say “these observations may come from different hidden situations.”
That single sentence already covers mixtures, topic models, hidden Markov models, factor models, and much of generative modeling.
A tiny discrete toy
Imagine a hidden topic z that can be either “sports” or “finance.” Once the topic is chosen, the observed word distribution changes.
function sampleCategorical<T>( items: T[], probs: number[]): T { const u = Math.random(); let acc = 0;
for (let i = 0; i < items.length; i++) { acc += probs[i]; if (u <= acc) return items[i]; }
return items[items.length - 1];}
const topics = ["sports", "finance"] as const;const topicProbs = [0.6, 0.4];
const wordsByTopic = { sports: { words: ["goal", "coach", "league", "market"], probs: [0.35, 0.25, 0.30, 0.10], }, finance: { words: ["goal", "coach", "league", "market"], probs: [0.05, 0.05, 0.10, 0.80], },};
const z = sampleCategorical([...topics], topicProbs);const x = sampleCategorical( wordsByTopic[z].words, wordsByTopic[z].probs);
console.log({ z, x });Not a serious topic model. Just the latent-variable pattern in its smallest readable form: first draw z, then draw x | z.
What makes learning harder
Latent variables can make the model richer, but they also make learning harder. The moment the hidden variables are unobserved, you inherit new questions:
- What is the latent value for this example?
- How do I fit parameters when I never see
z? - How do I compute if it is expensive or intractable?
Those questions are not bugs. They are the normal price of latent modeling. You gain a richer story about the data, but you also inherit an inference problem.
Why this belongs after softmax
Softmax gave us a clean distribution over visible labels. Latent variable models introduce a different idea: a model can generate visible data by first sampling hidden causes.
The role of sampling changes again. In a classifier, sampling might choose a class. In a latent-variable model, sampling may choose an unseen state before any visible output is produced.
That moves the series from “distributions over answers” to “distributions over explanations.”
A hidden variable is already a powerful idea. But a lot of modern generative modeling wants something stronger: not just hidden variables, but a latent space you can actually sample from in a smooth, meaningful way. A plain autoencoder can learn a hidden code, but random points in that code space often decode into nonsense. Something extra is needed. That is the next post.