Skip to main content

The Case for Hidden Variables

· 4 min read
Series parts
  1. Part 11The Case for Hidden Variables
On this page

Suppose you are looking at a pile of documents.

Some are about sports. Some are about finance. Some are about politics. You never see a field called topic. You only see words.

And yet it is hard to shake the feeling that the documents make more sense if some hidden variable is sitting behind them; something like “this article is mostly about sports,” which then makes words like goal, league, and season more likely.

That instinct is the doorway to latent variable models.

The main idea

A latent variable model says: the visible data may be easier to explain if we assume there is some unobserved structure underneath it.

The basic setup is a probability distribution over observed variables xx and unobserved variables zz:

p(x,z;θ).p(x, z; \theta).

The xx variables are observed in the dataset. The zz variables are never observed directly. The definition is simple. It does not tell you whether zz is a topic, a cluster, a hidden state, or something continuous and geometric. It just says: there is what you see, and there is what the model posits behind what you see.

A mixture model

Suppose each observation comes from one of KK hidden components. Then the joint factorizes as

p(x,z)=p(xz)p(z),p(x, z) = p(x \mid z)\, p(z),

and the marginal over the observed variable is

p(x)=k=1Kp(xz=k)p(z=k).p(x) = \sum_{k=1}^{K} p(x \mid z = k)\, p(z = k).

That formula means something very ordinary: first choose a hidden cause, then generate the visible effect.

Why bother introducing something you cannot observe?

Because sometimes one blunt model of the visible data is worse than several sharper models indexed by a hidden cause.

Instead of trying to model all documents with one giant word distribution, you might assume there is a hidden topic variable that changes the likely vocabulary.

Latent variables let the model say “these observations may come from different hidden situations.”

That single sentence already covers mixtures, topic models, hidden Markov models, factor models, and much of generative modeling.

A tiny discrete toy

Imagine a hidden topic z that can be either “sports” or “finance.” Once the topic is chosen, the observed word distribution changes.

function sampleCategorical<T>(
items: T[],
probs: number[]
): T {
const u = Math.random();
let acc = 0;
for (let i = 0; i < items.length; i++) {
acc += probs[i];
if (u <= acc) return items[i];
}
return items[items.length - 1];
}
const topics = ["sports", "finance"] as const;
const topicProbs = [0.6, 0.4];
const wordsByTopic = {
sports: {
words: ["goal", "coach", "league", "market"],
probs: [0.35, 0.25, 0.30, 0.10],
},
finance: {
words: ["goal", "coach", "league", "market"],
probs: [0.05, 0.05, 0.10, 0.80],
},
};
const z = sampleCategorical([...topics], topicProbs);
const x = sampleCategorical(
wordsByTopic[z].words,
wordsByTopic[z].probs
);
console.log({ z, x });

Not a serious topic model. Just the latent-variable pattern in its smallest readable form: first draw z, then draw x | z.

What makes learning harder

Latent variables can make the model richer, but they also make learning harder. The moment the hidden variables are unobserved, you inherit new questions:

  • What is the latent value for this example?
  • How do I fit parameters when I never see z?
  • How do I compute p(zx)p(z \mid x) if it is expensive or intractable?

Those questions are not bugs. They are the normal price of latent modeling. You gain a richer story about the data, but you also inherit an inference problem.

Why this belongs after softmax

Softmax gave us a clean distribution over visible labels. Latent variable models introduce a different idea: a model can generate visible data by first sampling hidden causes.

The role of sampling changes again. In a classifier, sampling might choose a class. In a latent-variable model, sampling may choose an unseen state before any visible output is produced.

That moves the series from “distributions over answers” to “distributions over explanations.”


A hidden variable is already a powerful idea. But a lot of modern generative modeling wants something stronger: not just hidden variables, but a latent space you can actually sample from in a smooth, meaningful way. A plain autoencoder can learn a hidden code, but random points in that code space often decode into nonsense. Something extra is needed. That is the next post.