Skip to main content

When Uniform Isn't Enough

· 4 min read
Series parts
  1. Part 2When Uniform Isn't Enough
On this page

The first post in this series worked because the options were symmetric. This one starts when that symmetry breaks.

Imagine three cache nodes with roughly 1, 2, and 7 units of spare capacity. Sending requests to them uniformly sounds fair, but it is wasteful. The smallest node should not get the same odds as the largest one. At the same time, always picking the current winner is too deterministic; it gives noisy estimates more authority than they deserve and turns a soft preference into a hard rule. You still want randomness, but it should favor stronger options.

That is weighted sampling.

From equal odds to relative odds

Suppose those three nodes have weights 1, 2, and 7.

Those numbers are not probabilities yet. They are just weights: relative strengths. The second node should be twice as likely as the first. The third should be seven times as likely as the first.

To turn weights into probabilities, normalize:

P(X=i)=wijwj.P(X = i) = \frac{w_i}{\sum_j w_j}.

So the total weight is 1+2+7=101 + 2 + 7 = 10, and the actual probabilities are 1/101/10, 2/102/10, and 7/107/10.

Uniform sampling says every option gets the same slice. Weighted sampling says every option gets a slice proportional to the evidence you have.

Why not just pick the max?

The obvious alternative is greedy choice: always pick the option with the highest weight. Sometimes that is correct. But many systems do not have weights that deserve that much trust. Capacity estimates go stale, heuristics are noisy, model scores are uncertain, and early winners are often just early.

Weighted sampling keeps the bias without turning it into dogma. Better options win more often. They do not win forever.

The interval picture

The easiest way to visualize weighted sampling is as a line segment.

Take the total weight W=iwiW = \sum_i w_i. Imagine the interval [0,W)[0, W) split into consecutive chunks whose lengths are the weights. For weights [1, 2, 7]:

  • the first node owns [0, 1)
  • the second node owns [1, 3)
  • the third node owns [3, 10)

Now draw one number uniformly at random from [0,10)[0, 10). Wherever it lands, route the request to the owner of that interval.

Weighted sampling is uniform sampling on a line whose intervals have different widths. Bigger weight means bigger target.

Weighted Sampling Playground
Weights
A1
B2
C3
D7
Probability intervals
A
B15.4%
C23.1%
D53.8%
7.7%
Draw
101001k10k
frequency
A
B
C
D

Drag the weight sliders and watch the interval bar stretch and compress. Then draw a few thousand samples to see the empirical frequencies converge on the theoretical probabilities. The formula P(X=i)=wi/jwjP(X = i) = w_i / \sum_j w_j stops looking abstract. It turns into an object you can poke.

A minimal implementation

Here is the classic cumulative-sum version in TypeScript:

sample-weighted.ts
export type NonEmptyArray<T> = readonly [T, ...T[]];

export function sampleWeighted<T>(
  xs: NonEmptyArray<T>,
  weights: NonEmptyArray<number>,
): T {
  if (xs.length !== weights.length) {
    throw new Error("Items and weights must have the same length");
  }

  let total = 0;

  for (const w of weights) {
    if (w < 0 || !Number.isFinite(w)) {
      throw new Error("Weights must be finite and non-negative");
    }
    total += w;
  }
  if (total === 0) {
    throw new Error("At least one weight must be positive");
  }

  let threshold = Math.random() * total;

  for (let i = 0; i < xs.length; i++) {
    threshold -= weights[i];
    if (threshold < 0) {
      return xs[i];
    }
  }

  return xs[xs.length - 1];
}

The algorithm mirrors the interval picture exactly: compute the total weight, choose a random threshold in [0,W)[0, W), then walk through the weights until you cross it.

What weights are, and are not

Weights do not need to sum to 1. The lists [1, 2, 7], [10, 20, 70], and [0.1, 0.2, 0.7] all define the same distribution. Only the ratios matter.

Probabilities are the cleaned-up public interface. Weights are the messy internal state. Real systems emit spare capacity, urgency scores, heuristic values, logits, or confidence numbers. Weighted sampling is useful because it lets you stay in that mess a little longer, then normalize only when an actual draw has to happen.

This matters a lot later in the series. Search procedures, evolutionary methods, and language models all tend to produce scores first and probabilities second.

Where weighted choice shows up

Once you start seeing weighted sampling as structured chance, it shows up everywhere. A load balancer can favor healthier machines without dogpiling one node. A search heuristic can spend more time around promising moves without going blind to alternatives. A genetic algorithm can amplify fitter candidates without collapsing diversity too early. A language model can strongly prefer likely tokens without turning every continuation into the same sentence.

Same mechanism, different stakes. The randomness is still real. It just reflects the structure of the problem.


Weighted sampling still hides one clean assumption: after each draw, the item goes back into the pool. That is fine for request routing or token decoding. It is wrong for raffles, reviewer assignment, or any setting where duplicates would feel absurd. Once an item leaves, the distribution itself changes. That is the next post.