When Randomness Should Take Sides

Imagine three cache nodes with roughly 1, 2, and 7 units of spare capacity. Sending requests to them uniformly sounds fair, but it is wasteful. The smallest node should not get the same odds as the largest one. At the same time, always picking the current winner is too deterministic; it gives noisy estimates more authority than they deserve and turns a soft preference into a hard rule. You still want randomness, but it should favor stronger options.

Give the nodes 10%, 20%, and 70% of the requests in expectation. Chance remains, but it now takes sides. That is weighted sampling.

From equal odds to relative odds

Suppose those three nodes have weights 1, 2, and 7.

Those numbers are not probabilities yet. They are just weights: relative strengths. The second node should be twice as likely as the first. The third should be seven times as likely as the first.

To turn weights into probabilities, normalize:

P(X = i) = \frac{w_i}{\sum_j w_j}.

So the total weight is $1 + 2 + 7 = 10$ , and the actual probabilities are $1/10$ , $2/10$ , and $7/10$ .

Uniform sampling says every option gets the same slice. Weighted sampling says every option gets a slice proportional to the evidence you have.

Why not just pick the max?

The obvious alternative is greedy choice: always pick the option with the highest weight. Sometimes that is correct. But many systems do not have weights that deserve that much trust. Capacity estimates go stale, heuristics are noisy, model scores are uncertain, and early winners are often just early.

Weighted sampling keeps the preference without turning it into a verdict. Better options win more often. They do not win every draw.

The cache example is deliberately small. Real load balancers also care about in-flight requests, queue depth, stale health data, and failure domains. A weighted draw honors the weights it receives; it cannot repair a bad measurement model.

The interval picture

I find the line-segment view easier to trust than the formula.

Take the total weight $W = \sum_i w_i$ . Imagine the interval $[0, W)$ split into consecutive chunks whose lengths are the weights. For weights [1, 2, 7]:

the first node owns [0, 1)
the second node owns [1, 3)
the third node owns [3, 10)

Now draw one number uniformly at random from $[0, 10)$ . Wherever it lands, route the request to the owner of that interval.

Weighted sampling is uniform sampling on a line whose intervals have different widths. Bigger weight means bigger target.

Weighted Sampling Playground

Weights

Probability intervals

B15.4%

C23.1%

D53.8%

7.7%

Draw

101001k10k

frequency

Drag the weight sliders and watch the interval bar stretch and compress. Then draw a few thousand samples to see the empirical frequencies converge on the theoretical probabilities. The formula $P(X = i) = w_i / \sum_j w_j$ stops looking abstract. It turns into an object you can poke.

A minimal implementation

Here is the classic cumulative-sum version in TypeScript:

sample-weighted.ts

export type NonEmptyArray<T> = readonly [T, ...T[]];

export function sampleWeighted<T>(
  xs: NonEmptyArray<T>,
  weights: NonEmptyArray<number>,
): T {
  if (xs.length !== weights.length) {
    throw new Error("Items and weights must have the same length");
  }

  let total = 0;

  for (const w of weights) {
    if (w < 0 || !Number.isFinite(w)) {
      throw new Error("Weights must be finite and non-negative");
    }
    total += w;
  }
  if (total === 0) {
    throw new Error("At least one weight must be positive");
  }

  let threshold = Math.random() * total;

  for (let i = 0; i < xs.length; i++) {
    threshold -= weights[i];
    if (threshold < 0) {
      return xs[i];
    }
  }

  return xs[xs.length - 1];
}

The algorithm mirrors the interval picture exactly: compute the total weight, choose a random threshold in $[0, W)$ , then walk through the weights until you cross it.

This implementation costs $O(n)$ per draw. If the weights stay fixed across many draws, you can preprocess cumulative weights and binary-search them in $O(\log n)$ , or build an alias table for $O(1)$ draws. Those faster structures earn their complexity only when the distribution is reused; rapidly changing weights make the plain loop attractive again.

What weights are, and are not

Weights do not need to sum to 1. The lists [1, 2, 7], [10, 20, 70], and [0.1, 0.2, 0.7] all define the same distribution. Only the ratios matter.

Probabilities are the cleaned-up public interface. Weights are the messy internal state. Real systems emit spare capacity, urgency scores, heuristic values, or logits. Weighted sampling lets those quantities remain relative until a draw needs actual probabilities.

Where weighted choice shows up

A search heuristic can spend more time around promising moves without going blind to alternatives. A genetic algorithm can favor fitter candidates without making the current winner immortal. A language model can prefer likely tokens without repeating the argmax forever. The mechanism is small; the policy encoded by the weights is not.

This sampler quietly puts every item back after a draw. Request routing can tolerate that. A raffle cannot. Replacement is part of the probability contract, even when the API forgets to say so.