When Luck Is Part of the Proof

A randomized algorithm can fail on Tuesday and still come with a theorem.

Karger’s min-cut algorithm contracts randomly chosen edges until only two supernodes remain. One run may destroy the minimum cut. A lucky run returns an exact one, and the probability of luck is bounded well enough that repetition becomes part of the algorithm.

That is more interesting than the usual claim that randomness helps a search “explore.” Chance can play at least three distinct roles: widen a heuristic search, construct an exact answer with measurable failure probability, or replace an expensive exact step with an unbiased noisy one.

Randomness can widen a search

A simple hill-climber does something like this: start from a candidate solution, evaluate nearby moves, take the best improvement, repeat until no local move improves the score.

If the objective landscape is smooth and kind, this works well. If there are many local optima, the algorithm stops at the first hill it reaches.

Randomness helps in three common ways: by randomizing the starting point, by randomizing which moves get proposed, or by occasionally accepting non-greedy moves so the search can escape its current neighborhood.

The goal is not motion for its own sake. It is to avoid making the starting point an invisible permanent decision.

Suppose a random restart has probability $p$ of eventually finding a “good enough” basin. After $r$ independent restarts, the chance of succeeding at least once is

P(\text{success at least once}) = 1 - (1-p)^r.

This is just the complement rule, but it captures something important: repeated randomized attempts compound surprisingly well.

If one restart succeeds 10% of the time, 44 independent restarts succeed at least once with probability above 99%. Sixty-six push it past 99.9%. A weak trial can become a strong procedure when failure is independent and cheap enough to repeat.

random-restart-search.ts

function objective(x: number): number {
  return Math.sin(4 * x) + Math.cos(9 * x) - 0.05 * x * x;
}

function hillClimb(
  start: number,
  step = 0.05,
  iters = 200
): number {
  let x = start;

  for (let i = 0; i < iters; i++) {
    const left = x - step;
    const right = x + step;
    const fx = objective(x);
    const fl = objective(left);
    const fr = objective(right);

    if (fl > fx && fl >= fr) x = left;
    else if (fr > fx) x = right;
    else break;
  }

  return x;
}

function randomRestartSearch(
  low: number,
  high: number,
  restarts: number
): { x: number; score: number } {
  let bestX = low;
  let bestScore = -Infinity;

  for (let i = 0; i < restarts; i++) {
    const start = low + Math.random() * (high - low);
    const x = hillClimb(start);
    const score = objective(x);

    if (score > bestScore) {
      bestScore = score;
      bestX = x;
    }
  }

  return { x: bestX, score: bestScore };
}

Run the greedy version from one start: it settles quickly. Run the restarted version: multiple starting points let it cover much more of the landscape.

Landscape Explorer

Strategy

Speed2x

Best path

Global max

Try this: Hit "Run" in single-start mode first. The climber starts at x=0.5 and immediately gets trapped in a local peak. Then switch to "Random restarts" and run again to see how multiple starting points explore the whole landscape.

Watch a single-start hill-climber get stuck on a mediocre peak, then toggle to random restarts and see multiple climbers explore the full landscape. The one that happens to start near the global maximum finds a much better solution. Different starting points reveal regions a single deterministic run never sees.

Randomness can be the algorithm: Karger’s Min Cut

Karger’s Min Cut is one of the nicest examples of a randomized algorithm whose lucky runs return an exact answer.¹

For an undirected multigraph, the core routine is almost rude in its simplicity: pick an edge uniformly at random, contract it, discard self-loops, keep parallel edges, and stop at two supernodes.

The lucky event is easy to state: during contraction, never contract an edge that belongs to a minimum cut. If that event happens, the cut left at the end is a true minimum cut. The beauty is not that one run is certain to succeed. It is that one run has a clean nontrivial success probability, and repeated runs amplify it.

The headline lower bound for one run is

P(\text{success in one run}) \ge \frac{2}{n(n-1)}.

That looks small because it is. The important part is that it is explicit. If one run succeeds with probability at least $p$ , then $r$ independent runs all fail with probability at most $(1-p)^r$ . “Try again” has become a proof obligation, not folk wisdom.

This is a different use of randomness from random restarts. Karger uses randomness as the core constructive mechanism, then uses repetition to amplify success. It shows that randomized algorithms are not only heuristics; sometimes they are exact computations wrapped in a probabilistic success story.

Randomness can make a step cheap: SGD

There is another way randomness sneaks into algorithms: not by changing the search space, but by changing the cost of a step.

Suppose your loss is an average over many training examples:

L(w) = \frac{1}{n}\sum_{i=1}^n \ell_i(w).

A full gradient step uses $\nabla L(w) = \frac{1}{n}\sum_{i=1}^n \nabla \ell_i(w)$ . Exact, but it costs a pass over the whole dataset.

SGD replaces that expensive exact step with a stochastic one. At step $t$ , pick one example (or a small minibatch) and update using only its gradient:

w_{t+1} = w_t - \eta_t \nabla \ell_{i_t}(w_t).

When $i_t$ is sampled uniformly, the single-example gradient is an unbiased estimator of the full gradient. That does not guarantee easy convergence on every objective, but it explains why the cheap step points in the right direction on average.

SGD belongs in this series because it is not randomness for drama. It is randomness as a cheap proxy for an expensive exact computation.

Randomness does not remove failure. It can turn a systematic failure mode into one that can be measured, repeated, and bounded. That is a better bargain than “the heuristic usually works,” but only when the implementation matches the probability argument. My old TSP solver did not.