Skip to main content

Why Randomness Helps Algorithms

· 6 min read
Series parts
  1. Part 8Why Randomness Helps Algorithms
On this page

A deterministic algorithm can be perfectly logical and still be bad at looking around.

Hard optimization problems make this painfully clear.

If the landscape is simple, greed works beautifully. Move downhill. Keep the best improvement. Stop when nothing nearby looks better. Done.

But real search spaces are often rough. They have plateaus, traps, ridges, and local optima. If you only ever take the most obvious next step, you can get stuck in a place that is better than your neighbors but nowhere near globally good.

That is one way randomness helps. But it is not the only way.

Randomness does not only help algorithms “search better.” It can broaden a search, turn repeated trials into a probabilistic guarantee, or replace an expensive exact computation with a cheap noisy estimate.

A simple hill-climber does something like this: start from a candidate solution, evaluate nearby moves, take the best improvement, repeat until no local move improves the score.

If the objective landscape is smooth and kind, this works well. If there are many local optima, the algorithm stops at the first hill it reaches.

Randomness helps in three common ways: by randomizing the starting point, by randomizing which moves get proposed, or by occasionally accepting non-greedy moves so the search can escape its current neighborhood.

Those are three different uses of chance, but they all answer the same question:

How do I keep searching without becoming blind?

Suppose a random restart has probability pp of eventually finding a “good enough” basin. After rr independent restarts, the chance of succeeding at least once is

P(success at least once)=1(1p)r.P(\text{success at least once}) = 1 - (1-p)^r.

This is just the complement rule, but it captures something important: repeated randomized attempts compound surprisingly well.

function objective(x: number): number {
return Math.sin(4 * x) + Math.cos(9 * x) - 0.05 * x * x;
}
function hillClimb(
start: number,
step = 0.05,
iters = 200
): number {
let x = start;
for (let i = 0; i < iters; i++) {
const left = x - step;
const right = x + step;
const fx = objective(x);
const fl = objective(left);
const fr = objective(right);
if (fl > fx && fl >= fr) x = left;
else if (fr > fx) x = right;
else break;
}
return x;
}
function randomRestartSearch(
low: number,
high: number,
restarts: number
): { x: number; score: number } {
let bestX = low;
let bestScore = -Infinity;
for (let i = 0; i < restarts; i++) {
const start = low + Math.random() * (high - low);
const x = hillClimb(start);
const score = objective(x);
if (score > bestScore) {
bestScore = score;
bestX = x;
}
}
return { x: bestX, score: bestScore };
}

Run the greedy version from one start: it settles quickly. Run the restarted version: multiple starting points let it cover much more of the landscape.

Landscape Explorer
0.21.11.92.83.70123456xglobal max
Strategy
2x
Best path
Global max

Try this: Hit "Run" in single-start mode first. The climber starts at x=0.5 and immediately gets trapped in a local peak. Then switch to "Random restarts" and run again to see how multiple starting points explore the whole landscape.

Watch a single-start hill-climber get stuck on a mediocre peak, then toggle to random restarts and see multiple climbers explore the full landscape. The one that happens to start near the global maximum finds a much better solution. Different starting points reveal regions a single deterministic run never sees.

Randomness can be the algorithm: Karger’s Min Cut

Karger’s Min Cut is one of the nicest examples of a randomized algorithm that is both simple and exact in spirit.

The core routine is almost rude in its simplicity: pick a random edge, contract it, keep going until only two supernodes remain, return the cut between them.

The beauty is not that one run is certain to succeed. It is that one run has a clean nontrivial success probability, and repeated runs amplify it.

The headline lower bound for one run is

P(success in one run)2n(n1).P(\text{success in one run}) \ge \frac{2}{n(n-1)}.

That looks small. But it is not hopelessly small. Running the algorithm independently many times and keeping the best cut makes the failure probability drop multiplicatively.

This is a different use of randomness from random restarts. Karger uses randomness as the core constructive mechanism, then uses repetition to amplify success. It shows that randomized algorithms are not only heuristics; sometimes they are exact algorithms with probabilistic guarantees.

Randomness can make a step cheap: SGD

There is another way randomness sneaks into algorithms: not by changing the search space, but by changing the cost of a step.

Suppose your loss is an average over many training examples:

L(w)=1ni=1ni(w).L(w) = \frac{1}{n}\sum_{i=1}^n \ell_i(w).

A full gradient step uses L(w)=1ni=1ni(w)\nabla L(w) = \frac{1}{n}\sum_{i=1}^n \nabla \ell_i(w). Exact, but it costs a pass over the whole dataset.

SGD replaces that expensive exact step with a stochastic one. At step tt, pick one example (or a small minibatch) and update using only its gradient:

wt+1=wtηtit(wt).w_{t+1} = w_t - \eta_t \nabla \ell_{i_t}(w_t).

When the dataset is huge, a noisy cheap step can be much more useful than an exact expensive one. This is the same family of idea as Monte Carlo estimation: full computation is expensive, a random subset gives a noisier answer, repeated updates make the whole process work.

SGD belongs in this series because it is not randomness for drama. It is randomness as a cheap proxy for an expensive exact computation.

The common thread

Three faces of randomness:

  • Search: random restarts and randomized proposals keep algorithms from getting stuck.
  • Randomized exact algorithms: Karger uses chance plus repetition to solve a hard graph problem.
  • Stochastic approximation: SGD swaps an expensive exact step for a noisy cheap one.

Those are not the same story, but they rhyme. In each case, randomness changes the shape of the computation so that the algorithm can do something a purely deterministic or fully exact version would find too rigid, too expensive, or too brittle.


A TSP metaheuristic is a concrete place where several of these roles meet: candidate moves are sampled, neighborhoods are shaped, and exploration pressure is managed explicitly. That is the next post.