The Greedy Algorithm for Submodular Maximization
Series parts
On this page
In Part 1, I defined submodular functions, showed that they capture the pattern of diminishing returns across sensor placement, document summarization, influence maximization, and many other domains, and formalized the constrained maximization problem: given a monotone submodular function and a budget , find subject to . I also showed that this problem is NP-hard: enumerating all candidate subsets is intractable for any realistic problem size.
The question becomes: how do we find a good solution efficiently, given that finding the exact optimum is out of reach? The answer lies in approximation algorithms.
Approximation Algorithms
An -approximation algorithm for a maximization problem is an algorithm that runs in polynomial time and, for every instance of the problem, returns a solution whose value is at least , where is the value of the optimal solution. The factor is called the approximation ratio (or performance guarantee).
For instance, a -approximation algorithm for a maximization problem always produces a solution worth at least half of the true optimum. Always. On every instance. This is a worst-case guarantee, not an average-case one.
Three reasons to care about approximation algorithms:
- Polynomial-time solutions to NP-hard problems. We cannot solve these problems exactly in polynomial time (unless P NP), but we can get provably close.
- A rigorous metric for comparing heuristics. Instead of “this heuristic works well on my benchmark,” you get “this algorithm is within 63.2% of optimal on every possible input.” The guarantee does not depend on the dataset.
- Implicit bounds on the optimum. When an -approximation algorithm returns a solution of value , it also tells you that . This is useful when computing the exact optimum is too expensive; you get a bound for free.
For minimization problems, the convention flips: , and the algorithm’s output is at most . A -approximation for a minimization problem never exceeds twice the optimal cost.
The Greedy Algorithm
The greedy algorithm for monotone submodular maximization under a cardinality constraint is exactly the algorithm you would write as a first attempt. Start with an empty set. At each step, scan all remaining elements, pick the one with the largest marginal gain, and add it to the solution. Repeat times.
That is the entire algorithm. No relaxation step, no rounding, no linear program. Just a loop with a max operation.
Runtime. At each of the iterations, the algorithm evaluates for every element . Each marginal gain computation requires one call to the value oracle for . The total cost is oracle queries.
A Worked Example: Sensor Coverage
Consider 6 candidate sensor locations on a grid, labeled , and a budget of . Each sensor covers a circular region, and the submodular function measures the total area covered by the union of all sensors in .
Six candidate sensor locations on a grid. You have a budget of k = 3 sensors. Each sensor covers a circular region. Step through the greedy algorithm to see which sensors it picks and why — or toggle "Try it yourself" to make your own choices and compare.
| Sensor | New coverage | Marginal gain |
|---|---|---|
3Greedy | 15.2% | |
4 | 15.2% | |
6 | 15.2% | |
1 | 13.7% | |
5 | 13.4% | |
2 | 11.8% |
Step 1. The algorithm computes the marginal gain for each candidate , since the current set is empty. Suppose sensor 3 covers the largest area individually. We set .
Step 2. The algorithm recomputes the marginal gain of each remaining element with respect to the current solution . Sensor 5, which covers a region that barely overlaps with sensor 3, has the highest additional area. We set .
Step 3. Same logic. Sensor 1 adds the most new area given that sensors 3 and 5 are already placed. We set .
The crucial observation: at each step, the marginal gain of the selected sensor decreases. The first sensor might cover 120 on its own; the second adds 95 of new area; the third adds only 60 . This is submodularity at work: the same sensor would have contributed more if placed earlier, when less ground was already covered.
For a ground set this small, you can enumerate all possible subsets and verify that the greedy solution is close to (or exactly equal to) the true optimum. On realistic instances with thousands of candidate locations, enumeration is impossible, but the greedy solution is still guaranteed to be within a precise factor of the optimum.
The Approximation Guarantee
The following result, due to Nemhauser, Wolsey, and Fisher (1978)1, is arguably the single most celebrated theorem in submodular optimization.
Theorem. Let be a monotone submodular function, and let be a positive integer. The Greedy algorithm returns a set satisfying:
where is the optimal solution and is Euler’s number.
Since , the greedy algorithm always returns a solution worth at least of the true optimum. The guarantee holds for every monotone submodular function and every cardinality constraint . No randomization, no special structure, no tuning.
Two additional facts make this result striking:
- The bound is tight: there exist instances where the greedy algorithm achieves exactly a fraction of the optimum and no better.
- The bound is optimal: Feige (1998)2 proved that no polynomial-time algorithm can achieve an approximation ratio better than for this problem, unless P NP.
Greedy is not just “a decent heuristic.” It is the best any efficient algorithm can do.
Pick a submodular objective below, then run the race. Greedy picks the element with the highest marginal gain at each step. Random picks blindly. Both select k elements from a ground set of n. How close does each strategy get to the optimum?
Hire a team that covers the most distinct skills. Each candidate knows 2–5 skills; overlapping skills don't count twice.
Why Does It Work? A Proof Sketch
The proof has three key steps. I will present them in a way that emphasizes the geometric intuition rather than the full formalism.
Step 1: Each Greedy Step Closes a Fraction of the Gap
Let be the greedy solution after steps, and let be the optimal solution with . Define the gap at step as:
The gap measures how far the current greedy solution is from the optimum. At step , the greedy algorithm picks the element with the largest marginal gain .
Here is the key argument. The optimal solution contains at most elements. By the monotonicity of , we know that , so the total marginal gain of adding all elements of to is at least . By submodularity, the marginal gains of individual elements of (with respect to ) sum to at least this gap3:
Since , at least one element in has marginal gain at least . The greedy algorithm picks the best element overall, so:
In words: each greedy step closes at least a fraction of the remaining gap.
Step 2: The Gap Shrinks Geometrically
The inequality above means that after step :
Applying this recursively from the initial gap . (Assume , which is standard for these problems.)
Step 3: Bounding the Geometric Compound
The expression is a well-known sequence from calculus. It is monotonically increasing and converges to as 4.
The sequence rises steeply for small k and levels off well before k = 10. The dashed line marks the limit 1/e ≈ 0.368. Hover a point to see its value.
Since for all , we get:
Rearranging:
That is the entire proof structure. Each step closes at least a fraction of the gap; the gap compounds geometrically; and the geometric compound is bounded above by . The result is clean because submodularity gives us the per-step bound, and elementary calculus handles the rest.
Each step closes at least a 1/k fraction of the remaining gap. The teal bars show f(Sj) rising; the faded region above is the gap δj. Step through to watch the gap compound geometrically.
Lazy Greedy: Exploiting Diminishing Returns for Speed
The standard greedy algorithm recomputes the marginal gain for every remaining element at every step . This is wasteful: submodularity guarantees that marginal gains can only decrease over time. If an element had a low marginal gain at step , it will have an even lower gain at step . Why recompute it?
Minoux (1978)5 formalized this observation into Lazy Greedy. The idea is to maintain a max-heap (priority queue) of upper bounds on marginal gains:
At each step, Lazy Greedy pops the element with the highest upper bound from the heap and recomputes its actual marginal gain. If the recomputed gain is still the largest (i.e., the next-highest upper bound in the heap), then by submodularity must have the largest true marginal gain among all remaining elements. The algorithm selects and moves on. Otherwise, it pushes back into the heap with its updated bound and tries the next candidate.
The worst-case runtime is still ; you might end up recomputing every element at every step. In practice, however, Lazy Greedy often needs only a handful of recomputations per step, because elements whose gains were already low at the previous step are even lower now and stay deep in the heap. The speedup varies depending on the problem instance, but factors of 5x to 100x over standard greedy are common. The approximation guarantee remains , unchanged.
If you have worked with lazy evaluation in functional programming or deferred computation in systems engineering, the pattern is familiar: avoid doing work until you know it matters.
When Greedy Fails
The guarantee depends critically on monotonicity. When is non-monotone (i.e., adding elements can decrease ), the greedy algorithm can perform arbitrarily badly.
The thesis chapter I drew this material from contains a concrete counterexample. Consider a ground set and define the non-monotone submodular function:
Let . At the first step, the greedy algorithm adds element , because its marginal gain is , which is the largest among all singletons (every other element has a gain of ). Once is in the solution, it stays there; the algorithm only adds elements, never removes them. The consequence: no matter what the algorithm picks in the remaining steps, the function value is stuck at , because the presence of forces for any containing .
The optimal solution is to take any elements other than , yielding . The greedy solution achieves . The approximation ratio is , which goes to as grows. The algorithm gets trapped by a locally attractive choice that is globally catastrophic.
The lesson: for non-monotone submodular functions, standard greedy is the wrong tool. Randomized variants like the Random Greedy algorithm (Buchbinder et al. 2014)6 recover a constant approximation ratio of by introducing randomness into the selection step, at the cost of a weaker guarantee.
The type of constraint also matters. With a cardinality constraint, the greedy algorithm achieves . With a single matroid constraint (a generalization of cardinality constraints), the same ratio holds, though the analysis is more involved7. With knapsack constraints, where each element has a cost and the total budget is limited, different algorithms are needed, and the landscape of results becomes considerably more complex.
Looking Ahead
The greedy algorithm requires function evaluations. Lazy Greedy speeds things up in practice but offers the same worst-case bound. For moderate problem sizes, this is fine. For massive datasets ( in the millions, in the thousands), even becomes prohibitive.
In Part 3, I will introduce the Stochastic Greedy algorithm (Mirzasoleiman et al. 2015)8, which replaces the full scan over at each step with a random subsample of size . The total cost drops from to ; the factor of becomes a logarithmic term. The approximation guarantee degrades only slightly, to in expectation, where is a tunable parameter.
Key Takeaways
- The greedy algorithm for monotone submodular maximization under a cardinality constraint is simple: at each step, pick the element with the largest marginal gain. Repeat times.
- It achieves an approximation ratio of , provably the best any efficient algorithm can achieve for this problem.
- The proof relies on a geometric argument: each greedy step closes at least a fraction of the remaining gap to the optimum, and after steps the residual gap is at most .
- Lazy Greedy exploits the diminishing returns property to skip redundant computations, often yielding 5x-100x speedups in practice with the same guarantee.
- The greedy algorithm requires function evaluations. For massive datasets, even this linear dependence on is too expensive, motivating the Stochastic Greedy algorithm in Part 3.
- Monotonicity is critical: without it, greedy can achieve an approximation ratio as bad as , which is effectively useless for large .
Further Reading
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An Analysis of Approximations for Maximizing Submodular Set Functions” (1978). The original paper proving the bound.
- M. Minoux, “Accelerated Greedy Algorithms for Maximizing Submodular Set Functions” (1978). The Lazy Greedy optimization.
- U. Feige, “A Threshold of for Approximating Set Cover” (1998). The hardness result establishing that is optimal.
- A. Krause and D. Golovin, “Submodular Function Maximization” (2014). A comprehensive survey covering greedy, lazy greedy, and continuous relaxation methods.
- D. P. Williamson and D. B. Shmoys, “The Design of Approximation Algorithms” (2011). A broader textbook on approximation algorithms.