Walking Is the Most Expensive Warehouse Operation
Series parts
On this page
Walking accounts for roughly 55% of total order-picking time in warehouses1. Not scanning barcodes, not counting items, not packing boxes. Walking. The arithmetic is blunt: if an operator spends six hours a day fulfilling orders, more than three of those hours go to moving between shelves.
At Vivaldi srl, an Italian manufacturing and distribution company, I watched this happen daily. Operators walked redundant routes because their pick sheets listed items in whatever order the ERP produced. An order might send someone to aisle C, then across the warehouse to aisle I, then back west to aisle B for something they walked past five minutes earlier. TSP-optimized picking routes reduce walking distance by 47–83% compared to naive sequences, depending on warehouse layout and order size2. The waste was quantifiable, and it was large.
I built a system to fix it. From September 2019 to March 2022, working as the sole engineer, I designed and deployed a full-stack logistics platform that computes optimized walking routes through the physical warehouse and tells operators exactly where to go, in what order. The project spanned 319 commits across all branches, 14 TypeScript packages, and 2 Rust crates. The optimization engine went through three generations, pure TypeScript, then C++ with simulated annealing, then Rust compiled to WebAssembly, because each version hit a constraint I had to design around.
A System Built for One Warehouse
Off-the-shelf warehouse management systems from Manhattan Associates, Blue Yonder, or SAP would have cost six to seven figures annually and required months of integration. Vivaldi evaluated that route early and rejected it. The warehouse was too small to justify the licensing cost, and no vendor offered a lightweight integration path with the existing Microsoft SQL Server ERP.
What Vivaldi needed was narrower: a domain-specific optimization layer wired directly into the live ERP database, built and maintained by one engineer. The system ran on a modest in-house server with no access to external services. Deploys happened over SSH. Because I couldn’t use Sentry or Datadog, I built a small error-tracking pipeline that used Vivaldi’s own SMTP server to email me when something broke. Every piece of the system was sized to the company’s actual needs, not to what a vendor catalog would suggest.
The Constraints
Every architectural decision in this series traces back to one or more of the following constraints. I’m listing all of them here so Parts 2–4 can reference them without re-explaining.
4 GB RAM, shared server. The production server was not dedicated to this project. It hosted other Vivaldi services, and the warehouse system got whatever memory was left. This single constraint drove more design decisions than any other: the evolution from memory-heavy TypeScript to a compact ~144 KB WASM binary, streaming uploads via gRPC instead of buffering, Pino.js for structured logging with minimal garbage collection pressure, and process isolation via Unix pipes so each Node.js process managed its own heap independently.
No Docker in production. The IT team explicitly vetoed containerization. I never got a detailed reason; my best guess is that the server already ran a mix of legacy services and they didn’t want another runtime dependency to manage. The system runs directly on the host OS via systemd, which turned out to be a surprisingly clean deployment model for three coordinated services.
No CI/CD pipeline. Deploys happened over SSH: scp the build artifacts to the server, run an install script, restart the systemd services. Manual, but predictable. I knew it was a liability, but setting up a proper pipeline would have meant convincing IT to open SSH access to a CI runner, and that wasn’t a battle worth fighting for the deployment frequency I had.
No external services. The server sat on a private LAN with no outbound access to third-party SaaS. No Sentry, no Datadog, no hosted anything. Every capability the system needed (error tracking, logging, monitoring) had to be built in-house. I ended up building a small gRPC-based log pipeline that batched errors and forwarded them to Vivaldi’s SMTP server, which emailed me when something broke. That was the entire observability stack. Part 3 covers this pipeline in detail.
Single engineer. No code review, no second opinion on schema decisions, no one to hand off to. I could move fast and make aggressive architectural bets, like rewriting the solver twice, but every abstraction had to be simple enough to debug six months later with no context. This shaped the code as much as any technical constraint.
Node.js v16 (forced downgrade). The server’s Debian version couldn’t run Node.js 18. I discovered this during the first deployment attempt, which also meant giving up some language features I’d been using in development. A small thing, but representative of the gap between my development environment and the production reality.
Existing ERP database. The system integrated directly with Vivaldi’s live Microsoft SQL Server ERP. I didn’t have the luxury of designing my own schema; I read from and wrote to tables that the ERP’s own workflows also used. Stored procedures served as a contract layer, isolating my queries from potential schema changes on the ERP side.
Private LAN only. The system was accessible only within the warehouse’s local network, not from the internet. This removed the need for authentication entirely, one fewer layer of complexity. The tradeoff was that remote debugging required VPN access to the company network.
On the constraints: they are the most important section of this post. The 4 GB server forced the WASM migration (Part 2). The lack of Docker drove the systemd deployment model (Part 3). The absence of external services produced the gRPC error-forwarding pipeline (Part 3). The live ERP database shaped the stored-procedure contract layer and the atomic import swap (Part 4). Every architectural decision in the next three posts traces back to this list.
The Warehouse Floor
The physical warehouse is modeled as a 17-row by 31-column grid. Each cell is either walkable floor, an impassable shelf rack, or a named item position. Nine main aisles labeled A through I run through the space, plus a separate section J. Aisles B through I form double-wide shelf racks separated by corridors wide enough for an operator with a cart; Aisle A runs along the left wall; Section J occupies a separate area in the bottom-right corner.
The warehouse team reorganized the layout multiple times per year: moving shelves, extending aisles, adding or removing entire sections. Each reorganization meant the system’s code and database had to match the new physical reality. This forced a tight coupling between the grid definition (a text file called grid.txt) and the database position records, a coupling that the compile-time code-generation pipeline described in Part 2 was specifically designed to manage. If the grid file and the database disagreed, routes would be wrong. The build system was designed to make that disagreement impossible.
Each warehouse position follows the format {Aisle}{Rack}.{Shelf}. C3.7 means aisle C, rack 3, shelf 7: a specific vertical slot on a specific rack in a specific aisle. The aisle letter and rack number identify a physical location on the 2D grid; the shelf number identifies the vertical tier, up to 12 per rack. The optimization engine uses the first two components for distance computation (how far apart are two racks on the grid?) and the third for ergonomic prioritization (ground-level shelves are faster and less fatiguing to pick from). Part 2 explains how the optimizer uses shelf height to sequence picks.
The Data Model
The system’s data model connects six entities. Items are the core: an item has barcodes (the physical stickers used for identification and scanning), appears in order lines (when a customer requests it), and occupies inventory slots (where it physically sits in the warehouse). Orders contain order lines. Positions hold inventory slots.
The critical relationship for route optimization is between Items and InventorySlots, joined through Positions. A single item can appear in multiple slots across different aisles: 25 units of ESP32S3 boards at position C3.7 and another 10 units at A5.2, for instance. When an order requests that item, the system retrieves every slot where it’s available, along with current quantities at each slot. The optimizer then decides which slots to visit and in what order, factoring in walking distance to each slot, the slot’s inventory level (prefer draining nearly-empty slots to reduce fragmentation), and the shelf’s vertical position (prefer ground-level shelves for ergonomics).
This multi-slot, multi-factor decision is what separates the problem from a simple shortest-path computation. The optimizer isn’t just finding the nearest position; it’s choosing which positions to visit at all, then sequencing them. For an order with six items, each available at two or three positions, the combinatorial space is already nontrivial.
On the graph structure: think of positions as vertices and walking distances as weighted edges. The warehouse is a weighted graph with obstacles, shelf racks that block direct paths. The optimization engine finds a low-cost tour through a subset of vertices, a constrained variant of the Traveling Salesman Problem. The real-world constraints (inventory limits per slot, physical obstacles, ergonomic shelf priority) make it harder than the textbook version, but the core structure is the same.
What an Order Looks Like
An order is a list of item identifiers with requested quantities. It arrives as a paper order sheet. The operator scans the sheet’s barcode with a handheld scanner; the system reads the barcode data, derives the ORC code (the ERP’s internal order reference, short for “Ordine Cliente”), and queries the database for every item in that order along with its requested quantity.
A concrete order might look like this:
| Item | Quantity Requested |
|---|---|
| ESP32S3 | 4 |
| C100nF capacitor | 20 |
| M8x16 bolt | 12 |
| XT60 connector | 6 |
The operator scans once. The system resolves all four items, retrieves their warehouse positions and quantities, and passes the full mapping to the optimization engine.
For each item, the system performs a slot lookup against the InventorySlots table. ESP32S3, for instance, is available at two positions: C3.7 with 25 units on hand and A5.2 with 10 units. The optimizer needs only 4 units, so it could pick all 4 from either slot. The choice depends on which slot is closer to the other items in this order, whether one slot is at ground level (faster pick), and whether draining the smaller slot (A5.2, only 10 units) would reduce future inventory fragmentation. Every item in the order goes through this same multi-slot resolution, and every decision about one item affects the optimal path to every other.
Not every item can be fully fulfilled. If the order requests 120 M8x16 bolts but only 85 are in stock across all warehouse positions, the system flags the shortfall and caps the pick at 85 units. The remaining three items are unaffected; the operator fulfills them normally. Partial fulfillment is always better than rejecting the entire order over one line item. The system reports both the fulfillable items (with their optimized pick sequence) and the unfulfillable ones (with the gap between requested and available quantities), so the operator knows exactly what they’re short on.
The Route Problem
Without optimization, the ERP outputs items in arbitrary order. The operator walks whatever sequence appears on the pick sheet. In a real batch order, this means jumping between distant positions: aisle C, then all the way east to aisle I, then back west past aisles H through D for something near aisle B. Each backtrack crosses corridors the operator already walked. The warehouse isn’t large by logistics-industry standards (17 rows, 31 columns, roughly the footprint of a large gymnasium), but redundant crossings add up across dozens of orders per day.
The optimized route for a real batch order requesting 8 items across aisles A through I: A2 → B3 → A5 → C7 → E5 → I5 → H1 → H2. Each transition moves to the nearest unvisited position. The operator starts near the entrance in the western aisles, sweeps through B and C, moves east through E, reaches the far east at I, and finishes the final two picks at H1 and H2, both adjacent, both in the same corridor. No backtracking. The path is a single sweep with minor local detours, not a random walk.
The difference compounds across every order, every shift. TSP-optimized routes reduce walking distance by 47–83% depending on warehouse layout and order size3. For a warehouse processing 20+ batch orders per day across 250 working days, even the conservative end of that range, 47%, recovers thousands of hours annually. These are hours operators already work but spend moving between shelves instead of picking, counting, or packing. The optimization engine computes routes in sub-second time, so there’s no waiting at the scanning station. The operator scans the barcode, the route appears, they walk.
On the business impact: walking time is operational drag that doesn’t appear on any line item. A route reduction of 47% doesn’t just save steps; it compounds into faster order turnaround, fewer late shipments, and operators who finish their shifts with less physical fatigue. The system I built computes these routes on a server that costs less per year than a single day of a consultant’s time.
Beyond Route Optimization
Route optimization is the core of the system, but operators need more than optimized pick sequences. They search for items by stock ID to find every slot where an item is currently stored, with quantities at each position. They update inventory counts at specific slots after physical counts, a shelf-by-shelf reconciliation that keeps the database accurate. They assign items to new slots when stock arrives or the warehouse team rearranges shelf contents.
The largest supporting workflow uses Excel files. An operator exports a full warehouse snapshot (one row per slot, a canonical 4-column spreadsheet with position, item ID, quantity, and description), edits it offline, and re-imports the changes. The import can create new slot references (an item now stored at a position where it wasn’t before), update existing quantities, or destroy slot references entirely (remove an item from a position by setting its quantity to zero). This round-trip is the highest-risk operation in the system: a bad import can corrupt every position in the database. Part 4 covers the entire import pipeline: gRPC streaming, row-by-row validation with typed error codes, staging tables, and the atomic database swap that makes it safe.
Part 2 picks up at the moment an operator scans a barcode and the system needs to produce an optimized route. I’ll tell the full solver evolution story: three generations of implementations across three languages and two algorithmic paradigms, Jump Point Search pathfinding that replaced A*, compile-time code generation with perfect hashing, and the multi-phase greedy algorithm that made simulated annealing unnecessary. The constraints listed above will follow us through every post in this series.