Autonomous Inventory Optimisation

What We Built

The simulation architecture was deliberately constrained to mirror enterprise reality. The agent received a frozen simulator (no modifications allowed), a frozen cost formula, and a single mutable configuration file containing three parameters per product: reorder point, reorder quantity, and safety stock. The demand data was deterministic with a fixed seed, so identical parameters always produced identical costs. The agent's only lever was parameter selection. This is the operational reality of most ERP-driven inventory systems: the physics of the supply chain are fixed, and the only degrees of freedom are policy parameters buried in configuration tables.

We ran four versions, each adding a layer of complexity that is present in real supply chains but rarely addressed in a single optimisation loop.

V1 modelled five products with two cost components: a $10 per-unit stockout penalty and a $0.50 per-unit-per-day holding cost. This is the simplest possible inventory problem. It exists to establish a baseline for how quickly an autonomous agent can identify and correct gross misconfigurations.

V2 scaled to twelve products across eight demand categories (staple, seasonal, trending, slow-mover, fast-mover, new-launch, end-of-life, and correlated demand). It introduced variable lead times per product (1 to 5 days) and a third cost component: quantity discount tiers on ordering ($5/unit for small orders, $3.50/unit for medium, $2/unit for bulk). This version models the procurement economics that dominate real purchasing decisions.

V3 retained the V2 catalogue but introduced perishable goods mechanics: FIFO batch tracking, variable shelf life per product (ranging from 3 days to non-perishable), expiry, waste, and a fourth cost component at $5 per wasted unit. Seven of the twelve products were perishable. The simulator tracked individual inventory batches by arrival date and discarded expired units before fulfilling demand each day.

V4 extended V3 with two structural changes. First, time-phased strategies: each product could have different reorder parameters for different windows within the 90-day simulation. Second, per-product Bayesian optimisation: instead of fitting one 36-dimensional Gaussian Process across all products jointly, V4 fitted separate 3-dimensional GPs per product. The search space expanded from 36 dimensions to roughly 60.

Each version ran against 90 days of synthetic demand for all products, with 100 units of initial inventory per product. Each simulation completed in under 5 seconds. No GPU. No training. Just a fast Python loop with FIFO batch tracking.

The Numbers

Version	Products	Cost Components	Experiments	Baseline Cost	Final Cost	Reduction
V1	5	2 (stockout + holding)	65	$61,455	$15,988	74.0%
V2	12	3 (+ ordering discounts)	300	$236,277	$143,486	39.3%
V3	12	4 (+ waste/perishability)	70	$241,040	$146,695	39.1%
V4	12	4 (+ time-phased strategies)	~15,000	$149,202	$143,300	4.0% (40.5% from V3 naive baseline)

Cost reduction by version — % of baseline eliminated

V1

74.0%

5 products · 65 experiments · $61,455 → $15,988

V2

39.3%

12 products · 300 experiments · $236,277 → $143,486

V3

39.1%

12 products · 70 experiments · $241,040 → $146,695 (+ perishability)

V4

40.5%

12 products · ~15,000 experiments · vs V3 naive baseline (+ time-phased strategies)

V1's 74% reduction came almost entirely from correcting a gross misconfiguration. The naive baseline used identical parameters for all five products. The highest-demand product was consuming 50 units per day and running out constantly. The agent fixed this in six experiments. The remaining 59 experiments squeezed out incremental gains by fine-tuning holding cost against stockout risk.

V2's 39% reduction required 300 experiments and revealed a fundamentally different cost surface. The early wins were similar to V1: the agent identified that fast-mover products were catastrophically under-stocked and fixed them immediately, cutting $56,000 from the baseline in the first two experiments. But the middle and late phases exposed a phenomenon with significant operational implications: cliff effects. The agent discovered that a single one-unit parameter change on prod_fast_2 caused missed sales to jump from 400 to 3,300 units — a $22,000 cost increase.

V3's 39% reduction was achieved in only 70 experiments. The introduction of perishability created a two-sided penalty structure. The productive work happened in experiments 42 and 43, where the agent switched to systematic grid search and cut $20,000 from the cost. The Bayesian optimiser provided by the frozen module produced zero accepted suggestions — the joint 36-dimensional Gaussian Process could not build a useful model with fewer than 100 data points.

V4's 4% reduction from its own baseline (40.5% from V3's naive baseline) required approximately 15,000 experiments. The agent cycled through six distinct optimisation techniques and worked in a regime where each additional dollar of cost reduction required roughly 2,500 experiments to find.

Five Structural Findings

Finding 01

The 80/20 Frontier Holds, Then Hardens

In every version, the agent captured the majority of available cost reduction within the first 10 to 15 percent of total experiments. V1 achieved 68% of its total improvement in the first 6 of 65 experiments. V2 achieved 60% of its improvement in the first 11 of 300 experiments. V4 achieved 50% of its improvement in the first 53 of 15,000 experiments.

The consistency of this pattern across increasing problem complexity is the finding. The low-hanging fruit in parameter optimisation is structurally similar regardless of the number of products, cost components, or constraint types. An autonomous agent identifies and corrects gross misconfigurations quickly because they produce large, unambiguous cost signals. The remaining improvement requires disproportionate computational effort. The ROI curve for autonomous optimisation is steep at the front and flat at the back.

Experiment efficiency — cumulative % of total reduction captured

Finding 02

Cliff Effects Dominate the Late-Stage Cost Surface

The most operationally significant finding across V2, V3, and V4 was the prevalence of cliff effects — sharp, non-linear discontinuities where a small parameter change causes a large cost jump. They are not artefacts of the simulation. They are structural features of any inventory system with discrete ordering, integer demand, and threshold-based reorder policies.

In V2, the agent mapped cliffs on 8 of 12 products. The most severe: a one-unit reduction in reorder point on prod_fast_2 (from 100 to 99) increased cost by $22,000 — a 15% jump in total system cost from a single parameter change. An inventory system operating near a cliff is fragile. A small change in demand patterns, a one-day increase in supplier lead time, or a minor forecasting error can push the system into a dramatically worse cost regime.

The agent's ability to map cliff boundaries is arguably more valuable than its ability to find optimal parameter values. It produces a risk map of the strategy space. Any organisation running threshold-based reorder policies should want to know where their cliffs are.

Parameter risk map — reorder point vs total system cost (illustrative)

Finding 03

Bayesian Optimisation Failed at Scale, Succeeded When Decomposed

V3 provided the agent with a joint Gaussian Process surrogate model across all 36 parameter dimensions. Across 70 experiments, it produced zero accepted suggestions. This is a known limitation of GP-based surrogate models in high-dimensional spaces — a 36-dimensional GP needs hundreds or thousands of data points to build a useful model.

V4 addressed this by decomposing the problem into twelve separate 3-dimensional GPs, one per product. This worked. The per-product GPs produced accepted suggestions starting around experiment 28. The decomposition reduced the data requirement from hundreds of experiments to roughly 5 per product. The lesson generalises: when applying surrogate-based optimisation to enterprise systems, the dimensionality of the surrogate must match the structure of the problem, not the total parameter count.

Optimisation structure — joint vs decomposed search

V3 — Joint 36-dimensional GP

✕ 0 accepted suggestions — GP cannot model 36 dimensions with <100 data points

V4 — Per-product 3-dimensional GPs

P01

P02

P03

P04

P05

···

✓ Accepted suggestions from experiment 28 — ~5 data points per product sufficient

Finding 04

Time-Phased Strategies Capture Value in Architecture, Not in Tuning

V4's baseline cost ($149,202) was already lower than V3's converged optimum ($146,695), because the time-phased structure itself provided a better starting point before any optimisation occurred. The subsequent 15,000 experiments of iterative tuning reduced cost by only 4% from that baseline. The structural decision — how many phases, where the boundaries fall — captured most of the value. The parametric tuning within each phase captured relatively little.

For seasonal businesses, phase boundaries matter more than safety stock levels. For product lifecycle management, the transition points between launch, growth, maturity, and decline matter more than the reorder quantities within each phase.

Finding 05

The Optimiser is Not the Bottleneck. Problem Decomposition Is.

Across all four versions, the choice of optimisation algorithm mattered less than the structure of the search. V3's grid search (2 experiments) outperformed its Bayesian optimiser (50+ experiments). V4's tier-aware adjustments outperformed its gradient-guided moves. In every case, the technique that encoded domain knowledge into the search structure outperformed the technique that treated the problem as a generic black box.

Better problem decomposition produces higher returns than more sophisticated optimisation algorithms. A 100-parameter system that decomposes into 20 independent 5-parameter subsystems is tractable. A 100-parameter system treated as a single joint optimisation is not, regardless of how sophisticated the surrogate model is.

Industry Applications

Grocery and Food Retail

The V3 and V4 perishability mechanics model the central operational challenge of grocery: balancing freshness against availability. A product with a 3-day shelf life and a 2-day supplier lead time has an effective selling window of 1 day after arrival. For a grocery chain managing 30,000 SKUs with variable shelf lives, the value proposition is not "AI will optimise your inventory." It is: autonomous simulation will map the fragility boundaries of your reorder policies across every SKU, every store, every season, and tell you exactly where you are exposed to cost cliffs that your current parameters cannot absorb.

Pharmaceutical Distribution

Pharmaceutical supply chains combine perishability, regulatory constraints on ordering, and extreme stockout penalties. The cost asymmetry in our simulations (stockouts at $10/unit, waste at $5/unit) is conservative compared to pharmaceutical reality, where a stockout of a critical medication can trigger regulatory action, contract penalties, and reputational damage that dwarfs the unit cost. The autonomous agent's ability to map fragility boundaries across hundreds of medications, each with different lead times, shelf lives, and demand patterns, addresses a problem that manual analysis cannot scale to cover.

Manufacturing and Industrial Parts

V2's quantity discount tiers model the procurement economics of manufacturing: volume discounts from suppliers, minimum order quantities, and the trade-off between per-unit cost and carrying cost. The V4 time-phased strategies are particularly relevant — product demand in manufacturing is rarely stationary. New product introductions ramp up, legacy products decline, seasonal products cycle. The simulation demonstrated that the phase architecture matters more than the parameter values themselves.

E-Commerce and Direct-to-Consumer

E-commerce inventory operates under V2's constraint set: variable lead times, quantity discount tiers, and demand heterogeneity. The cliff effects documented in V2 have a specific e-commerce interpretation: for businesses running flash sales, influencer promotions, or seasonal campaigns, knowing where these cliffs are before the demand spike arrives is the difference between capturing the revenue and displaying "out of stock" at peak traffic.

Quick-Service Restaurants and Food Service

The V3 waste-versus-stockout trade-off is the daily operational reality of food service. The time-phased strategies from V4 map to daypart planning — breakfast, lunch, dinner, and late-night service have different demand profiles and require different prep quantities. Getting the daypart transition times right matters more than fine-tuning the exact prep quantity for each period.

Healthcare and Hospital Supply Chains

Hospital supply chains combine nearly every constraint type in our simulation series. The finding that autonomous agents produce the most value as exploration infrastructure rather than autonomous controllers is particularly relevant in healthcare. No hospital will hand reorder decisions to an unsupervised algorithm. But every hospital would benefit from a system that continuously simulates its inventory policies, maps the fragility boundaries, and flags products where current parameters are operating near a cost cliff.

Capital Allocation Implications

First, the initial deployment captures most of the value. The 80/20 pattern was consistent across all four versions. An autonomous agent correcting gross misconfigurations will produce large, fast returns. Subsequent refinement produces diminishing returns. Budget accordingly: the first project should be scoped for quick wins, not for convergence to a global optimum.

Second, invest in problem decomposition before investing in optimisation algorithms. The single most impactful architectural decision across our simulations was V4's switch from joint optimisation to per-product optimisation. This is a structural decision about how to frame the problem, not a decision about which algorithm to use. Organisations should invest in understanding the decomposition structure of their inventory system before selecting optimisation tooling.

Third, the value of autonomous exploration compounds over time. The agent's 15,000 experiments in V4 produced a comprehensive map of the cost surface: every cliff boundary, every discount tier interaction, every phase transition effect. This map does not depreciate. The ongoing value of autonomous simulation lies in maintaining a continuously updated understanding of where the system is fragile and where headroom for improvement exists.

What We Did Not Test

The simulations operated under deterministic demand with a fixed random seed. Real supply chains face stochastic demand, supplier unreliability, transportation delays, and demand shocks. The cliff effects we documented would be even more consequential under uncertainty, because a system operating near a cliff boundary would be pushed over it by normal demand variance.

We also did not test multi-echelon inventory structures, substitution effects, or dynamic pricing interactions. Each of these adds constraint dimensions that, based on our V1-to-V4 progression, would increase the required experiment budget by roughly an order of magnitude per constraint type.

The simulations used a single autonomous agent operating sequentially. Parallel agent architectures, where multiple agents explore different regions of the parameter space simultaneously, could reduce wall-clock convergence time significantly. The decomposition structure that worked in V4 (per-product optimisation) maps naturally to parallel execution.

Summary

Across four simulation versions, 15,435 total experiments, and progressively increasing operational complexity, the autonomous agent reduced inventory costs by 40 to 74 percent. The results are consistent and reproducible. The cost surface is dominated by structural discontinuities, not smooth gradients. The majority of value is captured early. Sophisticated optimisation algorithms underperform structured, domain-aware search. Problem decomposition matters more than algorithm selection. Time-phased strategy architecture matters more than parametric tuning within a fixed architecture.

Version	Products	Cost Components	Experiments	Baseline Cost	Final Cost	Reduction
V1	5	2 (stockout + holding)	65	$61,455	$15,988	74.0%
V2	12	3 (+ ordering discounts)	300	$236,277	$143,486	39.3%
V3	12	4 (+ waste/perishability)	70	$241,040	$146,695	39.1%
V4	12	4 (+ time-phased strategies)	~15,000	$149,202	$143,300	4.0% (40.5% from V3 naive baseline)

For organisations operating complex supply chains, the operational question is not whether autonomous inventory optimisation works. The simulations demonstrate that it does. The question is how to deploy it: as exploration infrastructure that maps fragility and identifies headroom, not as an autonomous controller that replaces human judgment.

The agent's value is in the map it produces, not in the single point it recommends. Interested in applying this to your operations? Start an optimisation enquiry →