The recent news of Uber blowing their yearly AI budget in a quarter is no surprise, even if the press makes it sound like it is. This is a key aspect to consider past the rollout phase, when work at pace begins and AI adoption is business as usual.
Basic patterns evolving into disruption
Think of it this way: when you rollout GenAI assistance for engineers, you have an average, generalised distribution of:
- 20% resistance
- 60% mild-to-moderate adoption
- 20% hardcore adoption
Putting this into metrics, only an average of 20% of the workforce is the one exhausting enterprise quotas and budgets after you reach BAU adoption at individual level.
Bear in mind this is where most organisations are today, and for which we have data beyond the empirical. It can be managed via traditional processes, and while it’s not a rounding error, it’s not a fundamental shift either.
The pattern shifts significantly once you go beyond individual adoption. Agentic Engineering fundamentally changes the economics of an engineering function, via a model that will allow anyone to exponentially consume an order of magnitude more resources due to the speed of delivery and the division of work into subagents.
This is then compounded by the natural evolution of maturity at such a stage of adoption:
- the 20% of resistance shrinks due to a number of factors (including a usage mandate)
- the 60% collectively raises the bar in terms of compute demand due to the widespread use of agents
- the 20% hardcore adopters further exacerbate the issue, as they were the ones already running out of quotas!
The old rules of Opex forecasting suddenly don’t apply anymore, cue the Uber situation.
Now what?
It is relatively simple - you are moving away from measuring cost per interaction and onto measurable value per outcome. The budget goalposts need to move, as the old model isn’t really fit for purpose anymore.
We faced something similar as an industry once we moved from the traditional Capex-based models of on-premise to the cloud-abstracted service models.
Some practices are relatively simple: token usage needs to become a new throughput metric, like a burndown used to be. Dynamic overages become manageable that way. Forecasting becomes important at project level.
Others are new: how do you measure the actual value generated by agent usage? Where is the breakeven? It’s not about agent cost, but rather value created relative to the expenditure.
It’s essentially the law of diminishing returns applied to agents.
Where do you draw the line?
The budgeting model changes, so you need full-blown observability on usage. It’s not singling-out, but rather understanding usage patterns of your engineers so to plan for baseline budgets more accurately. There will be overburn, but that would be based on current loads and not out of thin air.
Optimisation becomes important again, with new patterns emerging and efficiencies becoming necessary requirements. I can see a resurgence of effective prompt and context engineering, to avoid the runaway train of firing a few instructions and evaluating what comes out of the other side of the machine.
It’s also the reason why no provider gives you truly unlimited access. If it’s expensive for you as a consumer, it’s mindblowing thinking of what it is as a provider.
Obviously all of this applies until we get local AI models that can perform like Claude Sonnet on your device, and that’s where things will change again, soon enough…