Token usage tells you how much the model is running. It tells you nothing about whether any work got done. As enterprise AI moves from pilot to production, that distinction is going to separate the companies that scale from the ones that stall.
Enterprise software has a recurring problem. Every time a new technology paradigm emerges, the industry reaches for the wrong metric first.
When cloud arrived, we measured bookings. Bookings were clean, reportable, and completely disconnected from whether customers were getting value. The result was a generation of contracts that looked great at signature and churned in year two.
AI is repeating the pattern. The metric this time is token usage.
Token usage tells you how much the model is running. It tells you nothing about whether any work got done. And as enterprise AI moves from pilot to production, that distinction is going to separate the companies that scale from the ones that stall.
The teams that get this right are starting with a different question. Not "how much is the AI being used?" but "what job was the customer hired to do, and did AI help them do it faster, cheaper, or better?"
This is Jobs to Be Done applied to AI value measurement. The framework itself is not new. The application to enterprise AI is.
The shift reframes everything: how products are designed, how outcomes are measured, and how GTM teams build renewal conversations. Consider what this looks like across three enterprise workflows already being transformed.
A finance team running monthly close. The AI processes transactions, reconciles receipts, maps accounting codes. But the job is closing the books faster with fewer errors. If cycle time doesn't move, the AI is overhead, not value.
An HR team managing contractor onboarding. The AI drafts documents and routes approvals. But the job is getting the employee productive faster. If time-to-productivity doesn't change, the renewal conversation will be difficult.
A nurse practitioner handling patient triage. The AI parses records, suggests diagnostic codes, prepares clinical notes. But the job is completing pre-diagnosis faster so the practitioner can see more patients. If throughput doesn't increase, the clinical team finds something else.
| Industry | Job to be done | What the agent does | Execution boundary |
|---|---|---|---|
| Healthcare | Patient triage | Parses records, suggests diagnostic codes, drafts clinical notes. | NP validates physical exam and signs the chart. |
| Finance | Monthly close | Matches receipts to accounting codes, flags anomalies. | Controller reviews anomaly report and locks the month. |
| HR | Employee offboarding | Scans non-competes, flags hardware, drafts exit scripts. | HRBP conducts exit interview and clicks final approve. |
| Procurement | Contract compliance | Analyzes contract changes, checks vendor history. | Procurement manager accepts risk and signs. |
| Sales | Deal qualification | Researches account, scores fit, drafts outreach brief. | AE reviews summary and decides whether to pursue. |
Getting to job completion requires a design decision most enterprise AI deployments skip: defining exactly where autonomous agentic work ends and human judgment begins.
This is the Execution Boundary.
Every enterprise workflow has one. It is the point where the cost of an automated error exceeds the efficiency gain of automation. Before the boundary, the agent operates autonomously — gathering records, synthesizing data, staging outputs, flagging exceptions. At the boundary, a human validates, approves, or signs. After the boundary, the work is committed to the system of record.
The Execution Boundary is not a limitation of current AI. It is a design principle for deploying it responsibly at enterprise scale. The workflows that scale fastest are the ones where this boundary is defined explicitly from the start — not discovered after a production incident.
The practical implication: enterprise AI products should be designed around the boundary, not just the model. The autonomous layer takes the job as far as it can go without human accountability. The handoff is deliberate, auditable, and consistent. The human is elevated from data-entry clerk to agentic supervisor.
This is also where differentiation compounds. The teams that instrument and optimize around the Execution Boundary accumulate something their competitors cannot replicate: outcome data.
When you measure by job completion — when you track whether the finance team closed faster, whether the nurse practitioner saw more patients, whether the contractor was productive sooner — you build an evidence base that is proprietary by nature.
You know which workflows your AI actually changes. You know which customers are getting value and which are at risk. You know where the Execution Boundary is calibrated correctly and where it needs to move. You know which jobs are ready for higher automation coverage and which require more human oversight.
This is the moat that enterprise AI companies should be building right now. Not proprietary models — those are increasingly commoditized. The proprietary asset is the operational knowledge of how specific jobs get done in specific enterprise environments, and the evidence that your system reliably completes them.
The companies that will define this market are already making three moves.
The companies still optimizing for token usage and seat counts will find themselves in the same position as early cloud vendors who optimized for bookings — great metrics until renewal, and then a very hard conversation.
The metric that will define enterprise AI winners is not how much the system ran. It is whether the job got done, how fast it got done, and whether the evidence is auditable enough to justify the next contract.
That is the framework the market is moving toward. The question is which teams get there first.
— Souji Madhurapantula writes about enterprise AI, operating models, and go-to-market strategy at jaggedfrontier.org