The AI Cost Framework
The first time a buyer asks “how much does AI cost?” they get one number. Sometimes it is $50,000. Sometimes it is $2 million. Both are wrong.
Not because the vendor is lying. Because the question is malformed, and the vendor has every incentive not to fix it.
AI implementation is not one cost. It is three. The build – the engineering line on the proposal. The run – what it costs to keep the system online once it ships. And the hidden third layer – data preparation, evaluation infrastructure, monitoring, retraining, compliance, change management – which most proposals omit entirely and which typically equals or exceeds the build itself.
The buyers who only price the build get blindsided. They sign a $250K statement of work, ship the pilot, and discover they need another $200K to make it usable, plus $80K a year to keep it from degrading. The vendors who only sell the build are not all dishonest. Most genuinely do not know what the run will cost, because they have never operated the system at your scale. This guide is the cost model you should walk into the conversation with – before the slide deck, before the SOW, before the pilot.
Why AI Cost Estimates Are Mostly Wrong
Most AI cost estimates are wrong for the same reason most software estimates are wrong, plus a few new reasons specific to AI.
The vendor prices the engineering, not the project. A proposal arrives. It lists model selection, prompt engineering, integration, UI, testing, deployment. The hours look reasonable. What it doesn’t list: building a labeled evaluation set, instrumenting for drift, setting up an on-call rotation for inference outages, redesigning the workflow your team actually uses, and the six weeks of cleanup when half the data your model needs lives in a 2014 SQL database with no schema documentation. None of that is in the SOW. All of it gets done. Someone pays for it. Often that someone is you, in change orders.
The buyer under-specifies accuracy and latency. “We need a model that can answer customer questions” is not a requirement. “We need a model that can answer customer questions correctly 95% of the time, with sub-two-second response, in seven languages, with auditable citations” is. The two specs cost dramatically different amounts. When the spec is loose, the vendor optimizes for what’s easy to demo, which is rarely what’s hard to operate.
Both sides ignore the hidden layer until it bites. Data prep, evaluation, monitoring, and change management are not optional add-ons. They’re the work that determines whether the system is actually used and trusted. A model that’s 92% accurate but has no eval harness will silently degrade to 78% over six months and nobody will notice until a customer escalation. The hidden layer is where AI projects either earn their cost or quietly become shelfware.
Token economics surprise everyone. A demo that costs $40 in API calls during the pilot can cost $40,000/month at production volume. Multiply by retries, by long-context queries, by the fact that you’ll probably need to call the model two or three times per user-facing answer (retrieval, generation, validation), and the inference line item alone can outpace the engineer’s salary. Most buyers don’t price this until the first invoice.
The 2026 market is still pricing incoherently. Some firms quote $40K for work that other firms quote at $400K. Sometimes the cheap one cuts corners. Sometimes the expensive one is doing pre-sales for a platform license. Without a cost model, buyers can’t tell which is which. This guide gives you that model. The same way our website redesign cost guide reframed agency pricing – published ranges anchor high, real ranges are narrower – AI pricing has its own fictions worth knowing.
Build Cost: Engagement Type Ranges
The first move in any cost conversation is to name the engagement type. The numbers below are 2026 market rates for U.S. and U.S.-adjacent vendors; offshore is 30–60% lower with corresponding tradeoffs in oversight and timezone.
| Engagement type | Build cost range | Typical timeline |
|---|---|---|
| Internal AI tools (Slack bot, doc Q&A, summarization) | $5K–$60K | 2–8 weeks |
| LLM-powered product feature (chatbot, copilot, search) | $25K–$150K | 6–16 weeks |
| Custom model / fine-tuning / RAG with proprietary data | $150K–$750K | 3–6 months |
| Enterprise AI platform (multi-model, pipelines, governance) | $500K–$5M+ | 6–18 months |
Internal AI tools: $5K–$60K. A Slack bot that answers questions about your handbook. A summarizer for support tickets. A Notion or Confluence Q&A layer. These are the cheapest engagements because the off-the-shelf tooling is excellent and the integration surface is small. If a vendor quotes $90K for a Slack bot, you’re paying for their overhead and their roadmap, not your project. Most internal tools should be built by a freelancer or a small team in three to six weeks.
LLM-powered product feature: $25K–$150K. A customer-facing chatbot, an in-app copilot, an AI-powered search box. The cost spread is wide because the requirements vary wildly. A bot that uses an off-the-shelf API, displays results in a basic UI, and handles a single domain lands at the low end. A copilot that has to reason across multiple data sources, route to different models depending on the query, handle multi-turn conversations with memory, and meet enterprise SSO and audit requirements lands at the high end. The cost driver here is rarely the model – it’s the integration, the eval harness, and the UI.
Custom model / fine-tuning / RAG: $150K–$750K. This is where serious data work begins. You’re either fine-tuning a base model on your data, building a retrieval-augmented generation pipeline against a proprietary corpus, or both. The cost is dominated by data preparation (often 40–60% of the engagement), eval set construction, and the ML engineering required to make the pipeline reliable. Most companies should not start here. Start with off-the-shelf, prove demand, then graduate.
Enterprise AI platform: $500K–$5M+. Multi-team, multi-model, with shared data infrastructure, governance, model registry, eval pipelines, and centralized observability. The work spans 6–18 months and usually involves a platform team, an ML team, a data engineering team, and a security/compliance review. If you’re at this scale, you don’t need this guide – you need a development partner and an internal program lead. The reason to call out the range is so you know what “enterprise AI” actually costs and don’t get talked into it when an internal tool would do. The Stanford HAI 2025 AI Index Report tracks industry-wide AI investment and adoption costs annually if you want a calibration source for your own benchmarks.
Hidden Costs: Data, Evaluation, and Drift
Here’s the part most proposals leave out. Read this section twice.
Data preparation: 30–50% of total project cost. AI runs on data, and your data is almost certainly not ready. It lives in five systems. It has inconsistent formatting, missing fields, duplicate records, and three different ways to spell the same product name. You need to extract it, clean it, normalize it, deduplicate it, label some of it, and build a pipeline that keeps it fresh. A vendor who quotes a $250K project with $20K of “data work” in it has either inherited a miraculous dataset or – far more likely – under-scoped the messiest part of the engagement. Real data prep on a typical mid-market dataset is 30–50% of build cost. On a regulated dataset (healthcare, finance), closer to 50–70%.
Evaluation infrastructure: $20K–$150K. You cannot ship an AI feature without an eval set. An eval set is a labeled corpus of inputs paired with the outputs you’d want – a few hundred to a few thousand examples that let you measure whether the model is improving or regressing. Building one takes domain experts, not engineers. It’s slow, expensive, and almost never quoted. You also need an eval harness – code that runs the eval set against any model version and reports accuracy, hallucination rate, latency, cost per query, and whatever else matters in your domain. Without this, you don’t know if the model works. You’re shipping vibes.
Inference at scale. A model that costs $0.005 per query in the pilot costs $5,000/day at 1M queries. Most internal demos run hundreds of queries. Production runs millions. Token math is non-optional, and we’ll do the math in the next section.
Monitoring and drift detection: $15K–$80K to set up, $20K–$60K/year to maintain. Models drift. The world changes, your data changes, user behavior changes, the underlying model gets updated by the provider. You need monitoring that catches drift before users do. That means logging inputs and outputs, sampling for human review, alerting on accuracy regressions, and having a process for what to do when an alert fires. None of this exists out of the box.
Retraining and refresh: 10–25% of build cost annually. Every 6–12 months, you’ll retrain or refresh the model – new data, new base model version, new fine-tuning run, new RAG corpus. This is real work, not a batch job. Budget for it.
Compliance and privacy review: $10K–$100K. If you’re in healthcare, finance, legal, education, or any EU-touching business, you’ll need a privacy impact assessment, a data residency review, and probably a third-party audit. This is non-negotiable and usually budget-omitted.
Change management: $20K–$200K, often more. Your team’s workflow has to change. Support reps have to trust the AI’s draft instead of writing from scratch. Sales has to learn when to override the recommendation. Ops has to handle the new exception path. This is the hidden cost that most often kills adoption. It’s not a software cost. It’s a humans-changing-how-they-work cost. Budget for training, documentation, ongoing support, and the productivity dip in the first three months.
Run Cost: Token Economics at Scale
Run cost is dominated by inference. Let’s do real math.
Assume you’re building a customer support copilot. Each user-facing answer involves three model calls: one to retrieve relevant context (small model), one to generate the answer (large model), and one to validate or rerank (small model). Total tokens per answer: roughly 8,000 input + 800 output, blended across the calls. At blended public pricing – call it $3 per million input tokens, $15 per million output tokens for a frontier model, with smaller models 5–10x cheaper – your blended cost per answer lands around $0.04–$0.08.
Now scale it.
| Volume | Cost per answer | Daily cost | Monthly cost | Annual cost |
|---|---|---|---|---|
| 1,000 answers/day | $0.05 | $50 | $1,500 | $18,000 |
| 10,000 answers/day | $0.05 | $500 | $15,000 | $180,000 |
| 100,000 answers/day | $0.05 | $5,000 | $150,000 | $1,800,000 |
| 1,000,000 answers/day | $0.05 | $50,000 | $1,500,000 | $18,000,000 |
A pilot at 1,000 answers/day looks cheap. The same architecture at 100,000 answers/day costs nearly $2M/year in inference alone. This is why “the model is so cheap now” is a misleading sentence. Per-token pricing has dropped, but production-grade applications make many calls per user action and run at volumes the pilot never tested.
What changes the math:
- Caching. If 30% of queries are repeats (and they often are), prompt and response caching cuts inference cost by 20–40%. Worth building.
- Routing. Send simple queries to a small model, hard queries to a frontier model. A good router cuts blended cost by 40–60% with minimal accuracy loss.
- Context discipline. Most prompts are 3–5x larger than they need to be. A focused prompt with retrieval beats a giant prompt with everything-and-the-kitchen-sink. This is the single biggest cost lever most teams ignore.
- Self-hosting. Above roughly 10M queries/month, self-hosting open models on dedicated GPUs starts to compete with API pricing. Below that volume, the operational burden isn’t worth it.
Plus the non-inference run costs: hosting and infrastructure ($10K–$80K/year), monitoring tools ($10K–$50K/year), on-call and incident response (varies), eval re-runs ($5K–$20K/year), and retraining (~10–25% of build cost/year).
Sum it all: annual run cost typically lands at 20–40% of build cost for production AI features. A $250K build is a $50K–$100K/year run cost, not counting the hidden layer.
Total Cost of Ownership: A Worked Example
Let’s price a realistic project end-to-end.
The project: A customer support copilot for a 200-person SaaS company. Drafts replies for support reps, pulls answers from your help center and ticket history, handles 5,000 tickets/day across two languages.
Build (engineering line): $250,000. Six months of work. Two engineers, a half-time PM, a half-time designer, a part-time ML specialist. Includes integration with Zendesk, retrieval pipeline, generation pipeline, validation pass, draft-in-agent UI, basic eval harness, basic monitoring. This is the number that goes on the SOW.
Hidden third layer: $230,000.
| Item | Cost |
|---|---|
| Data preparation (cleaning ticket history, normalizing tags, building knowledge base) | $90,000 |
| Eval set construction (1,500 labeled tickets, golden answers, domain expert time) | $40,000 |
| Compliance and privacy review (PII handling, data residency) | $25,000 |
| Change management (training 40 support reps, workflow redesign, documentation) | $55,000 |
| Monitoring setup (drift detection, alerting, dashboards) | $20,000 |
This is the layer that doesn’t show up in most proposals. It’s also the layer that determines whether the project is a success.
Year-one run cost: $80,000.
| Item | Cost |
|---|---|
| Inference (5,000 tickets/day × 2 model calls × blended pricing) | $35,000 |
| Hosting and infrastructure | $15,000 |
| Monitoring tools | $10,000 |
| Eval re-runs and quality assurance | $8,000 |
| On-call and incident response (allocated engineering time) | $12,000 |
Year-one total: $560,000. The build was $250K. The actual project was $560K. That’s the TCO heuristic in action – build cost is roughly 45% of year-one TCO.
Year-two run cost: $90,000–$110,000 – inference grows with usage, plus a retrain cycle.
If you only budgeted the $250K build, you’d be in the change-order spiral by month four. If you budgeted the full $560K up front and held the vendor to it, you’d ship a system that actually works and is actually used.
What Drives Cost Up or Down
Five levers move AI cost dramatically. Knowing them lets you negotiate intelligently and lets you flag a vendor who doesn’t bring them up.
1. Build vs. buy. Always start with off-the-shelf. A $200/seat/month tool that solves 80% of your problem is almost always better than a $300K custom build that solves 95%. Most companies discover, after the off-the-shelf pilot, that the remaining 20% wasn’t worth the 10x cost. The custom-build conversation should start only after you’ve operated the off-the-shelf version for at least three months and have a written articulation of what it can’t do that’s worth more than 3x its cost. Read AI tools for small business for the buy-side version of this question.
2. Accuracy threshold. Going from 90% accuracy to 95% might double the project cost. Going from 95% to 99% might 5x it. Going from 99% to 99.9% might 10x it again. The cost curve is exponential, not linear, because each additional nine of accuracy requires more eval data, more edge case handling, more human-in-the-loop review, and a longer tail of bugs. Specify the accuracy you actually need, not the accuracy you want. For a support copilot drafting replies, 90% might be fine because a human reviews each one. For a medical diagnostic tool, 99.9% may not be enough.
3. Latency budget. Sub-second responses cost real money. They constrain model choice (smaller, faster, less accurate), require caching infrastructure, often require self-hosting, and complicate retrieval. Two-to-five-second responses are 30–60% cheaper to build. Ten-second responses (acceptable for back-office workflows) can be half the cost. Most teams over-spec latency. Ask: does this need to feel like a chat or like a queue?
4. Data residency and sovereignty. If your data has to stay in EU, in a single region, in a private VPC, in a self-hosted model, the cost climbs. Self-hosted frontier-class models require GPU infrastructure and ML ops capability most teams don’t have. Plan for a 30–80% premium over the same project on shared cloud infrastructure.
5. Auditability and explainability. If every model output has to be logged with citations, traced back to source documents, and producible on demand for an auditor, you’re building an audit trail alongside the AI. That’s another data system, another retention policy, another set of access controls. Budget 15–30% extra for regulated-industry projects. The NIST AI Risk Management Framework and the OECD AI Policy Observatory both publish governance scaffolding that helps scope this work – and the scope of governance is itself a cost driver.
A useful rule: each of the five levers, dialed to maximum, roughly doubles the cost. Stack three of them and you’ve gone from a $250K project to a $2M project. Stack none and you might have a $40K project. The vendor doesn’t decide where you sit on these levers – you do, when you write the spec.
Getting a Defensible Number from a Vendor
The goal of a vendor conversation isn’t to get a price. It’s to get a price you can defend to your CFO, your board, and your future self. That means structure.
Ask for the price broken into four buckets. Build (engineering), data preparation, evaluation infrastructure, and run (year one and year two). If a vendor can’t break it apart, they haven’t priced the project. They’ve priced a pitch. A clean breakdown looks like this:
| Bucket | Includes | Cost |
|---|---|---|
| Build | Engineering, design, PM, ML specialist | $X |
| Data | Extraction, cleaning, normalization, ongoing pipeline | $Y |
| Eval | Eval set construction, harness, ongoing eval runs | $Z |
| Run (Y1) | Inference, hosting, monitoring, on-call, retrain reserve | $W |
Ask what accuracy and latency they’re targeting and what each costs to improve. A good vendor will say “we’re targeting 92% accuracy at 2-second latency. To hit 96% would add ~$80K to the build and $25K/year to ongoing eval. To hit sub-second would require self-hosting and add $150K and reduce model quality.” A vendor who doesn’t know the answer doesn’t have the experience to do the project.
Ask who owns what. The model weights, the fine-tuning data, the eval set, the data pipeline, the prompts. You should own all of it. Vendors sometimes try to retain the model or the eval set, then license it back to you or hold it as switching-cost insurance. Don’t sign that. Use the technology vendor due diligence checklist before signing anything.
Ask what happens at 2x usage. “If our volume doubles in year one, what happens to the price?” If the vendor doesn’t have a clean answer (typically: inference scales linearly, hosting steps up, monitoring is fixed), they haven’t run a system at scale.
Ask for references at your scale. “Show me three customers with similar volume, similar accuracy targets, similar data complexity. What did they pay all-in? What surprised them?” If the vendor only has demos and pilots to point at, they haven’t shipped production AI. That’s expensive on-the-job training for you.
Ask about pricing models. Fixed-fee for clearly scoped phases (discovery, pilot, integration). Time-and-materials with a hard cap and weekly burn reporting for ambiguous research. Avoid open-ended T&M, especially with a vendor you haven’t worked with before. The fixed-fee vs. time-and-materials guide covers this in depth.
RFP language that forces a defensible number. Include these clauses:
- “Provide costs broken into build, data preparation, evaluation, and year-one run, with line items for each.”
- “Specify target accuracy and latency, and the cost delta for each 10% accuracy improvement and each halving of latency.”
- “Identify all third-party services (model APIs, vector databases, monitoring) and their projected annual cost at our stated volume.”
- “Specify ownership of model weights, eval data, fine-tuning data, and data pipelines. Default is buyer ownership of all artifacts.”
- “Include three reference customers at comparable scale, with permission to discuss costs and surprises.”
A vendor who answers these cleanly is a serious one. A vendor who hedges, says “it depends,” or asks to schedule another call to discuss them is buying time to figure it out at your expense.
Common Pricing Traps
Some pricing structures look reasonable on the page and are catastrophic in practice.
Open-ended time-and-materials. “We bill hourly. We’ll keep you posted on burn.” This is the classic AI consulting trap. Without a cap and weekly reporting, the bill compounds invisibly. By month three, you’re $200K over budget and the vendor’s response is “AI is harder than expected.” Use T&M with a hard cap, weekly burn reports, and a kill clause. Better: phased fixed-fee with a small T&M reserve for unknowns.
Platform license + per-seat + per-token. Some vendors bundle a “platform license” ($50K–$250K/year), a per-seat fee ($30–$150/seat/month), and per-token pass-through pricing on inference. Each piece looks fair. Stacked, they compound. A 50-person team can hit $300K/year before any inference. Always model the three-year fully-loaded cost, not the year-one. And ask whether the platform actually does anything you couldn’t get from the underlying model APIs and a thin layer of glue code.
Exclusive-IP clauses. “We retain the rights to the model and the data pipeline.” This sometimes shows up as a fine-print justification for a discount. It means the vendor can take what you paid them to build, package it, and sell it to your competitors. It also means you can’t switch vendors without rebuilding from scratch. Buyer-side rule: you own everything you paid to build. Period.
Per-token markup. Some implementation firms resell model API access with a 20–40% markup. The markup is hidden inside the “platform fee.” At scale, this is a tax of tens of thousands per month for nothing. Insist on direct-billed model APIs or full visibility into pass-through pricing.
“Implementation included” that’s just handoff. Same trap as the website redesign world. A vendor says “implementation is included” and means “we’ll hand off the model and your team will integrate it.” Pin down what “implementation” means: integration with your stack, deployment to production, a runbook, training your team, and 30 days of post-launch support. If they won’t commit to that, the engagement is unfinished by design.
Discovery without a decision point. A discovery phase that ends in a recommendation is fine. A discovery phase that ends in another, larger discovery phase is a sales funnel. Phase 1 should produce a go/no-go decision and a defensible price for phase 2. If it doesn’t, the vendor is selling you a longer engagement, not a result.
Annual price escalators with no cap. Some platform contracts include a 7–15% annual escalator. Compounded over five years, that’s a 40–100% price increase. Negotiate the escalator down or cap it at CPI.
What Buyers Should Do
A few rules to take into your next AI cost conversation.
Start with off-the-shelf. Always. The cheapest version of AI is the one you didn’t build. Pilot a tool. Operate it for three months. Document what it can’t do. Only then talk about custom.
Price the project in three layers, not one. Build, run, hidden. Don’t sign a SOW that doesn’t have all three.
Specify accuracy and latency before the vendor does. These are the two biggest cost multipliers. If you don’t pin them down, the vendor will, in the direction that’s easiest to demo and most expensive to operate.
Demand the breakdown. Build, data, eval, run – line-itemed. Anyone who can’t deliver that hasn’t done the work to price the project.
Own everything you paid to build. Model, data, eval set, pipelines, prompts. No exclusive-IP clauses. No vendor-retained artifacts.
Model three-year TCO, not year-one cost. That’s where platform-plus-seat-plus-token pricing reveals itself, and where annual escalators show their teeth.
Cap your T&M and report burn weekly. AI engagements drift. Without a cap, the bill drifts with them.
Budget for change management. The technology is the easier half. Getting your team to use it, trust it, and redesign their work around it is the harder half. Plan for it explicitly.
Walk away from one-number quotes. A vendor who says “$300K total” without breaking it apart is pricing your ambiguity, not your project. The right response is “send me the breakdown, or we will find a vendor who can produce one.”
Do these things and you will pay close to what the project actually costs. Skip them and you will pay 1.5x to 3x more – to the wrong vendors, on the wrong terms, for systems that quietly become shelfware.
The math is not hard. Most buyers just do not do it.
If you would rather not run the math alone, AI implementations are the canonical use case for Managed Selection – exec-sponsored, complex, often first-of-its-kind. Once the partner is signed, Delivery Assurance keeps the engagement honest as the work moves into build, eval, and run.
Frequently Asked Questions
How much does AI implementation cost in 2026?
It depends entirely on the engagement type. Internal AI tools run $5K–$60K. LLM-powered product features run $25K–$150K. Custom or fine-tuned models run $150K–$750K. Enterprise AI platforms with data pipelines run $500K–$5M+. Any vendor quoting a single number without naming the engagement type is pricing your ambiguity.
What's the TCO multiplier for AI projects?
Annual run cost lands at 20–40% of build cost – inference, monitoring, retraining, on-call. On top of that, the hidden third layer (data preparation, evaluation infrastructure, drift detection, compliance review, change management) typically equals or exceeds the build itself. A $250K build is realistically a $250K–$500K hidden layer plus $50K–$100K/yr to run.
Why are AI cost estimates so often wrong?
Because vendors quote the engineering line and call it the project. Data preparation, evaluation harnesses, monitoring, retraining, and change management are real work that has to happen and rarely shows up in the proposal. Buyers also under-spec the accuracy and latency requirements, which are the two biggest cost multipliers.
Should we build or buy?
Buy first. An off-the-shelf tool at $20–$200/seat/month is almost always cheaper than custom for the first 18 months. Build only when you've validated demand with the off-the-shelf version, hit a real limitation, and can articulate why the custom version generates more than 3x its cost in value.
What questions should I ask a vendor about AI cost?
Ask for the price broken into build, run (year one and year two), data preparation, and evaluation. Ask what accuracy and latency they're targeting and what each costs to improve. Ask who owns the model, the eval set, and the data pipeline. Ask what happens to price when usage doubles. If they can't answer cleanly, they haven't priced the project – they've priced a pitch.
What pricing models work best for AI engagements?
Fixed-fee for clearly scoped phases (discovery, pilot, integration). Time-and-materials with a hard cap and weekly reporting for ambiguous work. Avoid open-ended T&M, per-seat-plus-per-token bundles that compound, and any contract that grants the vendor exclusive rights to the model or eval set you paid to build.