Skip to content
PAVEL GLUKHIKH
Menu

AI

An enterprise AI adoption framework that survives contact

A staged enterprise AI adoption framework from an architect's seat: use-case triage, build vs buy vs wait, platform foundations, and metrics that matter.

10 min read

Executive summary

Enterprise AI adoption is the deliberate sequencing of use cases, platform foundations, skills, and governance so that each wave of AI deployment makes the next one cheaper and safer instead of harder. Most adoption efforts fail on sequence, not technology: pilots proliferate before a platform exists, governance arrives after the habits it needed to shape, and executives track activity metrics that reward motion over value. This article lays out the staged framework I use as an enterprise architect: how to triage use cases by blast radius and evidence, when to build versus buy versus deliberately wait, why a gateway comes before the tenth use case, how governance scales with adoption, and the handful of metrics executives should actually watch.

Enterprise AI adoption is a sequencing problem. The technology mostly works; the vendors mostly deliver what they demo; the employees are already using it whether anyone approved it or not. What separates the organizations getting compound value from the ones accumulating a pilot graveyard is the order in which they do things: which use cases first, which foundations before which proliferation, which governance at which stage, and which numbers the executives stare at while all of it happens.

I work on this daily as a Sr. Enterprise Architect in an AI enablement group, and the failure pattern is remarkably consistent. It is almost never “we bet on the wrong model.” It is thirty disconnected pilots, each with its own provider account and its own idea of logging, no shared platform, governance arriving eighteen months late as a document nobody can operate, and a board deck that counts pilots the way a failing sales team counts meetings.

Adoption done well looks different: each wave of deployment makes the next wave cheaper and safer, because the foundations were laid one stage ahead of the demand. That is the whole framework. The rest of this article is the stages and the decisions inside them.

The stages, briefly

I use five stages. They overlap in practice, and different business units will sit at different stages simultaneously, which is fine. What is not fine is skipping one, because each stage builds the asset the next stage consumes.

StageNameThe asset it produces
0VisibilityInventory of actual AI usage, sanctioned and shadow
1CalibrationTwo or three instrumented use cases and honest data about them
2FoundationGateway, identity integration, logging, an intake path
3ProliferationA portfolio of use cases riding the platform
4OptimizationConsolidation, cost engineering, retirement of what failed

Stage 0 deserves one paragraph of respect before anyone reaches for strategy. Adoption has already begun in every enterprise I have seen: employees pasting text into public tools, a team that shipped an API integration last spring, a SaaS vendor that turned on AI features by default in a product you already license. You cannot sequence what you cannot see. The inventory, who is using what, with which data, under which contract, is the same artifact that anchors AI risk management, and it typically produces the program’s first genuine surprises.

Use-case triage

The intake queue fills up faster than any team can deliver, so triage is the first permanent institution of an adoption program. I sort every candidate on two axes: blast radius and evidence.

Blast radius is who a bad output reaches and whether the damage is reversible. An internal draft a human edits is small. An automated message to a customer is large. An action against a production system or a financial record is larger still.

Evidence is whether anyone has demonstrated, in your environment or a credibly similar one, that the task is within current model capability at acceptable quality. “The vendor demo was impressive” is not evidence. A measured pilot with a defined baseline is.

The sort yields four buckets:

  • Small radius, strong evidence: run. Drafting, summarization, internal search, code assistance. These are the workhorses. They will never headline a keynote, and they will produce most of the realized value for the first two years.
  • Large radius, strong evidence: engineer. Worth doing, with the full control set, evals, human gates, monitoring, and the integrity properties described in the AI integrity framework. These go slower on purpose.
  • Small radius, weak evidence: experiment. Cheap probes with explicit kill criteria and a calendar date on which someone must decide.
  • Large radius, weak evidence: refuse, for now. This bucket is where the most exciting proposals land, fully autonomous customer resolution, unsupervised actions on production infrastructure, and the discipline to say “not yet” is the triage function earning its existence.

One rule I hold firmly: every approved use case names its baseline before it starts. How long does the task take today, at what quality, at what cost. Without the baseline, the retrospective becomes a feelings survey, and feelings surveys always come back positive, because the people answering them approved the project.

Build, buy, or wait

Every use case that survives triage faces the sourcing question, and AI has genuinely shifted the classic calculus in one direction: the vendors are moving faster than internal teams can, in the categories vendors care about. That sentence does the work if you let it.

Buy when the capability is generic and the category is one the vendors are competing hard on: coding assistants, meeting transcription and summarization, office-suite features, customer-service tooling. Any internal build here is a depreciating asset from the day it ships, because a vendor with a hundred times your engineering budget refreshes the capability quarterly. The evaluation effort belongs on data terms, where prompts and outputs go, what trains on them, what the contract actually says, not on feature comparison.

Build when the value depends on assets only you hold: proprietary data, internal workflows, integrations your vendors will never prioritize. In practice this means retrieval systems over your own corpus, agents wired into your internal tools, and applications where the prompt and orchestration logic encode genuine domain knowledge. The architectural patterns for this tier are covered in enterprise AI architecture patterns and, for the retrieval-heavy majority, in RAG architecture for the enterprise.

Wait is the option executives underrate because it looks like indecision. It is not. When a category is churning, when this quarter’s leading product may be next quarter’s acquisition or abandonment, a deliberate six-month deferral with a named review date is often the highest-return decision available. Waiting with a date and a criterion is strategy. Waiting without them is drift, and the two deserve to be distinguished in writing.

The trap spanning all three options is per-use-case sourcing anarchy: twenty buy decisions producing twenty vendor data-processing agreements, or ten build decisions producing ten bespoke provider integrations. Which is why the sourcing question is downstream of the platform question.

Platform before proliferation

The single highest-leverage architectural decision in an adoption program is standing up a control plane before the use-case count gets away from you. Concretely, that means an AI gateway: one governed path between everything you run and every model provider you use.

The gateway gives you, in one place: authentication tied to your identity provider, per-team and per-application cost attribution, prompt and completion logging under proper access control, rate and budget limits, policy enforcement, and model routing, so an application asks for a capability tier rather than hard-coding a vendor. When provider pricing shifts or a model is deprecated, routing changes in one place instead of in forty codebases.

The timing argument is the part executives need to hear. Deployed at Stage 2, when there are three use cases, the gateway is a modest piece of infrastructure that every subsequent team simply inherits. Retrofitted at Stage 3, with thirty use cases live, it is a multi-quarter migration program negotiated team by team, and some of those negotiations will be lost. I have watched the identical dynamic play out in cloud adoption: the organizations that built landing zones before workload migration governed their estates, and the ones that did not spent years doing archaeology. Infrastructure earns its keep by being early and boring.

Two honest caveats. A gateway is a dependency and a potential single point of failure, so it gets engineered like one: highly available, with a documented break-glass path. And it is not a substitute for application-level controls; permission-aware retrieval, output evaluation, and human gates still live with the application. The gateway governs the pipe, not the behavior. Behavior is what evaluation in production is for.

Capability building

Tools without skills produce expensive demos, but the skills that matter are more specific than “AI literacy,” and generic training programs mostly miss them.

Three capabilities repay deliberate investment. Engineers who can evaluate, because writing evals, constructing golden sets, and reasoning about non-deterministic failure is the scarce engineering skill of this cycle, scarcer than prompt writing and far more durable. Domain experts who can specify, because the highest-value systems encode judgment that lives in underwriters, clinicians, and analysts, and extracting that judgment into testable expectations is a skill nobody’s job description contains yet. Executives who can calibrate, because leaders who have personally used the tools on real work make noticeably better build-buy-wait decisions than leaders operating on vendor briefings. The gap shows in meetings within minutes.

The delivery mechanism matters less than the substrate: people learn on real use cases riding the real platform, not in sandboxes disconnected from both. A useful leadership discipline during this stage is protecting teams’ time to actually build the skill, which is a resourcing decision, not a training-catalog decision. I have written elsewhere about technology roadmaps as commitments rather than aspirations; capability lines belong on the roadmap with the same status as system deliveries.

Governance that scales with adoption

Governance fails in two symmetric ways: arriving too late, after habits have formed and thirty systems each do logging differently, or arriving too heavy, a review board in front of every experiment, which teaches the organization to stop asking. The design goal is governance that scales with the stage, tightening where blast radius grows and staying out of the way where it does not.

A shape that works:

  • Stage 1: rules of the road only. Which data classes may go to which tools, one page, plus the inventory obligation. No boards.
  • Stage 2: the platform is the policy. The gateway enforces access, logging, and budgets mechanically, which removes an entire category of review meetings, controls that execute do not need controls that convene.
  • Stage 3: risk-tiered review. The triage buckets from earlier map directly: run-bucket cases self-certify against a checklist; engineer-bucket cases get real review, evals, and named risk acceptance per the practices in AI risk management.
  • Stage 4: audit and pruning. Governance turns retrospective: which accepted risks materialized, which controls produced signal, which systems should be retired.

The principle underneath, argued at length in AI governance for engineers, is that governance engineers will not route around must be cheaper to comply with than to evade. Every week of review latency is a subsidy for shadow AI. The NIST AI RMF supplies a sound vocabulary for the govern function; the implementation that actually holds is the one compiled into the platform rather than published as a PDF.

The metrics executives should actually track

Most AI dashboards I see measure motion: pilots launched, employees trained, tokens consumed, “percent of workforce using AI weekly.” Motion metrics have a property that should disqualify them from board decks: they can all be maximized while value goes to zero.

Track a small set with teeth instead:

MetricWhat it tells you
Hours returned per workflow, against the pre-AI baselineWhether value is real. No baseline, no claim.
Unit cost per AI-assisted task, trendedWhether the economics improve with scale or erode
Portfolio coverage: % of production AI systems with a named owner and eval coverageWhether control is keeping pace with adoption
Intake-to-production cycle timeWhether governance is a path or a wall
Portfolio kill rateWhether anyone is telling the truth

The last one deserves a sentence of defense. A portfolio in which no experiment has ever been killed is not a portfolio with perfect judgment; it is a portfolio where retrospectives are theater and sunk costs ride forever. A healthy experiment bucket produces dead experiments at a steady rate, and each one is cheap tuition. Executives who celebrate a killed pilot in public do more for the quality of the next quarter’s proposals than any intake template.

One number I deliberately exclude: aggregate “AI ROI” as a single figure. The measurement error across a portfolio of heterogeneous workflows swamps the signal, and the number exists mostly to be put on slides. Per-workflow value against a baseline is measurable. Portfolio-level precision is theater.

Tradeoffs worth stating plainly

This framework has costs, and pretending otherwise would violate its own rules.

Sequencing sacrifices speed at the start. Standing up a gateway and an intake while competitors announce pilots feels slow for roughly two quarters, until their integration debt comes due and your marginal use case costs a fraction of theirs. The bet is on the compound curve, and it is a bet; in a genuinely winner-take-all niche, a rushed pilot might be the right call. Most enterprise use cases are not that.

Central platforms concentrate risk and can concentrate bureaucracy. A gateway team that turns into a bottleneck recreates the review-board problem in infrastructure form. The mitigation is treating the platform as a product with users, latency, and an SLA, not as a checkpoint.

And no framework substitutes for judgment. Stages describe readiness, not a calendar; triage buckets inform decisions, not replace them. The organizations that do this well hold the framework loosely and the baselines firmly.

Underneath all of it is the oldest lesson in enterprise technology: platforms compound, projects do not. Every previous wave, virtualization, cloud, containers, rewarded the organizations that built the boring foundations one step ahead of demand and punished the ones that scaled first and governed later. AI changes the workloads, the risks, and the speed. It has not changed that.

Frequently asked questions

Where should an enterprise start with AI adoption?
Start with an inventory of what is already happening, because adoption has already started without you: employees are using public tools and teams have shipped API calls. Then pick two or three use cases that are high-volume, low-blast-radius, and measurable, internal drafting, summarization, code assistance, and instrument them properly. The goal of the first stage is not ROI. It is calibrated judgment about what the technology can and cannot do in your environment.
When does it make sense to build AI capability instead of buying it?
Buy when the capability is generic and the vendor's roadmap outruns anything you could fund: coding assistants, meeting summarization, office suite features. Build when the value depends on your proprietary data, your workflows, or integration your vendors will not prioritize, which in practice means retrieval systems, agents wired into internal tools, and anything where the prompt logic encodes your domain. Wait when the category is churning so fast that this quarter's purchase is next quarter's regret.
What is an AI gateway and why does it come before scaling use cases?
A gateway is a control point between your applications and model providers: one place for authentication, logging, cost attribution, rate limits, policy enforcement, and model routing. Deployed early, it makes every subsequent use case inherit observability and control for free. Retrofitted late, it means renegotiating with twenty teams who each built their own provider integration. It is the difference between governing a platform and chasing a sprawl.
What metrics should executives track for AI adoption?
Track value and control, not activity. Useful: hours returned per workflow measured against a baseline, unit cost per AI-assisted task and its trend, the fraction of production AI systems with named owners and evaluation coverage, and time from use-case approval to production. Vanity: number of pilots, number of employees 'trained', raw token consumption. Activity metrics rise fastest in exactly the organizations that are converting the least of it into value.
How do you handle shadow AI during an adoption program?
Treat it as demand signal first and policy violation second. Shadow AI exists because sanctioned paths are slower or worse than public tools. The durable fix is to make the governed path the convenient one: fast access to approved models through the gateway, clear data rules, and a lightweight intake. Enforcement without a good sanctioned alternative just pushes usage further out of sight, and what you cannot see, you cannot govern.

References

Related reading