Infrastructure
Infrastructure modernization without the theater
An infrastructure modernization approach without transformation theater: what to replace, what to keep, strangler patterns, funding, and risk-first sequencing.
Executive summary
Infrastructure modernization is the disciplined replacement of systems that genuinely constrain you and the disciplined operation of everything that does not — not the wholesale reinvention the word transformation implies. Most modernization programs fail the same way: they treat an operating problem as a procurement problem, sequence around visible wins instead of risk, and run out of funding at the 90% mark, leaving two estates running where one used to. This article gives the assessment that separates replace from operate-better, applies strangler-style migration at the infrastructure layer, reads lift-and-shift honestly, defends the legacy that should stay, and explains why product funding beats project funding for work that never actually ends.
Modernization has a theater problem
Most infrastructure modernization programs are announced, not engineered. There is a kickoff deck with a mountain on it, a three-year timeline, a renamed program office, and a definition of success that amounts to “the old thing is gone.” Two years in, the old thing is still there. Next to it is a partially finished new thing. The team is now operating both, with the same headcount, and the program is being quietly rescoped into a victory.
I have watched this cycle from several chairs — twelve years running a hosting company on infrastructure I had to keep alive through every generation of hardware I owned, plant networks where the refresh cycle was measured against turnaround schedules, and enterprise accounts where “transformation” was often the largest line item on the statement of work. The failure pattern is consistent across all of them, and it is rarely technical. It is the decision, made implicitly on day one, to treat modernization as an event.
Modernization is not an event. It is a property of how you operate.
An estate where components are continuously assessed, migrated in small reversible steps, and actually decommissioned does not need a transformation program, because it never accumulates enough of a backlog to justify one. The transformation program is what you buy when that discipline has been absent for a decade — and buying it as a single heroic project imports all the risk that the missing discipline created, concentrated into one budget cycle. The rest of this article is about doing the work without the theater: what to assess, how to migrate, what to leave alone, how to fund it, and in what order.
Separate “needs replacing” from “needs operating better”
The first honest act in any modernization effort is an assessment that most programs skip, because its results are politically inconvenient: sorting the estate by the cause of the pain, not the age of the equipment. Every system lands in one of four buckets.
Working and healthy. Stable, monitored, documented, patchable, owned by someone who understands it. Leave it alone. This bucket is larger than transformation vendors want it to be.
Working but badly operated. The system is fine; the operating practice around it is not. No monitoring, no current documentation, configuration that lives in one engineer’s head, backups nobody has tested. The symptom is fear: nobody wants to touch it, so everyone concludes it must be replaced. This is usually the largest bucket, and it is the expensive mistake waiting to happen — because replacing a badly operated system produces a badly operated new system. The operating habits migrate with the team. What this bucket needs is operational investment: telemetry, runbooks, restore tests, an owner. That work is unglamorous, costs a fraction of replacement, and removes most of the risk that was driving the replacement argument.
Genuinely constraining. The system itself is the problem. The operating system cannot be patched and cannot be isolated. The vendor is gone. The platform has a capacity ceiling the business has already hit. Spare parts come from an auction site. The cost of making any change has grown larger than the value of the changes being requested. These are the real modernization candidates, and an honest assessment usually finds fewer of them than the program charter assumed.
Should not exist at all. The application was retired but the infrastructure under it was not; the report nobody reads still has a dedicated server; the “temporary” replication job has a five-year uptime. The cheapest modernization is deletion, and every estate I have assessed had more of this than anyone expected.
The sorting test I use is one question: is the pain caused by what the system is, or by how it is run? Age answers neither. A twelve-year-old system with stable requirements and a tested recovery plan is an asset. A two-year-old system nobody can safely change is technical debt with fresh paint.
Strangler patterns work on infrastructure too
Martin Fowler described the strangler fig pattern for applications: grow the replacement around the legacy system, route traffic incrementally, and retire the old system when nothing calls it anymore. Application teams have used it for two decades. Infrastructure teams have better seams for it than application teams do, and use it less.
The seams are the indirection layers you already own. DNS names instead of IP addresses. Load balancer VIPs instead of server names. An identity provider instead of local accounts. A message broker instead of point-to-point transfers. Each one is a facade you can hold stable while everything behind it changes. The migration mechanics are always the same four steps:
- Put the facade in place first, before any replacement exists, and move consumers onto it. This step has almost no risk and creates the option value everything else depends on.
- Stand up the replacement behind the facade and route a deliberately small, low-consequence slice of traffic to it.
- Move consumers in small groups, watching telemetry after each move, with a rollback path that is a routing change rather than a restore.
- Decommission when traffic reaches zero — and prove zero with telemetry, not assumption. Watch the old system’s interfaces for a full business cycle, including month-end and year-end, before you power anything off. The job that runs quarterly is the one that finds you.
This shape applies far beyond applications: mail relays, DNS platforms, storage migrated share by share, directory services migrated by trust and then by object, monitoring stacks run in parallel until the new one has caught a real incident. The prerequisite for all of it is knowing the dependency graph, which is why an observability investment usually has to precede the migration it justifies. You cannot move consumers you have not identified.
One discipline decides whether strangler migrations succeed, and it is not technical: finishing. The pattern’s known failure mode is the migration that reaches 90% and stalls, because the last two consumers are awkward and the team has been redeployed. At 90% you are paying for both systems, both sets of patches, both on-call burdens — the transition state costs more than either endpoint. A migration is done when the old system is powered off and its maintenance renewal is cancelled. Nothing before that counts.
Lift-and-shift honesty
Rehosting — lift-and-shift, the least fashionable of the migration strategies AWS catalogs as the “7 Rs” — gets sneered at in architecture reviews and quietly chosen in program plans. Both reactions miss what it actually is: a relocation, with a relocation’s costs and benefits.
What lift-and-shift genuinely buys: an exit from a facility on a deadline, escape from a hardware refresh you do not want to fund, and a timeline measured in months instead of years because nothing about the workload has to be understood deeply enough to change it. Those are real benefits, and when the driving constraint is a lease expiry or an end-of-life storage array, rehosting is often the correct engineering answer.
What it does not buy: cheaper operations or better architecture. A VM sized for a datacenter — provisioned for peak, running at 15% utilization, chatty with its neighbors — costs more in a cloud that meters exactly those behaviors, not less. The workload arrives with its operational debts intact: the same brittle startup ordering, the same undocumented configuration, now on rented hardware. The cloud-versus-on-premises constraints do not soften because the move was fast.
The honest version of lift-and-shift states its purpose as relocation, budgets a second phase for the workloads worth optimizing, and accepts that some rehosted workloads will simply run out their remaining life where they land. The dishonest version calls the relocation “transformation,” declares victory at cutover, and leaves the finance team to discover the difference over the following four quarters. Same technical work. Entirely different program integrity.
The legacy that should stay
Some systems should survive every modernization program, and saying so out loud is part of doing the assessment honestly.
Age is not a defect. A system earns the right to stay when its requirements have stopped changing, its failure modes are understood and documented, it can be patched — or genuinely isolated when it cannot be — and the risk of replacing it exceeds the risk of running it. Plenty of infrastructure meets that bar: the plant historian that has quietly done one job for a decade, the internal DNS that has not paged anyone in years, the file transfer system that four business processes depend on and none want changed. Replacing a boring system that works is spending risk to buy nothing.
The same honesty cuts the other way. GAO’s reviews of federal legacy systems document the end state of “if it isn’t broken, don’t touch it” as a strategy: systems running unsupported software with known vulnerabilities, maintained by a shrinking pool of people near retirement, where the modernization plan does not exist because the system was never officially a problem. The distinction that matters is not old versus new. It is owned versus unownable. A system is unownable when no vendor supports it, no one on staff fully understands it, its skills are leaving the labor market, and it cannot be isolated from things that can hurt it. Unownable systems go on the replacement list no matter how quietly they are running today — because the time to replace a system is while it still works.
Fund it like a product, not a project
The funding model decides more modernization outcomes than the technology choices do, and most organizations pick the wrong one by default.
Project funding — a lump sum, a fixed scope, an end date — is how big- bang modernization gets built, and it manufactures the failure modes this article keeps circling. The end date creates pressure to declare success at deployment rather than decommissioning. The lump sum runs out precisely at the 90% mark, where the awkward last consumers live. And the end date itself is a fiction: when the project closes, the team disbands, ownership lands on an operations group that was consulted twice, and the estate resumes aging with nobody funded to notice. Five to eight years later, the backlog justifies the next transformation program, and the cycle is complete.
Product funding treats the infrastructure estate as a product with a standing team, a roadmap, and a permanent budget line that covers lifecycle work — the same way serious organizations fund a platform. Under that model, modernization stops being an initiative and becomes a queue: every component has an owner, a health assessment, and a planned end of life, and migrations of the strangler shape run continuously at a rate the team can absorb. The roadmap discipline this requires is the same one I describe in digital transformation for engineers: strategy expressed as a sequence of operable states, not a rendering of the end state.
The objection is always that product funding looks more expensive, because the spend is visible every year instead of spiking every seven. The spend was always there. Project funding just books it as emergencies.
Sequence around risk, not visibility
Programs default to sequencing around visible wins — the executive-facing system first, the flashy platform first. Sequence around risk instead. The questions that order the queue:
| Question | If yes | Why |
|---|---|---|
| Is there a forced deadline? | Do it in its window | End-of-support dates, lease expiries, and compliance deadlines are not negotiable; everything else is |
| Does it block other migrations? | Do it early | Identity, network, and DNS are dependencies of every other move; modernizing them last means doing everything twice |
| Is it unownable? | Schedule it now, while it works | Replacement under failure conditions costs multiples of replacement under control |
| Is it high-churn? | Do it before the stable systems | Systems that change weekly pay modernization dividends immediately; systems that change never can wait |
| Is it stable and boring? | Do it last, or never | The quiet core is where replacement risk exceeds replacement value |
Two additions from experience. First, do the riskiest migration second. First should be something small and genuinely representative, chosen to prove the migration method itself — the facade mechanics, the rollback path, the telemetry — where a failure teaches instead of wounds. Once the method has survived contact with production, spend it on the migration that matters most while the team and the funding are freshest. Saving the hard one for last hands it to a tired team and an exhausted budget.
Second, sequence the enabling layers deliberately. If the estate is moving toward infrastructure as code, every system migrated before that operating model exists will be migrated by hand and then re-migrated into code later. Doing the operating-model work early feels like delay. It is the opposite: it is the difference between a program that accelerates and one that repeats itself.
What to write down
- The four-bucket assessment — healthy, badly operated, constraining, delete — with a named owner and a next decision date per system.
- The dependency graph for every retirement candidate, and the telemetry that will prove zero traffic before power-off.
- The definition of done for each migration: old system off, contracts cancelled, monitoring removed, documented.
- The funding model, explicitly chosen, with the lifecycle line item that survives budget season.
Modernization theater persists because it photographs well: the program launch, the new platform, the mountain slide. The real work photographs poorly — an inventory that mostly concludes “operate this better,” a facade quietly installed a year before anyone migrates, an old server finally powered off with no announcement at all. But infrastructure that is continuously renewed never needs to be transformed, and that is the actual goal. The platforms will keep changing. The discipline of knowing what you run, why it is there, and when it leaves — that is what modern means.
Frequently asked questions
- What is infrastructure modernization?
- Infrastructure modernization is the ongoing practice of replacing infrastructure components that constrain the business — unsupportable platforms, capacity ceilings, unpatchable systems — while improving the operation of components that are merely old. It is a lifecycle discipline, not a one-time program. The organizations that do it well treat modernization as routine portfolio management: continuous small migrations funded like a product, rather than a three-year transformation funded like a project.
- Is lift-and-shift a legitimate modernization strategy?
- Yes, for specific problems: a datacenter exit with a hard deadline, hardware at end of life, or a facility you need to leave. Rehosting buys you an exit and a timeline. It does not buy cheaper operations or better architecture — a VM shaped for a datacenter usually costs more in cloud, not less. Lift-and-shift is honest when you call it a relocation and budget the optimization work that follows. It fails when you call it transformation and stop there.
- What is the strangler pattern for infrastructure?
- The strangler pattern retires a system by putting a stable facade in front of it — a DNS name, a load balancer VIP, an identity broker — then moving consumers to the replacement one at a time behind that facade, while both systems run. The old system is decommissioned only when telemetry proves traffic has reached zero. It replaces big-bang cutover risk with many small, reversible migrations, at the cost of operating two systems during the transition.
- Should all legacy systems be replaced?
- No. Old and stable is not a defect; a system with stable requirements, understood failure modes, and a workable patching or isolation story is often the lowest-risk component in the estate. Replace legacy when it is genuinely unownable — no vendor, no hireable skills, unsupportable software with network exposure — or when it blocks changes the business actually needs. Replacing a boring system that works is spending risk to buy nothing.
- Why do infrastructure modernization programs fail?
- The common failure modes are consistent: replacing badly operated systems and inheriting the same operating habits on new hardware; sequencing around visible wins instead of risk; project funding that runs out before the last consumers are migrated, leaving both estates running indefinitely; and declaring success at deployment instead of at decommissioning. Each of these is a management structure problem, which is why better technology alone never fixes them.