MAI Models and Multi-Model Strategy

Around June 2, 2026, at its Build developer conference, Microsoft unveiled a family of in-house MAI AI models explicitly framed as a way to lessen its reliance on OpenAI and lower costs for developers. The inaugural model in that lineup, MAI-Code-1-Flash, takes a written description from a person and produces source code for applications and websites. If the largest deployer of OpenAI’s models in the world is building its own alternatives to reduce dependence on a single provider, that is not a Microsoft story. It is a memo to every engineering org running production AI on one vendor’s API.

The lesson is not “switch to Microsoft.” The lesson is the move itself: even Microsoft — with the deepest OpenAI relationship in the industry — decided that depending on a single model provider was a risk worth spending real money to reduce. Your AI implementation should reach the same conclusion, and the good news is you can reach it for a lot less than a frontier training run.

Single-vendor lock-in is a cost and continuity risk, not an ideology#

The argument for model diversification is usually made on principle — avoid lock-in, stay flexible, support the ecosystem. Those are fine sentiments and weak motivators. The argument that actually moves a budget is concrete, and it has three parts.

Pricing moves underneath you. Model pricing in this market changes on a quarterly cadence, sometimes faster. A provider that is the obvious choice today can be twice the cost-per-task of a competitor six months from now, or a competitor can ship a model that is suddenly the better deal — as just happened. If your system is wired to one vendor’s API at the call site, every price change is a renegotiation you can’t act on quickly.

Capability leadership rotates. No single provider has held the top of every task category for long. One model is best at code, another at long-context reasoning, another at structured extraction at low cost. A system locked to one vendor inherits that vendor’s weakest categories along with its strongest.

Availability is a single point of failure. A provider outage, a rate-limit change, a sudden deprecation of the exact model version you depend on — each of these takes down a single-vendor system entirely. A multi-model system degrades to a fallback instead of going dark.

Microsoft’s MAI move is the institutional version of all three: reduce exposure to one provider’s pricing, capability roadmap, and availability. You don’t need to train your own model to get the same protection. You need an architecture that treats the model as a swappable component.

Central router hub directing flows to several distinct model blocks, one breaking away from a dominant block

Multi-model architecture: routing as a first-class concern#

A multi-model system is not “we use three providers.” It is a deliberate routing layer that decides, per request, which model handles the work — and that decision is driven by data, not by whoever signed the last contract.

In the systems we build, routing keys off a few signals. Task type is the first: a code-generation request, a cheap classification, a long-context summarisation, and a high-stakes reasoning task are four different jobs that may warrant four different models. Cost and latency budget is the second: a high-volume background automation routes to the cheapest model that clears the accuracy bar, while a low-volume, high-value task can afford the premium model. Fallback policy is the third: if the primary model errors or times out, the router degrades to a secondary rather than failing the whole request.

The trap here is over-engineering. You do not need a router on day one for a system calling one model a hundred times a day. You need the seam where a router can live — so that when the second model becomes worth adding, you add it behind the abstraction instead of refactoring every call site. Cut the trendy agent-routing framework if it is not earning its keep; keep the thin, boring routing function you can actually reason about. The discipline is the same one we apply everywhere: keep the load-bearing tools, drop the fashionable ones.

The portable evals layer is what makes swapping safe#

Here is the part most teams skip, and it is the part that makes everything above real. You cannot safely swap models if you cannot measure whether the swap made things worse. The thing that lets you treat models as interchangeable is a portable evaluation harness — a fixed set of representative tasks with checkable outcomes, decoupled from any one provider, that you can point at any model and get back an accuracy figure and a cost-per-task figure.

Build the eval set against your work, not the leaderboard#

A public benchmark tells you how a model does on someone else’s tasks. Your eval set should be your tasks: the actual prompts, the actual data shapes, the actual success criteria from your domain. For a coding model like MAI-Code-1-Flash, that means “does the generated code pass the test suite for our representative tickets,” not “what did it score on a generic coding benchmark.” For an extraction model feeding a Data Platform, it means “did it pull the right fields from our real documents.” The eval is only useful if it predicts production behaviour, and only your data does that.

Keep the harness provider-agnostic#

The evals must not depend on any provider’s SDK quirks. The harness sends a prompt, gets a completion, scores it — through the same abstraction your application uses. When a new model ships, whether it is a MAI model, a frontier closed model, or an open model you self-host, it goes through the identical harness and produces comparable numbers. That comparability is the whole point. Without it, “should we switch” is a debate; with it, it is a measurement.

Track cost alongside accuracy, always#

Every eval run reports a dollar cost per task next to its accuracy. A model that is 2% more accurate but 5x the cost is the right choice for some tasks and the wrong one for most. You cannot make that call without both numbers in front of you, and you cannot make it quickly unless the harness produces them automatically. Cost tracking is non-negotiable — it is what turns “swap the model” from a leap of faith into a decision with a spreadsheet behind it.

A single evaluation rubric card run against three swappable model tiles in sequence

What this looks like in a real platform#

When we deploy a data-centric ERP for a client — a Hospital Management System with AI-assisted clinical documentation, a School ERP with automated administrative workflows — the model is never wired directly into the application logic. It sits behind an abstraction, selected by a router, watched by observability, and continuously scored by a portable eval set. The application code does not know or care which provider answered. That is what lets the platform absorb a launch like the MAI family as a tuning decision rather than a rewrite.

This is exactly where legacy ERP vendors fall down. Their AI features, where they exist at all, are bolted to one provider through an opaque integration, with no evals, no routing, and no cost visibility. When the pricing shifts or a better model lands, they cannot move — their data is trapped and their model choice is frozen. A modern platform built on this discipline reads the new model’s numbers off its own harness and decides in a day. Microsoft just spent a fortune to buy itself that optionality. With the right architecture, you can have it for the cost of building the harness once.

The right model for your workload will change. Your architecture should make that a measurement, not a migration. If you want a portable evals-and-routing layer that lets you swap models as pricing moves, let’s talk.

Single-vendor lock-in is a cost and continuity risk, not an ideology#

Multi-model architecture: routing as a first-class concern#

The portable evals layer is what makes swapping safe#

Build the eval set against your work, not the leaderboard#

Keep the harness provider-agnostic#

Track cost alongside accuracy, always#

What this looks like in a real platform#

Related posts.

The Model Migration Runbook: Swapping the LLM Under a Production System

The Semantic Layer Is the Missing Piece in Enterprise AI

Agents in Slack: An Engineer's Read on Claude Tag