Everybody loves a bit of an analogy to help us picture a story, and for me, I sometimes think about building a machine learning model being a bit like building a sports (kit) car in your garage: the engine roars perfectly on the test stand, but getting it onto the open road? That’s a different story. This is AI’s infamous “last mile.”
The “last mile problem” in AI refers to the challenges organisations face when transitioning AI models from successful testing phases to effective real-world applications. Despite billions invested in data and brilliant talent, most ML models never make it. And those that do? They often run on brittle, hand-assembled infrastructure, custom scripts, mismatched environments, and manual hacks. Data scientists, masters of math and modelling, are forced into DevOps roles, wrestling with Kubernetes, GPUs, and cloud networking. It’s slow, costly, and a colossal waste of talent.
MLOps promises to fix this. MLOps, or Machine Learning Operations, is a set of practices that combines machine learning, software engineering, and data engineering to manage the machine learning lifecycle efficiently. Just like DevOps and SRE, it focuses on deploying, monitoring, and maintaining machine learning models in production. However, it’s a methodology, not a turnkey solution. Enter an ML-ready internal developer platform (IDP) – the output of Platform Engineering and a game-changer that turns MLOps chaos into a production-ready AI factory.
Machine learning systems aren’t just software — they’re living, evolving systems with unique operational challenges:
Without a standardised approach, every project becomes a bespoke infrastructure nightmare, forcing data scientists to reinvent the wheel repeatedly.

Platform Engineering builds internal platforms (IDPs), curated, self-service environments that abstract software delivery complexity. From someone tinkering with data on their laptop to serving real customers with apps on the cloud, these environments simplify software delivery. Therefore, it makes sense that having a ready-to-go way of delivering AI training and serving in an already familiar approach provides clear and obvious value. This is our IDP for MLOps. Think of it as building a factory where your data scientists focus on the craft, while the platform handles the machinery.
This creates a “golden path” — a standardised, automated route from model conception to production — reducing cognitive load and operational friction on data specialists and software developers.

In this section, I wanted to look at what a modern MLOps blueprint could look like. In this example, we leverage Kubernetes for training, complex workflows and high-performance inference; and serverless for event-driven inference. One of the most interesting Kubernetes developments of recent years is since version 1.33, it introduced a new stable feature called Dynamic Resource Allocation. DRA is a Kubernetes framework that allows container workloads to request and consume specialised, shared resources, such as high-performance storage or hardware accelerators, in a manner that’s decoupled from the core scheduling process. So I wanted to touch on the benefit it can bring and how an IDP feature can be created to make using it transparent to data specialists and software engineers.
First, let’s look at a potential solution summary
Training deep learning models requires sustained computational power. Kubernetes provides a robust foundation:
For sporadic inference workloads (e.g., a user uploads a photo), serverless is cost-effective:
These services generally also come with reasonable metrics and logging included, but would still require some setup. As we’re looking to leverage Kubernetes benefits, we recommend staying in the same specification model and utilising tools like Config Connector or AWS Controllers for Kubernetes to deploy them.
For low-latency, high-throughput applications, we want to be using specialist hardware, a TPU or FPGA chip. To do so, some extra wiring is needed on our cloud provider set-up. Some serverless services are now offering the ability to connect them up, but we’re going to stick with the Kubernetes approach because of its robust, consistent and agnostic approach:

As you may have noted, Kubernetes becomes a fundamental tool of our MLOps. As well as using it to deploy our apps to, as it is traditionally used, we can also use it for workflows, security and compliance, and deploying our cloud serverless infrastructure; but the part I want to really highlight is how we can use it to consistently manage our AI hardware — DRA.
Dynamic Resource Allocation (DRA) for AI Accelerators
The Old Way: Rigid device allocation forced fragile workarounds — data scientists couldn’t easily request specific GPUs or slices for small tests.
New Way: Kubernetes 1.33+ introduces DRA:
gpu-high-performance-training).This flexibility reduces operational complexity and improves resource utilisation dramatically.
This forces Kubernetes to send AI requests to a GPU or TPU in the same physical zone as the request origin. This makes inference faster and reduces cloud costs without complex networking
Here is a clearer, more digestible rewrite tailored for an Internal Developer Platform (IDP) context.
A robust IDP doesn’t just deploy models; it manages their ongoing health and hardware stability through the entire lifecycle.
Deployments are defined as Configuration-as-Data manifests, not complex IaC. Using tools like Config Sync or FluxCD to apply changes as they happen keeps a constant, synchronised implementation across each SDLC environment.
The platform provides a pre-configured monitoring stack covering three distinct layers of reliability:
Model Intelligence
Device Health

Platform engineering shifts IT from a support function to a strategic driver, directly impacting the bottom line in four key areas:
The platform actively drives down the cost of AI at scale using intelligent orchestration:
The gains can be significant. I’ve summarised some simple comparisons from articles, seminar talks and direct conversations I’ve had over the last couple of years:
Time-to-Market
Talent Focus
Governance
Hardware Efficiency
Reliability

MLOps tells you what to do to productionise AI. Platform Engineering tells you how to do it at scale.
Stop treating each model as a unique, artisanal project. Build a factory: an Internal ML Platform that standardises workflows, automates operations, and frees your data scientists to innovate. Kubernetes 1.33+ and modern platform engineering finally make this possible.
We design modern MLOps and AI platforms that deliver results. Reach out to Mesoform today.