The Challenge: When "Massive" Becomes "Colossal"
We've all been there. You build something incredible and it’s generally considered a triumph... until someone asks how big will we need to be in 5-10 years time.
That's precisely what happened with one of our most ambitious projects to date. With over a decade of experience building platforms, we'd already built a robust Internal Developer Platform (IDP) that streamlined development, empowered engineers and it hummed like a well-oiled machine for a long time. We built this global IDP for a global financial powerhouse (think "too big to fail" big). This wasn't some scrappy startup; we were dealing with stringent regulations, complex security protocols, and a sprawling development organisation. Our Version 1 platform, crafted back in 2017, had scaled admirably, supporting around 4,000 technology projects and tens of thousands of engineers. Already one of the biggest Google Cloud implementations on the planet.
Except the world doesn't stand still; and this organisation had visions of how to scale even bigger over the next few years. We weren't talking about small growth here; they were projecting a usage to grow to between 5x-10x. Suddenly, "massive" became "colossal". At this scale, any squeaks and creaks the platform had would quickly become rattles and bangs.
The challenge? Take an already successful IDP and turn it into something which will handle this unprecedented scale, all while addressing the unique demands of a highly regulated environment and a platform team which became a victim of their own success and now struggles to keep on top of the growing requirements for features and keep up with industry and technology changes. No pressure, right?
Beyond Scale: The Multifaceted Puzzle
Scaling wasn't the only piece of the puzzle. As the technology environment grew, so did the complexity of its development landscape. We needed to tackle several interconnected challenges:
- Engineering Consistency: With numerous platform engineers spread across the globe, maintaining consistency in engineering practices and adherence to internal policies was like trying to herd cats. We needed a way to enforce standards without stifling innovation.
- Developer Toil: Nobody likes grunt work, especially highly skilled engineers. Manual tasks, repetitive configurations, and endless compliance checks were bogging down productivity and hindering creativity.
- Security and Compliance: In the financial world, security isn't a suggestion; it's a mandate. We had to navigate a labyrinth of controls to be responsible for, including VPC-SC perimeters, IAM policies, and data protection requirements, ensuring that the platform was a guard of the cloud infrastructure and even a fortress of itself.
Athena: The IDP That Could
Our answer to this multifaceted challenge? A custom, enterprise version of Mesoform Athena, our cutting-edge IDP-as-a-service. Re-architected from our original approaches of using disparate technologies, integrating them together and managed with GitOps and a central service catalog. Fundamentally rebuilding years of solutions design and engineering from the ground up to not only handle immense scale but also address the specific needs of demanding environments like this.
How Athena stepped up to the plate:
Self-Service Nirvana: We implemented a shared services and a self-service CAD model back in 2017 and agreed with the company to double down on both, but also giving developers greater autonomy to provision infrastructure and manage their environments without relying on centralised operations teams. This was crucial for both scalability and developer satisfaction.
Platform Engineering, Standardised: To ensure consistency and adherence to best practices across the organisation, we implemented a robust and standardised framework for platform engineers. This framework revolves around an operator pattern model, where all services managed by the platform team are encapsulated within Kubernetes operators.
But we didn't just stop at creating operators; we implemented a full Software Development Lifecycle (SDLC) for each one. This included:
- Pre-defined deployment configurations: New operators are deployed automatically to each environment with pre-configured settings, ensuring consistency and reducing manual effort.
- Pre-defined CI/CD pipelines: Automated pipelines are in place for all new operators, streamlining development, testing, and deployment.
- Pre-defined build configurations for custom admission controllers: Building new custom admission operators is simplified with pre-configured build settings and base libraries.
- Least privilege access controls: All new operators come with ready-to-go least privilege access controls, enhancing security from the outset.
This cookie-cutter approach to operator management, combined with the comprehensive SDLC, significantly reduces the cognitive load on platform engineers, promotes efficiency, and minimises friction in the development process. It ensures that new operators are consistently built, deployed, and secured according to best practices, freeing up platform engineers to focus on higher-value tasks. This was our "recursion point" - the point where we turned platform engineering onto itself to create a common, repeatable and low-burden way of building the custom solutions being requested of platform engineers by developers.
Security Amplified: We built a multi-layered security approach into Athena from the very beginning, going beyond the basics to provide robust protection:
- Policy-Driven Control: Using OPA Gatekeeper (Open Policy Agent), we decoupled policies from the underlying infrastructure, allowing for declarative specification and enabled centralised management of security and compliance rules, with clear violation dashboards for enforcement at scale. Think of guardrails for your cloud environment.
- Proactive Testing: Developers who were using the platform were given a way to proactively test deployments against centrally managed policies before committing changes, allowing them to catch any issues early. We even automated the testing of our policy templates to ensure policy quality and effectiveness.
- Custom Admission Control: Where OPA Gatekeeper fell short, we leveraged Kubernetes' dynamic admission control (KDAC) to do more advanced admission control, like integrating with external approval systems and external datasets, whilst still maintaining the common approach (OPA Gatekeeper uses KDAC underneath) to the security solution.
- Least Privilege Access: Every component operates with minimal permissions, reducing unauthorised access risks. Dedicated "robot" service principals with split duties are automatically set up for each project team using the platform and ensures no single entity has excessive access.
- Continuous Reconciliation: Using Config Connector we were able to provide frequent drift reconciliation by default, automatically detecting and remediating any unauthorised configuration changes, ensuring a consistently secure and compliant environment.
- Developer-Centric Design: We focused on making Athena a joy to use, with a no-code/low-code approach, YAML-based configurations, using the intuitive Kubernetes interface and built for security and manageability on GKE Enterprise (previously Anthos). Developers could focus on what they loved – building and shipping – without getting bogged down in complexity.
- GitOps: The Future is Now: We embraced a next-generation GitOps approach with Config Sync, providing a single source of truth for configurations, built-in deployment dashboards, and phased version adoption for a smooth transition. organisation's
The Results: Early Promise and Transformative Potential
While the full impact of Athena is still unfolding, the early signs are incredibly promising. The platform has just gone live with early adopters, and already, positive results can be demonstrated:
- Scale Ready: Leveraging the scalability of GKE Enterprise running on the organisation's existing infrastructure, the platform is ready to accommodate the anticipated explosive growth and handle 10,000+ projects with ease. Our initial choices in technology and design positions them for success.
- Early Productivity Gains: Platform engineers are already experiencing increased productivity thanks to automated tasks, streamlined workflows, and greater autonomy. This newfound efficiency is further fostering a culture of innovation and rapid development.
- Security Enhanced: The platform’s robust security measures have demonstrably strengthened the organisation's ability to implement controls. Centralised policies, least privilege access and strong environment separation are providing a solid foundation for a secure development lifecycle.
- Delivery Velocity Increased: Streamlined processes, automated deployments, and improvements to the GitOps-driven approach are poised to accelerate delivery times, enabling the team to respond to customer demands with agility and speed.
The Takeaway: A Foundation for Future Success
This wasn't simply a case of implementing a next generation IDP; this organisation already had a first-generation platform demonstrating the potential of this approach. This project was about taking that foundation and providing a solution for a future of unprecedented scale, complexity, and opportunity. Even though it is a very customised version of Mesoform Athena, this financial giant hasn't just upgraded its IDP; it has made a strategic investment in its future.
Conclusion
Athena delivers major improvements in security, compliance, flexibility, and observability, while simultaneously reducing cost, complexity, delivery time, and toil. This new platform empowers developers, streamlines operations, and accelerates innovation, all while positioning organisations for easy future transformations as the industry and technology landscape evolves. The early signs are extremely positive, and the potential for Athena to drive significant and lasting value is undeniable.
Learn more about how Mesoform Athena can help your organisation achieve similar success. Visit our website at https://www.athenamulti.cloud to explore our platform and request a demo.
If you would like to discuss any of these topics in more detail, please feel free to get in touch