Agentic Platform Engineering at Outshift by Cisco

Guy Menahem
Jul 19
4 min read

Hasith Kalpage - Director, Platform Engineering & CISO @ Outshift by Cisco

Sri Aradhyula - Senior Technical Leader @ Cisco

Platform engineering was born to tame the chaos of the cloud-native world. By creating a streamlined path for developers, it brought order and governance. However, this very design often turns the platform team into a bottleneck. The constant flow of requests, from creating a new repo to getting cluster access, can lead to developer frustration and platform team burnout.

But what if we could evolve beyond the world of tickets and forms? The team at Outshift by Cisco is showing how Agentic AI is transforming platform engineering from a bottleneck into an intelligent, automated superhighway.

The Problem with "Platform Engineering V1"

For years, the solution to developer self-service has been a portal filled with forms and templates. Whether through a UI like Backstage or a CLI, developers select from a menu, fill out endless fields (CPU, memory, bucket name, user ID), and hit "submit."

This "Platform Engineering V1" was a huge step up from manual "click-ops," but it has its limits:

Scalability: As the number of services and templates grows into the hundreds or thousands, finding the right form becomes a challenge in itself.
User Experience: Forcing developers to fill out fields with information the system should already know (like their user ID or project context) is frustrating.
The "Last Mile" Problem: Even with perfect automation, users still have questions. Platform teams spend countless hours answering the same things over and over in chat, acting as a human FAQ.

This is where Agentic AI comes in. Instead of structured forms, it uses natural language to understand a developer's intent. It can have a conversation, clarify incomplete requests, and orchestrate complex tasks across multiple systems.

The Impact of an AI Platform Engineer

The Outshift team has been running their agentic platform in production for over 10 months, and the results speak for themselves. Led by Hassaan Ahmed and architected by Shri Arajula, their system has delivered a dramatic impact.

Eliminated the Support Desk: They completely removed their dedicated, ticket-based support desk. The AI engineer is now the first point of contact.
Drastically Reduced Response Times: What used to take hours of human attention now takes seconds or minutes. Simple requests like provisioning a development VM or answering an FAQ are handled automatically.
Met Developers Where They Are: The AI isn't confined to one portal. It's accessible through the tools developers already use, including WebEx, Backstage, VS Code, and the CLI.

As Hassaan Ahmed puts it, "We have been able to get our AI engineer as the entry point... Response times could be hours at times, whereas now, with seconds, responses are pretty much immediate."

Open Sourcing the Future: The CNOE Project 🛶

Seeing this success, the team decided to bring their work to the broader community through the open-source CNOE project. They created a new Special Interest Group (SIG) focused on Agentic AI for Platform Engineering.

The goal isn't just to build a tool, but to define what a production-grade, multi-agent platform engineering system looks like. Just as you need medical experts to design a useful health AI, you need platform engineering experts to design the right agents, skills, and workflows for infrastructure.

The architecture is built on a few key concepts:

Multi-Agent Systems: Instead of one monolithic "uber agent" that gets easily confused, the system uses a supervisor agent that orchestrates smaller, domain-specific agents (e.g., a GitHub agent, a PagerDuty agent, a Kubernetes agent).
Standard Protocols: The system leverages emerging industry standards to ensure interoperability and scalability.
- MCP (Multi-Agent Communication Protocol): This protocol allows agents to discover and use "tools," which are essentially APIs. It provides the LLM with the necessary context about what a tool does and what inputs it needs, without overwhelming its context window.
- A2A (Agent-to-Agent Protocol): This is the communication bus between agents. It enables one agent (like the supervisor) to delegate tasks to another, handling complex, asynchronous interactions.

This architecture provides flexibility, allowing teams to use different LLMs for different agents (e.g., a local model for sensitive data, a powerful frontier model for complex reasoning) and ensuring the system can scale.

See It in Action: A Quick Demo

The Canoe project provides a GitHub repository where you can try this out yourself. With a single docker-compose up command, you can spin up a complete environment, including a Backstage instance with a pre-built "Agent Forge" plugin.

In a live demo, Shri Arajula showed two examples:

Simple Query: Asking, "Show me all the GitHub repos in canoe-io," the supervisor agent correctly routed the request to the GitHub sub-agent, which used an MCP-enabled tool to call the GitHub API and return the list.
Complex Query: A more powerful example: "Show me who's on SRE on-call and find all their Jira tickets from the last 7 days." This is where the magic happens. The orchestrator:
- First, called the PagerDuty agent to find out who was on call.
- Then, using the email address returned, it called the Jira agent.
- The Jira agent ran a JQL query to find the relevant tickets.
- Finally, the orchestrator synthesized the information from both sources into a single, formatted response.

This entire multi-step, multi-tool workflow was completed in seconds, all from a single natural language prompt.

What About Security?

Handing the keys to an AI can seem daunting. The project tackles security by treating agents like any other microservice.

Authentication & Authorization: The system uses standard practices like OAuth and JWTs. The user is authenticated, and that identity can be used to perform actions on their behalf, respecting existing RBAC policies in systems like Kubernetes.
Human in the Loop: GitOps isn't dead! For critical actions, the agent can create a pull request, allowing a human to review and approve the change, with the agent providing all the necessary context to make the decision easy.

The

Platformers