Many organizations are under pressure to take their AI agent experiments and proof of concepts out of pilots and into production. Devops teams may have limited time to ensure these AI agents meet AI agent non-negotiable requirements for production deployments, including implementing observability, monitoring, and other agenticops practices.
One question devops teams must answer is what their minimum requirements are to ensure AI agents are observable. Teams can start by extracting fundamentals from devops observability practices and layering in dataops observability for data pipelines and modelops for AI models.
But organizations also must extend their observability standards, especially as AI agents take over role-based tasks, integrate with MCP servers for more complex workflows, and support both human-in-the-middle and autonomous operations.
A key observability question is: Who did what, when, why, and with what information, from where? The challenging part is centralizing this information and having an observability data standard that works regardless of whether the decision or action came from an AI agent or a person.
“Devops should apply the same content and quality processes to AI agents as they do for people by leveraging AI-powered solutions that monitor 100% of interactions from both humans and AI agents,” suggests Rob Scudiere, CTO at Verint. “The next step is observing, managing, and monitoring AI and human agents together because performance oversight and continuous improvement are equally critical.”
I asked experts to share key concepts and their best practices for implementing observable AI agents.
1. Define success criteria and operational governance
Observability is a bottom-up process for capturing data on an AI agent’s inputs, decisions, and operations. Before delving into non-functional requirements for AI agents and defining observability standards, teams should first review top-down goals, operational objectives, and compliance requirements.
Kurt Muehmel, head of AI strategy at Dataiku, says observable agents require three disciplines that many teams treat as afterthoughts:
- Define success criteria because engineers can’t determine what “good” looks like alone. Domain experts need to help build evaluation datasets that capture edge cases only they would recognize.
- Centralize visibility because agents are being built everywhere, including data platforms, cloud services, and across teams.
- Establish technical operational governance before deployment, including evaluation criteria, guardrails, and monitoring.
Observability standards should cover proprietary AI agents, those from top-tier SaaS and security companies, and those from growing startups. Regarding technical operational governance:
- Evaluation criteria can incorporate site reliability concepts around service-level objectives, but should include clear boundaries for poor, unacceptable, or dangerous performance.
- Guardrails should include deployment standards and release-readiness criteria.
- Monitoring should include clear communication and escalation procedures.
2. Define the information to track
Observability of AI agents is non-trivial for a handful of reasons:
- AI agents are not only stateful but have memory and feedback loops to improve decision-making.
- Actions may be triggered by people, autonomously by the AI agent, or orchestrated by another agent via an MCP server.
- Tracking the agent’s behavior requires versioning and change tracking for the underlying datasets, AI models, APIs, infrastructure components, and compliance requirements.
- Observability must account for additional context, including identities, locations, time considerations, and other conditions that can influence an agent’s recommendations.
Given the complexity, it’s not surprising that experts had many suggestions regarding what information to track.
“Teams should treat every agent interaction like a distributed trace with instrumentation at the various decision-making boundaries and capture the prompt, model response, the latency, and the resulting action in order to spot drift, latency issues, or unsafe behaviors in real time,” says Logan Rohloff, tech lead of cloud and observability at RapDev. “Combining these metrics with model-aware signals, such as token usage, confidence scores, policy violations, and MCP interactions enables you to detect when an agent is compromised or acting outside its defined scope.”
Devops teams will need to extend microservice observability principles to support AI agents’ stateful, contextual interactions.
“Don’t overlook the bits around session, context, and workflow identifiers as AI agents are stateful, communicate with each other, and can store and rehydrate sessions,” says Christian Posta, global field CTO at Solo.io. “We need to be able to track causality and flows across this stateful environment, and with microservices, there was always a big challenge getting distributed tracing in place at an organization. Observability is not optional, and without it, there’s no way you can run AI agents and be compliant.”
Agim Emruli, CEO of Flowable, adds that “teams need to establish identity-based access controls, including unique agent credentials and defined permissions, because in multi-agent systems, traceability drives accountability.”
3. Identify errors, hallucinations, and dangerous recommendations
Instrumenting observable APIs and applications helps engineers address errors, identify problem root causes, improve resiliency, and research security and operational issues. The same is true for AI agents that autonomously complete tasks or make recommendations to human operators.
“When an AI agent hallucinates or makes a questionable decision, teams need visibility into the full trajectory, including system prompts, contexts, tool definitions, and all message exchanges,” says Andrew Filev, CEO and founder of Zencoder. “But if that’s your only line of defense, you’re already exposed because agentic systems are open-ended and operate in dynamic environments, requiring real-time verification. This shift started with humans reviewing every result and is now moving toward built-in self- and parallel verification.”
Autonomous verification will be needed as organizations add agents, integrate with MCP servers, and allow agents to connect to sensitive data sources.
“Observing AI agents requires visibility not only into model calls but into the full chain of reasoning, tools, and code paths they activate, so devops can quickly identify hallucinations, broken steps, or unsafe actions,” says Shahar Azulay, CEO and co-founder of Groundcover. “Real-time performance metrics like token usage, latency, and throughput must sit alongside traditional telemetry to detect degradation early and manage the real cost profile of AI in production. And because agents increasingly execute code and access sensitive data, teams need security-focused observability that inspects payloads, validates integrations like MCP, and confirms that every action an agent takes is both authorized and expected.”
4. Ensure AI agent observability addresses risk management
Organizations will recognize greater business value and ROI as they scale AI agents to operational workflows. The implication is that the ecosystem of AI agents’ observability capabilities becomes a fundamental part of the organization’s risk management strategy.
“Make sure that observability of agents extends into tool use: what data sources they access, and how they interact with APIs,” says Graham Neray, co-founder and CEO, Oso. “You should not only be monitoring the actions agents are taking, but also categorizing risk levels of different actions and alerting on any anomalies in agentic actions.”
Risk management leaders will be concerned about rogue agents, data issues, and other IT and security risks that can impact AI agents. Auditors and regulators will expect enterprises to implement robust observability into AI agents and have remediation processes to address unexpected behaviors and other security threats.
5. Extend observability to security monitoring and threat detection
Another consumer of observability data will be security operation centers (SOCs) and security analysts. They will connect the information to data security posture management (DSPM) and other security monitoring tools used for threat detection.
“I expect real insight into how the agent reacts when it connects to external systems because integrations create blind spots that attackers target,” says Amanda Levay, CEO of Redactable. “Leaders need this level of observability because it shows where the agent strains under load, where it misreads context, and where it opens a path that threatens security.”
CISOs will need to extend their operational playbooks as threats from AI actors grow in scale and sophistication.
“Infosec and devops teams need clear visibility into the data transferred to agents, their actions on data and systems, and the requests made of them by users to look for signs of compromise, remediate issues, and perform root-cause analysis, says Mike Rinehart, VP of AI at Securiti AI. “As AI and AI agents become part of important data pipelines, teams must fold governance into prompts, integrations, and deployments so security, privacy, and engineering leaders act from a shared view of the data landscape and the risks that come with it.”
6. Evaluate AI agent performance
Addressing risk management and security concerns is one need for implementing observability in AI agents. The other key question observability can help answer is gauging an AI agent’s performance and providing indicators when improvements are needed.
“When I evaluate AI agents, I expect visibility into how the agent forms its decisions because teams need a clear signal when it drifts from expected behavior,” says Levay of Redactable. “I watch for moments when the agent ignores its normal sources or reaches for shortcuts because those shifts reveal errors that slip past general observability tools.
To evaluate performance, Tim Armandpour, CTO of PagerDuty, says technology leaders must prepare for AI agents that fail subtly rather than catastrophically. He recommends, “Instrument the full decision chain from prompt to output and treat reasoning quality and decision patterns as first-class metrics alongside traditional performance indicators. The teams succeeding at this treat every agent interaction as a security boundary and build observability contracts that make agent behavior auditable and explainable in production.”
7. Prepare for observability AI agents that take action
The natural evolution of observability is when devops organizations turn signals into actions using AI observability agents.
“Observability shouldn’t stop at recording; you should be able to take action if an agent is going astray easily,” says Neray of Oso. “Make sure you can easily restrict agentic actions by tightening access permissions, removing a particular tool, or even fully quarantining an agent to stop rogue behavior.”
Observability data will fuel the next generation of IT and security operational AI agents that will need to monitor a business’s agentic AI operations. The question is whether devops teams will have enough time to implement observability standards, or whether business demand to deploy agents will drive a new era of AI technical debt.
Go to Source
Author: