When AI Agents Go Rogue: How Approval Fatigue Is Undermining Security Controls

Autonomous AI agents are creating hidden security risks. Approval fatigue, semantic attacks, and weak governance are turning automation into exposure.

2/24/20268 min read

A group of humanoid robots dancing in a futuristic laboratory setting

In Short

Semantic attacks don’t require stolen credentials. Attackers can manipulate AI agents by embedding malicious instructions in the content those agents consume. External data sources must be treated as active attack surfaces, not implicitly trusted inputs.
Approval fatigue erodes human oversight. When machine identities outnumber humans by hundreds to one, agent-generated decisions overwhelm review processes. Oversight degrades into routine approval rather than meaningful control.
Identity governance is lagging behind reality. Most organisations lack formal policies for AI agent lifecycles, and many non-human credentials remain unrotated. Existing identity models were not designed for autonomous decision-making systems.
Compromise propagates rapidly. Testing shows that a single subverted agent can influence the majority of downstream decisions within hours. Traditional monitoring tools struggle to detect this behaviour.

Security In Practice | CyOps Consulting Team

In September 2025, international engineering firm Arup lost USD 25 million after an employee authorised a fund transfer during what appeared to be a routine video call with senior executives. The voices matched. The faces looked right. The setting was familiar and consistent with previous internal meetings. Acting on what appeared to be legitimate executive approval, the employee completed the transfer before the deception was uncovered.

The call was a deepfake. The employee had not spoken to Arup’s CFO or financial controller. While the incident relied on AI impersonating humans, it reflects a broader shift in how AI is being used against enterprises. Increasingly, the most consequential risk does not come from AI pretending to be people, but from AI systems organisations deploy deliberately and trust implicitly.

Autonomous AI agents are now embedded in financial operations, software delivery pipelines, customer service workflows, and internal decision-making systems. These agents can execute payments, modify databases, and coordinate multi-step processes with limited or no human involvement. As their authority expands, so does the impact of subtle failures in governance and oversight.

“Agents are moving fast, and attackers are moving with them,” said Mateo Rojas-Carulla, Head of Research at Lakera AI. Analysis of customer environments in late 2025 showed that indirect attacks targeting browsing, document access, and tool execution succeeded more reliably than direct prompt injection. External data sources, not user prompts, are emerging as the dominant risk vector.

This represents a structural shift in enterprise security. Controls designed for human users or static service accounts assume that risk is tied to credential compromise. In agentic systems, that assumption breaks down. Attackers do not need to steal credentials if they can manipulate how an agent interprets instructions and context.

Researchers describe these techniques as semantic attacks. The agent’s permissions remain unchanged, but its understanding of what it should do is altered.

The Governance Gap

In January 2026, NIST’s Center for AI Standards and Innovation issued an RFI on security practices for AI agent systems. The notice explicitly warned that these systems “can be deployed with little to no human oversight,” highlighting the speed at which agentic capabilities are being adopted ahead of governance maturity.

NIST identified three broad risk categories. The first two, adversarial attacks during training or inference and deliberately embedded backdoors, are familiar to most security teams. The third category is more subtle and more difficult to address. Even uncompromised models can threaten confidentiality, integrity, or availability through specification gaming (a behaviour that satisfies the literal specification of an objective without achieving the intended outcome) or misaligned objectives.

This third category exposes a gap in existing security frameworks. An agent can behave exactly as designed and still cause material harm if its decision-making is subverted at the semantic layer. This is the point where instructions are interpreted, data sources evaluated, and actions selected. Traditional controls rarely operate at this level.

“The more autonomous and interconnected AI agents become, the larger the attack they create,” said Lavi Lazarovitz, Vice President of Cyber Research at CyberArk. “By 2026, organisations won’t just be testing agents. They’ll be relying on them.” Reliance changes the risk calculus. Failures move from theoretical to operational.

Survey data from the Cloud Security Alliance and Oasis Security reinforces this concern. Published in early 2026, the research found that 78 percent of organisations lack formal policies for creating or retiring AI identities. Ninety-two percent are not confident their existing identity and access management tools can manage agent-related risk effectively. Only 28 percent believe they could prevent a rogue agent from causing material damage.

The disconnect is not purely technical. While agentic AI risk is increasingly discussed at board level, more than half of respondents reported that leadership is not acting with sufficient urgency. Awareness exists, but governance and resourcing have not kept pace with deployment.

Approval Fatigue and the Illusion of Control

The governance gap is most visible in approval workflows. These mechanisms are intended to provide human oversight of agent decisions, particularly for high-risk actions. In practice, they often create the appearance of control without meaningful review.

ManageEngine’s 2026 Identity Security Outlook reported machine-to-human identity ratios averaging 100 to 1, with some organisations reaching 500 to 1. When a significant portion of those machine identities are autonomous agents, the volume of approval requests escalates rapidly. Each agent may generate dozens or hundreds of decisions per day.

Security teams describe a predictable outcome. As approval queues grow, reviews shift from evaluation to throughput. Decisions are approved in batches, driven by time pressure and operational necessity rather than risk assessment. Oversight becomes procedural.

This dynamic has already been exploited. In late 2025, a widely reported SaaS supply chain breach affected hundreds of downstream organisations. Threat actor UNC6395 used stolen OAuth tokens from a trusted Salesforce integration to access customer environments across more than 700 organisations. The activity required no exploit and no phishing. It appeared legitimate because it originated from an authorised integration.

In environments where agents routinely request access across systems, malicious activity blends into normal operational patterns. Approval fatigue makes it harder to distinguish legitimate automation from abuse.

“AI turns identity into a high-velocity system,” said Danny Brickman, CEO of Oasis Security. “New agents and workflows can mint credentials in minutes. Too many organisations still govern that with spreadsheets.” The challenge is not just scale, but speed.

The issue extends beyond credential creation. It is also about decision velocity. Marketing agents can make dozens of budget decisions each day. DevOps agents trigger multiple deployments. Support agents process hundreds of data access requests. Few organisations staff enough reviewers to evaluate each decision meaningfully, even when approvals are nominally required.

As a result, controls exist on paper but not in operational reality.

Semantic Manipulation as an Attack Surface

Traditional security models assume that controlling access limits risk. Agentic AI challenges that assumption by introducing a layer where legitimate permissions can be misused through manipulated reasoning.

The OWASP Top 10 for Agentic Applications 2026 reflects this shift. Developed with input from more than 100 researchers and practitioners, it identifies prompt injection, tool misuse, and data poisoning as dominant failure modes. These are not edge cases. They are systemic risks inherent to how agents interact with external information.

Prompt injection ranks as the most significant vulnerability. The attack embeds instructions in content the agent processes, such as web pages, documents, or messages. Unlike direct user input, these instructions arrive through data channels the agent is designed to trust.

IBM Distinguished Engineer Jeff Crume and Master Inventor Martin Keen demonstrated this risk using a shopping agent tasked with finding a used book at the best price. The agent identified several valid options but purchased a copy at twice the expected cost. The seller’s webpage contained hidden text instructing the agent to ignore previous instructions and complete the purchase regardless of price.

The text was invisible to human users but processed by the agent’s language model. “This allows an attacker to override original intent by changing context,” Crume explained. The adversary never interacted with the agent directly. The instructions were embedded where the agent was expected to look.

Research from Lakera found that indirect attacks required fewer attempts to succeed than direct prompt injections. When malicious instructions arrive through external data sources, existing filters and guardrails are less effective. The risk increases significantly when agents have execution authority. A compromised chatbot spreads misinformation. A compromised agent with API access can transfer funds, delete systems, or modify production environments using valid credentials.

Identity Sprawl and Multi-Agent Risk

Identity governance has traditionally assumed relatively stable relationships. Employees have accounts. Service accounts run applications. AI agents disrupt this model by combining human intent with machine persistence.

Research from Token Security noted that treating agents as conventional non-human identities creates blind spots. Over-privileging becomes common. Ownership becomes unclear. Behaviour drifts from original intent without triggering traditional alerts.

The impact is magnified in multi-agent systems. Stellar Cyber, citing the Galileo AI multi-agent failure study, reported that a single compromised agent influenced 87 percent of downstream decisions within four hours during simulated attacks. Without visibility into inter-agent communication, identifying the root cause is extremely difficult.

Traditional SIEM tools may surface anomalous outcomes, such as unusual transactions or configuration changes. They rarely explain why those outcomes occurred or which agent decision initiated the cascade.

Identity Governance and Administration systems are poorly aligned with agentic workloads. These systems assume HR-driven lifecycles and predictable provisioning. Agents are created dynamically by pipelines, platforms, and integrations. They authenticate using secrets rather than passwords and operate continuously.

The OWASP Non-Human Identities Top 10 ranks improper offboarding as the leading risk. When projects are cancelled or integrations deprecated, agent credentials often persist. CSO Online reports that 71 percent of non-human identities are not rotated within recommended timeframes. Each unchanged credential extends the window of exposure.

Strategic Responses Emerging

Addressing agentic AI security requires acknowledging that these systems do not fit existing models. Several approaches are emerging, although consensus remains limited.

The OWASP Agentic Top 10 emphasises extending controls across the full agent interaction chain, validating all external content before action, and enforcing least-privilege access with strict execution policies. These principles shift focus from static identity to contextual behaviour.

Gartner predicts that by 2026, 30 percent of enterprises will rely on agents that act independently. According to Strata, this requires real-time policy evaluation, accountability, and human oversight for sensitive actions. Static approvals are insufficient.

Some organisations are experimenting with decision attestation. Agents record their reasoning in machine-readable formats. Monitoring systems evaluate whether decisions align with expected patterns for that agent and context. Only anomalies escalate to human review, reducing approval fatigue while preserving oversight.

Other approaches focus on semantic verification. Agents validate that key data sources have not been tampered with and cross-check critical inputs before acting. Credential strategies are also shifting. Time-bound, purpose-specific access is replacing permanent grants. Short-lived tokens, just-in-time permissions, and aggressive rotation reduce exposure when compromise occurs.

Annika Whitmore of Palo Alto Networks summarised the risk succinctly. A single prompt injection can provide an adversary with an autonomous insider capable of executing trades or deleting backups without raising traditional alarms.

What Comes Next

The strategic choice facing organisations is whether to treat AI agents as enhanced automation or as a fundamentally different class of system. History suggests misclassification carries risk. Early cloud breaches were driven less by cloud technology than by insecure deployment practices. Similar indicators are appearing in AI adoption.

For CISOs, the priority is governance before incidents force the issue. Agents should be treated as production applications from day one. Behavioural monitoring must complement static review. Broadly privileged agents should be considered high-value insider threats.

When humans go rogue, behaviour changes. When agents are semantically compromised, behaviour often looks normal. Only the objective has shifted. One Identity has forecast that 2026 may see the first publicly disclosed breach attributed to an over-privileged AI agent. It will not look like an attack. It will look like the system operating as designed.

That is the central challenge. Traditional security excels at detecting anomalies. Semantic manipulation removes them.

Organisations adapting fastest are not those with the largest budgets. They are the ones willing to accept that autonomous decision-making requires new security primitives. That means mapping ingestion points, separating retrieval from action, validating tool calls, assuming memory can be poisoned, and testing agents under adversarial pressure.

As agents assume greater operational authority in 2026, the decision becomes unavoidable. Invest in security models designed for autonomous systems or wait for a forcing event to make the case. The latter will be far more expensive.

Special thanks to Goodfirms for bringing you today’s content.

Brought to by the CyOps Consulting Team. Discover our most recent publication.

When AI Agents Go Rogue: How Approval Fatigue Is Undermining Security Controls

CyOps Consulting – Trusted Advice. Proven Expertise. Practical Solutions.

Get in touch today to strengthen your cyber security posture with confidence.

CyOps Consulting

O'Connell Street, North Adelaide, SA, 5006

Terms and Conditions | Disclaimer | Privacy Policy