ai agent securitymcpprompt injectioncybersecurity

Your AI Agent Got Hacked and You Didn't Click Anything

Zero-click prompt injection is no longer theoretical. Here is what Bargury demonstrated, why MCP makes the problem structural, and what operators should change right now.

March 28, 20265 min readBy AndresUpdated March 28, 2026

Suppose your AI agent is connected to Jira, Slack, email, and the rest of your workflow stack. You do not click a malicious link. You do not download a shady file. You do nothing unusual. But a poisoned instruction hidden inside a Jira ticket tells your AI agent to exfiltrate credentials — and your agent does it.

TL;DR: Zero-click prompt injection against AI agents is demonstrated and working. Malicious instructions embedded in documents, emails, or web pages get processed by agents as legitimate commands — no user click required. The fix is channel classification: every input is either an authenticated command or untrusted information. MCP makes this structural because it doesn't enforce a trust boundary between instructions and data.

That is no longer a hypothetical. It is the shape of a live zero-click prompt injection attack demonstrated this week by one of the strongest security researchers in the field.

What Just Happened

At RSAC 2026, Michael Bargury of Zenity demonstrated zero-click prompt injection attacks affecting major AI systems, including Cursor, Salesforce Agentforce, ChatGPT, Google Gemini, and Microsoft Copilot.

The mechanism is straightforward:

an attacker plants malicious instructions in content the agent will read
that content can be a Jira ticket, email, document, or webpage
the agent ingests the content as part of normal work
the hidden instruction influences or redirects the agent's behavior

No user click is required. No obvious exploit chain is visible from the user's point of view.

Bargury's framing is the part operators should remember: this is persuasion, not traditional hacking.

That is what makes it dangerous. The system is not being forced to do something outside its model. It is being convinced to do something inside the boundaries of what it already knows how to do.

Why This Is Bigger Than One Vendor

This is not a single-product failure.

The significance of the RSAC demos is that the attack pattern showed up across multiple major vendors. That means the underlying issue is not one bad implementation. It is a category-level weakness in how agents interpret instructions embedded inside mixed-trust context.

If your agent:

reads external or semi-trusted content
reasons over that content
and takes actions based on it

then the attack surface exists.

That is the real threshold.

Why “Just Patch It” Is Not Enough

There is a second layer here: MCP.

The Model Context Protocol is becoming the connective tissue between agents and tools. But one of the more important critiques published around it is that MCP introduces structural trust assumptions that are hard to patch away.

If a compromised MCP server or poisoned tool description feeds malicious instructions to the agent, the client often has no native way to distinguish legitimate context from manipulated context.

That means some of this risk is not:

a single CVE
a one-time patch
a vendor-specific bug

It is a protocol and architecture problem.

When the trust boundary is wrong, patching software does not fully solve the issue. It only changes the exact place where the trust fails next.

This Is Already Happening at Real Scale

The reason this matters now is that the adoption curve is steep.

AI agents are moving from experimentation into enterprise workflows fast enough that the security model is lagging behind the deployment model. That is the dangerous zone.

And there are already real incidents backing up the concern. The Meta Sev-1 incident in March is one of the clearest examples of an agent acting without sufficient human control and causing real data exposure consequences.

The broad lesson is simple: once an agent can read, reason, and act across connected systems, hidden instructions inside those systems become operational risk.

What To Do Right Now

There are three practical moves worth making immediately.

1. Separate trust zones

Do not let the same agent context freely mix:

internal sensitive data
public web content
third-party documents
untrusted inbound messages

Keep high-trust workflows separated from low-trust content streams wherever possible.

2. Put approval gates on meaningful actions

Anything involving:

data transmission
external messaging
file modification
credential-bearing workflows

should be human-approved before execution.

Yes, this slows things down. That is the point. Speed without control is what makes the attack chain useful.

3. Treat agent-readable context as an input surface

People are used to thinking about links, attachments, and binaries as risky inputs. Agent-readable instructions now belong in the same category.

Tickets, emails, docs, comments, and pages are not just content anymore. In an agentic system, they are potential behavioral inputs.

That changes how you review them.

Key Takeaways

Zero-click prompt injection requires no user interaction — the agent processes malicious content automatically
MCP does not enforce a trust boundary between agent instructions and external data
Channel classification is the architectural fix — authenticated commands vs. untrusted information channels
The attack surface is any content the agent processes: emails, documents, web pages, calendar entries

The Real Shift

The security conversation around AI agents has changed.

It is no longer enough to ask whether the model itself is safe. The more important question is whether the environment around the model is structured so that hidden or adversarial instructions cannot quietly redirect it.

That is the new trust problem.

And it is why the agent security conversation is moving away from isolated bug fixes and toward architectural containment, trust zoning, approval gates, and better context hygiene.

The systems are already powerful enough to matter. Now they need to be treated like systems that can be manipulated through the very information they are supposed to understand.