Does my multi-agent system need A2A policy if I only have two agents?

Yes. Risks such as privilege escalation, confused deputy problems, infinite loops, and data leakage appear even in small multi-agent systems. The same policy infrastructure used for larger deployments can and should be applied from the beginning.

Home Blog Why Agent-to-Agent Proxies Need Deterministic Policy, Not LLM-Based Filters

Why Agent-to-Agent Proxies Need Deterministic Policy, Not LLM-Based Filters

Q: What's the difference between an A2A proxy and an API gateway?

An API gateway operates at the HTTP layer using paths, methods, headers, and status codes. An A2A proxy operates at the agent-action layer, evaluating tool calls, intents, arguments, and call chains. The same HTTP request can be allowed or denied depending on which agent initiated it and the context of the action.

Q: Does this work with closed-source agent runtimes like OpenAI Assistants?

Yes. The proxy intercepts tool calls at the runtime boundary regardless of whether the underlying model is hosted by OpenAI or self-hosted. The enforcement point is the agent call layer rather than the model implementation itself.

Q: What's the performance overhead?

Typical rule sets operate with sub-10 millisecond p99 latency, while more complex rule sets with external lookups generally remain under 50 milliseconds at p99. The enforcement engine is designed for inline execution on the request path.

Q: Can I run an A2A proxy on top of MCP?

Yes. MCP tools are treated as first-class objects within the policy language. Policies can match on MCP tool names, server identities, and MCP-specific argument structures while still enforcing agent-to-agent controls.

Q: Is this the same as guardrails?

Not exactly. The term guardrails is used broadly across the industry and can refer to input filtering, output classification, or deterministic enforcement. An A2A proxy specifically refers to deterministic policy enforcement at the agent-to-agent execution layer.

AI Runtime Control, Real-time AI Security, Threat Intelligence, Vaikora

May 12, 2026

When one AI agent calls another AI agent, you have a problem that didn’t exist a year ago. The first agent generates a request based on its reasoning. The second agent acts on that request. Neither agent has any guarantee about what the other will actually do. The user who started the conversation is three hops away from the action that finally executes. And nobody has built a policy layer that sits between them.

Agent-to-agent (A2A) proxies are the missing piece. They sit in the path of every cross-agent call, evaluate the request against deterministic rules, and decide whether to allow it, deny it, or escalate to human review. This post explains the architecture, the threats they handle, and why an LLM-based filter cannot do this job.

The A2A surface area is bigger than people think

A multi-agent system in production typically has these moving parts:

A planner agent that decomposes a user request into subtasks
A set of worker agents, each specialized for a domain (database queries, file operations, web search, code execution, API calls)
A coordinator that routes messages between agents
A long-running memory or context store that all agents read from

In a small system, this might be three agents working on a single user task. In a real system, it’s twenty agents handling thousands of concurrent tasks, with each agent able to call any other agent, each agent able to fetch from a shared memory store, and each agent able to spawn new agents on demand.

The Model Context Protocol (MCP) launched by Anthropic in late 2024 standardized one piece of this: how tools and external systems expose themselves to an AI agent. Google’s announcement of an A2A protocol in 2025 sketched a similar standardization for agent-to-agent calls themselves. Both protocols make agent ecosystems easier to build. Neither protocol includes a policy or authorization layer.

That gap is what A2A proxies fill.

What an A2A proxy does in concrete terms

An A2A proxy is software that sits between agents in the call graph. Every agent-to-agent message passes through it. For each message, the proxy:

Identifies the calling agent (which agent is making the request, with what credentials, on whose behalf)
Identifies the called agent (which agent is being invoked, what method, with what parameters)
Inspects the full request payload (arguments, context, prior conversation if available)
Evaluates the call against a policy ruleset
Returns one of: allow (pass through), deny (block, return error to caller), modify (rewrite the request before passing), require_approval (suspend, notify human reviewer, resume on approval)
Logs the decision with enough detail that an auditor could reconstruct exactly what happened

If you’ve worked with a service mesh (Istio, Linkerd) or an API gateway (Kong, Apigee), the architecture is similar. The difference is that the policy rules operate on agent-action semantics (tool calls, intents, arguments) instead of on HTTP-level semantics (paths, methods, headers).

The four threats an A2A proxy handles

These come up in production and an LLM-based filter cannot reliably catch any of them.

Threat 1: Privilege escalation through agent chaining

A user can call agent A. Agent A has limited permissions. Agent A calls agent B. Agent B has different permissions, possibly broader. Without an A2A proxy, the original user’s permissions are not propagated to the agent-B call. The user just got the union of A and B’s permissions for free.

A deterministic A2A proxy enforces the rule:

- name: enforce_user_permissions_through_chain
  match:
    call_type: agent_to_agent
  decision: rewrite
  rewrite:
    inject_caller_user_context: true
    intersect_permissions: ["caller_user", "called_agent_assigned_role"]

Agent B now sees the original user’s permission set instead of agent A’s. If agent B is asked to do something the original user couldn’t do directly, it gets denied.

LLM-based filters can’t enforce this because the LLM has no reliable concept of “permission set.” It can be told to think about permissions, but it cannot guarantee the propagation.

Threat 2: Confused-deputy attacks

Agent A is allowed to read customer billing records. Agent B is allowed to send emails to customers. Both are constrained reasonably in isolation. An attacker convinces agent A to fetch a customer’s record, then convinces agent A to pass that record to agent B with instructions to email it. Agent B has no way to know whether the customer asked for their billing record to be emailed.

A deterministic A2A rule:

- name: pii_data_cannot_be_emailed_without_user_initiation
  match:
    call_type: agent_to_agent
    called.tool: email.send
    payload.contains_pii: true
    context.user_initiated_action: "email"
  invert_match: true
  decision: require_approval

The email gets suspended for human approval. The audit log captures every step of the chain that led here.

Threat 3: Resource exhaustion through agent loops

Agent A asks agent B a question. Agent B asks agent A a question to clarify. Agent A loops back. Without a circuit breaker, the loop runs until you hit the LLM provider’s rate limit (best case) or run up a five-figure inference bill (worst case).

- name: a2a_call_depth_limit
  match:
    call_type: agent_to_agent
    chain.depth: "> 5"
  decision: deny
  reason: "Agent call depth exceeded limit of 5"

- name: a2a_circular_call_detection
  match:
    call_type: agent_to_agent
    chain.contains_cycle: true
  decision: deny
  reason: "Circular agent call detected"

These are mechanical rules. The proxy maintains a per-task call graph, evaluates depth and cycles deterministically, fires the rule, blocks the loop. LLM-based filters can’t see the call graph and have no consistent way to count.

Threat 4: Cross-tenant data leakage in shared agent infrastructure

A SaaS company runs the same fleet of agents for many customer tenants. Customer Acme’s planner agent calls a shared “database query” worker agent. The worker agent has access to all tenants’ databases. Without a proxy enforcing tenant isolation, agent A from Acme could (intentionally or accidentally) query data from Beta Corp.

- name: tenant_isolation_on_db_queries
  match:
    call_type: agent_to_agent
    called.tool: database.query
    payload.target_tenant: "!= caller.tenant_id"
  decision: deny
  reason: "Cross-tenant query attempted"

Hard rule, always enforced, audit-logged on every block. In a regulated industry (financial services, healthcare), this is the difference between SOC 2 / HIPAA pass and fail.

Why LLM-based filters can’t do this job

Every problem above has the same underlying property: the rule depends on structured metadata that has a clear correct answer. Permission sets. Call chain depth. Tenant identifiers. PII tags. These are facts, and the rule is either satisfied or not.

LLM-based filters try to evaluate the same questions by reading the request as natural language. They produce judgments like “this email request seems okay given the context.” Judgments can be wrong. Judgments cannot be unit-tested. Judgments leave no audit trail beyond “the model thought it was fine.”

In contrast, a deterministic rule against the structured request always produces the same answer. You can write 200 tests that pin down its behavior. You can show an auditor exactly which rule fired on which call. You can change the policy with a code review and a deployment, not with prompt engineering.

This is the difference between using a programming language and using a chat interface. Both can express intent. Only one gives you the engineering tools to debug, test, version, and audit that intent.

How A2A proxies fit into existing architecture

If you already run a service mesh, the A2A proxy is usually deployed as a sidecar or a gateway in the same control plane. If you run agents as serverless functions, the A2A proxy is a middleware layer that wraps each agent’s RPC interface. If you run agents in a single process (LangGraph, AutoGen, CrewAI), the A2A proxy is an in-process interceptor between agent nodes.

The deployment options are different. The semantic position is the same: every agent-to-agent call passes through the proxy, every call is policy-evaluated, every call is logged.

The minimum requirements for a production-grade A2A proxy:

Sub-10ms p99 evaluation overhead (otherwise users feel it)
Support for both synchronous (request/response) and asynchronous (message-passing) agent communication
Integration with at least the major agent runtimes (LangGraph, OpenAI Assistants, Claude tool calling, AutoGen, CrewAI)
Audit log export to standard SIEMs (Splunk, Elastic, Microsoft Sentinel)
Rule expression language that supports composition (rules that reference other rules, rules that depend on prior request state, rules that fire across time windows)
A management surface that lets a non-engineer add or modify rules without breaking production

Vaikora’s A2A architecture

Vaikora is built around a deterministic policy engine that operates at the A2A layer. Three architectural choices matter:

In-process evaluation, no remote round-trip. The policy engine is embedded as a library in your agent runtime (Python, TypeScript, Go). When an agent calls another agent, the call is evaluated inline. There’s no HTTP hop to a remote evaluator and no inference call to an LLM. P99 latency is in single-digit milliseconds.
Rules as code, audit logs as data. Policy rules are written in a structured rule language, version-controlled in the same repo as the agent code, unit-testable. Audit logs are emitted as structured events to whatever SIEM the customer runs. SOC 2 and HIPAA auditors get a complete record without anyone having to write export scripts.
MCP-aware and A2A-protocol-aware. Vaikora speaks MCP natively, so any tool exposed via MCP is automatically in scope for policy enforcement. The same engine handles Google’s A2A protocol announcements and the major agent-framework-specific RPC patterns.

The open-source vaikora-llm-gateway on GitHub (MIT license) is the standalone policy engine, deployable on its own for evaluation or in air-gapped environments. The commercial Vaikora product adds the management console, the connector library, audit log retention, and starter rulesets for common industries.

For a head-to-head with the closest competitors, see the Vaikora vs Zenity, Vaikora vs Prisma AIRS, Vaikora vs Noma, and Vaikora vs Capsule comparison pages. For the runtime documentation, see vaikora.com/docs.

Adoption pattern for new deployments

If you’re adding A2A policy to an existing multi-agent system, the proven sequence is:

Week 1: deploy in observe mode. The proxy is in the call path but every decision is “allow” with full logging. You’re collecting data on what your agents actually do. Most teams find at least one surprise here: an agent making API calls nobody knew about, or a call chain that goes 8 hops deep when everyone assumed it went 2.

Week 2-3: starter rules for the most obvious risks. Block cross-tenant access, block calls outside the agent’s declared tool set, block call depth above a reasonable threshold. These are mechanical wins that catch most accidents.

Week 4-6: domain-specific rules. Money transfers above a threshold, deletion of customer records, PII exfiltration patterns, regulated-data routing. These require business input but they’re where the audit-grade compliance value lives.

Ongoing: rule maintenance. Same rhythm as any production policy code. New agent capabilities → new rules. New incidents → new defensive rules. Quarterly review with security and compliance teams.

Common failure modes when teams try to skip the proxy

Three patterns we see often:

“We’ll just rely on the LLM’s tool descriptions.” Tool descriptions are advisory. Models can and do call tools in ways the description didn’t anticipate, especially under adversarial prompting. Tool descriptions belong in the schema, not in the security model.
“We’ll just trust the agent runtime’s permission system.” Most agent runtimes (LangGraph, AutoGen, OpenAI Assistants) have some notion of permissions but they’re framework-specific, hard to audit, and don’t compose across runtimes. The A2A proxy gives you one place for all policy.
“We’ll add policy later.” Policy added later is policy that was missing during the period when you needed it most. The cost of retrofitting policy on a multi-agent system that’s been running for 6 months is high (every existing agent has to be updated to route through the proxy, every existing rule has to be backfilled, every existing incident has to be re-examined for whether the new policy would have caught it). Add the proxy on day one. Even an empty ruleset.

Frequently asked questions

What’s the difference between an A2A proxy and an API gateway?
An API gateway operates at HTTP level: paths, methods, headers, status codes. An A2A proxy operates at agent-action level: tool calls, intents, arguments, call chains. The same call to the same URL can be allowed or denied based on which agent is calling and why.

Does this work with closed-source agent runtimes like OpenAI Assistants?
Yes. The proxy intercepts the agent’s tool calls at the runtime level, regardless of whether the model itself is hosted by OpenAI or self-hosted. The integration is at the call boundary.

What’s the performance overhead?
For Vaikora specifically, p99 sub-10ms for typical rule sets, p99 sub-50ms for complex rule sets with external lookups. The engine is designed for the hot path.

Can I run an A2A proxy on top of MCP?
Yes. MCP tools are first-class objects in the policy rule language. You can write rules that match on MCP tool name, MCP server identity, and MCP-specific argument structures.

Is this the same as guardrails?
Guardrails is an industry term that’s been used for everything from input filtering to output classification to deterministic policy enforcement. An A2A proxy is one specific category: deterministic policy enforcement at the agent-to-agent call layer. If a vendor calls their product “guardrails” without specifying which layer, ask which layer.

Does my multi-agent system need A2A policy if I only have 2 agents?
The threats above (privilege escalation, confused deputy, loops, data leakage) appear at 2 agents and get worse from there. The policy infrastructure is the same complexity at 2 agents as at 200. Add it on day one.