AI Governance

Trust Budgeting Framework for Multi-Agent Systems: Managing the Tradeoff Research Proved You Cannot Eliminate

Enterprise AI strategy in 2025 converged on a single architectural direction: multi-agent systems. Gartner and Forrester promoted the vision. Every major cloud vendor sold the toolkit. Deploy networks of specialized agents. One for customer analysis, another for inventory queries, a third for financial recommendations. Let them collaborate. Watch them coordinate.

The implicit assumption was that agent collaboration is purely beneficial. More coordination means more capable systems. More trust between agents means smoother workflows. More data sharing means better outcomes.

An October 2025 paper from researchers at Zhejiang University and Tsinghua University proved this assumption wrong in a fundamental, measurable way. Their research introduced the Trust-Vulnerability Paradox: in multi-agent AI systems, increasing trust between agents to improve task coordination simultaneously and linearly increases the rate at which sensitive data leaks across agent boundaries. Not occasionally. Not in edge cases. Consistently, across every model backend and orchestration framework they tested.

Collaboration is a vulnerability. And the relationship is monotonic. There is no sweet spot where you get high coordination with low exposure. Every incremental increase in trust buys better task completion and costs more data exposure.

This guide translates that research into an operational framework. It provides the methodology, metrics, and architectural patterns for managing a tradeoff that the industry spent 2025 pretending did not exist.

The Mathematical Case for Trust Budgets (Simplified)

The researchers defined trust as a scalar parameter (tau) between 0 and 1. At tau equals 0, agents share nothing. Task success rates are low because agents cannot coordinate. At tau equals 1, agents share everything. Task success rates are highest, but the Over-Exposure Rate (OER), the frequency at which agents share information beyond what is minimally necessary, is also at maximum.

The key finding: as trust increases, OER increases linearly. Double the trust parameter, you approximately double the over-exposure rate. The researchers validated this across multiple model backends, including GPT-4, Claude, and open-source models, and multiple orchestration frameworks. The pattern held everywhere.

Two metrics from the paper matter for practitioners.

Over-Exposure Rate (OER) measures boundary violations: instances where an agent discloses information that exceeds the Minimum Necessary Information (MNI) threshold for a given task. If Agent A needs only a customer's purchase preferences to generate a recommendation, but receives the customer's full profile including payment information, support history, and internal notes, that is an over-exposure event.

Authorization Drift (AD) captures how trust parameters cause permissions to expand transitively through the agent network. Agent A trusts Agent B, which trusts Agent C. Information that Agent A shared with Agent B (appropriately, given their trust level) propagates to Agent C, which Agent A never authorized to receive it. The permissions expand through the network like a slow leak.

The conventional wisdom from analyst firms treats trust as binary: you either trust an agent (it is authenticated and authorized) or you do not. There is no concept of graduated trust with security implications. The Trust-Vulnerability Paradox says this is wrong. Trust is a continuous variable with a measurable security cost, and organizations need to manage it as such.

Trust Budget Implementation Methodology

Trust budgeting treats inter-agent trust as a finite resource with explicit limits, allocated based on data sensitivity and organizational exposure tolerance.

Step 1: Map Your Agent Topology

Before you can budget trust, you need to see it. Document every agent-to-agent communication path in your multi-agent system. For each path, record what data flows between the agents, what tasks require the data exchange, and what the minimum information is that the downstream agent needs to complete its function.

Most organizations discover at this step that their multi-agent topology is flatter and more interconnected than they assumed. Agents that were designed as specialists often have communication paths to many other agents, creating a dense trust graph with extensive transitive exposure.

Step 2: Classify Data Flows by Sensitivity

For each agent-to-agent communication path, classify the data that flows along it.

Classification	Examples	Trust Budget Ceiling
Public	Product catalogs, public documentation	tau = 0.8-1.0
Internal	Internal knowledge base articles, process documents	tau = 0.5-0.7
Confidential	Customer records, financial data, employee information	tau = 0.2-0.4
Regulated	HIPAA/PII/SOX-covered data	tau = 0.1-0.2
Critical	Authentication credentials, encryption keys, security configurations	tau = 0.0 (no agent-to-agent sharing)

The trust budget ceiling defines the maximum trust parameter for that communication path. Higher-sensitivity data gets a lower ceiling, which means tighter information-sharing constraints and lower over-exposure rates, at the cost of reduced coordination effectiveness.

Step 3: Implement MNI Gates

Between every pair of communicating agents, implement a Minimum Necessary Information gate. The MNI gate evaluates the downstream agent's current task, determines the minimum data fields required for that task, and strips everything else before passing the data forward.

An MNI gate for a recommendation agent might work as follows. The upstream data agent has the customer's full profile: purchase history, payment information, support tickets, demographic data, communication preferences. The recommendation agent needs only purchase history and stated preferences. The MNI gate strips payment information, support tickets, demographics, and communication preferences before forwarding the data. The recommendation agent never sees what it does not need.

MNI gates are the multi-agent equivalent of the principle of least privilege, applied to data rather than permissions.

Step 4: Set Trust Parameters Per Agent Pair

For each pair of agents that communicate, set a trust parameter based on the data classification and the trust budget ceiling. This parameter controls two functions: how strictly the MNI gate filters information, and how much detail the upstream agent includes in its responses to the downstream agent.

At lower trust levels, the MNI gate is strict. Only explicitly required fields pass through. At higher trust levels, the gate is more permissive. Related contextual information may pass through if it could improve task quality. The tradeoff is explicit: higher trust means better task performance and more data exposure.

Step 5: Monitor and Adjust

Trust parameters are not set-and-forget. Monitor OER and AD metrics continuously. If OER for a specific agent pair exceeds your tolerance threshold, reduce the trust parameter. If task failure rates increase because the trust parameter is too restrictive, evaluate whether the MNI gate's definition of "minimum necessary" is too narrow.

Agent Pair Trust Scoring System

This scoring system provides a structured method for determining the trust parameter for each agent pair.

Input Factors

Factor	Weight	Scoring
Data sensitivity of shared information	3x	1 (public) to 5 (critical)
Downstream agent's attack surface	2x	1 (isolated, no tools) to 5 (external-facing, multi-tool)
Regulatory requirement for data traceability	2x	1 (none) to 5 (strict, audited)
Blast radius if downstream agent is compromised	3x	1 (bounded) to 5 (cascading to other agents/systems)
Historical OER for this agent pair	1x	1 (consistently low) to 5 (frequently elevated)

Trust Parameter Calculation

Sum the weighted scores. Maximum possible score is 55. Map the total to a trust parameter:

Score Range	Trust Parameter	Interpretation
44-55	0.0-0.1	Minimal trust. MNI gate in strictest mode. Consider whether this communication path should exist.
33-43	0.1-0.3	Low trust. Only explicitly required data fields pass through MNI gate.
22-32	0.3-0.5	Moderate trust. Required fields plus closely related context pass through.
11-21	0.5-0.7	Elevated trust. Broader context sharing permitted. MNI gate is permissive.
0-10	0.7-1.0	High trust. Minimal filtering. Appropriate only for non-sensitive data flows.

Authorization Drift Detection Patterns

Authorization Drift occurs when permissions expand transitively through the agent network. Agent A shares data with Agent B (appropriately). Agent B incorporates that data into its context. Agent B then communicates with Agent C, and the data from Agent A leaks into the Agent B-to-C exchange. Agent C now holds data from Agent A that Agent A never authorized it to receive.

Detection Pattern 1: Data Lineage Tracking

Tag every data element with its origin agent and classification. When data passes through an MNI gate, the tags persist. If tagged data appears in a communication channel where the origin agent has no direct or authorized indirect relationship, that is authorization drift.

Detection Pattern 2: Permission Expansion Monitoring

At the start of each agent session, record the set of data classifications each agent is authorized to access. Monitor for cases where an agent's effective data access expands during the session. If Agent C, which is only authorized for internal-classification data, processes data tagged as confidential, the session should be flagged for review.

Detection Pattern 3: Transitive Trust Chain Analysis

Periodically analyze the trust graph for transitive chains that exceed policy limits. If Agent A trusts Agent B at tau=0.5, and Agent B trusts Agent C at tau=0.5, the effective trust between Agent A and Agent C (through transitive sharing) may be higher than what a direct A-to-C trust policy would allow. Set maximum transitive trust depth and alert when chains exceed it.

Detection Pattern 4: Session Termination on Drift

Configure your orchestration layer to terminate agent sessions when authorization drift is detected. An agent that started with read-only access to customer preferences should not end a session with access to financial records because intermediate agents passed escalating context. Terminate the session, log the drift event, and investigate.

Minimum Necessary Information (MNI) Gate Design

The MNI gate is the architectural component that enforces trust budgets at the data level. It sits between every pair of communicating agents and controls what data crosses the boundary.

Gate Architecture

Agent A generates response
    ↓
MNI Gate evaluates:
    1. What task is Agent B performing?
    2. What data fields does that task require?
    3. What is the trust parameter for the A→B relationship?
    4. What is the data classification of each field?
    ↓
MNI Gate filters:
    - Strips fields not required for Agent B's task
    - Redacts sensitive portions of required fields
    - Tags remaining data with origin and classification
    ↓
Agent B receives filtered data

Gate Implementation Considerations

Static vs. Dynamic Field Lists. Static field lists define the required data per task type at design time. Dynamic field lists determine required data based on the specific task instance at runtime. Static lists are simpler and more predictable. Dynamic lists are more precise but require a secondary model or rule engine to evaluate necessity, which introduces its own attack surface.

Redaction vs. Removal. For fields that are required but contain sensitive subcomponents, redaction may be preferable to removal. A customer record might need the account identifier (for lookup) but not the payment method details. Redaction preserves the useful portion while stripping the sensitive component.

Performance Impact. MNI gates add latency to every inter-agent communication. For real-time agent systems, this latency matters. Design gates for efficiency: precompute field requirements by task type, cache trust parameters, and use lightweight data filtering rather than full re-serialization.

Swarm Isolation Architecture

Do not build monolithic agent networks where every agent can potentially communicate with every other agent. Create separate agent swarms organized by data sensitivity.

Swarm Design Principles

Principle 1: Swarms are bounded by data classification. All agents within a swarm operate at the same data classification level. An agent swarm that processes customer PII does not share a trust domain with a swarm that processes public product data, even if cross-swarm communication would improve coordination.

Principle 2: Cross-swarm communication requires a gateway. If task completion genuinely requires data from a different classification level, the request must pass through a cross-swarm gateway that enforces the higher classification's MNI requirements, logs the cross-boundary access, and applies the lower of the two swarms' trust parameters.

Principle 3: Compromise containment. If an agent within a swarm is compromised, the blast radius is limited to the data within that swarm's classification level. Cross-swarm gateways prevent a compromised agent in the public-data swarm from accessing the PII swarm's data, regardless of what instructions the compromised agent receives.

Example Topology

[Public Data Swarm]          [Internal Data Swarm]       [Confidential Data Swarm]
 - Product catalog agent      - Knowledge base agent      - Customer data agent
 - FAQ agent                  - Process document agent    - Financial analysis agent
 - Public search agent        - Internal search agent     - Compliance agent
         ↕                            ↕                           ↕
    [Cross-Swarm Gateway: Public↔Internal]    [Cross-Swarm Gateway: Internal↔Confidential]
         (tau = 0.3 max)                              (tau = 0.15 max)

Over-Exposure Rate (OER) Measurement Guide

OER is the metric that tells you whether your trust budgets are working. It measures the frequency at which agents share information beyond what is minimally necessary.

Measurement Methodology

Step 1: Define MNI for each task type. For every task that involves inter-agent data sharing, define the minimum set of data fields required. This is your baseline.

Step 2: Instrument data flows. At every MNI gate, log both the data that the upstream agent attempted to share (pre-filtering) and the data that passed through (post-filtering). Also log any data that the downstream agent received from other sources during the same task.

Step 3: Calculate OER. For a given time period, OER equals the number of inter-agent communications where the pre-filtering data exceeded the MNI threshold, divided by the total number of inter-agent communications.

Step 4: Track OER by agent pair and by sensitivity level. An OER of 15% across your entire agent network tells you little. An OER of 40% specifically for communications involving confidential data tells you where to focus.

OER Thresholds

OER Level	Interpretation	Action
0-5%	Well-controlled. Trust budgets are effective.	Continue monitoring.
5-15%	Elevated. Some MNI gates are too permissive.	Review and tighten gates for the agent pairs contributing most to OER.
15-30%	High. Significant over-sharing.	Reduce trust parameters. Audit MNI gate configurations. Investigate whether MNI definitions are complete.
30%+	Critical. Trust budgets are not functioning.	Emergency review. Consider swarm isolation changes. Potential compliance exposure.

The Uncomfortable Design Question

The Trust-Vulnerability Paradox forces a question that enterprise AI architects have been avoiding: how much capability are you willing to sacrifice for security?

Most enterprise AI projects are evaluated on task performance. Multi-agent systems are funded because they complete complex workflows more effectively than single-agent alternatives. The research proves that maximizing task performance requires maximizing trust, which maximizes data exposure. Constraining trust to manage exposure means accepting lower task performance.

There is no architecture that eliminates this tradeoff. It is structural. The enterprises that will deploy multi-agent systems successfully are the ones that acknowledge the tradeoff explicitly, set their exposure tolerances before deployment rather than after an incident, and build monitoring systems that keep their actual exposure within their stated tolerance.

The ones that pretend the tradeoff does not exist will discover it when their incident response team explains what "Over-Exposure Rate" means in the context of a data breach notification.