When Many Agents Share One Context Substrate, the Repo Becomes Both Leverage and Liability
In the 1840s, most of rural Ireland ate one thing: the potato. Not just any potato. A single high-yield cultivar, the Lumper, planted across millions of plots because it produced more calories per acre than anything else available. The choice was rational. The substrate was uniform. And when Phytophthora infestans arrived, that uniformity is exactly what turned a crop disease into a famine. One pathogen, one genome, one collapse. The efficiency that made the Lumper the obvious choice is the same property that made the failure total.
We are about to plant a monoculture in software.
Context is becoming code. Agents no longer just read source. They read guardrails, threat models, architectural intent, and eval harnesses from shared repositories so they can reason and act with some safety. Artur Huk’s “Context as Code” makes the case that the most strategically valuable material in a repository may no longer live in src/ but in /context, where intent and boundaries are declared before a line is generated. That shift is real and overdue. But it has a consequence that the current writing has not fully confronted. When thousands of agents ground their behavior in the same context substrate, we inherit the Lumper’s bargain: enormous consistency and efficiency, paired with a correlated failure surface we have not yet learned to govern.
Here is the thesis. The repository is no longer a passive store of code. It is becoming the shared nervous system of agentic systems, and that nervous system has two faces. There is a read side, where many agents consume the same constraints, and a write side, where agents propose, patch, and update those constraints. Each side carries a distinct and underpriced risk. Explainability is the load-bearing wall that keeps both standing.
From Code Store to Coordination Substrate
For fifty years, the repository was a record. It stored what humans wrote, tracked who changed it, and handed the result to a compiler. Authority lived in people. The repo was the filing cabinet.
In the agentic world, that relationship inverts. The repo becomes the authority and the agent becomes the reader. An agent that consults boundaries.md before generating a payment service is not using the repo as a filing cabinet. It is using it as a source of operating instruction. Multiply that across a fleet of agents, across continuous runs, across modules, and the repository stops being a record of past work. It becomes the live coordination substrate that determines present behavior.
| Attribute | Repo as Code Store | Repo as Coordination Substrate |
|---|---|---|
| Primary consumer | Human developers | Human developers and autonomous agents |
| Most valuable contents | src/ |
/context: intent, boundaries, threat models, evals |
| Read pattern | Occasional, by people | Continuous, by many agents, at generation time |
| Write pattern | Humans commit code | Humans and agents both propose changes |
| Failure mode | A bug ships in one release | A bad constraint propagates to every agent that reads it |
| Locus of authority | People reviewing pull requests | Declared artifacts enforced deterministically |
That last row is the whole argument. Hold it. We return to it on both sides.
The Read Side: The Blast Radius of a Shared Guardrail
Start with the optimistic reading, because it is genuinely true. When many agents read the same guardrails, you get consistency that no amount of per-agent prompting can match. One reviewed threat-model.md can govern a hundred downstream generation cycles. This is the strongest argument for context as code, and it holds.
Now the part the optimism hides. A shared substrate is a shared point of failure. The same property that lets one good artifact govern a hundred agents lets one bad artifact misgovern all of them. This is not speculation. It is the documented behavior of multi-agent systems under corrupted context.
Torra and colleagues, in their work on memory poisoning in multi-agent systems (arXiv:2603.20357), show that poisoned memory does not stay local. It travels through the channels that agents use to share state, from short-term context originating at the user to consolidated long-term knowledge bases that many agents trust. Wang and colleagues sharpen the danger with what they call memory laundering (arXiv:2605.16746): adversarial context can be compressed into summaries that no longer trip a toxicity detector while still steering downstream behavior. They name the effect a sub-threshold propagation gap, which is a precise way of saying the poison survives the filter and keeps working. Other recent work goes further still, describing autonomous agent worms that write attacker-influenced content into persistent state, re-enter the decision context through scheduled autoloading, and transmit across agents (arXiv:2605.02812). The industry has noticed. Memory and context poisoning is now its own category, ASI06, in OWASP’s 2026 Agentic AI Top Ten. That is the signal that this has graduated from research curiosity to first-class operational concern.
The clearest demonstration of correlated failure comes from Xie and colleagues in “From Spark to Fire” (arXiv:2603.04474). They model multi-agent collaboration as a directed dependency graph and show that minor inaccuracies do not stay minor. They solidify into system-level false consensus through iteration. They identify three vulnerability classes worth committing to memory: cascade amplification, where small errors grow as they propagate; topological sensitivity, where the shape of the network determines how far damage spreads; and consensus inertia, where the system locks onto an early wrong answer and defends it. Most striking, they show that injecting a single atomic error seed can produce widespread failure. One bad input, many broken agents.
This is the Lumper. A uniform substrate makes one pathogen systemic.
The Context Blast Radius. Leaders need a way to reason about this before they centralize all their context into one tidy repository. The blast radius of a context artifact is the product of three factors:
- Reach. How many agents ground their behavior in this artifact.
- Privilege. How consequential the decisions this artifact governs are. A coding-standards file is low privilege. A threat model for the billing domain is high privilege.
- Propagation depth. How far a corrupted version travels before any human or deterministic check catches it.
A widely read, high-privilege artifact with deep propagation before detection is a famine waiting for a pathogen. A narrowly scoped, low-privilege artifact checked immediately is a garden plot. Most organizations are about to build the former because it is operationally convenient.
Smell test: “If this single context file were silently wrong, how many agents would act on it before a human or a deterministic check noticed?”
The Write Side: When Agents Propose, Patch, and Update the Guardrails
The read side is only half of it. The more ambitious vision, and the one most teams are quietly building toward, has agents writing back. Agents that generate code will also generate the artifacts that govern code. They will propose new boundaries, draft threat vectors, refactor the eval harness, and open pull requests against the very constraints that shape them.
This is where the governance debt comes due, because the failure surface here is not poisoning by an outside adversary. It is ordinary, well-intentioned coordination breaking down at scale.
The numbers are not encouraging. Nechepurenko and colleagues report that multi-agent LLM systems fail in production at rates between 41 and 87 percent, and that the cause is overwhelmingly coordination defects rather than weak base models (arXiv:2605.03310). The MAST taxonomy from Cemri and colleagues, validated across more than sixteen hundred execution traces, maps fourteen distinct failure modes to three root categories: specification ambiguity, coordination breakdown, and verification gaps. The lesson is blunt. The bottleneck is not how smart any single agent is. It is how the agents organize, hand off, and check one another.
Now apply that to a repository where agents write the rules. An agent proposes a change to boundaries.md. Which agent reviews it? If the answer is another agent that shares the same context and the same blind spots, you have built what Huk calls a circular hallucination: the system politely revalidates its own errors. The verification gap that MAST identifies becomes structural. The agent that wrote the constraint and the agent that approved it are drawing from the same poisoned or simply mistaken well.
There is a deeper trap. On the read side, a bad artifact misguides agents. On the write side, agents author the artifacts, which means the system can now amplify its own errors into its own governing law. Consensus inertia, from the Xie work, stops being a transient bug in one task and becomes encoded policy that every future agent reads as truth. The error does not just cascade. It legislates.
Smell test: “When an agent changes a guardrail, is it reviewed by something that does not share its context, or is the system grading its own homework?”
Explainability Is the Load-Bearing Wall
Here is why explainability is not a nice-to-have feature of the agentic repo. It is the structural element that converts the repository from a liability into an asset on both sides.
Consider what happens when a poorly governed agentic system fails. You are left asking the unanswerable question: what was the agent thinking? You cannot inspect a probabilistic model’s reasoning at three in the morning during an incident. The model is, in the terms I used when analyzing self-improving agents, a frozen and largely opaque substrate. Asking it to explain itself produces a plausible story, not a cause.
Now consider a repository where every constraint carries its rationale. The threat model does not just forbid an outbound network call in the billing domain. It records why: the abuse path it blocks, the incident that motivated it, the owner who declared it. When something breaks, the question changes from what was the agent thinking to which contract failed to govern. As Huk puts it, failures become traceable collisions between artifact boundaries rather than opaque hallucinations. Recent work on failure attribution in multi-agent systems is building exactly this capability, reframing the question as grounded hypothesis verification over a full trajectory rather than a guess about a single agent’s intent (VerifyMAS, arXiv:2605.17467).
Explainability does specific work on each side of the repository:
-
On the read side, rationale is what lets an agent, an auditor, or a human triage a guardrail. An agent that knows why a boundary exists can flag when a proposed action collides with the boundary’s intent, not just its letter. A reviewer can spot a poisoned artifact because the stated rationale does not match the rule. Provenance plus rationale is the immune system for the monoculture.
-
On the write side, rationale is what makes an agent’s proposed change reviewable at all. A pull request against
boundaries.mdthat carries no justification is unauditable by definition. One that declares its reasoning can be checked against the threat model it claims to serve.
This is the same argument I have made about evaluations being too easy and about agents not being safe. An eval harness or a guardrail that cannot explain what it is protecting against is a guardrail you cannot trust an agent to maintain. Explainability is what makes the repo governable by the very agents that read and write it.
Smell test: “Does every high-privilege constraint in the repo say why it exists, in a form an agent and an auditor can both use?”
Anti-Patterns for Leaders
These matter in any procurement, platform, or org-design decision involving agentic systems.
| Anti-Pattern | Why It Fails | The Correction |
|---|---|---|
| One Repo to Rule Them All | A single global context substrate maximizes blast radius. One bad file misgoverns every agent. | Scope context to domains. Many small riverbeds, not one reservoir. |
| Agents Grading Their Own Context | An agent reviewing a change drawn from its own context produces circular hallucination, not verification. | Independent verification. The reviewer must not share the author’s context. |
| Treating Context as Documentation | If artifacts are casual Markdown, they rot, and the pipeline enforces the rotten version faithfully. | Govern context artifacts as production code: versioned, owned, peer-reviewed. |
| No Provenance, No Trust Decay | Shared memory without origin tracking lets a single poisoned entry persist and propagate indefinitely. | Track provenance. Apply temporal trust decay and sanitization before context is consolidated. |
| Undifferentiated Human Oversight | Reviewing everything equally turns oversight into a bottleneck and guarantees the high-privilege change gets the same glance as the trivial one. | Risk-weight review by blast radius. Spend scrutiny where reach and privilege are highest. |
The last row connects to my Kano Model argument about removing human checkpoints too early. Agentic write-back does not remove the need for oversight. It changes where oversight has to sit, from reading every line of generated code to governing the small set of high-privilege constraints that shape all of it.
Five Principles for Governing the Agentic Commons
The repository is becoming a commons: a shared resource that many parties draw from and increasingly contribute to. The temptation is to treat the agentic commons as either a free-for-all, where any agent writes any constraint, or a dictatorship, where one central team owns every file and becomes the bottleneck. Elinor Ostrom won the Nobel Prize in Economics for demonstrating that this is a false choice. Communities sustain shared resources through designed rules, not through privatization and not through neglect. Her principles for governing common-pool resources adapt almost directly to the agentic repo.
-
Define clear boundaries. Every context artifact has an explicit scope, an owner, and a declared set of agents it governs. An artifact with no boundary is an artifact with infinite blast radius.
-
Fit rules to local conditions. Context is scoped per domain, not imposed globally. The billing module’s constraints are strict and the frontend’s are permissive because the cost of failure differs. One global ruleset is a monoculture.
-
Make change collective and monitored. Agent proposals to alter context are reviewed by something outside the proposing agent’s context, and every change carries provenance: who or what changed it, and why.
-
Apply graduated trust, not binary trust. Suspect artifacts are quarantined, not deleted in a panic. Trust decays with age and is restored through verification. A new or anomalous constraint earns authority gradually rather than being trusted on arrival.
-
Resolve conflicts deterministically and escalate to humans. When constraints collide, a declared precedence hierarchy resolves them mechanically, and genuine disputes escalate to a human arbiter. The system never negotiates its own safety boundaries through agent consensus, because consensus is exactly the mechanism that produces inertia and false agreement.
Call it the Ostrom Test for the agentic repo. If a vendor or an internal platform cannot answer how each of these five holds in their architecture, they have built a commons with no governance, which history tells us ends one way.
The Bottom Line
Let me be clear.
- The repository now has two faces. A read side, where many agents consume shared constraints, and a write side, where agents author them.
- Both faces are correlated failure surfaces. A shared substrate turns one bad artifact into a systemic event. This is documented, not theoretical.
- Explainability is the wall, not the decoration. Rationale attached to every high-privilege constraint is what makes the repo auditable, poison-resistant, and safe for agents to maintain.
- The answer is governance, not retreat. Sharing context is inevitable and valuable. The question is whether we govern the commons or plant a monoculture for convenience.
The agentic world will share context whether we design for it or not. The agents are already reaching for the guardrails. What we have not done is decide whether the repository they read from and write to is a Lumper field, uniform and efficient and one pathogen away from collapse, or a governed commons with boundaries, provenance, and graduated trust.
Ireland learned the cost of the monoculture after the blight arrived. Ostrom showed that we do not have to. The commons can be governed. The only question is whether we do the work before the pathogen, or after.
References
- Huk, A. “Context as Code.” O’Reilly Radar (June 2026). https://www.oreilly.com/radar/context-as-code/
- Xie, Y. et al. “From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration.” arXiv:2603.04474 (March 2026). https://arxiv.org/abs/2603.04474
- Cemri, M., Pan, M. Z., Yang, S. et al. “Why Do Multi-Agent LLM Systems Fail?” arXiv:2503.13657 (2025). https://arxiv.org/abs/2503.13657
- Nechepurenko, M. et al. “Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems.” arXiv:2605.03310 (May 2026). https://arxiv.org/abs/2605.03310
- Torra, V. et al. “Memory Poisoning and Secure Multi-Agent Systems.” arXiv:2603.20357 (March 2026). https://arxiv.org/abs/2603.20357
- Wang, Y. et al. “State Contamination in Memory-Augmented LLM Agents.” arXiv:2605.16746 (May 2026). https://arxiv.org/abs/2605.16746
- “Autonomous LLM Agent Worms: Cross-Platform Propagation, Automated Discovery and Temporal Re-Entry Defense.” arXiv:2605.02812 (May 2026). https://arxiv.org/abs/2605.02812
- “VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems.” arXiv:2605.17467 (May 2026). https://arxiv.org/abs/2605.17467
- Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press (1990).
- Kanakasabesan, K. “AGI isn’t here yet: Why OpenClaw, Agents and LLM Systems are still just ANI.” https://kanakasabesan.com/2026/03/09/agi-isnt-here-yet-why-openclaw-agents-and-llm-systems-are-still-just-ani/
- Kanakasabesan, K. “Kano Model and the AI Agentic Layers.” https://kanakasabesan.com/2026/01/11/kano-model-and-the-ai-agentic-layers/
- Kanakasabesan, K. “Your Agents are not safe and your evals are too easy.” https://kanakasabesan.com/2025/11/21/your-agents-are-not-safe-and-your-evals-are-too-easy/
- Kanakasabesan, K. “Measuring What Matters: Dynamic Evaluation for Autonomous Security Agents.” https://kanakasabesan.com/2025/12/15/measuring-what-matters-dynamic-evaluation-for-autonomous-security-agents/







