Tag Archives: multi-agent

The Map Is Becoming the Territory: What the Repository Becomes in an Agentic World

From a Record of What We Built to the Interface Agents Act Through


In 1494, two empires divided a world neither had finished exploring. The Treaty of Tordesillas drew a meridian on a chart, 370 leagues west of Cape Verde, and declared everything east of it Portuguese and everything west of it Castilian. No fleet had crossed most of that line. Yet the line governed where ships could sail, which conquests were legitimate, and who could trade with whom for the next century. The map stopped describing the world. It started dictating action in it.

Alfred Korzybski gave us the warning that the map is not the territory. He meant it as a caution against confusing a representation with the reality it stands for. In the agentic world, that caution is quietly inverting. The repository was always a map: a record of what humans had already built. It is becoming the territory itself, the live surface that agents inhabit and act through. When an agent consults boundaries.md before generating a payment service, it is not reading a description of the system. It is reading the system.

I argued in my last post, The Repository Has a Read Side and a Write Side, that this shared substrate is a correlated failure surface and that we must govern it as a commons. That post asked how do we govern the substrate. This one asks a prior and more structural question: what is the repository now. The answer is that it is no longer a noun. It is becoming a verb.


The Old Repo Was a Filing Cabinet

For fifty years the repository was a record. It stored what humans wrote, tracked who changed it, and handed the result to a compiler. The authority lived in people. Pull requests were reviewed by humans, merged by humans, and reasoned about by humans. The repo was passive infrastructure, valuable precisely because it sat still and remembered.

Artur Huk’s “Context as Code” names the shift bluntly. The most strategically valuable material in a repository may no longer live in src/. It lives in /context, where intent, boundaries, and threat models are declared before a line is generated. Huk’s frame is build-time governance: assemble the agent’s working context from prioritized artifacts, then enforce declared boundaries with deterministic checks so structurally invalid code cannot merge. The senior engineer’s new job, he argues, is declarative boundary engineering: stating what the system is forbidden from doing.

That is correct, and it understates the consequence. If context is the input that determines agent behavior, then the repository is no longer where we keep the system. It is where the system runs from. The map has become the territory.

Attribute Repo as Filing Cabinet Repo as Live Territory
Primary consumer Human developers Agents at generation time, humans at review time
Most valuable contents src/ /context: intent, boundaries, threat models, evals
Function Stores what was built Determines what gets built next
Read pattern Occasional, by people Continuous, by many agents
Authority People reviewing diffs Declared artifacts enforced deterministically
Failure of a bad entry One buggy release A propagated constraint that misgoverns a fleet

That last row is why the role change matters and is not a vocabulary game. When the artifact is a record, a mistake ships once. When the artifact is the territory, a mistake is the ground every agent walks on.


The Agentic Repository Stack

The repository does not change role in one way. It absorbs four distinct functions it never held before, bound together by a fifth that runs through all of them, and it acquires an economic shape that product leaders cannot ignore. Call it the Agentic Repository Stack.

Layer What the repo becomes What it serves Grounded in
1. Context Server A source of operating instruction Intent, boundaries, threat models, eval harnesses Huk, MCP
2. Collaboration Space A coordination hub Agent proposals, patches, eval runs MAST, coordination-layer work
3. Executable Governance An enforcement plane Machine-interpretable policy Semgrep, OPA, CodeQL
4. Provenance Authority A chain of custody Signed authorship, agent identity, trust SLSA, Sigstore
Spine. Explainability A query interface The rationale behind every constraint VerifyMAS, failure attribution
Economics. Context as Product A capitalized asset Owners, SLAs, and context debt Product discipline

The four layers are the new functions. Explainability is the spine that makes them safe. The economics are what turn the whole thing from an architecture diagram into a budget line. Take them in order.


Layer 1: The Repo Becomes a Context Server

Agents do not want files. They want context, scoped to the task in front of them, delivered at the moment of generation. A coding agent asked to add a payment notification needs the billing domain’s boundaries, its threat model, and its acceptance criteria, and it needs them assembled, prioritized, and conflict-resolved before inference begins. That is not a filing cabinet operation. It is a server operation.

The Model Context Protocol is the early plumbing for exactly this: a way for an agent to query the workspace and assemble the boundaries that apply, without a human manually pasting Markdown into a prompt. The repo stops being something you check out and becomes something you query.

The benefit is real and is the strongest case for context as code. One reviewed threat model can ground a hundred downstream generation cycles with a consistency no per-agent prompting can match. The cost is equally real. A served interface needs a schema. Context manifests, scoping rules, and freshness guarantees stop being nice-to-haves and become the contract. And as I argued in Your LLM Has a State Management Problem, the moment context is served rather than stored, it inherits every coherence problem that caching solved twenty years ago. Stale context is a stale cache read, and the agent cannot tell.

Smell test: “Is your context something an agent queries at generation time, or something a human pastes into a prompt by hand?”


Layer 2: The Repo Becomes a Multi-Agent Collaboration Space

The more ambitious move, and the one most teams are quietly building toward, has agents writing back. Agents that generate code will also propose boundaries, draft threat vectors, and refactor the eval harness. The repository becomes a coordination hub where agents and humans negotiate the rules together.

This is where the published failure data should temper the enthusiasm. The MAST taxonomy from Cemri and colleagues (arXiv:2503.13657), validated across more than sixteen hundred execution traces, maps fourteen failure modes to three root causes: specification ambiguity, coordination breakdown, and verification gaps. Nechepurenko and colleagues (arXiv:2605.03310) report that multi-agent systems fail in production overwhelmingly because of coordination defects, not weak base models. The bottleneck is not how smart any single agent is. It is how the agents hand off and check one another.

Apply that to a repo where agents author the rules. If an agent proposes a change to boundaries.md and another agent that shares its context approves it, you have built what Huk calls a circular hallucination: the system politely revalidates its own blind spot. Collaboration without independent verification is not collaboration. It is consensus inertia wearing a pull request.

Smell test: “When an agent changes a guardrail, is it reviewed by something that does not share its context, or is the system grading its own homework?”


Layer 3: The Repo Becomes the Executable Governance Plane

A guardrail written only in prose is a suggestion. A guardrail compiled into a deterministic rule is a law. The repository’s third new role is to hold both: the natural-language artifact that biases the agent’s generation and the machine-interpretable policy that rejects violations mechanically.

Huk’s pattern pairs a Markdown boundaries.md with a semgrep-rule.yml so the same boundary that guides the model also fails the build deterministically. The tools here are mature and decidedly not AI: Semgrep, Bandit, and CodeQL for code-level invariants, Open Policy Agent and Rego for runtime and infrastructure policy. We do not ask a probabilistic model to certify that a boundary survived generation. We execute a prewritten rule that a human reviewed.

This is the layer that lets a leader simulate a policy change before applying it, version the policy like production code, and prove compliance rather than assert it. It is also the layer most likely to ossify. As I noted in Foundation First, Not AI First, deterministic enforcement is what makes the intelligence reliable, but determinism enforces only what was explicitly declared. It can block a forbidden import. It cannot judge whether the architecture is sound. A repo that mistakes its rule set for its judgment will enforce the wrong thing perfectly.

Smell test: “Can every high-privilege guardrail in your repo fail a build on its own, or does it depend on a human or a model noticing the violation?”


Layer 4: The Repo Becomes the Provenance Authority

When humans were the only authors, provenance was a courtesy. Git blame told you which colleague to ask. When agents author artifacts, provenance becomes a safety system. You need to know whether a constraint was declared by a named architect, generated by an agent, or quietly mutated by a process no one is watching.

This is where software supply-chain security stops being hygiene and becomes governance. Signed commits, SLSA provenance, and Sigstore attestation give the repository a chain of custody: who or what produced this artifact, under what authority, and whether the signature verifies. That chain is what lets you apply graduated trust instead of binary trust. A new or anomalous constraint earns authority gradually. A suspect one is quarantined, not trusted on arrival.

The honest difficulty is that identity for non-human authors is not a solved problem. We have decades of tooling for attesting that a human or a CI system produced an artifact. We have very little for attesting that this specific agent, operating under this specific policy version, produced it, and that its authority to do so had not been poisoned upstream. Provenance is the layer most likely to be theater: signatures collected and never verified. A chain of custody no one checks is a chain of custody that does not exist.

Smell test: “For every artifact that governs agent behavior, can you name the author, verify the signature, and say how much you trust it today?”


The Spine: Explainability Is the Query Interface, Not a Footnote

Here is why explainability is not a feature bolted onto the agentic repo. It is the legend on the map. A territory you cannot read is a territory you cannot navigate, and a constraint whose reason is unknown is a constraint no agent can apply intelligently and no auditor can trust.

Consider the difference rationale makes on each layer. A threat model that forbids an outbound network call in the billing domain is a rule. A threat model that also records why, the abuse path it blocks, the incident that motivated it, the owner who declared it, is something an agent can reason against. An agent that knows why a boundary exists can flag an action that collides with the boundary’s intent, not merely its letter. A reviewer can spot a poisoned artifact because the stated rationale no longer matches the rule. When a system fails, the question shifts from the unanswerable “what was the agent thinking” to the tractable “which contract failed to govern.” Recent work on failure attribution in multi-agent systems, such as VerifyMAS (arXiv:2605.17467), is building exactly this: failures reframed as verifiable hypotheses over a full trajectory rather than guesses about one agent’s intent.

This is the same argument I made in Your Agents Are Not Safe and Your Evals Are Too Easy and in Measuring What Matters. A guardrail that cannot explain what it protects against is a guardrail you cannot trust an agent to maintain. Explainability is what makes the other four layers governable by the very agents that read and write them. Without it, the context server serves rules no one can audit, the collaboration space produces changes no one can review, the governance plane enforces logic no one can question, and the provenance authority attests to artifacts whose purpose is opaque. The legend is what makes the territory usable.

Smell test: “Does every high-privilege artifact say why it exists, in a form an agent and an auditor can both use?”


The Economics: Context Becomes a Product, and Context Debt Becomes a P&L Line

The product-strategist conclusion is the one most engineering writing on this topic skips. If /context determines behavior, it is not documentation. It is a product. It has consumers, agents and humans, it has a contract, the schema and the rationale, and it has a failure cost measured in misgoverned generation cycles.

That reframing has teeth. Documentation is treated as free, optional, and perpetually deferred. A product is owned, versioned, reviewed, and budgeted. The difference shows up the moment context goes stale. Huk’s term for this is context debt, and his observation is sharp: the pipeline enforces strictly whatever was declared, even when the declaration is wrong. Stale context is worse than no context, because it carries the authority of a rule while encoding an obsolete decision. That is a liability, and like any liability it belongs on the balance sheet, not in a backlog labeled “tech debt, someday.”

This connects to the argument I made about agentic layers in the Kano Model post. Context freshness is a performance attribute. No one writes it on a requirements list, and everyone notices the instant it degrades. The product leader’s job is to assign owners to high-privilege context, set review SLAs weighted by blast radius, and treat the rate of context debt accrual as a metric with the same standing as latency or cost.

Smell test: “Does your most consequential context artifact have a named owner, a review cadence, and a place in someone’s budget, or is it an orphaned Markdown file with infinite blast radius?”


Anti-Patterns for Leaders

These matter in any platform, procurement, or org-design decision involving agentic systems.

Anti-Pattern Why It Fails The Correction
Repo as Documentation Casual Markdown rots, and the pipeline enforces the rotten version faithfully Govern context as production code: owned, versioned, peer-reviewed
Context Without a Schema A served interface with no manifest is a cache with no coherence policy Define context manifests, scoping, and freshness guarantees
Agents Grading Their Own Context A reviewer that shares the author’s context produces circular hallucination Independent verification by something outside the author’s context
Rules Without Reasons A guardrail with no rationale cannot be applied by an agent or trusted by an auditor Attach why, owner, and motivating incident to every high-privilege artifact
Provenance Theater Signatures collected and never verified are no chain of custody at all Verify attestation; apply graduated trust and decay
Determinism as Judgment Static checks prove compliance with declared invariants, not architectural correctness Keep humans on the semantics; automate only what is mechanically decidable

The last row connects to the warning that has run through this canon since the ANI classification post. Enforcement is not cognition. A repository that can mechanically reject a forbidden import is not a repository that understands your architecture. Mistaking the first for the second is how teams over-delegate autonomy to a system that is enforcing yesterday’s decision with today’s confidence.


The Bottom Line

Let me be clear.

  • The repository has changed role, not just contents. It was a record of what we built. It is becoming the interface agents act through.
  • It absorbs four new functions: context server, collaboration space, executable governance, and provenance authority.
  • Explainability is the spine that makes those functions safe. Rationale is the legend without which the territory cannot be read or trusted.
  • Context is now a product with a real liability. Context debt belongs on the balance sheet, owned and reviewed by blast radius, not deferred as documentation.

Korzybski told us the map is not the territory, and he was right about representations. But the warning was about confusing the two by accident. We are now doing it on purpose, by design, at scale. We are building repositories that agents do not consult to understand the system. They consult them to be the system. The line on the chart governs the fleets before they sail.

The Treaty of Tordesillas held until reality, the parts of the world the mapmakers had never seen, finally forced a redrawing. Our /context directories will face the same test. The question for every leader building toward this future is whether the map you are drawing today is one you will be willing to act on tomorrow, when a thousand agents treat it not as a description of your system but as the ground they stand on.

Draw it as if it were the territory. Because it is becoming exactly that.


References

  1. Huk, A. “Context as Code.” O’Reilly Radar (June 2026). https://www.oreilly.com/radar/context-as-code/
  2. Cemri, M., Pan, M. Z., Yang, S. et al. “Why Do Multi-Agent LLM Systems Fail?” arXiv:2503.13657 (2025). https://arxiv.org/abs/2503.13657
  3. Nechepurenko, M. et al. “Coordination as an Architectural Layer for LLM-Based Multi-Agent Systems.” arXiv:2605.03310 (May 2026). https://arxiv.org/abs/2605.03310
  4. “VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems.” arXiv:2605.17467 (May 2026). https://arxiv.org/abs/2605.17467
  5. Milosevic, Z. and Odell, J. “Architecting Agentic Communities using Design Patterns.” arXiv:2601.03624 (January 2026). https://arxiv.org/abs/2601.03624
  6. Korzybski, A. Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics. (1933).
  7. Kanakasabesan, K. “The Repository Has a Read Side and a Write Side: Governing the Agentic Commons.” https://kanakasabesan.com/2026/06/04/the-repository-has-a-read-side-and-a-write-side-governing-the-agentic-commons/
  8. Kanakasabesan, K. “Foundation First, Not AI First.” https://kanakasabesan.com/2026/05/04/foundation-first-not-ai-first/
  9. Kanakasabesan, K. “Your LLM Has a State Management Problem. Distributed Systems Solved It in 2005.” https://kanakasabesan.com/2026/05/25/your-llm-has-a-state-management-problem-distributed-systems-solved-it-in-2005/
  10. Kanakasabesan, K. “Kano Model and the AI Agentic Layers.” https://kanakasabesan.com/2026/01/11/kano-model-and-the-ai-agentic-layers/
  11. Kanakasabesan, K. “Your Agents Are Not Safe and Your Evals Are Too Easy.” https://kanakasabesan.com/2025/11/21/your-agents-are-not-safe-and-your-evals-are-too-easy/
  12. Kanakasabesan, K. “Measuring What Matters: Dynamic Evaluation for Autonomous Security Agents.” https://kanakasabesan.com/2025/12/15/measuring-what-matters-dynamic-evaluation-for-autonomous-security-agents/
  13. Kanakasabesan, K. “AGI isn’t here yet: Why OpenClaw, Agents and LLM Systems are still just ANI.” https://kanakasabesan.com/2026/03/09/agi-isnt-here-yet-why-openclaw-agents-and-llm-systems-are-still-just-ani/