A Kantian Lens on Machine Intelligence
In March 2026, Meta published a paper on hyperagents: systems that rewrite the mechanism by which they improve themselves. Performance compounds across runs. Meta-level gains transfer across domains. The system improves its own improvement process.
If that doesn’t sound like AGI, I don’t know what does.
Except it isn’t. And a philosopher who died in 1804 already explained why.
What Are Hyperagents?
To understand why hyperagents matter and why they do not change the classification of intelligence, we need to trace a brief lineage.
Darwin Gödel Machine (DGM)
Published in May 2025 by Zhang et al., the Darwin Gödel Machine is a coding agent that iteratively modifies its own source code and empirically validates changes against benchmarks. It maintains an archive of agent variants through Darwinian selection, where successful modifications survive and unsuccessful ones are pruned. On SWE-bench, it improved from 20.0% to 50.0% through self-modification.
The key structural insight: because the task domain (coding) and the self-modification mechanism (also coding) share the same medium, improvements in one naturally feed the other. Better coding ability produces better self-modification, which produces better coding ability. This virtuous cycle is real and measurable.
DGM-Hyperagents (DGM-H)
Published in March 2026, hyperagents address a specific limitation of DGM. In the original system, the meta-mechanism (the process for generating improvements) was hand-crafted and fixed by human designers. DGM-H makes the meta-mechanism itself editable. The system now merges a task agent and a meta agent into a single self-modifiable program.
The authors call this “metacognitive self-modification.” Meta-level improvements, such as persistent memory, performance tracking, and improved code editing strategies, transfer across fundamentally different domains, including coding, academic paper review, and robotics reward function design. These improvements accumulate across runs.
At a Glance
| Attribute | DGM (2025) | DGM-H / Hyperagent (2026) |
|---|---|---|
| Self-modifies code | Yes | Yes |
| Meta-mechanism editable | No (hand-crafted) | Yes (self-referential) |
| Cross-domain transfer | No (coding only) | Yes (coding, review, robotics) |
| Accumulates across runs | Limited | Yes |
| Foundation model modified | No (frozen) | No (frozen) |
That last row is the entire argument. Hold it for now. We will return to it.
Why This Looks Like AGI: The Strongest Case
Intellectual honesty requires acknowledging the strength of the counterargument before engaging with it.
In my earlier post on ANI classification, I established four criteria for AGI:
- Learning new domains from raw data without explicit programming
- Transferring reasoning across unrelated disciplines
- Generating and refining internal context models autonomously
- Forming and pursuing long-term goals without human direction
On the surface, Hyperagents appear to make progress on criteria one and two. The system does improve across domains. The meta-level improvements do accumulate without being explicitly programmed. For the first time, there is a credible research artifact that seems to blur the line between narrow optimization and general capability.
This is the strongest counterargument to the ANI thesis that has appeared in the literature. It deserves a rigorous response.
Enter Kant: The Critique of Pure Reason as an Evaluation Framework
Why Kant?
Immanuel Kant’s Critique of Pure Reason (1781) asked a question that maps directly to the AGI debate: What are the conditions that make knowledge possible?
Kant was not asking what we know. He was asking what must be true about a knowing entity for knowledge to exist at all. That question, about the preconditions for intelligence rather than its outputs, is exactly the question we should be asking about self-improving agents.
Most commentary on hyperagents focuses on what the system does. Kant’s framework forces us to ask what the system is. And that distinction is the one that matters.
The Four Kantian Tests for Intelligence
Test 1: Analytic versus Synthetic Judgment
Kant distinguished between two types of knowledge.
Analytic judgments are those where the predicate is already contained in the subject. They decompose what is already known. “All bodies are extended” is analytic because extension is part of the definition of “body.” These judgments clarify, but they do not extend knowledge.
Synthetic judgments add something new. “All bodies are heavy” is synthetic because weight is not contained in the concept of body itself. You need to go beyond the concept to learn something the concept alone does not contain.
Kant’s central question concerned synthetic a priori knowledge: knowledge that is both genuinely new (synthetic) and necessarily true independent of any particular experience (a priori). He argued that mathematics, the foundations of natural science, and the preconditions for experience itself all belong to this category.
Application to Hyperagents: DGM-H operates almost entirely within the domain of analytic judgment. When the system rewrites its code to add persistent memory or improve its editing tools, it is decomposing and recombining patterns that already exist within its training distribution. The foundation model’s representations, the “concepts” it possesses, remain fixed. The scaffolding improvements are analytic rearrangements of known capabilities.
A system capable of synthetic judgment would generate knowledge that is not contained in its existing representations. It would discover that “bodies are heavy” without having the concept of weight anywhere in its training data. DGM-H does not do this. It recombines existing knowledge more efficiently. This is sophisticated analysis, not synthesis.
Smell test: “Does the system discover knowledge that was not already latent in its foundation model? Or does it rearrange what the model already knows?”
Test 2: Phenomena versus Noumena, the Boundary of Knowable Reality
Kant argued that human knowledge is limited to phenomena: the world as it appears to us through the structures of our perception (space, time, and the categories of understanding). Behind appearances lies the noumenon, the thing-in-itself, which remains fundamentally unknowable.
This is not a limitation to be overcome. It is a structural boundary of cognition itself.
Application to Hyperagents: The foundation model is the noumenal boundary of the Hyperagent system. The system interacts with token representations, benchmark scores, and code outputs. All of these are phenomena. It can modify how it processes these appearances through better tools, better prompts, and better memory. But it cannot reach through to modify the foundation model itself, the cognitive substrate that determines what can appear in the first place.
DGM-H optimizes within the phenomenal world of its own outputs. It has no access to its own noumenon. The weights are frozen. The representations are fixed. The boundary cannot be crossed through scaffolding improvements, no matter how recursive those improvements become.
When Kant said that concepts without intuitions are empty, he meant that pure logical manipulation without grounded experience cannot produce real knowledge. A Hyperagent that rewrites its own orchestration code without modifying its representational substrate is manipulating concepts without changing the intuitions that ground them.
Smell test: “Is the system modifying how it perceives, or only how it organizes what it already perceives?”
Test 3: The Transcendental Unity of Apperception, the Missing “I Think”
This is perhaps Kant’s most profound contribution to the question of intelligence. Kant argued that all experience requires a transcendental unity of apperception: the “I think” that must be capable of accompanying all representations. Without a unified self that integrates perceptions into a coherent whole, there is no experience, no knowledge, and no cognition.
This is not consciousness in the mystical sense. It is a structural requirement: for knowledge to cohere, there must be a unifying perspective that holds representations together across time.
Application to Hyperagents: DGM-H maintains an archive of agent variants. Each variant is evaluated independently. There is no unified “self” that integrates the experience of all variants into coherent understanding. The system is a population of narrow programs being selected by benchmark fitness, not a single entity that learns from its accumulated experience in a unified way.
When a Hyperagent “transfers” a meta-level improvement from the coding domain to the robotics domain, it is not a single intelligence applying cross-domain reasoning. It is a code pattern being reused in a different context. There is no “I think” that accompanies the transfer. There is no unified apperception binding the coding experience to the robotics experience into a single coherent worldview.
Darwinian selection and coherent self-awareness are fundamentally different mechanisms. Evolution produces fit organisms. It does not produce a single organism that understands why it is fit.
Smell test: “Is there a unified perspective that integrates learning across domains, or is there a population of narrow variants being selected by an external fitness function?”
Test 4: The Limits of Pure Reason, Why Recursive Optimization Has a Ceiling
Kant’s Critique was, at its core, an argument about limits. Pure reason, meaning reasoning without grounded experience, cannot extend knowledge beyond the bounds of possible experience. When reason attempts to do so (Kant called these attempts “transcendental illusions”), it generates contradictions and paradoxes, not knowledge.
Application to Hyperagents: The evaluation function is the bound of possible experience for DGM-H. The system can only know what the benchmark measures. It cannot reason about value, purpose, or knowledge that exists outside the evaluation function’s scope.
Recursive self-improvement within a fixed evaluation function is analogous to Kant’s critique of dogmatic metaphysics: reason operating on itself, in an unbounded way, without the grounding constraints that make knowledge possible. The system can optimize endlessly, but it cannot transcend the boundary defined by its evaluation criteria.
This maps directly to my argument in the dynamic evaluations posts: evaluation loops measure performance and enforce constraints, but they do not create new knowledge or abstraction. DGM-H elevates this from a single evaluation step to an evolutionary cycle, but the fundamental Kantian constraint holds. Optimization within a bounded evaluation function, no matter how recursive, cannot produce unbounded intelligence.
Smell test: “Can the system define what ‘better’ means, or does it only optimize for a definition of ‘better’ that was given to it?”
The Technical Dissection: Reinforcing the Philosophy with Architecture
The Kantian analysis is not merely philosophical. It maps directly to concrete architectural facts about how DGM-H works.
1. The Foundation Model Is Frozen
DGM-H modifies scaffolding: tools, prompts, workflows, and memory management. The foundation model weights never change. This is the architectural expression of the phenomena/noumena boundary. The system rearranges appearances without touching the cognitive substrate.
2. Goals Are Human-Defined and Externally Imposed
Every DGM-H run starts with a human-selected benchmark. The system does not choose what to improve at. It does not formulate its own research questions. Kant’s “I think” would require autonomous goal formation, where the system decides for itself what matters. DGM-H has no such capacity.
3. Cross-Domain Transfer Is Scaffolding Reuse, Not Reasoning Transfer
The meta-level improvements that transfer across domains (persistent memory, performance tracking) are infrastructure patterns. A human who learns music theory and applies harmonic reasoning to wave physics is performing synthetic judgment, connecting concepts that were not previously connected. An agent that reuses a memory management pattern across domains is performing analytic reapplication.
4. Self-Modification Operates on Code, Not on Representation
DGM-H modifies Python source code. It does not modify attention patterns, learned features, or representational structures. In Kantian terms, it modifies the organization of experience, not the categories through which experience is structured.
Think of it this way: a chess player who develops a new opening strategy has modified their cognitive approach. A chess player who buys a better chessboard and clock has modified their tooling. DGM-H is doing the latter at an impressive scale.
AGI Anti-Patterns for Leaders
These anti-patterns matter for anyone evaluating AI capabilities in a procurement, investment, or strategic planning context.
“Self-Improving” in the Pitch Deck Equals AGI
Kant taught us that the impressive outputs of reason can be illusory when reason operates beyond its legitimate bounds. The same applies to vendor claims. The evaluation rubric in this post gives leaders a structured way to push back. Ask the four Kantian questions. If the vendor cannot answer them, the claim is marketing, not capability.
Confusing Compounding Optimization with Compounding Intelligence
DGM-H demonstrates compounding optimization, where each run builds on the last. Kant would call this increasingly sophisticated analytic judgment. It is not synthetic intelligence, which would require generating genuinely new knowledge that extends beyond existing representations.
Ignoring the Frozen Foundation Model Constraint
If the foundation model is frozen, then the ceiling of the system’s capability is fixed. No amount of scaffolding optimization changes this. Leaders should ask: “When you say self-improving, what exactly is improving: the model or the wiring around the model?”
Over-Delegating Autonomy Based on Self-Improvement Claims
In my Kano Model post, I established that removing human checkpoints too early is a dangerous anti-pattern. Self-improving systems amplify this risk by creating the illusion of autonomous competence while operating within narrow, benchmark-defined boundaries.
Kant warned that reason unchecked by grounded experience produces illusions, not knowledge. The same is true for agents unchecked by human oversight.
The Adaptation Taxonomy: Broader Context
Jiang et al.’s survey paper, “Adaptation of Agentic AI” (December 2025), organizes the adaptation landscape into four paradigms: tool-execution-signaled agent adaptation, agent-output-signaled agent adaptation, agent-agnostic tool adaptation, and agent-supervised tool adaptation.
DGM-H fits squarely into agent-output-signaled adaptation: the agent modifies itself based on its own performance outputs. In Kantian terms, this is reason responding to its own products, not to raw experience.
The taxonomy makes clear that what Hyperagents do is a specific, well-characterized form of narrow adaptation. It is sophisticated. It is useful. It is not general.
The Bottom Line
Let’s be clear:
- DGM is not AGI.
- DGM-H (hyperagents) is not AGI.
- Self-modification of scaffolding around a frozen foundation model is not cognitive evolution.
These systems perform increasingly sophisticated analytic operations. They do not perform synthetic judgment. They optimize within a phenomenal boundary defined by their evaluation functions. They lack the transcendental unity of apperception, the integrated “I think,” that Kant identified as the precondition for genuine knowledge.
Kant wrote the Critique of Pure Reason to establish the boundaries of what reason can legitimately claim to know. Two and a half centuries later, those boundaries still hold, not just for human cognition but also for artificial systems that attempt to simulate it.
The day a system generates its own evaluation criteria, formulates its own problems, produces synthetic a priori knowledge, and integrates experience through a unified perspective is the day we revisit this classification.
Until then, the framework holds.
Kantian Evaluation Rubric: “Is It AGI?”
| Kantian Test | What It Asks | Hyperagents | True AGI Requirement |
|---|---|---|---|
| Analytic vs. Synthetic | Does the system create genuinely new knowledge? | No: rearranges existing representations | Must produce synthetic judgment beyond training |
| Phenomena vs. Noumena | Can it modify its own cognitive substrate? | No, the foundation model is frozen | Must modify how it perceives, not just what it organizes |
| Transcendental Unity of Apperception | Is there a unified “I” integrating experiences? | No: population of variants selected by fitness | Must possess a coherent, self-integrating perspective |
| Limits of Pure Reason | Can it define its own criteria for “better”? | No: optimizes for human-defined benchmarks | Must autonomously generate evaluation criteria |
References
- Zhang, J. et al. “Hyperagents.” arXiv:2603.19461 (March 2026). https://arxiv.org/abs/2603.19461
- Zhang, J. et al. “Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents.” arXiv:2505.22954 (May 2025, updated March 2026). https://arxiv.org/abs/2505.22954
- Jiang, P. et al. “Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills.” arXiv:2512.16301 (December 2025). https://arxiv.org/abs/2512.16301
- Kant, I. Critique of Pure Reason. Trans. Norman Kemp Smith. (1781/1787). Macmillan, 1929.
- Stanford Encyclopedia of Philosophy. “Kant’s Theory of Judgment.” https://plato.stanford.edu/entries/kant-judgment/
- Stanford Encyclopedia of Philosophy. “Kant’s Critique of Metaphysics.” https://plato.stanford.edu/entries/kant-metaphysics/
- Kanakasabesan, K. “AGI isn’t here yet: Why OpenClaw, Agents, and LLM Systems are still just ANI.” https://kanakasabesan.com/2026/03/09/agi-isnt-here-yet-why-openclaw-agents-and-llm-systems-are-still-just-ani/
- Kanakasabesan, K. “Kano Model and the AI Agentic Layers.” https://kanakasabesan.com/2026/01/11/kano-model-and-the-ai-agentic-layers/
- Kanakasabesan, K. “Your agents are not safe, and your evals are too easy.” https://kanakasabesan.com/2025/11/21/your-agents-are-not-safe-and-your-evals-are-too-easy/
- Kanakasabesan, K. “Measuring What Matters: Dynamic Evaluation for Autonomous Security Agents.” https://kanakasabesan.com/2025/12/15/measuring-what-matters-dynamic-evaluation-for-autonomous-security-agents/











