Toward an Intelligence Management Curriculum
Anthropic Built Something It Won't Release — The Restraint That Should Terrify Every Educator and Change How We Teach Everything
I did not plan to write this article today. But when I read Anthropic’s Project Glasswing announcement on April 7, 2026, I found myself unable to continue with my other work. I kept returning to what was being described — not as a news consumer, but as someone who has spent years working at the intersection of artificial intelligence, education, and computational literacies, and who understands, with some precision, what these announcements mean for the people I work with and for.
I work in AI in education. My professional responsibilities include tracking frontier AI developments, studying how these systems affect learning and knowledge work, and asking what they demand of the educators, students, and institutions that will live with their consequences. I am not a skeptic of AI — I have spent much of my career arguing that AI, designed thoughtfully and deployed equitably, can expand access and serve human flourishing. I say all of this not to establish authority but to explain why I feel compelled to write in a register that is, for me, unusually direct: what Anthropic announced represents a genuine step change in artificial intelligence science. And I believe the AI in education community, the learning sciences community, and the broader educational technology field have not yet reckoned with what it means. This piece is my attempt to begin that reckoning.
There is a moment in reading Anthropic’s Project Glasswing announcement when the weight of what is being described becomes impossible to redirect. The company unveils a frontier model called Claude Mythos Preview. This model — which Anthropic states explicitly it will not release to the general public — autonomously found a twenty-seven-year-old vulnerability in OpenBSD, a system specifically hardened against exactly this kind of scrutiny for nearly three decades. It found a sixteen-year-old flaw in FFmpeg that automated testing tools had scanned five million times without detecting. Without any human direction, it chained together multiple Linux kernel vulnerabilities to escalate from ordinary user access to complete machine control.
These are not demonstrations of a better search engine. They are demonstrations of expert judgment.
And here is what I cannot stop thinking about: if this machine has crossed into the domain of expert judgment — not in one narrow field but across software engineering, PhD-level science, adversarial security, and complex reasoning — what, precisely, are we doing in our classrooms?
A Warning Hidden in Restraint
When a company that exists to deploy AI models builds one and then chooses — deliberately, explicitly — not to give the public access to it, they are telling us something about what the model is capable of. The published system card describes a model that scores 93.9% on SWE-bench Verified, the most rigorous software engineering benchmark available. It scores 94.6% on GPQA Diamond — a benchmark designed around PhD-level scientific reasoning where credentialed human experts perform near 70%. On Humanity’s Last Exam, a test specifically constructed to defeat AI systems, Mythos Preview answers 56.8% of questions correctly without tools, and 64.7% with them.
For readers who do not routinely follow AI capability benchmarks, I recognize that those acronyms may read as technical noise. I understand that. But I follow this field as a professional obligation, and I want to be direct with you: those numbers are genuinely terrifying to me. Not in a rhetorical sense. In the sense that I read them, sat with them, and could not find a reassuring interpretation.
These are not scores that describe a capable assistant. They describe a system that is competitive with — and in several domains superior to — the most highly trained human experts alive.
Anthropic’s restraint is not a pause in development. It is an acknowledgment that a threshold has already been crossed. And in the field of education, we have almost entirely failed to reckon with what that threshold means.
What the Machine Can Now Do
Let me be specific, because specificity is what this moment requires.
Claude Mythos Preview identified vulnerabilities in every major operating system and every major web browser — autonomously, without human steering. The OpenBSD vulnerability it found had survived twenty-seven years of expert human review and millions of automated scans. The FFmpeg vulnerability had evaded five million automated tests. These are not edge cases. They represent the combined best effort of human expertise and automated tooling over decades of sustained attention.
In pure software engineering tasks drawn from real production codebases, the model scores 77.8% on SWE-bench Pro. Opus 4.6, Anthropic’s most capable publicly available model, scores 53.4% on the same benchmark. The gap between what exists in production labs and what our universities are preparing students to do is not incremental. It is structural.
The Collapse of the Expert as Curriculum
For most of human history, the primary function of formal education has been the transmission and cultivation of expertise. We teach students to read, then to read critically, then to analyze, then to argue, then to situate arguments within scholarly discourse — building, over twelve to twenty-two years, toward a level of cognitive performance in a domain that enables independent original contribution. We do this in law, medicine, engineering, the humanities, the sciences. The summit of this process has traditionally been the expert: someone who, through years of deliberate practice and accumulated knowledge, can navigate the hardest problems in their field.
That summit is what the machine has now reached — and in certain domains surpassed.
This is not a hyperbolic claim. It is a description of the benchmark evidence Anthropic itself published. When a model scores 94.6% on PhD-level scientific reasoning, it is not performing text prediction. It is a participant in the domain. When it autonomously finds and chains kernel exploits, it is doing security research of a quality that would be recognized as professional contribution.
The implication for curriculum is severe and has not been adequately named in the literature. We have been constructing education systems that funnel students — over a decade or more — toward a level of expertise that a model can now approximate in minutes. The curriculum was designed for a world in which expertise was scarce, hard-won, and the legitimate object of years of training. That world has materially changed, and we are still teaching as though it has not.
I do not think this means education is obsolete. I think it means we have been teaching the right thing toward the wrong target.
Toward an Intelligence Management Curriculum
If the machine has expertise, what does the human need?
The answer I am working toward — and I offer it here as a hypothesis for the field, not a settled conclusion — is that humans need to become competent managers of intelligence itself. Not administrators of AI systems in a narrow technical sense, but something closer to what a chief editor is to a newsroom, or what a lead investigator is to a forensic team: a person who understands what the intelligence sources can and cannot do, knows how to frame the right questions, maintains accountability for conclusions reached, and refuses to surrender judgment simply because an automated system is confident.
I call this intelligence management, and I believe it deserves recognition as a distinct pedagogical aim — one that cuts across existing disciplines rather than replacing them.
An intelligence management curriculum would not resemble today’s AI literacy initiatives, which tend to focus on surface familiarity: how large language models work at a functional level, what prompt engineering involves, how to recognize AI-generated text. These are useful but radically insufficient. They prepare students to use tools. They do not prepare students to govern them.
Intelligence management, by contrast, requires something harder and something that has no precedent in our existing curriculum frameworks. It requires students to develop:
-
Epistemic authority under delegation — the capacity to evaluate the output of a highly capable system critically, even when the student cannot independently reproduce the underlying work. A physician who cannot perform a genome sequencing must still know when to trust and when to contest the result. This is a distinct and teachable cognitive skill, and we do not currently teach it.
-
Calibrated skepticism — not reflexive distrust of AI outputs, but trained sensitivity to the specific failure modes of these systems: confident confabulation, training-data blind spots, optimization toward what is measurable rather than what is true, and systematic overconfidence in domains at the edges of the training distribution.
-
Moral non-delegability — the recognition that certain decisions must not be handed to machines, not because machines lack relevant information, but because accountability is intrinsically human. Who is responsible when the automated exploit scanner misses a critical flaw? Who answers when the AI-assisted hiring system encodes historical discrimination? Intelligence management treats moral agency as a domain where human presence is not optional and not negotiable.
-
Question quality — the capacity to formulate problems in ways that extract genuine value from powerful systems, rather than confirming prior assumptions or generating sophisticated-sounding noise. This is not prompt engineering in the narrow technical sense. It is the deep intellectual skill of knowing what you do not yet know and being able to ask for it precisely.
None of these capacities are currently the primary object of any major curriculum I can identify. They exist in the interstices — touched in philosophy of science, in some critical thinking programs, in medical ethics seminars — but they are not coherently organized as a unified pedagogical aim, and they are not treated with the urgency the moment demands. They should be both.
What Survives the Machine, and What We Cannot Yet Know
I want to be honest about the limits of my own argument here, because I believe scholarly honesty is itself a form of intelligence management.
I do not know what the machine will be capable of in two years. Claude Mythos Preview represents a discontinuous leap over Opus 4.6, and Anthropic is explicit that development is ongoing and accelerating. The domain of human irreducibles — creativity, wisdom, moral agency, relational presence — may be shrinking faster than any of us are comfortable acknowledging. Arguments that these capacities will always remain beyond machine reach have been eroded steadily and should not be deployed as simple reassurance or as political cover for curriculum inaction.
What I am confident about is this: even bracketing the unresolved questions about machine consciousness and moral status entirely, there remains a non-trivial near-term window — measured in years, not decades — during which human society will require large numbers of people capable of governing AI systems, contesting their outputs, and maintaining accountability for their effects. That governance capacity is not being taught systematically anywhere I can identify. And the urgency of teaching it is increasing faster than the curriculum is moving.
The failure to act now is itself a form of causal inversion: we will have the machine fully embedded in schools, hospitals, courts, and public administration before we have produced the humans who are supposed to oversee it.
We are deploying superintelligent systems into every major institution of democratic life while operating educational institutions designed to produce experts in a world where expertise is no longer scarce. This is not a gap. It is a structural failure.
An Emergency Research Agenda
I am writing this as a direct call to my colleagues in learning sciences and society sciences, and I mean it in the strongest terms available to academic discourse: this is an emergency.
Not in the rhetorical sense that advocacy uses to claim urgency for incremental policy preferences. In the precise, literal sense that a capability threshold has been crossed, that the distance between what AI systems can now do and what our educational institutions are preparing people to manage is large enough to constitute a systemic social risk, and that the window during which deliberate intervention could change the trajectory is shorter than the planning cycles of most research institutions and most grant programs.
I am asking learning scientists to take up the following questions as a coordinated, funded, urgent research agenda — not as individual curiosity projects but as a field-level response:
First: What does competent oversight of highly capable AI systems actually look like, cognitively? What specific capacities does it require, how do they develop, and what learning experiences produce them? We do not have a validated account of this. We need one urgently.
Second: How do we design learning environments that maintain productive struggle and epistemic agency when a highly capable AI can always provide the answer? The risk is not only skill atrophy in the narrow sense. It is the atrophy of a learner’s sense of themselves as a knower — the motivational and identity foundations of intellectual life.
Third: What does appropriate trust calibration look like for non-expert users of systems that perform at or above the expert threshold across multiple domains simultaneously? The literature on trust in automation (Lee and See, 2004) was written for a world of narrow, specialist tools. It does not adequately address a system that is simultaneously a better software engineer, a better security researcher, and a better scientific reasoner than most of the humans relying on it. That literature needs urgent and substantial updating.
Fourth: How do we teach moral non-delegability — the principle that certain decisions must involve human accountability — in a culture that is normalizing delegation at speed? This is at once a moral education question, a democratic theory question, and a learning sciences design question. No single field owns it. All of them need to engage it.
I am asking society scientists — sociologists, political scientists, economists, science and technology studies scholars — to treat the Glasswing announcement not as a news item but as evidence of a structural transition that requires the same quality of sustained empirical attention we have given to industrialization, the diffusion of literacy, and the construction of mass schooling systems. Those transitions reshaped societies over generations. This one is moving faster, and we have less time to build the scholarship that could inform an adequate institutional response.
The restraint Anthropic showed in not releasing Claude Mythos Preview is not a policy model the rest of the world can simply adopt. It is a single company's judgment call made under competitive and liability pressure. The world needs a curriculum — not a corporate disclaimer.
I am less interested in teaching students to be competent performers of tasks the machine can also perform — and in several cases perform better. I am more interested in teaching students to be competent interrogators of machine performance: to ask why the system answered as it did, to identify what assumptions are encoded in the output, to decide what to do with the result, and — most critically — to take responsibility for the decision.
That shift — from performer to interrogator, from subject matter expert to steward of intelligence — is not a retreat from intellectual rigor. It is a demand for a different and arguably more demanding kind of rigor: the kind that does not disappear when the answer is readily available, and that does not abdicate when the system is confident.
The machine has found the vulnerability. It has passed the exam. It has written the code. The question that remains — and that no model on any benchmark can answer for us — is: who is qualified to read the results, and who is accountable for what we do with them?
That question is the curriculum. And we are not yet teaching it.
References
Anthropic. (2026, April 8). Project Glasswing: Securing critical software for the AI era. https://www.anthropic.com/glasswing
Lee, J. D., and See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80.
About the Author
Sai Gattupalli, Ph.D. is the founding scientist of Society & AI, where he studies how artificial intelligence reshapes learning, knowledge, and access across educational and social contexts. His work spans AI in education, computational literacies, and society-centered AI governance.
Cite this article
Gattupalli, S. (2026). Toward an intelligence management curriculum. Society and AI. https://societyandai.org/perspectives/intelligence-management-curriculum/
Write for Society & AI
Society and AI is dedicated to exploring the profound convergence of artificial intelligence, education, and the broader social fabric. We welcome scholarly contributions that navigate this tripartite intersection, offering rigorous yet accessible insights into how intelligent systems are reconfiguring the way we learn, govern, and live together. Scholars, educators, policymakers, and practitioners are invited to submit proposals to submissions@societyandai.org.