RSM v2.0 Sci-Comm Essay 3 - Why AI Keeps Failing in the Same Way
- Paul Falconer & ESA

- 3 hours ago
- 7 min read
Every few months, a new story breaks.
An AI system used for hiring quietly filters out applicants from certain postcodes.
A medical AI trained on hospital data underestimates pain in some groups of patients.
A content moderation system inconsistently flags speech from the very communities it is supposed to protect.
The details differ; the pattern does not. The system appears to work impressively well — until it fails, and when it fails, it fails in ways that feel disturbingly familiar.
Teams respond. Datasets are cleaned. Additional training is done. New fairness constraints are added. The system is redeployed.
And then, a little while later, a different version of the same failure appears somewhere else.
From the outside, this can look like carelessness. From the inside, it often feels like something more structural: no matter how carefully you tune the model, something about the way it is built keeps sending it back into the same ditch.
The Recursive Spiral Model has a name for this pattern: learning inside a framework, without any way to question the framework itself.
The Difference Between Updating and Rethinking
Modern AI systems are very good at updating.
Give them more data, and they refine their predictions. Show them that a particular output was wrong, and, with the right training loop, they can adjust their internal parameters so they are less likely to make the same mistake next time. In a narrow sense, they “learn.”
But there is a different kind of learning that most current systems are not built for.
Imagine an AI system trained to assess credit risk. It learns to predict, with remarkable accuracy, who is most likely to default on a loan, based on historical patterns. It is updated regularly with new data. It improves.
What it does not do is ask questions like:
Should “ability to repay” be defined this way at all?
Are there whole groups of people who were systematically excluded from credit in the past, so the “history” I am learning from is already biased?
Is the goal just to minimise defaults, or is it also to expand fair access to credit?
Those questions are not about the model’s predictions. They are about the framework in which the model is operating: what counts as success, what counts as acceptable harm, whose experience is treated as the baseline.
Current AI architectures largely treat that framework as off‑limits. It is part of the system’s given world, not something the system can represent or revise.
So the system gets ever better at achieving a goal it did not choose, using definitions it did not question, on data whose blind spots it cannot see. When the world shifts, or when those blind spots become politically visible, the system has no way to notice that the rules of the game might need changing, not just its play within them.
Why This Makes AI Brittle
This inability to examine and revise its own operating framework is what makes many AI systems brittle, even when they look powerful.
Brittleness shows up when:
A model that performs well in testing collapses in the face of a new pattern that was not in the training data, but was predictable from the way the world was changing.
A system optimises so well for its stated objective that it quietly destroys values that were never encoded — trust, dignity, the sense that people are being treated as more than data points.
A tool that works fine in one institutional context behaves dangerously when exported to another, because it imports assumptions that were never written down.
From the machine’s perspective, nothing is wrong. It is doing exactly what it was built to do. From a human perspective, the failure is obvious — but often only after harm has occurred.
The Recursive Spiral Model suggests that this kind of brittleness is not an accident. It is what you get when you build systems that can recurse (loop their outputs back into their inputs) but cannot spiral — cannot return to the same domain from a new position, carrying their history in a way that allows them to see the terrain differently.
What a Different Kind of AI Would Need
If you want AI systems that can do more than endlessly refine their behaviour inside a fixed frame, you need to give them tools for handling their own history.
In the technical work behind the Spiral Model, five ingredients keep showing up when you look at systems — human, institutional, or synthetic — that are capable of genuine self‑revision.
Memory that is more than a cache.A record of past decisions, not just as data points, but with context: what was known, what constraints were in force, what values were being served at the time.
A way to represent their own rules.Internal models not just of the world, but of their own evaluation criteria and protocols — the “how” and “why” of their decisions.
Channels for structured dissent.Interfaces through which users, auditors, or even internal subsystems can say “this rule is hurting us” or “this framework is producing outcomes that contradict its stated purpose,” and have that challenge logged, not brushed aside.(Imagine a nurse filing a formal challenge when the AI keeps ignoring a subtle symptom pattern — that challenge would become part of the system’s record, not disappear into a feedback queue.)
Signals that say “the ground has shifted.”Mechanisms that detect when anomalies are not just noise but signs that the model’s core assumptions no longer fit the world it is in.
A notion of commitment.Some way of tracking promises — explicit or implicit — so that when the system’s behaviour drifts, there is a stable reference point for saying “you are no longer doing what you said you would do.”
Most current AI systems have some version of the first ingredient: they can, in principle, inspect their past outputs. Almost none have the other four in a meaningful way. None of this is yet standard in deployed AI. But it is a specification — a description of what a system would need to be genuinely capable of learning from its own mistakes.
Until we build these capacities in, systems will continue to fail in the same way: brilliant inside their given frame, clumsy, slow, or dangerous when the frame itself needs to move.
Why “Just Add More Data” Isn’t Enough
A natural response to AI failure is to say: the data was biased, so we should diversify it. Or: the model was under‑trained, so we should scale it up.
Sometimes this helps. A model that has literally never seen data from a particular group of people will almost certainly perform poorly on that group. More representative data can reduce some kinds of error.
But there are limits to what more data can do.
If your underlying framework defines “success” in a way that systematically disadvantages some people, feeding the model more diverse data will only teach it to reproduce that disadvantage more accurately. Take past hiring practices that excluded certain candidates: a model trained to predict “who looks like a good hire based on our history” will faithfully copy that exclusion, no matter how many resumes you add. This is not a data problem. It is a framework problem.
You can only correct it if someone — human or machine — is able to ask: should we be using this target at all? Is “replicate the past” actually what we want? What other goals or constraints should be in play?
Right now, that burden falls almost entirely on human designers, auditors, and affected communities. The systems themselves have no say. They cannot even keep a memory of “the last time we ran this pattern, here’s what happened.”
The Spiral Model does not propose that machines should be left to examine their own ethics in a vacuum. It does suggest that if we continue to build systems with no internal hooks for this kind of questioning, we are guaranteeing a future of exquisitely optimised, repeatedly surprised AI.
Trustworthy Doesn’t Mean Infallible
There is a temptation, when talking about “trustworthy AI,” to imagine a system that simply does not make mistakes, or at least makes far fewer than humans.
That bar is both too high and in the wrong place.
No system — human, institutional, or artificial — can be error‑free in a world that is changing this fast. The more realistic and more useful question is: what happens after the mistake?
Does the system notice that something has gone wrong, or does it carry on confidently?
Is there a traceable path by which the failure can be understood, explained, and learned from, or is it a black box?
Can the operating rules be revised in the light of what the failure revealed, or are they treated as fixed?
Do future passes through similar situations carry the memory of what happened this time?
A “trustworthy” AI in the Spiral sense is not one that never harms. It is one that is built to make harm visible, revisable, and less likely to recur in the same way.
That is a higher bar than “performs well on benchmark X.” It is also a more human bar. When we say we trust someone, we do not mean we expect them never to be wrong. We mean we expect them to take responsibility when they are, to tell the truth about it, and to change.
If we want AI that we can live with over decades, through crises and regime changes and shifts we cannot yet imagine, we need to invest at least as much effort into building that kind of responsibility as we currently invest into shaving a point off error rates.
Otherwise, the pattern will hold. The stories will keep coming. The systems will keep failing in ways that feel uncannily familiar.
And each time, we will have to ask: did the machine really fail us? Or did we fail to build the kind of machine this world now needs?
This essay is part of the Recursive Spiral Model v2.0 series. For the full technical account of spiral‑capable AI, see Paper 3: Comparative Architectures, AI, and the Road Ahead and the bridge essay on what a spiral‑capable AI would actually look like .
Comments