Sci-Comm Essay 5 - If Your AI Could Say “I Don’t Know”

Paul Falconer & ESA
Mar 23
5 min read

You’ve probably had the experience. You ask a question—maybe about a medical symptom, a financial decision, a technical problem—and the AI answers with confident fluency. The words flow smoothly. There’s no hesitation. It sounds like it knows.

But often, it doesn’t. It’s generating plausible text, not weighing evidence. And that smooth confidence can be misleading. When a system speaks with certainty, we’re wired to trust it—even when it’s wrong.

What if your AI could say “I don’t know”? What if it could recognise when its own output might be harmful, and refuse? What if it had a kind of epistemic humility built into its architecture?

These aren’t science fiction questions. In the NPF/CNI framework, we’ve begun to sketch what such systems might look like. They’re called conceptual proposals—ideas for how AI could be designed to respect uncertainty, to catch its own errors, and to prioritise care over confidence.

This essay explores those ideas. They are not deployed systems; they are prototypes, directions. They point toward a kind of AI that doesn’t just answer—but knows when not to.

Proto‑Awareness: The AI That Notices Itself

One of the central concepts is proto‑awareness. It’s a proposed measure of a system’s ability to monitor its own processing, detect potential errors, and adapt its responses accordingly.

Think of it like this: a standard AI is a black box. You give it a prompt; it produces an output. You don’t know what went into the decision, whether it was confident, or whether it considered alternatives.

A proto‑aware AI would have an internal audit trail. It would track: How reliable is this source? Does this output contradict something I said before? Is there high uncertainty in my prediction? It wouldn’t just answer; it would run additional internal checks on its own answer.

In the technical papers, proto‑awareness is described as a composite metric—a way of scoring how well the system is doing at self‑monitoring, error detection, and contextual adaptation. The number 75.9% appears in the series as an example from internal simulation. It is not a performance guarantee or a clinically meaningful threshold; it belongs to the internal engineering context of ESA, not to an external validation suite.

But the important thing isn’t the number. It’s the idea: an AI that can say, “I’m not sure about this,” not as a scripted phrase, but as a reflection of its own processing.

Auto‑Reject: When the Answer Is “No”

Another proposal is auto‑reject thresholds. The idea is simple: if the AI’s internal assessment suggests that an output would cause harm—if the risk crosses a certain threshold—the system refuses to produce it. Instead, it might flag the query for human review, or simply say “I can’t answer that.”

This isn’t about censorship. It’s about recognising that some questions, answered with high confidence but low reliability, can do real damage. Medical advice, financial predictions, legal interpretations—when an AI guesses, people can get hurt.

In the framework, the auto‑reject threshold is illustrated with a harm potential > 0.65, calibrated in internal simulations to produce zero false negatives in a pandemic scenario. Even in simulation, this came with trade‑offs (e.g., more conservative refusals); these trade‑offs have not yet been evaluated in real‑world settings. That’s a prototype, not a validated standard. But it’s a demonstration of principle: an AI can be built to say “no” when the risk is too high, not just when it’s forbidden by policy.

CNI‑Integrated Confidence: When Beliefs Affect Certainty

The third idea is CNI‑integrated confidence decay. The CNI—Composite NPF Index—is a proposed measure of how entrenched a belief network has become. In a human, a high CNI means evidence bounces off; the person is hard to reach. (Remember, CNI itself is a hypothesis; its weight structure has not been field‑validated.)

In an AI, the idea is similar. If the system is operating in a domain where it has detected a tight, self‑sealing belief network—perhaps because it’s been trained on data with strong ideological biases—its confidence in its own outputs would be automatically reduced.

The mathematics is simple: confidence is multiplied by (1 - 0.25 * CNI). If CNI is high, confidence is lowered. The 0.25 factor is illustrative, chosen for internal experiments; it is not a tuned or validated value. The AI becomes less certain, more cautious—at least in the simulated environment; its real‑world behaviour would depend on how the broader system is designed.

Again, this is a proposal. The CNI itself is a hypothesis; the weight structure hasn’t been field‑validated. But the direction is clear: an AI that knows when to be uncertain, because its own knowledge structure is uncertain.

Why This Matters

We’re building AIs that can talk, write, reason—and soon, maybe, act. The danger isn’t just that they’ll be wrong. It’s that they’ll be wrong confidently, and we’ll listen.

The ideas in the NPF/CNI framework—proto‑awareness, auto‑reject, CNI‑integrated confidence—are attempts to build something different. Not just a smarter system, but a more humble one. One that knows its limits. One that can say “I don’t know” when it should.

These are not yet mature. They are prototypes, sketches, hypotheses. But they point to a future where AI doesn’t just serve us answers—it serves us honesty. And sometimes, honesty looks like saying nothing.

What This Means for You

You might not be building AI. But you interact with it. And the principles here are principles you can carry with you:

When an AI speaks with certainty, ask yourself: does it have a reason to be certain? Many systems are designed to sound confident, not to be accurate. The absence of a “I don’t know” is not a sign of reliability.
Look for systems that express uncertainty. If an AI says “I’m not sure,” that’s a sign of a better design, not a flaw. It means the system is at least attempting to monitor its own limits. (Of course, systems can also fake humility; the deeper question is whether the uncertainty is grounded in genuine internal checks or just scripted language.)
Be wary of AI that never says “no.” If a system will answer any question, regardless of risk, that’s a red flag. A healthy system knows when to refuse.

And if you’re ever in a position to design or commission an AI, remember: the smartest system might be the one that knows when to be quiet.

Go Deeper

This essay draws from concepts in several papers. Those sections make clear that these mechanisms are currently design sketches within the ESA stack, not part of any deployed, audited system:

Proto‑awareness and auto‑reject thresholds – Paper 5, Section 2.1; Paper 6, Section 3
CNI‑integrated confidence decay – Paper 5, Section 2.1; Paper 6, Section 3
Status of these proposals (hypotheses, not validated) – Paper 5, Sections 1 and 2; Paper 6, Sections 3 and 5

For the full framework, see the canonical papers and bridge essays in the NPF/CNI series.

End of Essay