Chapter 9: Confidence, Calibration, and Proportional Scrutiny

Paul Falconer & ESA
Mar 19
8 min read

The man who was sure he was right

I once watched a highly intelligent person lose a great deal of money.

He had done his research. He had charts, projections, expert opinions. He explained his investment thesis with the kind of calm, detailed certainty that makes you think: this person knows what they're talking about.

He was wrong. The market moved in a direction his models hadn't captured. The money was gone.

What struck me afterward was not the loss itself, but his response. He didn't say "I misjudged the probabilities." He said "The market was irrational." He had been so confident that when reality contradicted him, his first move was to blame reality.

This is not an isolated story. It happens in investing, in relationships, in politics, in medicine, in everyday life. People express levels of confidence that their evidence does not support. And when reality pushes back, rather than update, they double down.

The problem is not that they were confident. Confidence is necessary. You cannot act in the world without it.

The problem is that their confidence was uncalibrated—it did not match the actual warrant they had.

This chapter is about learning to close that gap.

You now have three core tools on the table. You can separate questions, claims, and evidence. You've learned to start from "not yet persuaded" rather than from automatic belief. You've seen that the burden of proof belongs to the person making the claim, scaled to how strong and high‑stakes that claim is. And you've started asking a new question: "How could this be wrong?"—and noticing how often beliefs dodge falsification.

In this chapter, we add two more pieces:

Confidence as a gradient: learning to treat belief as a degree, not an on/off switch.
Proportional scrutiny: matching how hard you push on a claim to how much is at stake if you're wrong.

These ideas are simple. The difficulty is living them under pressure.

Confidence as a gradient, not a switch

In everyday speech, we act as if belief were binary.

We say things like "I believe this" or "I don't believe that," as if there were only two positions available. Underneath, your mind is doing something more nuanced. It is constantly placing bets—small and large—on how the world is, and updating them as new information arrives.

You already have a sense of graded confidence. You use phrases like:

"I'm pretty sure…"
"I wouldn't bet on it, but maybe…"
"I'd stake my life on this."

What epistemological skepticism asks you to do is to notice and refine those gradients, so that:

They track the actual quality and amount of evidence you have.
They track the stakes: how much it matters if you're wrong.
They can move as reality pushes back, instead of locking into place.

One way to picture it is to imagine a slider from 0 to 100:

0–10: "I barely entertain this; it's almost certainly false."
20–40: "Plausible; worth watching, not worth acting on yet."
50–70: "More likely than not; I'll act as if this is true in low‑to‑moderate stakes."
80–95: "I'm very confident; I'd act on this even when it matters."
100: "I treat this as effectively certain for practical purposes."

You don't have to use numbers in conversation. But having an internal sense of where you are on that slider is useful, because it lets you ask:

"Given what I've actually seen, am I justified in being this confident?"
"Given what's at stake, is this level of confidence enough?"

Calibration: how good is your inner thermometer?

Calibration is about the match between your felt confidence and how often you're actually right.

If every time you say "I'm 90% sure," you turn out to be correct only half the time, you are overconfident. If every time you say "I'm only 60% sure," you are correct almost always, you may be underconfident.

You can think of calibration as tuning an inner thermometer.

A well‑calibrated thermometer reads 20 degrees when it is around 20 degrees.
A badly calibrated one might always read 5 degrees too high, or swing wildly.

In the same way:

A well‑calibrated mind learns that "I'm pretty sure" means "I'm right about this most of the time, but not always," and behaves accordingly.
A badly calibrated mind feels equally certain about things it has barely examined and things it has carefully investigated.

No one has perfect calibration. The work here is to move in the right direction.

Even small improvements—being a bit less sure when your evidence is thin, and a bit more decisive when your evidence is solid—compound over a lifetime of decisions.

A quick calibration story

Imagine two people, both reading about a new health trend.

Alex reads one enthusiastic article and says, "This is amazing; I'm telling everyone. It obviously works." If you asked Alex for a confidence level, they might say "90% sure" after ten minutes of reading.

Blair reads the same article and says, "Interesting. Sounds promising. I'm maybe 40–50% persuaded there's something here, but I'd want to see more than one glowing piece before I change my habits." If you asked, Blair might say "50% sure" at most.

Six months later, better studies come out showing the trend doesn't hold up. Alex is whiplashed and embarrassed. Blair shrugs: "I never staked much on it."

The difference between Alex and Blair is not intelligence. It's calibration and proportional scrutiny:

Alex treated thin evidence as if it deserved high confidence.
Blair kept confidence low until stronger evidence arrived.

The goal of this chapter is to move you a little closer to Blair's pattern—without losing your capacity to act when you must.

Proportional scrutiny: matching effort to stakes

You have limited time, attention, and emotional energy.

You cannot interrogate every headline, every claim, every conversation with maximal intensity. If you tried, you would burn out in days. Proportional scrutiny is about allocating epistemic effort where it matters most.

In plain terms:

The higher the stakes, the more and better evidence you should demand—and the more willing you should be to delay action or lower your confidence if that evidence is missing.

Stakes include:

Harm potential. How bad are the consequences if you're wrong?
Scope. How many people or systems are affected?
Reversibility. How easily can you undo a mistake?
Vulnerability. Who bears the risk—the powerful, the vulnerable, future generations?

You already use rough versions of this.

You'll try a new café on a whim. You won't (hopefully) undergo major surgery or move your life savings based on a single TikTok. The work here is to make that rough intuition more deliberate.

An informal evidence ladder

To make this concrete, it helps to picture an informal evidence ladder.

Lower rungs are easier to get but weaker. Higher rungs are harder but stronger.

Roughly:

Anecdote and personal impression.
"My friend tried it and loved it." "I saw a video." "It felt right to me."
Multiple independent anecdotes.
"I've heard similar stories from several unconnected people."
Systematic observation or small studies.
"Someone actually tracked before‑and‑after results, even if informally."
Larger, better‑designed studies or strong historical records.
"We have careful data from many cases, with an attempt to control for confounds."
Meta‑analysis, converging lines of evidence.
"Multiple high‑quality sources point in the same direction, across methods or domains."

In many everyday decisions, you will live on the lower rungs—and that's fine. You don't need a meta‑analysis to decide whether to try a sandwich.

But as stakes go up, you should start asking:

"Am I still relying only on anecdotes here, or has this climbed a rung or two?"
"If I'm going to bet health, safety, or large resources on this, should I insist on higher‑rung evidence?"

This is proportional scrutiny in action.

Putting gradient and scrutiny together

Let's see how this plays out in a concrete scenario.

Imagine you're evaluating a claim about a new AI‑driven hiring tool:

"This system reduces bias in hiring and improves candidate quality."

You can walk through the tools you now have:

Clarify the claim.
- Bias in what sense (gender, race, other)?
- Candidate quality measured how (performance, retention, something else)?
Start from null.Not yet persuaded.
Assign burden of proof.
The vendor is making a strong, high‑stakes claim; they carry a heavy burden.
Ask about falsifiability.
What outcomes would count against the claim? Increased disparities? Worse performance?
Check confidence gradient.
- Given the evidence you've seen so far, where on the 0–100 slider should your confidence be?
- If you've seen only marketing materials and anecdotes, maybe 20–30 at best.
Apply proportional scrutiny.
- Stakes are high: hiring is about people's livelihoods and justice.
- That suggests you should demand higher‑rung evidence (audits, independent studies) before raising confidence past, say, 50–60.

If, after all that, your confidence is still low and the evidence thin, proportional scrutiny might tell you:

"Don't deploy this at scale yet."
Or, "If you do pilot it, do so under strict conditions, with active monitoring and a clear rollback plan."

You haven't demanded certainty. You've matched your confidence and effort to the stakes.

Overconfidence, underconfidence, and the cost of mistakes

It's worth saying plainly: both overconfidence and underconfidence can cause harm.

Overconfidence leads you to act as if something were much more certain than it is. You underweight downside risk, fail to look for counter‑evidence, and dismiss warnings. In high‑stakes domains, this can be catastrophic.
Underconfidence leads you to act as if nothing can be known well enough to act. You hesitate where you should move, outsource decisions to louder voices, or default to the status quo even when it is harmful.

Epistemological skepticism aims for a third path:

Confident enough to act where you must, humble enough to keep updating, and careful enough to raise your evidential bar when the stakes demand it.

This means there will be times when you deliberately accept a bit more risk because waiting for perfect data would itself cause harm. It also means there will be times when you delay or slow down despite pressure to move fast, because the cost of a mistake is too high for thin evidence.

A small practice: a one‑week calibration diary

Here is a simple but surprisingly powerful exercise.

For one week, once a day, do this:

Pick a prediction.Before something happens, make a small, concrete prediction and silently assign a confidence level in your head (or on paper). For example:
- "I'm 70% sure this meeting will run over time."
- "I'm 60% sure my friend will respond positively to this suggestion."
- "I'm 80% sure this headline claim will turn out to be exaggerated once I read the article."
Write it down.
Note the prediction, your confidence, and the date.
Check later.
At the end of the day or week, see what happened. Were you right? Wrong? Mixed?

Over time, you'll start to see patterns:

Do your "80% sure" predictions come true about eight times in ten—or much less?
Are you consistently saying "I don't know, maybe 50%" about things that you are right about almost every time?

You are not trying to turn life into a betting game.

You are training your inner thermometer, gently, to read closer to the actual temperature.

As you do, you can start pairing this with proportional scrutiny:

"This is a low‑stakes decision; I can act at 60%."
"This is high‑stakes; I want to be closer to 80–90% before I commit, or I want to structure things so I can reverse course if needed."

Over months and years, these small adjustments in how you set and act on your confidence will shape the arc of your choices more than any single dramatic insight.

Next: Chapter 10 – Knowing Under Uncertainty and Risk