Correctness by Construction
Correctness by Construction
There's no one right way to write a program, and everybody's got an opinion on which wrong way is best. With libraries the usual axis people bicker along is opinionated versus unopinionated. Everyone carries their own coding style.
There comes a moment in every programmer's journey where they realize predictability beats raw speed. For me that moment arrived pretty early, mostly because I cut my teeth on a dynamic language. For plenty of others, the painful lesson lands far too late.
Level one, a naive check
So how do you buy yourself some predictability? The first instinct is to bolt a defensive guardrail onto the top of the function and call it a day.
function chargeCard(amount: number, currency: string) {
if (amount == null) throw new Error("amount required");
if (amount <= 0) throw new Error("amount must be positive");
if (!currency) throw new Error("currency required");
// ... the actual work
}Rather crude, but it's where almost everybody begins.
Level two, unit tests
After that you graduate to unit tests. Next, maybe something more surgical. Honestly, I reckon every programmer should meet the idea of correctness by construction. That phrase is just the label I happen to use. The same concept travels under other names: parse, don't validate, type-driven design. Some folks even fold it into functional programming, which to my mind is a slightly separate beast, though that's a tangent for another day. Its own writeup, really.
This, by a mile, is the most useful programming idea I've ever picked up. It shows up in everything I build, from a throwaway shell script to a service with thousands of users. A couple of reasons why. First, it finally handed me the peace of mind I'd been chasing for years. Every language backs the approach somewhere along a spectrum; some pull it off beautifully, while others barely manage. You can read this piece for my pick of the strongest one. Second, it shrinks the mental tax of writing software in a way nothing ever had.
I want to ratchet these examples up in difficulty gradually, since I'd like the article to serve newcomers and veterans alike. When I started out, I had no clue how deep this rabbit hole went. Your opening move is to declare a type. Types feel trivial; anyone can read one at a glance.
Level three, static typing
A rung higher sits static typing. Take that chargeCard function from before. Now the type checker won't let a caller hand it a Date where the number belongs. Once again, massive upgrade.
Still not good enough though. A negative number sails right through; it's a number, nothing to flag. So does "barbeque chicken" as a currency, exactly as easily as "NGN" or "EUR". They're all merely strings, and the checker can't tell one from another.
function chargeCard(amount: number, currency: string) {
// ...calls the payment processor
}
// every line below compiles. every line is also a bug.
chargeCard(-50, "USD"); // negative charge. processor rejects, or worse, refunds.
chargeCard(0, "USD"); // charged the card zero dollars. for what?
chargeCard(NaN, "USD"); // runtime crash inside the processor SDK.
chargeCard(50, "barbeque chicken"); // not a currency. customer's order silently fails.
chargeCard(599, "USD"); // wait, was that $5.99 or $599.00? depends what the docs said yesterday.
chargeCard(50, "usd"); // some processors are case-sensitive. transaction fails.So yeah, this is one more runtime defect. It's just marginally harder to summon. Obviously I'm keeping these snippets digestible, but I hope the broader shape is starting to come through. Now picture this routine invoked dozens of times, or a cousin of it scattered across the whole app. Data threads its way through a maze of functions that only ever check whether it happens to be a string.
Level four, where it starts getting good
Here's the genuine step up, the spot where correctness by construction starts paying rent. You refine your types until the rotten values turn uninstantiable. A PositiveAmount you can only mint through a parser. A CurrencyCode modeled as a union of the 168 ISO codes. Beyond those, nothing exists.
type PositiveAmount = number & { readonly __brand: "PositiveAmount" };
type CurrencyCode = "USD" | "EUR" | "JPY" | "GBP" | "NGN" /* ...all 168 ISO codes */;
function parsePositiveAmount(n: number): PositiveAmount | null {
if (!Number.isFinite(n) || n <= 0) return null;
return n as PositiveAmount;
}
function chargeCard(amount: PositiveAmount, currency: CurrencyCode) {
// no guards. nothing to validate. the types are the proof.
}The signature turns into chargeCard(amount: PositiveAmount, currency: CurrencyCode), and every guard from level one quietly deletes itself. All of them. The type system already proved upstream that those checks held. You validate once, right at the boundary, and never again afterward.
// every one of these now fails to compile
chargeCard(-50, "USD"); // error: number is not assignable to PositiveAmount
chargeCard(50, "barbeque chicken"); // error: "barbeque chicken" is not a CurrencyCode
chargeCard(NaN, "USD"); // can't even get here. parsePositiveAmount(NaN) returns null.
// the only way through the door
const amount = parsePositiveAmount(req.body.amount);
if (amount === null) return badRequest("amount must be positive");
chargeCard(amount, "USD"); // every call site downstream gets the proof for freeThe whole thing gets snagged at compile time, which means you cannot ship it. The defect never escapes your laptop.
There's a cost to all this, which we'll get into later. I call them considerations; most people would just call them drawbacks. To me they're a fair price for code that's correct. A bargain, honestly.
Level five, making invalid states unrepresentable
Then we climb to level five. Refined types stop bad values from existing in the first place. Algebraic data types reach further and forbid bad combinations of otherwise-fine values. You quit modeling state as a grab-bag of flags where half the permutations are gibberish, and instead model it as a finite roster of named states where every variant is, by construction, internally coherent and lawful.
// flags let you build nonsense: logged out, yet somehow holding a user
type Session = { loggedIn: boolean; user: User | null };
// { loggedIn: false, user: someUser } compiles. good luck downstream.
// states make that combination impossible to even write
type Session =
| { kind: "anon" }
| { kind: "authed"; user: User };The bugs this prevents look completely different from one domain to the next, so I'll show it twice. Same idea, two examples.
An order state machine in Rust
// the bag-of-flags version. accountant's nightmare.
struct Order {
is_paid: bool,
is_shipped: bool,
is_refunded: bool,
is_cancelled: bool,
paid_at: Option<DateTime<Utc>>,
shipped_at: Option<DateTime<Utc>>,
tracking: Option<String>,
refund_amount: Option<u64>,
}
// is_paid: true, is_refunded: true, paid_at: None.
// refunded a payment that never happened. money out the door.
// is_shipped: true, is_cancelled: true.
// cancelled an order after it left the warehouse. who eats the cost?
// is_paid: false, is_shipped: true.
// shipped an item we never charged for. customer wins, you lose.
// every one of these compiles. every one is somebody's incident report.
// the ADT version. these states cannot exist.
enum Order {
Pending { items: Vec<Item>, total: PositiveAmount },
Paid { items: Vec<Item>, total: PositiveAmount, paid_at: DateTime<Utc> },
Shipped { items: Vec<Item>, total: PositiveAmount, paid_at: DateTime<Utc>, tracking: TrackingNumber, shipped_at: DateTime<Utc> },
Delivered { tracking: TrackingNumber, delivered_at: DateTime<Utc> },
Refunded { original_total: PositiveAmount, refunded_at: DateTime<Utc>, reason: RefundReason },
Cancelled { cancelled_at: DateTime<Utc>, reason: CancellationReason },
}
// you cannot construct `Order::Shipped` without a paid_at. the compiler refuses.
// you cannot have a Refunded without an original_total. that field lives on Refunded, not Pending.
// every variant is, by construction, a coherent business state.
// there is nothing to defend against because the bad states do not exist.An RPG player in TypeScript
// the bag-of-flags version. any MMO player has seen this bug ship.
type Player = {
isAlive: boolean;
isDead: boolean;
isInCombat: boolean;
currentHp: number;
respawnTimer: number | null;
deathLocation: Vector3 | null;
inventoryUnlocked: boolean;
};
// isAlive: true, isDead: true.
// schrodinger's character. half your damage calls do nothing.
// isDead: true, currentHp: 100.
// invincible corpse. the death check already passed; the HP check never re-fires.
// isInCombat: true, isDead: true.
// you're being attacked. you are also dead. enjoy the rubber-banding.
// isDead: true, inventoryUnlocked: true.
// open inventory while dead, drop the legendary, log out, log in, repeat.
// the auction house economy is now over. ask any Diablo player.
// the ADT version. four states. nothing else exists.
type Player =
| { kind: "alive"; hp: PositiveInt; position: Vector3; combat: CombatState }
| { kind: "dying"; deathFromHit: DamageEvent; position: Vector3 }
| { kind: "dead"; deathLocation: Vector3; respawnAt: Timestamp }
| { kind: "respawning"; respawnAt: Timestamp; spawnPoint: SpawnPoint };
type CombatState =
| { kind: "peaceful" }
| { kind: "engaged"; enemies: EnemyId[]; engagedAt: Timestamp };
// the dupe glitch literally cannot ship.
// the code that opens the inventory window has to switch on the player state.
// the `dead` variant has no inventory branch to write, because the field is not on the variant.
// the field simply does not exist on a dead player; absence enforced at compile time.
// you cannot be in combat while dead, because `combat` lives on the alive variant only.
// you cannot have currentHp on a corpse, because the corpse has no HP field. it has a respawn timer.Same paradigm, two domains. In one, the breakage is a refund issued against a payment that never landed. In the other, it's a duped legendary that nukes your fragile in-game economy. The remedy is identical: refuse to write types capable of expressing the impossible.
Level six, typestate
Level six hunts ordering faults rather than shape faults. Level five guarantees a Dead character carries no hp field, yet it won't stop you from calling .attack() on the corpse. Level six declares attack(this: Character<Live>), so the compiler bats away the call because this is Dead. That's the chasm between this data is well-formed and this operation is legal at this exact moment.
// phantom type parameter. State only lives at compile time.
type Character<State> = {
readonly name: string;
readonly hp: number;
readonly __state: State;
};
type Live = { readonly tag: "live" };
type Dead = { readonly tag: "dead" };
function attack(self: Character<Live>, target: Character<Live>): void {
// the swing connects. type system says both sides are alive.
}
function die(self: Character<Live>): Character<Dead> {
return { name: self.name, hp: 0, __state: { tag: "dead" } };
}
function respawn(self: Character<Dead>): Character<Live> {
return { name: self.name, hp: 100, __state: { tag: "live" } };
}
// the compiler refuses these. no asserts, no checks, no runtime guard.
declare const corpse: Character<Dead>;
declare const someone: Character<Live>;
attack(corpse, someone); // error: Character<Dead> is not assignable to Character<Live>
die(corpse); // error: already dead. can't die twice.
declare const alive: Character<Live>;
respawn(alive); // error: not dead yet. respawn doesn't exist for the living.Same Character, same fields. The state rides along inside the type, and the operations that make sense at each state are the only ones the compiler permits you to write.
"Why all this?"
The loudest, most frequent reaction I field when teaching developers this, or coaching them through it, and frankly the one I wrestled with myself as a beginner, was: why all this? Sure, it kills off a handful of bugs. Unit testing snares those too. Half of them you dodge by simply not being an idiot, passing only sane numbers, steering clear of barbeque chicken as a currency. When all else collapses, run the program once and watch what unfolds.
But that's precisely the snag. The whole thing hinges on me not being an idiot, which is something none of us can genuinely bank on.
I'm only half joking here. There's a saying:
If debugging is the process of removing software bugs, then programming must be the process of putting them in.
Edsger Dijkstra
In other words: you will write bugs. It's simply the nature of the work. There's a companion line in the same spirit.
There are two ways to write error-free programs; only the third one works.
Alan Perlis
No matter how disciplined you stay, you can't foresee and stamp out every last defect. Not even with every rung in this article climbed flawlessly.
So what's the payoff? What does any of this genuinely buy you?
For openers, it makes the bad code much harder to even jot down. In theory you can render the structure airtight. The invalid state has nowhere left to live. The bug has no shape to live in. The compiler declines to give it a home, because nothing in the program's type system can so much as name the broken and contradictory arrangement.
You can still misjudge the domain and ship logically flawed code. Yet the core point holds firm. The faults we just rendered unrepresentable are gone, and whatever lingers is a logic bug. Logic bugs stay tractable in a way shape bugs never were.
Then AI happened
The bigger reason all this matters right now is the AI age. One of the heaviest considerations around correctness by construction, and I'll grant this one really is a drawback, is that it takes longer to write upfront. I won't pretend otherwise.
My first rebuttal is the same drum I've been banging the whole article. A touch more time spent now buys a program you never sweat over again, written once and correct forever. A worthy trade though. To be fair, it remains a trade.
In the AI age, that tradeoff is mostly obsolete. It costs a sliver of the time to point an agent at a problem and watch the correct version come back.
There's a catch, naturally. Code is deterministic. Run it twice, you collect the same output twice. That's the contract. Agents flat-out refuse to honor it. The identical prompt fed to the identical model can settle on two different answers across two runs, and the divergence widens fast once there's less compute behind the agent.
Happily, all of this gets resolved in the next article: Correctness by Construction in the AI Age.
This isn't a fresh idea. People have written about the concept for years, mostly inside niche functional programming circles. The pieces that shaped how I think about it are Parse, Don't Validate by Alexis King, Designing with Types: Making Illegal States Unrepresentable by Scott Wlaschin, and the red book, Functional Programming in Scala by Chiusano and Bjarnason. If you want to dig deeper, start there.