Engineering Philosophy: Leslie Lamport, Think Before You Code

Leslie Lamport, computer scientist and 2013 Turing Award laureate

Key Takeaways

  • Think before you code, and write the thinking down. A specification is the software equivalent of an architect’s blueprint – “not thinking guarantees that we will” make mistakes.
  • Time is not global; causality is what’s real. With no trustworthy shared clock, you stop asking when an event happened and ask what caused what.
  • Happened-before and logical clocks make causality formal. Lamport’s partial order – and the per-process counter that implements it – is the seed of vector clocks and modern conflict resolution.
  • Define correctness precisely, then prove it. Safety and liveness, Paxos consensus, Byzantine fault tolerance, and the TLA+ specification language all come from treating distributed computing as mathematics, not folklore.

The Principle

“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” – Leslie Lamport1

He sent that line to a bulletin board at his lab in 1987, and it is the most quoted sentence in distributed computing because it names the thing intuition refuses to believe.1 You think you are running a program on your computer. You are not. You are running it across a population of machines whose existence you have not enumerated, whose clocks disagree, and any one of which can fail at the exact instant that matters. The system you can hold in your head and the system that actually runs are different systems, and the gap between them is where every distributed bug lives.

Lamport’s life’s work is the refusal to paper over that gap with optimism. His conviction is that concurrent and distributed systems are too subtle to reason about by intuition, so you reason about them with mathematics, on paper, before you write a line of code. “Thinking doesn’t guarantee that we won’t make mistakes,” he told Wired in 2013. “But not thinking guarantees that we will.”2 The thinking has to be written down, because writing is where sloppy thinking gets exposed: “To think, you have to write. If you’re thinking without writing, you only think you’re thinking.”3 Architects draw a blueprint before a brick is laid; the software equivalent is a specification, and Lamport spent the back half of his career arguing that the reason our systems break is that we skip it.4

The deeper move is about time itself. In the physical world there is no global clock you can trust to order events across machines – so you must stop asking when something happened and start asking what caused what. Causality, not the wall clock, is the real structure of a distributed computation.5 Almost everything Lamport built follows from taking that one fact seriously, and it is the same conviction that runs under the evidence gate: you do not get to assume the ordering you wish were true; you have to establish it.

Context

Leslie Lamport was born February 7, 1941, in New York City.6 He took a B.S. in mathematics from MIT in 1960, then an M.A. and – in 1972 – a PhD in mathematics from Brandeis, with a dissertation on singularities in partial differential equations.6 He did not train as a computer scientist. He trained as a mathematician, and he never stopped behaving like one. The defining feature of his career is that he brought the standards of mathematics – precise definitions, stated assumptions, proofs rather than tests – to a field that was mostly programming by feel.

The work happened across four institutions. At Massachusetts Computer Associates in the 1970s he did the foundational thinking on ordering events. At SRI International (1977–1985) he produced the Byzantine generals work. At Digital Equipment Corporation’s Systems Research Center (1985, continuing through the Compaq acquisition until 2001) he wrote Paxos and began the specification work. From 2001 until his retirement in early 2025 he was at Microsoft Research.6 Across all of it the method was constant: find a problem in distributed or concurrent computing that everyone treated as folklore, define it precisely enough to state a theorem, and then prove the theorem.

Leslie Lamport speaking

The Work

Time, Clocks, and the Happened-Before Relation

“Time, Clocks, and the Ordering of Events in a Distributed System” appeared in Communications of the ACM in 1978, and it is among the most cited papers in all of computer science – it won the Dijkstra Prize in 2000 for distributed computing’s most influential work.56 Its insight is deceptively small. In a distributed system there is no shared, trustworthy clock, so you cannot order two events by comparing timestamps. What you can know is whether one event could have caused another: event A happened-before event B if A and B are on the same process with A first, or if A is the sending of a message and B is its receipt. Chain those together and you get the happened-before relation – causality made formal.5

The crucial, counterintuitive consequence is that happened-before is only a partial order, not a total one. Two events on different processes with no chain of messages between them are genuinely concurrent: there is no fact of the matter about which came first, and any system that pretends otherwise is inventing an ordering that does not exist.5 Lamport then gave the algorithm now universally called the Lamport clock: each process keeps a counter, increments it on every event, and stamps every outgoing message with it; a receiver advances its counter to one more than the larger of its own value and the message’s timestamp.7 That simple rule produces timestamps consistent with causality, and it is the seed of vector clocks, version vectors, and the entire conflict-resolution machinery of modern distributed databases.

The reason this paper matters beyond its algorithm is that it changed the question. Before Lamport, “what time did this happen?” was assumed to be answerable in a distributed system. After Lamport, the honest answer was “wrong question – ask what caused it.” That reframing is why he is credited with making distributed computing a science rather than a craft.

Paxos, Consensus, and Replicated State Machines

If there is no global clock and machines fail, how do a group of computers ever agree on anything – which transaction committed, who holds the lock, what the next entry in the log is? That question is the consensus problem, and Lamport’s answer is Paxos, the algorithm at the heart of nearly every fault-tolerant system built since. He paired it with the replicated state machine approach: if every replica starts in the same state and applies the same sequence of commands in the same order, they stay identical – so the whole problem of keeping a distributed system consistent reduces to agreeing on the order of commands, which is exactly what Paxos provides.6

The story of how Paxos was published is the most famous cautionary tale in the field. Lamport wrote it up as “The Part-Time Parliament,” framing the algorithm as the legislative procedure of a fictional ancient Greek island called Paxos, complete with legislators bearing pseudo-Greek transliterations of his colleagues’ names.8 The joke was a catastrophe. Reviewers thought he was not serious; the paper was rejected as insignificant; people who attended his talks remembered the Indiana Jones framing and not the algorithm.8 It sat unpublished for years. Communications of the ACM’s editor eventually published it in ACM Transactions on Computer Systems in 1998 – still wrapped in the parable.8 When the field continued to ignore it, Lamport finally surrendered and wrote “Paxos Made Simple” in 2001, stripping the Greek away to reveal what he insisted was a genuinely simple algorithm underneath.8 The lesson he drew was not “people are dumb.” It was that even a correct, important idea fails if you obscure the thinking – presentation is part of the work, not separate from it.

The Byzantine Generals Problem

Leslie Lamport

In 1982, with Robert Shostak and Marshall Pease, Lamport published “The Byzantine Generals Problem,” which gave the field its language for the hardest kind of failure.6 A crashed component is easy compared to a Byzantine one – a component that does not stop but lies, sending different, contradictory information to different peers, whether through a bug, corruption, or malice. The paper’s framing: divisions of an army, each led by a general, must agree on a single plan – attack or retreat – by messenger, while some unknown subset of generals are traitors actively trying to prevent agreement. Lamport proved the precise threshold of how many traitors a system can tolerate and still reach agreement, turning “what if a node lies?” from a vague worry into a theorem with a stated bound.

That theorem is the intellectual root of every system that has to keep working while some of its participants are adversarial – and it is why, decades later, the word Byzantine sits at the center of blockchain consensus. Lamport did not set out to enable cryptocurrency; he set out to define, exactly, what fault tolerance means when “fault” includes betrayal.

TLA+ and Specification (and, Yes, LaTeX)

The throughline of Lamport’s career is the conviction that you must specify before you code, and TLA+ is that conviction built into a tool. The Temporal Logic of Actions is a language for writing a precise mathematical description of what a system is supposed to do – its states, the steps between them, and the properties it must always satisfy (safety: nothing bad ever happens) and eventually achieve (liveness: something good eventually happens), two concepts he formalized.6 You write the spec, and a model checker explores the state space to find the subtle interleaving that breaks your invariant – the bug you would never have found by testing, because it only appears when three machines do three things in an order you never imagined. His argument is the architect’s blueprint: you draw the building before you pour concrete, and a software specification is the same blueprint, written in math because math is the most precise language we have.4

And there is a quieter monument. In the early 1980s, building on Donald Knuth’s TeX, Lamport wrote LaTeX – the macro system that became the default way scientists and mathematicians typeset documents, the reason a generation of papers and theses look the way they do.6 It is a characteristic Lamport artifact: he needed to write his own work precisely, so he built the tool that let everyone do it. The instinct that produced TLA+ – a notation that forces precision – is the same instinct that produced LaTeX.

The Method

The method is one idea applied relentlessly for fifty years: the thinking is the work, and the thinking must be written down before the code.

Reason about causality, not time. There is no global clock you can trust, so stop ordering events by when they happened and order them by what caused what. The happened-before relation, not the wall clock, is the real structure – and accepting that two events are genuinely concurrent is more honest than inventing an order.5

Specify before you code. For anything without an established solution, stop and think first, and the thinking is independent of the coding. Write the blueprint – the specification – in math, because math is simple and precise where prose is vague.4 “Not thinking guarantees that we will” make mistakes.2

Write to find out whether you’re thinking. Writing is the test of whether a thought is real. “If you’re thinking without writing, you only think you’re thinking.”3 A specification you cannot write down is an idea you do not actually have yet.

Define the failure exactly. “What if a node lies?” is folklore until you state it as Byzantine generals and prove the tolerance bound. Precise definitions turn worries into theorems.6

Presentation is part of correctness. Paxos was right for years before anyone could use it, because the Greek parable buried the idea. A correct result that no one can understand has not finished being engineered.8

Influence Chain

Who Shaped Him

Mathematics itself. Lamport’s training was a PhD in math, not computer science, and the entire shape of his contribution is the importation of mathematical standards – definitions, axioms, proofs – into a field that had been programming by intuition.6 (Formative influence)

The fault-tolerance problems of 1970s real systems. SRI’s work on reliable systems for aircraft and the practical horror of machines that fail in the middle of a protocol gave him concrete problems – Byzantine faults, consensus – that demanded the formal treatment he was built to give.6 (Direct influence)

Edsger Dijkstra. The conviction that correctness is established by proof rather than testing, and that concurrent programs must be reasoned about formally, is Dijkstra’s before it is Lamport’s. The prize Lamport won for his clocks paper bears Dijkstra’s name. (Direct influence)

Who He Shaped

Every distributed database, cloud platform, and blockchain. Logical clocks underpin conflict resolution in distributed stores; Paxos (and its descendant Raft) is the consensus core of Google’s Chubby and Spanner, of ZooKeeper, etcd, and the coordination layer of essentially every cloud. The Byzantine generals result is the theoretical floor under Bitcoin and every fault-tolerant consensus protocol since.

Formal methods in industry. Amazon Web Services famously uses TLA+ to specify and check S3, DynamoDB, and other core services before shipping them – finding subtle bugs in the design, not in production. Lamport’s blueprint argument went from heresy to practice.

The vocabulary of the field. Happened-before, safety and liveness, replicated state machines, sequential consistency, Byzantine fault – these are not Lamport’s contributions to the language of distributed systems; they largely are that language.

The Throughline

Edsger Dijkstra argued that you reason a program correct before you run it – that testing shows the presence of bugs, never their absence – and Lamport is the direct heir, carrying proof-before-execution from sequential programs into the far harder world of concurrency, where the bug you cannot test for is the rule rather than the exception. Barbara Liskov made formal reasoning about program behavior a working primitive with data abstraction and substitutability; Lamport made formal reasoning about system behavior over time the standard, specifying not what an object guarantees but what a whole concurrent system must always and eventually do. And Donald Knuth brought mathematical rigor to the analysis of algorithms and, almost as an aside, built TeX so that work could be set down precisely – a near-exact rhyme with Lamport, who built LaTeX on top of it for the same reason and made specification itself a mathematical act. Three mathematicians who refused to let “it seems to work” count as knowing. (Series bridge)

What I Take From This

The lesson I keep is that the thinking is the work, and the thinking is not real until it is written down. Lamport’s blueprint argument is uncomfortable because it is correct: the reason a system breaks in production is almost never that the code was typed wrong – it is that nobody specified what the system was actually supposed to do under concurrency and failure, so there was no statement to check the code against. The discipline is to write the specification before the implementation, in language precise enough that its contradictions become visible. That is the same standard as quality being the only variable: the question is never “does this pass the happy path?” but “have I stated, exactly, what must always be true – and proven the design honors it?” It is why I lean so hard on writing the PRD before the code; a PRD is a Lamport specification wearing work clothes.

In the world I build in now – agents, tool loops, multi-agent systems – Lamport’s lesson is the one almost everyone skips, and pays for. An agent system is a distributed system: independent processes, no global clock, messages that arrive out of order or not at all, and components that fail silently or, worse, lie. The bugs are never in the prompt; they are in the ordering – two agents acting on stale state, a tool call whose result arrives after the decision that needed it, a “concurrent” pair of events the orchestrator pretended to order. Lamport’s discipline is to stop trusting wall-clock intuition, reason explicitly about causality and failure, and write down what must always be true before wiring anything together. The conviction that you specify before you build, that you reason about a system rather than poke at it – that is the throughline from a 1978 paper on clocks to a 2026 agent harness, and it is exactly why I treat the performance and correctness of a system as a property you design in, never one you hope to debug back in later.

FAQ

What is Leslie Lamport’s engineering philosophy?

Lamport’s conviction is that concurrent and distributed systems are too subtle to reason about by intuition, so you must think with mathematics, on paper, before you write code – the way an architect draws a blueprint before a brick is laid. The software blueprint is a specification, written in math because math is the most precise language available.4 Underneath that is a stance toward time: in a distributed system there is no global clock to trust, so the real structure of a computation is causality – what caused what – not when things happened.5 “Thinking doesn’t guarantee that we won’t make mistakes,” he said, “but not thinking guarantees that we will.”2

What is a Lamport clock and the happened-before relation?

The happened-before relation is Lamport’s formalization of causality in a distributed system: event A happened-before event B if they are on the same process with A first, or if A sends a message and B receives it, or by chaining those.5 It is only a partial order – two events on different processes with no message connecting them are genuinely concurrent, and no correct system can claim one came first.5 A Lamport clock implements this with a counter per process: increment it on every event, stamp every outgoing message with it, and on receipt advance your counter to one past the larger of your own value and the message’s timestamp.7 The result is a set of timestamps consistent with causality – the foundation of vector clocks and modern distributed conflict resolution.

What are Paxos and the Byzantine Generals Problem?

Paxos is Lamport’s algorithm for consensus – getting a group of unreliable, possibly failing machines to agree on a single value, such as the next command in a replicated log – and it underpins the coordination layer of nearly every modern distributed system. Famously, he first published it as a parable about a fictional Greek parliament (“The Part-Time Parliament,” ACM TOCS 1998); the joke so obscured the idea that he eventually rewrote it as “Paxos Made Simple” in 2001.8 The Byzantine Generals Problem (Lamport, Shostak, and Pease, 1982) defines the hardest failure mode – components that do not merely crash but lie, sending contradictory information – as generals trying to agree on a plan while traitors among them sabotage agreement, and proves how many traitors a system can tolerate.6 It is the theoretical root of blockchain consensus.

Why did Leslie Lamport win the Turing Award?

The ACM awarded Lamport the 2013 A.M. Turing Award “for fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency.”6 The citation captures the breadth of his work: he turned distributed computing from folklore into a science with precise definitions and proofs, gave the field much of its vocabulary, created the TLA+ specification language to make “specify before you code” a practical discipline, and – building on Knuth’s TeX – wrote LaTeX, the typesetting system that became standard across science and mathematics.6


Sources


  1. Leslie Lamport, “distributed-system.txt” (his own publications site). The quip “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable” was sent to a bulletin board at DEC’s Systems Research Center (SRC) on May 28, 1987. Collected at Wikiquote, “Leslie Lamport.” 

  2. Leslie Lamport, quoted in Klint Finley, “Why We Should Build Software Like We Build Houses,” Wired, January 25, 2013. “Thinking doesn’t guarantee that we won’t make mistakes. But not thinking guarantees that we will.” Also collected at Wikiquote. 

  3. “To think, you have to write. If you’re thinking without writing, you only think you’re thinking.” Attributed to Leslie Lamport (a line he popularized, adapting cartoonist Dick Guindon’s “Writing is nature’s way of letting you know how sloppy your thinking is”); widely cited in his talks and TLA+ materials. See the collected attribution at igvita quotes and discussion at Goodreads. 

  4. Leslie Lamport, “Thinking Above the Code,” Microsoft Research, 2014 Faculty Summit keynote. The blueprint/specification argument: architects draw detailed blueprints before construction; a software blueprint is a specification; for any task without an established solution “you need to stop and think before you start coding,” and math (sets, functions, simple logic) is the best language for being simple and precise. See also Quanta Magazine, “Computing Expert Says Programmers Need More Math.” 

  5. Leslie Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” Communications of the ACM 21(7), July 1978. Introduces the happened-before relation (a partial order capturing causality), the notion of concurrent events, and logical clocks. Winner of the 2000 Dijkstra Prize; among the most cited papers in computer science. 

  6. “Leslie Lamport,” Wikipedia, and the ACM A.M. Turing Award laureate page. Born February 7, 1941, New York City; B.S. mathematics MIT (1960); M.A. and PhD mathematics Brandeis (1972); Massachusetts Computer Associates, SRI International, DEC/Compaq Systems Research Center, Microsoft Research. 2013 Turing citation: “for fundamental contributions to the theory and practice of distributed and concurrent systems, notably the invention of concepts such as causality and logical clocks, safety and liveness, replicated state machines, and sequential consistency.” Covers Byzantine Generals (with Shostak and Pease, 1982), Paxos / “The Part-Time Parliament” (TOCS 1998), TLA+, and LaTeX (built on Knuth’s TeX, early 1980s). See also Britannica, “Leslie Lamport.” 

  7. “Lamport timestamp,” Wikipedia. The logical-clock algorithm from the 1978 paper: a per-process counter incremented before each event; outgoing messages carry the counter; on receipt the counter is set to the maximum of its current value and the received timestamp, then incremented. Produces timestamps consistent with the happened-before relation; the basis for vector clocks. 

  8. Leslie Lamport, “The Part-Time Parliament,” ACM Transactions on Computer Systems 16(2), 1998, and “Paxos Made Simple” (2001). The Greek-parliament parable obscured the algorithm; reviewers thought it a joke and it was initially rejected, prompting the plain-English rewrite. Story summarized at Microsoft Research, “The Part-Time Parliament” and “The Strange Story of the Paxos Algorithm,” Towards Data Science. 

相关文章

Engineering Philosophy: Barbara Liskov, The Contract Is the Type

Barbara Liskov made data abstraction a programming primitive: a type is the contract it keeps, and a subtype must honor …

20 分钟阅读

Engineering Philosophy: Grace Hopper, Make the Computer Speak Human

Grace Hopper built the first compiler so programs could be written in human language, made latency physical with a nanos…

20 分钟阅读

Swift 6.2 Concurrency in Practice: Default to MainActor, Escape on Purpose

Swift 6.2 inverts concurrency: default your module to MainActor (SE-0466), run nonisolated async on the caller (SE-0461)…

10 分钟阅读