Loop Engineering: Loops Win Where Verification Is Cheap

Blake Crosley June 09, 2026 19 min de leitura

ai agents claude-code loops automation ai-engineering agent-harnesses

Do guia: Claude Code Comprehensive Guide

Boris Cherny, the engineer who created Claude Code, keeps five to 10 sessions open with a few hundred agents running during the day and a few thousand running every night.² When a clip of him saying “I don’t prompt Claude anymore… my job is to write loops” spread across X this past week, most commentary treated the line as a prophecy about autonomous software development, and within days Addy Osmani had named a discipline after it: loop engineering.⁷ I pulled the full transcripts of the three talks behind the clip. The transcripts tell a quieter story, and the quiet version is the one worth building on: every loop Cherny actually names has a success condition a machine can check for free. Verification cost, not loop construction, decides what you can automate. I have been running loops in production since February, and my logs agree with the transcripts, including two incidents where they taught me the lesson the hard way.

TL;DR / Key Takeaways

The viral quote comes from Cherny’s Acquired Unplugged interview, where he frames loops as the next step on a continuum from punch cards to assembly to high-level languages to prompting.¹ He gives the transition a short shelf life: “the next few months and maybe through the rest of the year.”¹
The mechanics are mundane by design. In his Sequoia talk, Cherny describes /loop as Claude using cron to schedule a repeating job.² His named loops babysit pull requests, keep CI healthy, and cluster Twitter feedback every 30 minutes.²
Every one of those loops is janitorial. Each has a machine-checkable end state: CI green, PR rebased, feedback clustered. None of his named examples builds features unattended.
The loop-writing job is already dissolving into the model. Cherny reports that newer models start loops on their own initiative, and he calls user-side loop construction “a product design problem” that means “I’m not doing a good job.”⁵
The durable skill underneath the meme: deciding what is safe to automate unattended. That decision is a judgment about verifiability, and it stays with you after the loop syntax disappears.

What He Actually Said

The clip everyone shared came from a live Acquired interview. Cherny sets the line up with family history: his grandfather programmed with punch cards in the Soviet Union, and his father wrote assembly and “would have made fun of me for writing Python.” Then the part that traveled:

“This is sort of the nature of programming: the level of abstraction always goes up… The way that I coded a year ago was I wrote code with some kind of autocomplete in an IDE. In November, I uninstalled my IDE, because I wasn’t using it… At that point, I was running maybe five, ten Claudes in parallel, and my coding was prompting Claude to write code. Now it’s leveled up again… where I don’t prompt Claude anymore. I have loops that are running. They’re the ones that are prompting Claude and kind of figuring out what to do. My job is to write loops.”¹

Watch: Boris Cherny on Acquired Unplugged, "My job is to write loops" (11:14)

The full Acquired Unplugged interview; the loops passage starts at 11:14.

Two details from the full transcript never made the clips. First, Cherny dates the transition himself: loops are what “we’re going to see in the next few months and maybe through the rest of the year.”¹ He describes a phase, not a destination. Second, a sourcing note for anyone tracing the discourse: several viral posts attributed the loops material to his hour-long Y Combinator podcast appearance. I checked the full transcript of that episode. It contains zero mentions of loops; the episode covers the product’s origin story and sub-agents.⁶ The substance lives in the Acquired interview and a 24-minute Sequoia talk.

A Loop Is a Cron Job

The Sequoia talk supplies the mechanics, and they are deliberately boring:

“All it is is you have Claude use cron to schedule a job for some point in the future, and it’s a repeat job. And it can run every minute, every five minutes, every day.”²

Watch: Boris Cherny's Sequoia talk, /loop is Claude using cron (7:56)

The Sequoia talk supplies every mechanical detail the viral clips left out; the loop section starts at 7:56.

Cherny runs dozens of these. The ones he names: a loop babysitting his PRs (fixing CI, auto-rebasing), a loop keeping CI healthy (it repairs flaky tests), and a loop that pulls feedback from Twitter and clusters it every 30 minutes.² Routines, announced at Anthropic’s Code with Claude event in May, moves the same pattern server-side so the loop survives a closed laptop.² Simon Willison, live-blogging the keynote, recorded the framing Anthropic itself uses: Routines are “higher-order prompts.”¹² Cherny had been telling people for months that /loop and /schedule were two of the most powerful features in the product; his canonical starter is a five-minute loop that babysits PRs.¹¹

The pattern has a cruder ancestor. Geoffrey Huntley’s Ralph Wiggum technique is, in his own words, “Ralph is a Bash loop”: a while true that feeds an agent the same prompt file forever, with progress persisting in files and git history between iterations.⁹ Anthropic now ships Ralph as an official plugin that intercepts the agent’s exit attempts and re-feeds the prompt until a completion string appears or an iteration cap hits.⁹ The Register confirmed Cherny uses Ralph himself, and quoted Huntley’s warning about where the economics lead: startups will use the technique to clone existing SaaS businesses and undercut them, because agentic coding runs at roughly $10 an hour.¹⁰ The lineage matters because it shows the idea’s actual age: the loop is the oldest control structure in computing, and nothing about the construct itself is novel.¹⁵

Every Loop He Names Is Janitorial

Here is the observation the discourse skipped. List Cherny’s loops again and look at their end states:

Loop	Success condition	Who checks it
Babysit PRs	CI green, branch rebased	CI, git
Keep CI healthy	Test suite passes	The test suite
Cluster Twitter feedback	Report delivered on schedule	Nobody needs to; it informs, it doesn’t act

Each one either has a machine-checkable success condition or produces output where errors cost nothing. None of his named examples is “build the feature while I sleep.” The man who runs a few thousand agents every night describes his personal automation as CI repair, rebasing, and feedback triage.

The apparent counterexample proves the rule. Anthropic’s own engineering team ran 16 agents in infinite loops for two weeks and produced a 100,000-line C compiler in Rust, at a cost of roughly $20,000 in compute.⁸ A compiler is the single most verifiable artifact in software: a giant test suite of programs either compiles and runs correctly or does not. The team picked a target where verification is nearly free, and even then, Nicholas Carlini, who ran the experiment, wrote that programmers deploying software they have never personally verified is “a real concern.”⁸

Addy Osmani’s essay naming the discipline “loop engineering” lands on the same constraint from the design side: “A loop running unattended is also a loop making mistakes unattended.”⁷ His proposed architecture (separate verifier agents, so the maker never grades its own work) is an attempt to manufacture cheap verification where it does not occur naturally.⁷

What My Own Loops Taught Me

I wrote about my overnight agent system in February, back when the polite term for the technique was still a Simpsons reference.¹⁶ Since then the loops that survived in my setup all converged on the same shape, and the ones that failed taught me more than the ones that worked.

The survivors are check-shaped. A nightly loop reads the day’s commits, maps the touched files to live URLs, loads every affected page, and reports pass or fail with load times. A security loop watches endpoints overnight and writes a morning briefing. A crawl loop reads Googlebot and Bingbot activity across my properties and reports indexation drift. None of these creates anything. Each one observes, compares against an expected state, and reports. When one misfires, the cost is a stale report, not a broken product. The morning read takes minutes because the output is binary per item: pass, fail, look here.

The failures were loops that acted. An isolation feature that automatically created git worktrees for parallel agents twice deleted scratch directories it decided were disposable; the automation’s blast radius included files it did not create and did not understand. A scheduled cache purge once ran in the wrong order relative to a deploy, and search crawlers spent 11 hours receiving 404s for pages that existed, because the purge evicted correct cached responses before the origin served their replacements.¹⁶ Neither failure came from a bad model. Both came from me granting write access to a loop whose preconditions I had not pinned down. Each one now has a guard: the worktree automation is blocked outright, and purges only run after deploy verification passes. The general rule I extracted: a loop that only reads needs a schedule; a loop that writes needs an ordering proof and a blast-radius limit before it earns the schedule.

That rule also explains the part of Cherny’s setup people find hardest to believe. “Now actually most of my work I do from my phone,” he says, running sessions through the code tab in the Claude app.² A phone is a terrible place to review code and a fine place to read a pass/fail report. The phone claim is a verification claim in disguise: his loops emit output legible enough to accept or reject at a glance, which is only possible when the success conditions were designed before the loop started running.

The Verification-Cost Ladder

If the thesis holds, choosing what to automate reduces to one question: who verifies the result, and what does that verification cost? Here is the ladder I use before any task gets a schedule.

Task	The verifier	Cost to verify	Run unattended?
Monitoring and report generation	None needed; output informs, nothing acts on it	Free	Yes, tonight
Rebase a passing PR	CI re-runs the suite	Free	Yes, tonight
Repair a flaky test	The test suite itself	Free	Yes, tonight
Dependency bumps	CI plus a changelog read	Cheap	Yes, with a checker agent
Bug fix with a reproduction	The repro test, written first	Cheap	Yes, with a maker-checker split
New feature	A human reading the diff	Expensive	No; the loop queues work for review
Architecture change	Humans, over months	Prohibitive	Never

The column that decides is the second one, and task difficulty never appears in it. A hard task with a free verifier (the flaky test) automates before an easy task with an expensive one (a one-line copy change a human must approve). Cherny’s loops, Anthropic’s compiler, and my survivors all sit in the top half of the ladder; the viral commentary assumed the bottom half.

The shape that survives the ladder looks like one supervisor pattern, regardless of scale:

Anatomy of a production agent loop: a schedule triggers the maker agent, a separate verifier checks the work, failures route back to the maker, and passing results reach a human as a glanceable report

The anatomy of a loop that earns its schedule. The verifier is drawn heavier because it is the load-bearing box: remove it and the loop still runs, but nobody knows what it did.

The smallest loop worth running is read-only, self-verifying, and glanceable, which makes it safe to write today and boring to watch tomorrow:

# Nightly site check: observes and reports, never edits.
# Run it under a permission mode that blocks writes outside reports/.
while true; do
  claude -p "Read today's commits. For each changed file that maps to a live
    page, fetch the staging URL and confirm it returns 200 and renders its
    headline. Append PASS or FAIL per page, with the reason, to
    reports/site-check-$(date +%F).md. Write nothing outside reports/."
  sleep 86400
done

Inside Claude Code, the same loop is one line: /loop 24h followed by the instruction. Promotion comes later. After the report has been boring for a month, the loop has earned consideration for the next rung of the ladder, and not before.

The Job That Is Left

The strangest passage in the Sequoia talk undercuts the meme it spawned. Cherny says newer models began starting loops without being asked: he requests a data query, and the model notices the data changes over time, proposes a recurring report every 30 minutes, and wires the report into Slack on its own.⁵ His conclusion: “It’s not on users to figure out how to hold the tools better… it’s actually a product design problem and I’m not doing a good job.”⁵ The regress argument skeptics raised on X (if humans write loops today, models write the loops tomorrow) turns out to be Anthropic’s roadmap, conceded by the person the meme is about.

He goes further: “As the model’s gotten better, the harness kind of gets less important,” predicting that permission modes, prompt-injection defenses, and human-in-the-loop checkpoints fade as alignment improves.³ I will take the other side of half that trade. The safety scaffolding may shrink. The orchestration scaffolding is becoming the whole job, by his own account: Anthropic engineers’ agents coordinate with each other over Slack while their owners work, and “we have no more manually written code anywhere at the company. All of the SQL is written by models.”⁴ Somebody decides what those agents may touch, what counts as done, and what happens when two of them disagree. That deciding layer is the real agent interface, and cron is the easy part of it.

So the durable skill is not loop syntax, which the model is already absorbing, and not prompting, which the loops absorbed first. The durable skill is the judgment call sitting under both: deciding what is safe to automate unattended. That call is always a question about verification cost. CI repair automated first because the test suite was already the verifier. Feedback clustering automated because errors are free. Feature development resists because verification still costs a human reading the diff, and a senior engineer on Hacker News put the resulting trap plainly: the tool requires skilled judgment to steer, and using the tool erodes exactly that judgment.¹⁴

Casey Newton’s Platformer interview with Cherny ran under the headline “Claude Code’s creator on the end of the software engineer.” In it, Cherny predicts the “software engineer” title dissolves into something like “builder” by the end of the year while the number of people writing code through agents grows a hundredfold.¹³ Builders, in that forecast, are the people choosing the loops. Choosing well means knowing, before anything runs, how you will know it worked.

Key Takeaways

For engineers running coding agents: - Audit your candidate automations by verification cost, not by excitement. CI repair, rebasing, monitoring, and report generation have free verifiers today; automate those first. - Apply the read/write split: a read-only loop needs a schedule, while a loop with write access needs an explicit ordering constraint and a blast-radius limit before it runs unattended. - Design the report before the loop. If you cannot reject the loop’s output from your phone in 10 seconds, the loop is not ready to run at night.

For team leads: - Your review bandwidth, not your agent count, is the ceiling on useful parallelism. Adding agents past that ceiling produces unreviewed merges, not throughput. - Separate makers from checkers. An agent verifying its own work claims correctness; a separate verifier with a different vantage point has at least a chance of catching the claim.

For tool builders: - Cherny calls user-side loop construction a product design failure, and the models are already initiating loops themselves.⁵ Building loop-authoring UX means building for a layer the model vendor intends to absorb. The longer-lived surface is verification: evidence, traces, and accept/reject ergonomics.

FAQ

What is loop engineering?

Loop engineering is the practice of writing small scheduled programs that prompt coding agents, check the results, and decide whether to run again, instead of prompting the agent by hand. Addy Osmani named the discipline in June 2026 after Boris Cherny's "my job is to write loops" interview. The hard part is not the loop: it is choosing tasks whose results a machine can verify.

Did Boris Cherny really say engineers should stop prompting?

He said he personally no longer prompts because loops prompt Claude on his behalf, and he framed the shift as a transition arriving over months, not a permanent state. Every loop he names automates maintenance with a machine-checkable outcome (CI repair, rebasing, feedback clustering), not open-ended feature work.

What is the difference between a loop and an agent?

An agent is the worker: a model with tools attempting a task. A loop is the supervisor: a small scheduled program that starts the agent, checks the result against a condition, and either stops or goes again. Cherny's version uses cron as the scheduler and Claude as the worker.

Where should someone start with agent loops?

Start with a loop that cannot break anything: a scheduled check that compares the live state of something you own against what you expect and reports the difference. Promote a loop to write access only after its failure modes have names, it has an ordering constraint, and its blast radius has a limit.

References

Acquired, “Boris Cherny: Claude Code & the Future of Engineering | Acquired Unplugged presented by WorkOS,” YouTube. Source for the “my job is to write loops” quote (≈11:14), the November IDE uninstall, the punch-cards-to-assembly continuum framing, and the “next few months and maybe through the rest of the year” timeline. Quotes verified against the author’s Whisper (large-v3-turbo) transcription of the source audio. ↩↩↩↩
Sequoia Capital, “Anthropic’s Boris Cherny: Why Coding Is Solved, and What Comes Next,” YouTube. Source for the phone-first workflow and Claude app code tab (≈7:20), session and agent counts (≈7:34), /loop as a cron-scheduled repeat job (≈7:56), the named PR-babysitting, CI-health, and Twitter-clustering loops (≈8:16), and Routines as the server-side version (≈8:42). Quotes verified against the author’s Whisper (large-v3-turbo) transcription of the source audio. ↩↩↩↩↩↩↩
Sequoia Capital, “Anthropic’s Boris Cherny: Why Coding Is Solved, and What Comes Next,” YouTube, ≈14:14. Source for “as the model’s gotten better, the harness kind of gets less important” and the prediction that permission modes and human-in-the-loop mechanisms fade with alignment. Quote verified against the author’s Whisper transcription of the source audio. ↩
Sequoia Capital, “Anthropic’s Boris Cherny: Why Coding Is Solved, and What Comes Next,” YouTube, ≈18:17. Source for agents coordinating over Slack and “we have no more manually written code anywhere at the company. All of the SQL is written by models.” Quote verified word-for-word against the author’s Whisper transcription of the source audio. ↩
Sequoia Capital, “Anthropic’s Boris Cherny: Why Coding Is Solved, and What Comes Next,” YouTube, ≈19:59. Source for the model initiating a recurring data report on its own, wiring it to Slack over MCP, and “it’s actually a product design problem and I’m not doing a good job.” Quotes verified against the author’s Whisper transcription of the source audio. ↩↩↩↩
Y Combinator, “Inside Claude Code With Its Creator Boris Cherny,” YouTube. Author reviewed the full auto-generated transcript on June 9, 2026; the episode contains no mentions of loops. Cited as a correction to viral posts attributing the loops material to that appearance. ↩
Addy Osmani, “Loop Engineering,” addyosmani.com, June 8, 2026. Source for the quoted sentence “A loop running unattended is also a loop making mistakes unattended” (verified against the published text) and for the separate-verifier architecture. ↩↩↩
Nicholas Carlini, “Building a C compiler with a team of parallel Claudes,” Anthropic Engineering, February 2026. Source for the 16-agent infinite-loop setup, the ~$20,000 cost, the 100,000-line Rust compiler result, and Carlini’s concern about deploying unverified software. ↩↩
Anthropic, “Ralph Wiggum Plugin README,” anthropics/claude-code, GitHub. Source for Huntley’s “Ralph is a Bash loop” description, the Stop-hook mechanism, and the completion-promise and max-iterations termination options. Verified against the README text. ↩↩
The Register, “‘Ralph Wiggum’ loop prompts Claude to vibe-clone software,” January 27, 2026. Source for “The creator of Claude Code, Boris Cherny, has said he uses Ralph” and Huntley’s expectation that startups will clone SaaS businesses at agentic-coding costs of roughly $10 an hour. Claims verified against the published text. ↩
Boris Cherny (@bcherny), “Two of the most powerful features in Claude Code: /loop and /schedule,” X, March 30, 2026. Source for the five-minute PR-babysitting starter loop (/loop 5m /babysit). Post text and date verified against the live thread. ↩
Simon Willison, “Code w/ Claude 2026,” simonwillison.net, May 6, 2026. Source for Routines described as “higher-order prompts” in the keynote. ↩
Casey Newton, “Claude Code’s creator on the end of the software engineer,” Platformer, May 2026. Source for the “builder” title prediction, the “100 times more engineers” forecast, the full “coding is solved for the kinds of coding that I do” quote, and “every night I have hundreds, sometimes thousands of agents running 5, 10, 20 hours.” Quotes verified against the published text. ↩
Hacker News, “Ask HN: How are you preserving your skills while using AI?” June 9, 2026. Source for the skill-erosion feedback trap raised by a senior engineer and the ensuing discussion. ↩
LinearB, “Inventing the Ralph Wiggum Loop, with Geoffrey Huntley,” Dev Interrupted podcast. Source for the origin and intent of the Ralph technique. ↩
Author’s production logs and incident notes, February–June 2026, summarized without infrastructure specifics. The February system is documented in The Ralph Loop: How I Run Autonomous AI Agents Overnight; the worktree-deletion and purge-ordering incidents come from the author’s session transcripts and Cloudflare crawl logs. ↩↩