Long-Running AI Agents Need Durable Channels
OpenAI’s background mode docs now describe a normal agent problem: reasoning tasks can take minutes, developers can poll a response by ID, cancel the response, and resume a stream from a recorded sequence number.1
What Are the Key Takeaways?
- A long-running agent run needs an address. The client must be able to reconnect to the same work, stream from a known cursor, send a steering command, cancel the run, and inspect evidence.
- Polling alone gives a thin contract. Polling can report status, but serious agent work also needs commands, event history, resumable streams, artifacts, authorization, and checkpoints.
- Durable execution solves only part of the system. Temporal-style workflows preserve execution state and event history, but the user still needs a durable communication surface around the running work.23
- WebSockets help, but a socket is not the whole address. A dropped connection should not erase the user’s route back to the agent run.
- The product surface matters. The user should see one coherent run with evidence, decisions, and next actions, not scattered logs and optimistic status text.
Long-running AI agents do not fit the old request-response shape. A normal request has an endpoint, a response, and a timeout. A serious agent run has a duration, an event history, intermediate artifacts, user interruptions, model/tool state, cancellation rules, and a human who may leave and return.
The missing object is not another chat message. The missing object is a durable channel: a stable address for a running piece of work.
I already argued that managed agents are absorbing runtime infrastructure, and that agent execution traces are becoming the runtime contract. Durable channels sit between those two ideas. A trace proves what happened. A managed runtime keeps work alive. A durable channel lets the product and the user talk to the work while it lives.
What Breaks in the Old Request Model?
The old web model assumes compute finishes inside a request or moves into a background job. The database stores the durable state. The application server stays stateless. A client can refresh the page, hit another server, and read the same database row.
Agent work strains that model in three ways. It can run for minutes or hours. It carries process state that does not reduce cleanly to one database record. It needs bidirectional control: watch, interrupt, approve, redirect, cancel, and resume.
Zak Knill named the same pressure as a routing problem. His May 2026 post argues that long-running, stateful, interactive agent work needs a routable primitive that can address the process doing the work, not only the database that stores its outputs.4 The useful part of that frame: the client wants to say, “deliver command Y to run X,” even if the original socket, worker, tab, or process disappeared.
Background jobs can still serve simple tasks. An image resize, invoice export, or nightly sync may only need queued, running, succeeded, or failed. Agent work crosses a line when the user needs to steer the work before it finishes.
Why Does Polling Fall Short?
Polling gives the client a way to ask, “are you done yet?” It does not give the client a complete interaction contract.
OpenAI’s background mode includes polling because polling solves the timeout problem. The docs tell developers to retrieve a background response while the status remains queued or in_progress, then stop when it reaches a terminal state.1 The same page also exposes cancellation and stream resumption with a sequence_number cursor, which points beyond basic polling toward a richer run contract.1
A product that stops at polling usually spreads agent state across too many places:
| Need | Thin polling answer | Durable-channel answer |
|---|---|---|
| See progress | status = in_progress |
Append-only events with timestamps and types |
| Reconnect after a dropped tab | Poll latest row | Resume stream after cursor N |
| Redirect the work | Write a note somewhere | Send a typed signal to run X |
| Cancel safely | Flip a boolean | Idempotent cancel command with terminal event |
| Review evidence | Read final text | Inspect event history, artifacts, and checkpoints |
| Authorize control | Trust the page session | Check permissions per run and command |
Polling can remain one access path. The mistake is treating polling as the product contract.
What Should a Durable Channel Contain?
A durable channel is a named communication contract around a run. The implementation can use a workflow engine, queue, event table, WebSocket, SSE stream, pub/sub topic, managed agent session, or some mix of those pieces. The product contract matters more than the transport.
The minimum contract has nine parts:
| Field or endpoint | Purpose |
|---|---|
run_id or workflow_id |
Stable address for the work. |
GET /runs/{id} |
Current state, owner, timestamps, terminal status, and summary. |
GET /runs/{id}/events?after=N |
Ordered event history for reconnects and audits. |
GET /runs/{id}/stream?after=N |
Resumable live updates from a known cursor. |
POST /runs/{id}/signals |
Typed steering commands such as approve, revise, pause, or add context. |
POST /runs/{id}/cancel |
Idempotent cancellation with a recorded terminal event. |
GET /runs/{id}/artifacts |
Diffs, files, reports, screenshots, traces, and other proof. |
checkpoint events |
Human-readable state for handoff and resume. |
| authorization checks | Per-run read, stream, signal, artifact, and cancellation rights. |
Every event needs a type, sequence number, timestamp, actor, payload reference, and redaction policy. Without that structure, the event log becomes another chat transcript.
The channel also needs taste. Do not stream every token when the user needs decisions. Do not hide tool failures behind a friendly spinner. Do not turn a running agent into a notification storm. A good channel shows the few events that help the user trust, steer, or stop the work.
How Do Existing Systems Point at the Pattern?
Temporal gives the execution side a mature vocabulary. A workflow execution has event history, replay, deterministic workflow code, and activities for outside-world work such as API calls, database queries, LLM calls, and file I/O.2 Temporal’s TypeScript message-passing docs describe workflows as stateful services that receive queries, signals, and updates. Clients can retrieve a workflow handle by workflow ID, query state, send signals, and execute updates.3
That model maps neatly to agent work. Queries answer “what state does the run report?” Signals answer “please change course.” Updates answer “perform a tracked change and return a result.” Event history answers “what happened?” A team does not need Temporal to learn from the shape, but the shape gives agent products a better vocabulary than “background job plus chat.”
Cloudflare Durable Objects point at a different piece: addressable coordination. Cloudflare describes each Durable Object as a globally unique instance with storage, useful for stateful coordination across multiple clients.5 Its WebSocket docs describe long-lived bidirectional connections and hibernation that keeps clients connected while the object sleeps, then wakes the object when a message arrives.6 That does not make Durable Objects a universal agent runtime. It does show why an addressable coordination object feels natural for live agent surfaces.
Anthropic’s long-running agent write-up adds the human-work side. The post says agents still struggle across many context windows and describes a pattern where later sessions make incremental progress while leaving clear artifacts for the next session.7 Durable channels should carry those artifacts into the product surface, not bury them in private logs.
What Would I Build First?
I would start with a small run service, not a grand orchestration platform.
Create a runs table with ownership, status, timestamps, and current summary. Create a run_events table or stream with monotonically increasing sequence numbers. Store large payloads and artifacts separately, then reference them from events. Add one resumable stream endpoint and one typed signal endpoint. Make cancellation idempotent. Put every state transition into the event log.
Then constrain the event vocabulary:
| Event type | Meaning |
|---|---|
run.started |
The system accepted the work and assigned a stable ID. |
agent.plan.updated |
The agent changed the current plan or checkpoint. |
tool.started |
A tool or command began with redacted arguments. |
tool.finished |
A tool or command ended with status, duration, and proof reference. |
artifact.created |
A diff, file, screenshot, report, or trace became available. |
human.signal.received |
A user sent a typed steering command. |
run.blocked |
The run needs permission, input, or external state. |
run.cancelled |
The system accepted cancellation and stopped work. |
run.completed |
The work reached a terminal success state with evidence. |
run.failed |
The work reached a terminal failure state with evidence. |
The UI can now show one coherent run. The user can leave, return, review events, inspect artifacts, and steer from the same address. The agent can stop claiming success in prose and start attaching evidence to state transitions.
What Should Teams Avoid?
Avoid three shortcuts.
First, avoid a pure chat transcript. Chat can initiate work and collect clarifications. It should not serve as the only runtime object for a long-running task.
Second, avoid raw token streaming as the main progress surface. Token streams help a developer debug latency, but most users need milestones, blockers, artifacts, and decisions. A durable channel can still expose raw events for expert inspection.
Third, avoid private-process leakage. A public product surface should show evidence, not private prompts, hook bodies, local file paths, or internal scoring machinery. The user needs enough to trust the work. They do not need every internal trick that made the work possible.
That privacy line also applies to public writing about agent systems. Share the contract. Keep the private machinery private.
How Does a Durable Channel Change Evaluation?
A durable channel makes evaluation less theatrical.
Instead of asking whether the final answer sounds plausible, the evaluator can inspect the run:
- Did the run start with the right owner, permissions, and scope?
- Did the agent emit a plan before acting?
- Did every claimed artifact come from a recorded event?
- Did failures produce useful checkpoints?
- Did user signals change the run in the expected way?
- Did cancellation end with one terminal state?
- Did the final report cite evidence from the event log?
That list turns the Evidence Gate into something the runtime can support directly. It also connects to the cleanup layer: many agent products will win by making messy runs understandable, resumable, and reviewable.
Quick Summary
Long-running AI agents need durable channels because the user needs a stable route back to the work. Polling can report status, but it cannot carry the whole contract by itself. A good agent run needs a workflow ID, ordered events, resumable streams, typed signals, idempotent cancellation, artifact references, permissions, and human-readable checkpoints. Durable execution keeps the work alive; durable channels let the user and product interact with it.
FAQ: Long-Running AI Agents and Durable Channels
Do long-running AI agents require Temporal?
No. Temporal gives teams a strong workflow vocabulary and mature execution model, but the durable-channel contract can run on simpler infrastructure. Start with stable run IDs, ordered events, resumable streams, typed commands, and artifacts. Move to a workflow engine when retries, replay, timers, and operational scale justify it.
Are WebSockets enough for agent progress?
No. WebSockets give a live bidirectional connection. The product still needs a durable address, event history, reconnection cursor, permissions, and terminal states. A socket can carry a channel, but a socket should not define the whole channel.
Is polling always bad?
No. Polling works for simple status checks and can remain a fallback path. Problems start when polling becomes the only way to observe, steer, or recover a long-running agent run.
What should a small team build first?
Build a runs resource and an append-only run_events log. Add a resumable stream once the event log has sequence numbers. Add typed signals only for commands the product can honor safely: approve, pause, revise, add context, and cancel.
What belongs in agent run events?
Record state transitions, plans, tool starts and finishes, artifact creation, human signals, blockers, cancellations, failures, and completions. Keep sensitive payloads out of inline event text. Store private details behind redacted references and access checks.
References
-
OpenAI, “Background mode,” OpenAI API documentation, accessed May 18, 2026. Source for asynchronous background Responses, polling by response ID, terminal statuses, cancellation,
sequence_numbercursors, and stream resumption withstarting_after. ↩↩↩ -
Temporal, “Temporal Workflow,” Temporal documentation, accessed May 18, 2026. Source for Workflow Executions, event history, replay, deterministic workflow code, and activities for API calls, database queries, LLM invocations, and file I/O. ↩↩
-
Temporal, “Workflow message passing - TypeScript SDK,” Temporal documentation, accessed May 18, 2026. Source for workflows acting as stateful services, queries, signals, updates, workflow handles, and workflow IDs. ↩↩
-
Zak Knill, “LLMs are breaking 20 year old system design,” /dev/knill, May 13, 2026. Source for the routing-primitive frame, polling critique, WebSocket-as-connection distinction, and durable-channel argument. ↩
-
Cloudflare, “Durable Objects,” Cloudflare Developers documentation, accessed May 18, 2026. Source for Durable Objects as globally unique, stateful coordination objects with storage. ↩
-
Cloudflare, “Use WebSockets,” Cloudflare Developers documentation, accessed May 18, 2026. Source for Durable Objects as WebSocket endpoints, long-lived bidirectional connections, and WebSocket Hibernation behavior. ↩
-
Anthropic, “Effective harnesses for long-running agents,” Anthropic Engineering, November 26, 2025. Source for long-running agents spanning many context windows, incremental progress across sessions, and clear artifacts for subsequent sessions. ↩