Managed Agents vs Local Agent Harnesses: What to Keep
Anthropic and OpenAI are turning agent runtime infrastructure into product surface: hosted sessions, sandboxes, tracing, memory, handoffs, rubrics, and event streams now sit closer to the model provider than to a team’s private script folder.12
What Are the Key Takeaways?
- Managed agents are becoming the runtime layer. Sessions, sandboxes, traces, events, and async execution increasingly belong in managed infrastructure when the provider meets the team’s security bar.12
- Local harnesses still matter. Keep the parts that encode taste, evidence, public-writing integrity, privacy boundaries, source verification, and project memory.
- The migration unit is the job, not the command. A slash command, Codex skill, SDK handoff, MCP server, or managed outcome can all carry the same workflow if the acceptance criteria survive.
- Do not publish private machinery. Public posts should explain the pattern and acceptance criteria, not private prompts, exact hook bodies, account details, or internal scoring rules.
- Promotion needs proof. Start explicit, run one real task, record the result, and promote only when the user-visible path improves.
Managed agent platforms should absorb commodity runtime work: sandbox execution, stateful sessions, event streams, tracing, file execution, and async completion. Local harnesses still matter, but their job gets smaller and sharper. Keep the parts that encode product taste, evidence gates, public-writing integrity, privacy boundaries, source verification, and project-specific operating memory. Move the parts that only exist because nobody else had packaged the runtime yet.
The bad migration is to delete your local harness because a provider shipped managed infrastructure. The second bad migration is to preserve every local command, hook, and prompt because it once solved a real problem. The right migration asks one question per component: does this encode my standards, or does this operate the machine?
For the broader architecture, read the AI Agent Architecture guide. For the live local migration pattern behind this post, read Claude Code to Codex Migration Guide, AGENTS.md Patterns, and Jiro Quality Philosophy.
What Changed With Managed Agents?
Claude Managed Agents gives developers a pre-built agent harness running in managed infrastructure. Anthropic describes it as a fit for long-running tasks and asynchronous work, with core concepts for agents, environments, sessions, and events.1 The same docs describe a managed environment where Claude can read files, run commands, browse, execute code, use MCP servers, and persist event history server-side.1
Anthropic’s engineering write-up makes the architectural point more clearly than the product docs. The Managed Agents team separated the session log, harness, and sandbox so each part can fail or change independently.3 That separation matters because it turns a fragile one-container agent loop into a system with recoverable session state, replaceable execution environments, and a narrower security boundary around credentials.3
OpenAI is moving in the same direction through the Agents SDK. Its April 15, 2026 update added a model-native harness, native sandbox execution, a manifest abstraction for workspaces, and support for common primitives such as MCP, skills, AGENTS.md, shell execution, and patch application.2 The SDK docs also expose sessions for multi-run memory, tracing for LLM generations, tool calls, handoffs, guardrails, and custom events, and handoffs for transferring work between specialist agents.456
That is the news. The strategy question is different: once platforms ship the agent runtime, what should your local harness still do?
What Is the Split Between Runtime and Judgment?
Most local agent harnesses mix two jobs that should not always live together.
The first job is runtime infrastructure. A runtime starts sessions, grants tools, prepares a workspace, executes commands, stores events, handles interruptions, resumes work, streams status, and records traces. That job benefits from standardization. It also benefits from security engineering that most individual teams should not rebuild unless they have a strong reason.
The second job is judgment. Judgment says what good work looks like, which public claims need primary sources, when a guide is too stale to publish, when a hook is too noisy to enforce, when a source scan should become a note instead of a post, and when an agent should refuse a technically correct but unworthy output. That job stays local because it comes from the product, the team, and the reader.
Managed infrastructure can run a better loop. It cannot decide what your taste should be.
What Should Move to Managed Agents?
Move the components that do not encode your product standards.
| Local component | Better home when the platform supports it | Why |
|---|---|---|
| Sandbox setup | Managed environment or SDK sandbox | Providers can maintain isolation, setup, network rules, and provider adapters. |
| Session persistence | Managed session log or SDK session store | Long-running work needs state that survives context windows and worker failures. |
| Event streams and webhooks | Managed events or app-level job queue | The application should observe status without polling private shell state. |
| Tracing | Provider tracing or your tracing processor | Agent debugging needs structured spans for model calls, tools, guardrails, and handoffs. |
| Tool execution glue | Managed tools, MCP, or SDK tool adapters | Tool calling belongs behind stable interfaces, not brittle prompt conventions. |
| Multi-agent fanout | Managed orchestration or SDK handoffs | Delegation needs visibility, input filters, and clear handoff contracts. |
Anthropic’s Outcomes feature shows where this trend goes next. The developer defines a rubric, the managed harness provisions a separate grader, and the agent iterates against the grader’s feedback.7 That does not remove local standards. It gives those standards a runtime slot.
The same pattern applies to OpenAI tracing. The SDK traces the run, agent spans, generations, function tool calls, guardrails, and handoffs by default, with controls for disabling tracing and processors for other destinations.5 A local script can approximate that. A production system should usually prefer the standardized trace and send it where the team already debugs work.
What Should Stay Local?
Keep the components that define your standards, your reader, or your private operating context.
Product taste. A platform can execute a task; it cannot know whether the result improves the whole product. Keep the taste rules that reject busy, generic, or low-dignity output.
Evidence gates. Keep rules that demand current-session evidence, user-path verification, named gaps, and root-cause analysis. Managed traces tell you what happened. Your standard decides whether the evidence is enough.
Public-writing integrity. Keep citation rules, source-tier rules, private-boundary checks, SEO/AIO checks, and publication gates close to the site. A model provider should not decide which private workflow details are safe to publish.
Project memory. Keep concise project doctrine, style decisions, known hazards, release boundaries, and operating logs where the team can inspect them. Move only the storage layer when a managed session store genuinely improves durability.
Source intelligence. Keep the editorial routing layer. A scanner can find 14 good items and still produce zero posts if the right move is monitoring, guide maintenance, or a private note.
Promotion policy. Keep staging rules. A skill can start explicit-only, a hook can run in shadow, and a plugin can sit in install-pilot until real work proves that it helps more than it distracts.
That list is the real harness. The files and commands are only one implementation of it.
What Migration Mistake Should Teams Avoid?
The easiest way to botch this migration is to preserve the shape instead of the job.
Claude Code slash commands, Codex skills, SDK tools, managed outcomes, and MCP servers are not interchangeable syntax for the same thing. They are different activation surfaces. A slash command may become a skill. A skill may become a managed outcome rubric. A hook may become a trace processor. A local script may become unnecessary once the platform exposes sessions or webhooks.
Anthropic’s long-running agents write-up makes the same point from the opposite direction: compaction alone did not produce production-quality work, so the effective pattern added feature lists, progress artifacts, clean handoff state, and end-to-end testing.8 Those are not UI conventions. They are proof obligations.
The migration should not ask, “Where do I put /scan-intel?” It should ask, “What job did the source-intelligence workflow perform?”
For a source scanner, the job is not “run a command.” The job is to scan configured sources, prove source reachability, score candidates, refuse broad low-signal writes, preserve useful notes privately, and route public opportunities to editorial review. The exact activation phrase can change without losing the workflow.
The same rule applies to quality doctrine. Do not publish a private prompt pack. Convert the doctrine into observable completion gates: evidence, user-path verification, private-boundary review, and the right to refuse work that weakens the product.
How Does This Apply to a Source-Intelligence Scanner?
A source-intelligence scanner makes the split concrete.
The runtime side can move. A managed platform can run the scheduled job, store the session, execute browser or feed-fetching tools, emit events, and preserve traces. If a scan times out, the managed session should know what ran, which sources failed, and where the next run should resume.
The judgment side should stay local. The scanner still needs a private source map, score thresholds, duplicate checks, write-volume limits, and an editorial route. A scan that finds 14 candidates should not automatically publish 14 notes or one article. The correct action may be a private note, a guide-maintenance task, a monitoring queue, or a refusal to write anything public.
That distinction turns a noisy automation into a useful workflow:
| Scanner step | Managed layer | Local harness layer |
|---|---|---|
| Fetch sources | Browser, feed, search, or MCP tools | Source map and trust tiers |
| Persist run state | Session log, events, traces | Topic ledger and prior-coverage memory |
| Score candidates | Optional model/tool pass | Editorial thresholds and taste rules |
| Write outputs | File or note tool | Write-volume gate and private-boundary check |
| Route next action | Event, webhook, or handoff | Publish, update guide, monitor, or no-op decision |
The same logic applies to coding, guide maintenance, translation, and public-writing workflows. Move execution mechanics when a platform does them better. Keep the standard that decides whether the output deserves to exist.
What Checklist Should Teams Use Before Moving a Harness?
Use this checklist before moving any local harness component to a managed agent platform.
| Question | If yes | If no |
|---|---|---|
| Does the component only operate runtime infrastructure? | Move it toward managed sessions, sandboxes, tracing, or events. | Keep it local or project-owned. |
| Does the component encode taste, trust, or editorial standards? | Keep the standard local; expose only a safe rubric or acceptance criteria. | Consider retiring it. |
| Does the component touch secrets, account state, or private prompts? | Keep the private details out of public packages and articles. | It may be publishable as a generic pattern. |
| Can the platform express the same gate as a rubric, trace, hook, or processor? | Pilot the platform-native version. | Keep the local version explicit-only. |
| Has real work proven the behavior? | Promote from explicit-only to pilot or enforced. | Keep it staged. |
| Does the component create noise? | Simplify, shadow, or remove it. | Keep measuring it against real outcomes. |
The promotion path should stay boring:
- Inventory the component.
- Name the job it performs.
- Classify it as runtime, judgment, memory, publishing, source intelligence, or safety.
- Port the smallest useful version.
- Run it on one real task.
- Record what happened.
- Promote, revise, or remove it.
Anything more elaborate usually hides uncertainty.
How Should Teams Split a Real Harness Today?
For a serious coding and writing setup, I would make this split.
Provider or managed layer:
- sandbox creation
- file execution
- persistent sessions
- event streams
- webhooks
- traces and spans
- long-running worker recovery
- basic multi-agent delegation
- rubric execution when the provider supports it
Local or project layer:
AGENTS.mdor equivalent project policy- public-writing standards
- citation and source-tier rules
- product-quality doctrine
- private operating memory
- site-specific SEO/AIO checks
- source-intelligence routing
- final publication gates
- release-boundary policy for plugins and shared packages
The dividing line is not “managed versus self-hosted.” The dividing line is “commodity runtime versus product judgment.”
Where Do Managed Agents Still Need Caution?
Managed agent platforms do not remove the hard parts. They move them.
You still need a security model for tools, files, network access, and credentials. Anthropic’s architecture explicitly separates credentials from the sandbox where generated code runs, which is the right direction, but teams still need to configure resources, vaults, and access boundaries correctly.3
You still need observability. A trace can show the call graph; it cannot tell you whether the work deserved to ship. A grader can evaluate a rubric; it cannot know whether the rubric expresses the right taste.
You still need content boundaries. A public migration article can describe the pattern, but it should not dump private prompts, exact hook internals, private file paths, source lists, account details, or proprietary editorial scoring.
You still need staging. Anthropic notes that Managed Agents remains beta, with all endpoints requiring the managed-agents-2026-04-01 beta header, and some features requiring preview access.1 A beta runtime can be useful without becoming the default path for every workflow.
What Should Teams Take Away?
For engineering leaders:
- Move runtime work toward managed sessions, sandboxes, events, and traces when the platform meets your security bar.
- Keep local standards for evidence, source quality, product taste, and release boundaries.
- Treat managed rubrics as execution slots for your standards, not as a replacement for them.
For agent builders:
- Do not port commands one-to-one. Port jobs-to-be-done.
- Start explicit-only, then promote after a real task proves value.
- Prefer traces, session logs, and public artifacts over private prompt archaeology.
For public writers:
- Turn private process into public acceptance criteria.
- Cite official product docs for current behavior.
- Refuse the recap when the better article is the decision framework.
What Is the Quick Summary?
Managed agent platforms make the local harness smaller, not irrelevant. Move runtime work into managed sessions, sandboxes, traces, events, and orchestration when the platform earns that trust. Keep the local standards that define quality, evidence, privacy, public-writing integrity, and what work deserves to ship.
FAQ: Managed Agents and Local Harnesses
Do Managed Agents replace a local AI agent harness?
No. Managed platforms replace more of the runtime layer: sessions, sandboxes, event streams, tracing, and tool execution. Local harnesses still matter when they encode product standards, evidence gates, public-writing rules, privacy boundaries, source intelligence, and project-specific memory.
What should stay in AGENTS.md or CLAUDE.md?
Keep durable project rules there: what the product values, how completion gets verified, which private details cannot be published, how public writing gets checked, and which user-visible paths must work before a task counts as done. Do not stuff transient tool output or private prompt bodies into permanent instruction files.
When should a team use a managed agent platform?
Use managed infrastructure when the work needs long-running execution, secure containers, durable sessions, event streams, async completion, tracing, or managed multi-agent orchestration, and when the provider’s security, cost, and data controls fit the use case.12
What should not move into a public harness package?
Do not publish private prompts, exact hook bodies, sensitive file paths, account identifiers, token handling, private source lists, proprietary scoring rules, or anything that lets a stranger reconstruct your internal operating system. Publish the pattern and the acceptance criteria.
References
-
Anthropic, “Claude Managed Agents overview”. Accessed May 7, 2026. ↩↩↩↩↩↩
-
OpenAI, “The next evolution of the Agents SDK”, April 15, 2026. ↩↩↩↩
-
Anthropic Engineering, “Scaling Managed Agents: Decoupling the brain from the hands”, April 8, 2026. ↩↩↩
-
OpenAI Agents SDK, “Sessions”. Accessed May 7, 2026. ↩
-
OpenAI Agents SDK, “Handoffs”. Accessed May 7, 2026. ↩
-
Anthropic, “Define outcomes”. Accessed May 7, 2026. ↩
-
Anthropic Engineering, “Effective harnesses for long-running agents”, November 26, 2025. ↩