Codex Hooks Make the Harness Real

May 17, 2026 11 min read

codex codex-hooks ai-agents agent-harnesses ai-engineering software-development

From the guide: Codex CLI Comprehensive Guide

OpenAI put Codex into the ChatGPT mobile app on May 14, 2026. The sharper line came lower in the announcement: Remote SSH and hooks reached general availability, while programmatic access tokens arrived for Business and Enterprise plans.¹

That changes the job. Codex no longer looks like a coding assistant that waits inside one terminal. It looks like an operating layer that follows work across machines, approvals, threads, diffs, tests, screenshots, plugins, credentials, and local tools.²

Codex hooks make the harness real. Once the agent can work from a phone, reach remote development environments, and run lifecycle hooks, teams need a control system around the model: evidence, approvals, git custody, source discipline, and taste.

TL;DR

Codex now supports the workflow shape that agent teams have been building privately: long-running work, remote execution, mobile steering, approvals, hooks, scoped credentials, and audit signals.¹²³ The prompt still matters, but the operating layer matters more.

The practical question is not “how do we prompt Codex?” The practical question is “what must Codex prove before we trust the result?” Teams should use hooks and configuration to encode review gates, security boundaries, public-writing standards, and release discipline. They should keep private machinery private and publish only the pattern, the acceptance criteria, and the verified outcome.

Key Takeaways

For engineering teams: - Treat Codex hooks as process infrastructure, not decoration. - Start with evidence, approvals, git custody, and release checks before adding clever automation.

For agent-tool builders: - Build around Codex’s real surfaces: mobile control, Remote SSH, sandbox modes, approval policies, project instructions, hooks, telemetry, and version control. - Port jobs-to-be-done, not old slash-command shapes.

For public writers: - Use official OpenAI docs for current Codex behavior. - Describe private practice as author analysis, and leave private prompts, hook bodies, file paths, source lists, credentials, and scoring internals out of public copy.

What Changed On May 14?

OpenAI’s May 14 announcement moved Codex closer to a persistent work surface. Codex in the ChatGPT mobile app can connect to machines where Codex runs, load live state from that environment, and let the user review outputs, approve commands, change models, start work, and follow diffs, terminal output, test results, approvals, and screenshots from a phone.¹

The same announcement says Remote SSH reached general availability. Codex can connect into remote environments, detect hosts from SSH configuration, create projects, and run threads in remote machines.¹ The developer docs frame remote connections more concretely: remote access uses the connected host’s projects, threads, files, credentials, permissions, plugins, Computer Use, browser setup, and local tools.²

OpenAI also moved hooks into general availability. The announcement names concrete uses: scanning prompts for secrets, running validators, logging conversations, creating memories, and customizing Codex behavior for repositories and directories.¹ The hooks documentation defines hooks as an extensibility framework for injecting scripts into the Codex loop, and the configuration reference exposes features.hooks for lifecycle hooks loaded from hooks.json or inline configuration.⁷⁶

Those details matter because they turn agent work from a chat exchange into governed operations.

Why Hooks Matter More Than Mobile

Mobile access changes where the human can intervene. Hooks change what the system can enforce.

A phone lets an operator answer a question while away from the desk. A hook can catch the agent before a risky action, after a file edit, before completion, or during a release check. The phone solves latency. The hook solves standards.

Codex already has first-party control surfaces around sandboxing and approvals. OpenAI’s safety docs say Codex combines sandbox mode, which defines what the agent can technically do, with approval policy, which defines when Codex must stop and ask before acting.³ The same docs say network access is disabled by default, and the default local workspace-write mode keeps network access disabled unless the user enables it.³

Hooks sit next to those controls. The current hook events include SessionStart, PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop; PreToolUse can intercept supported Bash calls, file edits through apply_patch, and MCP tool calls, but OpenAI’s docs warn that it does not intercept every shell path, WebSearch, or other non-shell, non-MCP tool calls.⁷ That makes hooks a review and steering layer, not a replacement for sandboxing.

Hooks can make local standards executable:

Standard	Hook-shaped enforcement
Do not leak secrets	Scan prompts and tool inputs before risky actions
Do not fake completion	Stop completion when evidence is missing
Do not publish stale writing	Require source checks and rendered-route checks
Do not leave dirty state	Require exact-path git status and commit intent
Do not weaken quality	Run focused review gates before release

The model can forget a rule. A hook can re-run the rule at the moment the rule matters.

The Harness Is The Operating Layer

An agent harness is the operating layer around a model: permissions, memory, tools, hooks, source checks, release gates, review packets, and rollback discipline. The term can sound private or ornate, but the job is plain. The layer turns intent into accountable work.

Codex now exposes enough official surface to make that layer explicit. Remote connections carry the host environment. Sandbox modes and approval policies define action boundaries. Configuration files define models, projects, permissions, MCP servers, skills, hooks, telemetry, and features.⁶ OpenTelemetry can record events such as user prompts, approval decisions, tool execution results, MCP usage, and network proxy decisions.³⁴

That set of surfaces creates a useful split:

Provider surface	Team-owned standard
Remote connection	Which hosts and accounts can carry work
Sandbox and approvals	Which actions deserve friction
Hooks	Which standards run at decision points
Telemetry	Which events become audit evidence
Git workflow	Which changes become save points
Project instructions	Which durable norms guide the agent

The provider should keep improving the runtime. The team still owns judgment.

What Should Teams Encode First?

Start with four gates. They pay rent immediately.

Evidence Gate

Codex’s original launch post emphasized verifiable evidence: terminal logs, test outputs, and traceable steps during task completion.⁵ Make that expectation non-negotiable. A meaningful completion should name the files changed, commands run, observed behavior, failed checks, and remaining gaps.

For public work, evidence includes source links and claim-source alignment. For web releases, evidence includes rendered routes, metadata, schema, discovery files, deployment state, cache freshness, and live changed markers. For translations, evidence includes locale coverage, quality gates, storage rows or cache files, and native-review status when required.

Approval Gate

Do not use one approval posture for every action. OpenAI’s approvals docs distinguish safe read-only browsing, workspace editing, approval-required network access, untrusted commands, auto-review mode, and dangerous full access.³ A strong local policy should keep the same shape: low-risk reads pass quietly, side-effecting work gets review, and destructive or externally visible work gets explicit evidence.

Git Custody Gate

Agent work needs rollback handles. Codex’s own security docs say Codex works best with version control: keep status clean before delegating, commit frequently, run targeted verification, review diffs, and document decisions in commit messages.³

That advice should become process. Commit after coherent, verified save points. Stage exact paths. Split commits by independently revertible concern. Ask before push unless the release flow already grants publishing authority. Do not sweep unrelated dirty files into a commit because the agent happened to see them.

Taste Gate

AI coding makes implementation cheaper. Cheaper implementation raises the value of taste.

Taste does not mean decorative preference. It means the work improves the whole product. It means the agent can refuse a technically possible path that weakens the result. It means public writing avoids private machinery, unsupported claims, and filler. It means a correct local patch can still fail if the user-visible path remains broken.

A taste gate should ask:

Question	Purpose
Who is the real user?	Prevent local artifact worship
What proves the outcome?	Separate evidence from confidence
What did we remove or refuse?	Preserve coherence
What remains unverified?	Avoid false completion
Why does the work deserve to exist?	Keep volume from replacing judgment

Mozilla Shows The Same Pattern

Mozilla’s May 7 post about hardening Firefox with Claude Mythos Preview makes the same point from a different stack. The team says early LLM code-audit attempts showed promise but had too many false positives to scale. Agentic harnesses changed the economics because they could create and run reproducible test cases to dynamically test bug hypotheses.⁸

Mozilla’s important sentence is not about the model alone. The team says discovery was necessary but not sufficient. The useful system had to integrate with the full security bug lifecycle: targets, deduplication, bug tracking, triage, fixes, and release.⁸ The authors also say the pipeline reflected Firefox’s codebase semantics, tooling, and processes.⁸

That is the lesson for Codex. Better models matter. The operational system around the model decides whether the work becomes trusted output.

What Not To Publish

A public Codex article should not dump the private working system.

Keep these out of public copy:

private prompts and hook bodies;
sensitive local paths;
exact source maps and scoring internals;
account identifiers and credential handling;
private workflow shortcuts;
unreleased plugin behavior;
anything that helps a stranger reconstruct internal operations.

Publish the pattern instead: what the gate protects, what evidence it requires, what failure it catches, and how a team can implement the idea using official Codex surfaces.

That line protects trust. It also improves the writing. Private machinery usually reads like folklore. Public acceptance criteria help other teams reason about their own systems.

A Practical Codex Harness Map

Build the smallest control map that proves useful work.

Layer	First useful version
Project policy	`AGENTS.md` with durable norms and verification commands
Permissions	Workspace-write by default, explicit network and external writes
Hooks	Secret scan, evidence stop gate, git custody, public-writing checks
Source discipline	Primary-source verification for current tool behavior
Review packet	Goal, changed files, commands, results, sources, gaps
Git custody	Exact-path commits after verified save points
Release gate	Rendered route, metadata, schema, translations, live markers
Telemetry	Approval, tool, and network events routed to trusted collectors

Start explicit. Run one real task. Record where the gate helped and where it got in the way. Promote only the parts that improve the user-visible outcome.

Quick Summary

Codex hooks, Remote SSH, mobile control, sandboxing, approvals, configuration, telemetry, and version control point in the same direction: coding agents need operating systems around them.¹²³⁴⁶ The agent can write code. The harness decides what counts as work.

The best teams will not win by producing the most agent output. They will win by making agent work inspectable, reversible, sourced, tasteful, and worthy of release.

FAQ

What are Codex hooks?

Codex hooks are lifecycle hook capabilities that can run from hooks.json or inline configuration. OpenAI’s announcement says hooks can scan prompts for secrets, run validators, log conversations, create memories, and customize Codex behavior for specific repositories and directories; the hooks docs list events such as PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop.¹⁷

Why do Codex hooks matter?

Hooks let teams put standards at decision points instead of relying only on prompts. A hook can check evidence, source quality, git state, or release readiness when the agent acts or tries to finish.

Does Codex mobile replace local agent workflow?

No. Mobile control lets users steer work away from the desk, but the connected host still supplies projects, files, credentials, permissions, plugins, and local tools.² Teams still need local policy, safe credentials, version control, and verification.

What should a Codex harness include first?

Start with project instructions, sandbox and approval posture, a secret boundary, an evidence stop gate, exact-path git custody, source verification for public claims, and a release gate for user-visible work.

Should teams publish their Codex hooks?

Publish patterns and acceptance criteria, not private hook bodies or sensitive workflow details. A useful public post can explain the job of a hook without exposing private paths, source maps, prompts, credentials, or scoring rules.

References

OpenAI, “Work with Codex from anywhere,” OpenAI, May 14, 2026. ↩↩↩↩↩↩↩
OpenAI Developer Docs, “Remote connections,” accessed May 17, 2026. ↩↩↩↩↩
OpenAI Developer Docs, “Agent approvals & security,” accessed May 17, 2026. ↩↩↩↩↩↩↩
OpenAI, “Running Codex safely at OpenAI,” OpenAI, May 8, 2026. ↩↩
OpenAI, “Introducing Codex,” OpenAI, May 16, 2025. ↩
OpenAI Developer Docs, “Configuration Reference,” accessed May 17, 2026. ↩↩↩
OpenAI Developer Docs, “Hooks,” accessed May 17, 2026. ↩↩↩
Brian Grinstead, Christian Holler, and Frederik Braun, “Behind the Scenes Hardening Firefox with Claude Mythos Preview,” Mozilla Hacks, May 7, 2026. ↩↩↩