Codex Hooks Make the Harness Real
OpenAI put Codex into the ChatGPT mobile app on May 14, 2026. The sharper line came lower in the announcement: Remote SSH and hooks reached general availability, while programmatic access tokens arrived for Business and Enterprise plans.1
That changes the job. Codex no longer looks like a coding assistant that waits inside one terminal. It looks like an operating layer that follows work across machines, approvals, threads, diffs, tests, screenshots, plugins, credentials, and local tools.2
Codex hooks make the harness real. Once the agent can work from a phone, reach remote development environments, and run lifecycle hooks, teams need a control system around the model: evidence, approvals, git custody, source discipline, and taste.
TL;DR
Codex now supports the workflow shape that agent teams have been building privately: long-running work, remote execution, mobile steering, approvals, hooks, scoped credentials, and audit signals.123 The prompt still matters, but the operating layer matters more.
The practical question is not “how do we prompt Codex?” The practical question is “what must Codex prove before we trust the result?” Teams should use hooks and configuration to encode review gates, security boundaries, public-writing standards, and release discipline. They should keep private machinery private and publish only the pattern, the acceptance criteria, and the verified outcome.
Key Takeaways
For engineering teams: - Treat Codex hooks as process infrastructure, not decoration. - Start with evidence, approvals, git custody, and release checks before adding clever automation.
For agent-tool builders: - Build around Codex’s real surfaces: mobile control, Remote SSH, sandbox modes, approval policies, project instructions, hooks, telemetry, and version control. - Port jobs-to-be-done, not old slash-command shapes.
For public writers: - Use official OpenAI docs for current Codex behavior. - Describe private practice as author analysis, and leave private prompts, hook bodies, file paths, source lists, credentials, and scoring internals out of public copy.
What Changed On May 14?
OpenAI’s May 14 announcement moved Codex closer to a persistent work surface. Codex in the ChatGPT mobile app can connect to machines where Codex runs, load live state from that environment, and let the user review outputs, approve commands, change models, start work, and follow diffs, terminal output, test results, approvals, and screenshots from a phone.1
The same announcement says Remote SSH reached general availability. Codex can connect into remote environments, detect hosts from SSH configuration, create projects, and run threads in remote machines.1 The developer docs frame remote connections more concretely: remote access uses the connected host’s projects, threads, files, credentials, permissions, plugins, Computer Use, browser setup, and local tools.2
OpenAI also moved hooks into general availability. The announcement names concrete uses: scanning prompts for secrets, running validators, logging conversations, creating memories, and customizing Codex behavior for repositories and directories.1 The hooks documentation defines hooks as an extensibility framework for injecting scripts into the Codex loop, and the configuration reference exposes features.hooks for lifecycle hooks loaded from hooks.json or inline configuration.76
Those details matter because they turn agent work from a chat exchange into governed operations.
Why Hooks Matter More Than Mobile
Mobile access changes where the human can intervene. Hooks change what the system can enforce.
A phone lets an operator answer a question while away from the desk. A hook can catch the agent before a risky action, after a file edit, before completion, or during a release check. The phone solves latency. The hook solves standards.
Codex already has first-party control surfaces around sandboxing and approvals. OpenAI’s safety docs say Codex combines sandbox mode, which defines what the agent can technically do, with approval policy, which defines when Codex must stop and ask before acting.3 The same docs say network access is disabled by default, and the default local workspace-write mode keeps network access disabled unless the user enables it.3
Hooks sit next to those controls. The current hook events include SessionStart, PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop; PreToolUse can intercept supported Bash calls, file edits through apply_patch, and MCP tool calls, but OpenAI’s docs warn that it does not intercept every shell path, WebSearch, or other non-shell, non-MCP tool calls.7 That makes hooks a review and steering layer, not a replacement for sandboxing.
Hooks can make local standards executable:
| Standard | Hook-shaped enforcement |
|---|---|
| Do not leak secrets | Scan prompts and tool inputs before risky actions |
| Do not fake completion | Stop completion when evidence is missing |
| Do not publish stale writing | Require source checks and rendered-route checks |
| Do not leave dirty state | Require exact-path git status and commit intent |
| Do not weaken quality | Run focused review gates before release |
The model can forget a rule. A hook can re-run the rule at the moment the rule matters.
The Harness Is The Operating Layer
An agent harness is the operating layer around a model: permissions, memory, tools, hooks, source checks, release gates, review packets, and rollback discipline. The term can sound private or ornate, but the job is plain. The layer turns intent into accountable work.
Codex now exposes enough official surface to make that layer explicit. Remote connections carry the host environment. Sandbox modes and approval policies define action boundaries. Configuration files define models, projects, permissions, MCP servers, skills, hooks, telemetry, and features.6 OpenTelemetry can record events such as user prompts, approval decisions, tool execution results, MCP usage, and network proxy decisions.34
That set of surfaces creates a useful split:
| Provider surface | Team-owned standard |
|---|---|
| Remote connection | Which hosts and accounts can carry work |
| Sandbox and approvals | Which actions deserve friction |
| Hooks | Which standards run at decision points |
| Telemetry | Which events become audit evidence |
| Git workflow | Which changes become save points |
| Project instructions | Which durable norms guide the agent |
The provider should keep improving the runtime. The team still owns judgment.
What Should Teams Encode First?
Start with four gates. They pay rent immediately.
Evidence Gate
Codex’s original launch post emphasized verifiable evidence: terminal logs, test outputs, and traceable steps during task completion.5 Make that expectation non-negotiable. A meaningful completion should name the files changed, commands run, observed behavior, failed checks, and remaining gaps.
For public work, evidence includes source links and claim-source alignment. For web releases, evidence includes rendered routes, metadata, schema, discovery files, deployment state, cache freshness, and live changed markers. For translations, evidence includes locale coverage, quality gates, storage rows or cache files, and native-review status when required.
Approval Gate
Do not use one approval posture for every action. OpenAI’s approvals docs distinguish safe read-only browsing, workspace editing, approval-required network access, untrusted commands, auto-review mode, and dangerous full access.3 A strong local policy should keep the same shape: low-risk reads pass quietly, side-effecting work gets review, and destructive or externally visible work gets explicit evidence.
Git Custody Gate
Agent work needs rollback handles. Codex’s own security docs say Codex works best with version control: keep status clean before delegating, commit frequently, run targeted verification, review diffs, and document decisions in commit messages.3
That advice should become process. Commit after coherent, verified save points. Stage exact paths. Split commits by independently revertible concern. Ask before push unless the release flow already grants publishing authority. Do not sweep unrelated dirty files into a commit because the agent happened to see them.
Taste Gate
AI coding makes implementation cheaper. Cheaper implementation raises the value of taste.
Taste does not mean decorative preference. It means the work improves the whole product. It means the agent can refuse a technically possible path that weakens the result. It means public writing avoids private machinery, unsupported claims, and filler. It means a correct local patch can still fail if the user-visible path remains broken.
A taste gate should ask:
| Question | Purpose |
|---|---|
| Who is the real user? | Prevent local artifact worship |
| What proves the outcome? | Separate evidence from confidence |
| What did we remove or refuse? | Preserve coherence |
| What remains unverified? | Avoid false completion |
| Why does the work deserve to exist? | Keep volume from replacing judgment |
Mozilla Shows The Same Pattern
Mozilla’s May 7 post about hardening Firefox with Claude Mythos Preview makes the same point from a different stack. The team says early LLM code-audit attempts showed promise but had too many false positives to scale. Agentic harnesses changed the economics because they could create and run reproducible test cases to dynamically test bug hypotheses.8
Mozilla’s important sentence is not about the model alone. The team says discovery was necessary but not sufficient. The useful system had to integrate with the full security bug lifecycle: targets, deduplication, bug tracking, triage, fixes, and release.8 The authors also say the pipeline reflected Firefox’s codebase semantics, tooling, and processes.8
That is the lesson for Codex. Better models matter. The operational system around the model decides whether the work becomes trusted output.
What Not To Publish
A public Codex article should not dump the private working system.
Keep these out of public copy:
- private prompts and hook bodies;
- sensitive local paths;
- exact source maps and scoring internals;
- account identifiers and credential handling;
- private workflow shortcuts;
- unreleased plugin behavior;
- anything that helps a stranger reconstruct internal operations.
Publish the pattern instead: what the gate protects, what evidence it requires, what failure it catches, and how a team can implement the idea using official Codex surfaces.
That line protects trust. It also improves the writing. Private machinery usually reads like folklore. Public acceptance criteria help other teams reason about their own systems.
A Practical Codex Harness Map
Build the smallest control map that proves useful work.
| Layer | First useful version |
|---|---|
| Project policy | AGENTS.md with durable norms and verification commands |
| Permissions | Workspace-write by default, explicit network and external writes |
| Hooks | Secret scan, evidence stop gate, git custody, public-writing checks |
| Source discipline | Primary-source verification for current tool behavior |
| Review packet | Goal, changed files, commands, results, sources, gaps |
| Git custody | Exact-path commits after verified save points |
| Release gate | Rendered route, metadata, schema, translations, live markers |
| Telemetry | Approval, tool, and network events routed to trusted collectors |
Start explicit. Run one real task. Record where the gate helped and where it got in the way. Promote only the parts that improve the user-visible outcome.
Quick Summary
Codex hooks, Remote SSH, mobile control, sandboxing, approvals, configuration, telemetry, and version control point in the same direction: coding agents need operating systems around them.12346 The agent can write code. The harness decides what counts as work.
The best teams will not win by producing the most agent output. They will win by making agent work inspectable, reversible, sourced, tasteful, and worthy of release.
FAQ
What are Codex hooks?
Codex hooks are lifecycle hook capabilities that can run from hooks.json or inline configuration. OpenAI’s announcement says hooks can scan prompts for secrets, run validators, log conversations, create memories, and customize Codex behavior for specific repositories and directories; the hooks docs list events such as PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop.17
Why do Codex hooks matter?
Hooks let teams put standards at decision points instead of relying only on prompts. A hook can check evidence, source quality, git state, or release readiness when the agent acts or tries to finish.
Does Codex mobile replace local agent workflow?
No. Mobile control lets users steer work away from the desk, but the connected host still supplies projects, files, credentials, permissions, plugins, and local tools.2 Teams still need local policy, safe credentials, version control, and verification.
What should a Codex harness include first?
Start with project instructions, sandbox and approval posture, a secret boundary, an evidence stop gate, exact-path git custody, source verification for public claims, and a release gate for user-visible work.
Should teams publish their Codex hooks?
Publish patterns and acceptance criteria, not private hook bodies or sensitive workflow details. A useful public post can explain the job of a hook without exposing private paths, source maps, prompts, credentials, or scoring rules.
References
-
OpenAI, “Work with Codex from anywhere,” OpenAI, May 14, 2026. ↩↩↩↩↩↩↩
-
OpenAI Developer Docs, “Remote connections,” accessed May 17, 2026. ↩↩↩↩↩
-
OpenAI Developer Docs, “Agent approvals & security,” accessed May 17, 2026. ↩↩↩↩↩↩↩
-
OpenAI, “Running Codex safely at OpenAI,” OpenAI, May 8, 2026. ↩↩
-
OpenAI, “Introducing Codex,” OpenAI, May 16, 2025. ↩
-
OpenAI Developer Docs, “Configuration Reference,” accessed May 17, 2026. ↩↩↩
-
Brian Grinstead, Christian Holler, and Frederik Braun, “Behind the Scenes Hardening Firefox with Claude Mythos Preview,” Mozilla Hacks, May 7, 2026. ↩↩↩