Codex CLI vs Claude Code 2026: Architecture, Pricing, and China Access

From the guides: Claude Code & Codex CLI

Both Codex CLI and Claude Code ship as terminal-native agentic tools, yet they enforce safety through fundamentally different mechanisms: kernel-level sandboxing versus application-layer hooks. That single design decision cascades into how each tool handles configuration, permissions, multi-agent workflows, and team governance. The following comparison maps those differences with concrete decision criteria, extending the AI engineering territory I’ve been building across this site.

I use Claude Code as my primary tool. I state that bias upfront. The observations here come from daily use of both tools across production tasks, blind evaluations, and dual-tool workflows.

TL;DR: Codex enforces safety at the OS kernel layer (Seatbelt, Landlock, seccomp)1 with coarse-grained control. Claude Code enforces safety at the application layer through 26 programmable hook events2 with fine-grained control. Both tools now run at large context: Claude Code on Opus 4.7 exposes 1M tokens at standard pricing5; Codex CLI on GPT-5.4 (the current OpenAI frontier model, released March 5, 2026, which incorporates GPT-5.3-Codex’s coding capabilities) exposes up to 1.05M context with 128K max output, though the default context is 272K unless you explicitly enable the long-context mode4. Use Codex for cloud-sandboxed task delegation and kernel-level isolation. Use Claude Code for programmable governance, long-horizon refactoring, and security-focused code review. Best results come from using both.

Key Takeaways

  • Solo developers: Start with whichever tool matches your primary language ecosystem. Both tools coexist in the same repo with no conflicts (CLAUDE.md and AGENTS.md are independent).
  • Team leads: Codex profiles offer explicit, auditable configuration switching. Claude Code’s layered hierarchy applies context-sensitive rules automatically. Choose based on whether your team prefers explicit control or automatic adaptation.
  • Security engineers: Codex’s kernel sandbox prevents the agent from bypassing restrictions at the OS level. Claude Code’s hooks share a process boundary with the agent but allow arbitrary validation logic. Match the tool to your threat model.

Which Tool Should You Pick? (Persona Decision Paths)

The comparison answer depends on who you are. Four paths, one each for the most common reader on this page.

Solo developer on personal or small-team projects

Default: Claude Code. The 1M token context on Opus 4.7 at standard pricing, the 26-hook governance system, and the plugin marketplace cover the cases solo developers hit daily (large codebase refactors, session continuity, format-on-save automation). Pro at $20/month or Max at $100-200/month is predictable and generous.

Bring in Codex CLI when: you need kernel-level sandboxing for a one-off untrusted-code review, or when ChatGPT Pro/Plus already covers your primary AI spend and adding Claude feels redundant. Both tools coexist cleanly; CLAUDE.md and AGENTS.md live side by side.

Team lead at a 10-50 person engineering org

Default: Claude Code. Programmable hooks (linting gates, security scans, forbidden-command blocks) encode team standards deterministically rather than hoping the model follows prompt instructions. Managed settings let the lead set org-wide policy that individual devs can’t override. The claude agents CLI and Agent Teams primitives match the patterns teams actually use for review workflows.

Bring in Codex CLI when: security-sensitive reviews need kernel-hard isolation (e.g., reviewing external contractor code, open-source PRs from unknown authors), or when the team is already committed to OpenAI tooling through Azure OpenAI / Microsoft Foundry. Run it as a focused review tool, not the daily driver.

Security-focused reviewer or red-team researcher

Default: Codex CLI (for adversarial inputs) + Claude Code (for governed execution). Codex’s kernel sandbox on macOS Seatbelt / Linux Landlock+seccomp denies syscalls below the application layer, so a hostile agent literally cannot touch filesystem areas you didn’t allow. Claude Code’s hook system is powerful but shares the process boundary. Use the tool that matches the threat.

Bring in Claude Code when: you want programmable post-review actions (triage hooks, audit logging, automated report generation). The typical workflow: Codex inspects under sandbox constraint, Claude Code handles the triage and policy-enforcement layer.

Chinese / mainland-China-based developer

Both tools work, but connectivity and cost shape the choice more than features do. Skip to Accessing Codex and Claude Code from China before committing.


The Core Architecture Split

The deepest difference between Codex and Claude Code is where governance happens. Codex enforces safety at the kernel layer via Seatbelt on macOS, Landlock and seccomp on Linux1. The OS restricts filesystem access, network calls, and process spawning before those operations reach the application. The model cannot bypass these restrictions because the operating system denies the syscall before it executes.

Claude Code enforces safety at the application layer through hooks, programs that intercept actions at 26 lifecycle points2. A PreToolUse hook on Bash can inspect every command, validate it against arbitrary logic, and block it with exit code 2. The hook system delivers programmable governance: encode business rules, run linters, scan for credentials. The tradeoff is that application-layer enforcement shares a process boundary with the agent. Kernel-level enforcement does not.

Every safety architecture trades expressiveness for boundary strength. These two tools sit at opposite ends of that spectrum, and that positioning is intentional. Kernel sandboxing makes sense when the threat model includes a potentially adversarial agent (reviewing malicious code, running untrusted scripts). Application-layer hooks make sense when the threat model is an overconfident but well-intentioned agent (your own code, your own team, your own conventions). Most developers need both threat models at different times.

Configuration Philosophy

Codex uses TOML for configuration. Claude Code uses JSON. The format difference is cosmetic. The philosophy difference is not.

Codex organizes configuration around profiles, named presets you switch between explicitly with --profile. A careful profile sets approval_policy = "untrusted" and sandboxes aggressively9. A deep-review profile switches to a more capable model. You always know which configuration is active because you selected it by name. The instruction layer uses AGENTS.md, an open standard under the Linux Foundation’s Agentic AI Foundation3, readable by Codex, Cursor, Copilot, Amp, Windsurf, and Gemini CLI.

Claude Code organizes configuration around layered hierarchy, five layers cascading from managed settings (highest priority) through command line, local project, shared project, and user defaults. CLAUDE.md files scope at user, project, and local levels. Skills, hooks, and rules directories add further layers. Context-appropriate configuration applies automatically, but the active configuration is not visible from any single file. You reconstruct it by reading the hierarchy.

Profiles favor explicitness and auditability. You can answer “what configuration was active?” by checking which --profile flag was passed. Layered hierarchy favors automation and context-sensitivity. The right context applies automatically, but answering “what configuration is active?” requires reading up to five layers and understanding their merge order. The tradeoff is real: I have occasionally been surprised by a user-level CLAUDE.md override that conflicted with a project-level instruction, which would not happen with explicit profiles.

Safety Models Compared

Dimension Codex CLI Claude Code
Sandbox approach Kernel-level (Seatbelt on macOS, Landlock + seccomp on Linux) Application-level hooks (26 lifecycle event types)
Permission levels Three sandbox modes: read-only, workspace-write, danger-full-access Granular pattern-based allow/deny lists per tool
Escape resistance High: OS denies syscalls below the application boundary Moderate: hooks share process boundary with agent
Programmability Low: binary allow/deny per sandbox mode High: arbitrary code in hook scripts (bash, Python, etc.)
Approval policies Three levels: untrusted, on-request, never Per-tool permission patterns with regex matching
Network restrictions Sandbox controls outbound network access Hooks can inspect but not kernel-block network calls
Known vulnerability class Sandbox escape (theoretical; no public CVE reported as of March 2026) Malicious hooks in project config (mitigated via project trust prompts)

The pattern: Codex provides stronger boundaries with coarser control. Claude Code provides weaker boundaries with finer control11. The right choice depends on your threat model. Reviewing untrusted external code? Kernel sandboxing. Enforcing organizational coding standards on trusted code? Programmable hooks.

Context and Models

As of April 2026, Codex CLI defaults to GPT-5.4 (released March 5, 2026, snapshot gpt-5.4-2026-03-05)4. GPT-5.4 is OpenAI’s current frontier general-purpose model and, per OpenAI’s launch post, incorporates GPT-5.3-Codex’s coding capabilities while adding native Computer Use and broader agentic workflow support. Context is 272K by default with a 1.05M-token experimental long-context mode you enable via model_context_window / model_auto_compact_token_limit configuration. Output caps at 128K.4 Long-context prompts over 272K input tokens are billed at 2× input / 1.5× output for that session.4 GPT-5.3-Codex is not deprecated and remains available for teams that prefer the coding-optimized cost/speed profile.

Claude Code’s default model depends on plan tier per Anthropic’s model config docs5: Max and Team Premium default to Opus 4.7 (released April 16, 2026); Pro, Team Standard, Enterprise, and pay-per-token Anthropic API default to Sonnet 4.6 with Enterprise and API moving to Opus 4.7 on April 23, 2026. Opus 4.7 exposes a 1M token context window at standard pricing when used (no long-context premium). Both vendors’ model defaults and context limits change between releases; check each vendor’s page for current values.

Both tools now handle large context well. Claude Code reaches 1M on Opus 4.7 at standard pricing, no premium. Codex CLI on GPT-5.4 reaches 1.05M with long-context mode enabled, billed at the 2×/1.5× multiplier when you cross 272K input. For monorepo ingestion, the practical difference has narrowed; retrieval quality (how well each tool finds relevant code) matters more than raw window size for most projects.

On public benchmarks as of April 2026, Opus 4.7 leads on SWE-bench Verified (87.6% vs GPT-5-Codex’s 74.9% baseline), SWE-bench Pro (64.3% vs GPT-5.4’s official 57.7% and GPT-5.3-Codex’s 56.8%), and CursorBench (70% vs Opus 4.6’s 58%)12. On Terminal-Bench 2.0, Opus 4.7 comes in at 69.4%; GPT-5.4 at 75.1% and GPT-5.3-Codex at 77.3% lead there12. GPT-5.4’s SWE-bench Verified score is not published on the official model or launch pages at the time of writing; third-party coverage reports a figure around 80%, but treat unpublished vendor numbers cautiously. Benchmark leadership swings between releases; check vendor pages before committing. In my blind evaluations with an earlier version of Opus, it outperformed on review and security tasks even at smaller context, and the same pattern holds at 1M.

Both tools support model routing. Codex selects models per profile9. Claude Code’s default depends on the plan tier described above (Opus 4.7 on Max and Team Premium, Sonnet 4.6 on Pro and Team Standard and Enterprise and API with Enterprise plus API moving to Opus 4.7 on April 23, 2026), and every invocation can override via --model or settings-level configuration.

Pricing Deep-Dive

Pricing splits into three patterns: per-token API billing, subscriptions that include agentic-CLI usage, and cloud-provider billing through AWS / GCP / Azure. The cheapest path depends on daily token volume, not sticker price.

Claude Code pricing (April 2026)

Per-token (Anthropic API):13

Model Input ($/MTok) Output ($/MTok) Cache read ($/MTok) 5-min cache write ($/MTok) 1-hour cache write ($/MTok)
Claude Opus 4.7 $5.00 $25.00 $0.50 $6.25 $10.00
Claude Opus 4.6 $5.00 $25.00 $0.50 $6.25 $10.00
Claude Sonnet 4.6 $3.00 $15.00 $0.30 $3.75 $6.00
Claude Haiku 4.5 $1.00 $5.00 $0.10 $1.25 $2.00

No long-context premium: Opus 4.7’s 1M token window is priced at the standard rate. Batch API delivers a 50% discount on input and output.13

Subscriptions that include Claude Code:8

Plan Monthly Claude Code usage profile
Pro $20 Generous daily limits; hits extra-usage gating under sustained heavy agentic work
Max 5x $100 5× Pro’s Claude usage; typical daily driver limit for solo developers
Max 20x $200 20× Pro’s usage; covers most single-dev heavy-refactor days
Team Standard $30/user Per-seat with shared admin controls
Team Premium $150/user Includes full Opus 4.7 default across all seats
Enterprise custom Per-seat with managed policy, SSO, and audit

Cloud-provider pricing follows AWS Bedrock / Google Vertex AI / Microsoft Foundry list rates, which closely track Anthropic’s direct API but with regional availability and data-residency differences.

Codex CLI pricing (April 2026)

Per-token (OpenAI API):14

Pricing changes as OpenAI rotates model variants; these are the rates verified as of April 19, 2026.

Model Input ($/MTok) Cached input ($/MTok) Output ($/MTok) Context / Max output
GPT-5.4 (current default) $2.50 $0.25 $15.00 1,050,000 ctx / 128K output
GPT-5.3-Codex see OpenAI pricing N/A see OpenAI pricing 400K input / 128K output
GPT-5.2-Codex see OpenAI pricing N/A see OpenAI pricing 400K input / 128K output
GPT-5 varies by tier N/A varies up to 400K input

Long-context prompts on GPT-5.4 (over 272K input tokens) bill at 2× input and 1.5× output for that session, across standard, batch, and flex tiers.4

Subscriptions that include Codex:

ChatGPT Plus ($20/month), Pro ($100/month for 5×, $200/month for 20×), and Business (pay-as-you-go Codex-only seats, or standard ChatGPT Business seats with Codex usage limits) all include Codex-family usage with plan-specific caps. Pro 5× gets a temporary usage boost to 10× Plus through May 31, 2026; Pro 20× 5-hour Codex limits run at 25× Plus during the same promo window. GPT-5.4, GPT-5.3-Codex, and GPT-5.2-Codex are all available via the OpenAI API with published per-token pricing and rate limits for supported API tiers (free tier unsupported).14 API-only teams skip the subscription entirely; use ChatGPT subscriptions when the bundled Codex usage plus the broader chat surface is the better value for the team.

What Opus 4.7’s 1M context actually costs

The practical question: “If I feed Opus 4.7 a 1M-token codebase, what’s the bill?”

One full-context pass with a 10K-token response: - Input: 1,000,000 tokens × $5.00/MTok = $5.00 - Output: 10,000 tokens × $25.00/MTok = $0.25 - Total (no caching): $5.25 per pass

With 5-minute prompt caching on the 1M-token codebase (assumed single cache write, repeated reads for follow-ups): - First write: 1,000,000 × $6.25/MTok = $6.25 (one-time) - Each subsequent read within 5 min: 1,000,000 × $0.50/MTok + 10,000 output × $25/MTok = $0.75 - Five reads in a session: $6.25 + (5 × $0.75) = $10.00 for five full-context passes

CNY example using a reference rate of 1 USD ≈ 6.82 CNY (PBOC central parity clustered in the 6.82-6.90 range around April 2026): ~¥68.20 for five full-context Opus 4.7 sessions on a 1M-token codebase. FX moves; verify the current rate before citing in procurement. The calculation, not the exact CNY figure, is what matters for budgeting.

The equivalent math on GPT-5.4’s long-context mode: - Input: 1,000,000 tokens × ($2.50 base × 2 long-context multiplier) = $5.00 - Output: 10,000 tokens × ($15.00 base × 1.5 long-context multiplier) = $0.225 - Total (no caching): $5.23 per pass — within 1% of Opus 4.7’s uncached price at full 1M context

On GPT-5.2-Codex (400K input ceiling), you’d need at least three passes to ingest the same 1M codebase, which changes the session-level cost profile. Most Chinese developer teams don’t need full 1M context daily, so the realistic comparison runs through typical session sizes (50K-200K tokens) where both tools cost under $1 per session.

When subscriptions beat per-token

Rough heuristic (not a published token quota, since Anthropic doesn’t publish one): light interactive use fits Pro comfortably; heavier daily agentic workflows on Opus 4.7 push into Max 5x or Max 20x territory; sustained full-context ($5+/session) workloads may be cheaper on pay-per-token with aggressive prompt caching than on a capped subscription. Run a representative week on Pro, check your Claude usage dashboard, and step up tiers as needed rather than guessing from a formula. Teams do the same per-user math, plus the admin, policy, and SSO overhead the Enterprise tier absorbs.

Accessing Codex and Claude Code from China

First-party OpenAI and Anthropic API access is not officially supported from mainland China per each provider’s published supported-country lists.18 Developers sometimes route through non-mainland networks and accounts to work around this, but doing so carries account-suspension and compliance risk that you need to weigh against whatever productivity case you’re making. The CLI binaries install and run locally once downloaded; day-to-day agent-loop behavior is the same everywhere. Cloud-provider routing is where the legitimate paths live.

AWS Bedrock regional availability

Anthropic’s Claude models are served through Amazon Bedrock in specific AWS regions. As of April 2026, public Bedrock runtime endpoints cover APAC regions including Tokyo, Seoul, Singapore, Mumbai, and Sydney, but no Bedrock runtime endpoint currently operates in mainland China or Hong Kong.15 Chinese customers routing through AWS typically use Singapore or Tokyo with the associated latency cost.

Google Vertex AI regional availability

Google Cloud offers Vertex AI generative-AI endpoints in Asia-Pacific regions.16 Specific Claude-model availability varies by region, and asia-east2 (Hong Kong) has historically offered lower latency for users in southern China. Verify Claude model availability in your chosen Vertex region before committing; coverage expands over time but is not uniform across APAC.

Microsoft Foundry

Claude is available through Microsoft Foundry on Azure’s global standard deployment, typically requiring eligible Enterprise / MCA-E subscriptions. Claude is not publicly documented as available in Azure China (operated by 21Vianet), which is a separate sovereign cloud with a distinct service catalog. Chinese customers using Foundry route through the global Azure footprint rather than Azure China.17

OpenAI Codex from China

OpenAI’s supported-countries list does not include mainland China; OpenAI warns that access from unsupported regions may cause account blocking or suspension.18 Azure OpenAI is available in specific global regions (not Azure China), and Chinese enterprises pursuing compliant access typically route through Azure OpenAI in an allowed region with appropriate contractual terms rather than trying to use the direct OpenAI API.

Model alternatives from Chinese providers

DeepSeek, Qwen (Alibaba), and Kimi (Moonshot) are model-level alternatives that Chinese teams evaluate for cost and latency reasons. These are models, not agentic CLIs. Pairing them with Claude Code requires an Anthropic-API-compatible adapter or gateway (Claude Code expects the Anthropic request/response shape; ANTHROPIC_BASE_URL points at Anthropic-compatible endpoints, not OpenAI-compatible ones). Codex supports profile-level model routing but similarly expects OpenAI-compatible responses. Neither tool exposes first-class DeepSeek/Qwen/Kimi support; the path is an adapter layer that translates between the provider’s API shape and what the CLI expects. Procurement, latency, and data-residency questions these models answer well. Agent-loop correctness and tool-calling maturity questions are still best served by the frontier Claude and GPT models these CLIs are tuned for.

Multi-Agent Capabilities

Codex offers cloud task delegation via codex cloud exec6. You describe a task, Codex spins up a cloud environment, runs the agent against your codebase, and returns a diff. You do not monitor the agent’s reasoning in real time; you define the task upfront and collect results later. Cloud delegation maps naturally to CI/CD pipelines and batch processing. Internally, Codex supports concurrent agent threads for parallel subtask execution7 (up to 6 in the current release, though this limit may change).

Claude Code offers explicit subagent spawning via the Task tool10. The parent agent spawns subagents with specific tasks and isolated context, coordinates results, and synthesizes outputs. Subagent spawning enables interactive orchestration: you see the reasoning and can intervene. Combined with deliberation patterns where multiple agents critique each other’s outputs, interactive orchestration catches issues that fire-and-forget models miss.

Cloud tasks suit workflows where you define the task upfront and want results later. Subagent coordination suits workflows where the task evolves through reasoning and requires real-time synthesis.

The Trust Spectrum

Before looking at the decision matrix, consider where your task falls on the trust spectrum. Every agentic coding task involves an implicit trust decision: how much do you trust the agent’s judgment on this specific task?

Low trust (use Codex): You are reviewing code you did not write, running scripts from external sources, or delegating work to a cloud environment you cannot monitor in real time. The agent might encounter adversarial input. You want the OS to enforce boundaries regardless of what the model decides.

Medium trust (use either): You are working on your own codebase with known patterns. The agent might make mistakes, but they are mistakes of overconfidence, not malice. You want to review changes before they land but do not need kernel-level isolation.

High trust (use Claude Code): You have built guardrails through hooks, CLAUDE.md instructions, and allowlisted permissions. The agent operates within a governed environment you designed. You trust the governance layer enough to approve actions selectively rather than blanket-restricting them.

Most developers operate at medium trust most of the time, which is why the dual-tool workflow works: Codex handles the low-trust tasks where its sandbox shines, and Claude Code handles the medium-to-high trust tasks where programmable hooks add more value than kernel restrictions.

Decision Framework

A concrete decision matrix based on specific needs:

If you need… Best choice Why
Kernel-level sandboxing Codex OS-level enforcement cannot be bypassed by the agent
Programmable governance hooks Claude Code 26 lifecycle events with arbitrary code execution
Cross-tool portability (AGENTS.md) Codex Open standard works in Codex, Cursor, Copilot, Amp, Windsurf
Deep multi-file refactoring Claude Code Opus excels at holding architectural context across long sessions
Fire-and-forget cloud tasks Codex codex cloud exec delegates to cloud infrastructure and returns diffs
Real-time interactive reasoning Claude Code Extended thinking + subagent coordination with live visibility
Reviewing untrusted external code Codex --sandbox read-only prevents all filesystem mutations
Enforcing team coding standards Claude Code Hooks encode and enforce business logic deterministically
Large monorepo ingestion Roughly tied Opus 4.7 brings Claude Code to 1M at standard pricing; Codex CLI on GPT-5.4 reaches 1.05M with long-context mode (billed 2×/1.5× over 272K input), so both now handle monorepos
Security-focused code review Claude Code Opus outperformed in my blind evaluation series on review tasks

No single tool dominates this matrix. The underlying pattern is simpler than ten rows suggest: Codex excels when you need hard boundaries, and Claude Code excels when you need programmable logic. If you are running untrusted code, reviewing external contributions, or delegating to a cloud environment you cannot monitor, hard boundaries matter more. If you are enforcing team conventions, orchestrating multi-step workflows, or building guardrails that encode business rules, programmable logic matters more. If more than three of your needs point to one tool, start there. If the split is even, consider the dual-tool workflow.

My Recommendation

Use both. I ran identical code review tasks through both tools across 12 task categories (documented in my blind evaluation series) and found that neither tool alone caught everything. A concrete example: during a FastAPI authentication review, Opus flagged a timing side-channel in the password comparison function. The comparison used Python’s == operator instead of hmac.compare_digest(), creating a timing oracle11. Codex missed that issue entirely. On the same codebase, Codex’s sandbox caught an SSRF vector in a URL-fetching endpoint where user-supplied URLs could reach internal services. Opus had approved the endpoint because the input validation looked correct at the application level, but the kernel sandbox flagged the outbound network request to an internal IP range. Different models trained on different data catch different vulnerability classes. Running both costs roughly 2x per review but catches meaningfully more issues on security-sensitive code.

My daily workflow splits by task type:

  • Claude Code handles feature implementation, code review, and multi-file refactors. Hooks enforce formatting, block dangerous commands, and run tests after every edit. The interactive subagent model works well for tasks that evolve through reasoning.
  • Codex handles untrusted code review with --sandbox read-only (I review external PRs and dependencies in the kernel sandbox), cloud-delegated batch tasks via codex cloud exec, and architecture second opinions where a different model perspective catches blind spots.

CLAUDE.md and AGENTS.md coexist in the same repository with no conflicts. Maintenance overhead stays minimal because both files share most content. I keep a shared section of conventions and copy it into both.

When not to use either tool. Neither Codex nor Claude Code is the right choice when you need guaranteed determinism. Both tools are probabilistic: the same prompt can produce different outputs across runs. If your workflow requires exact reproducibility (e.g., generating configuration files that must match a schema byte-for-byte), use a template engine or code generator instead. Agentic tools are strongest when the task requires judgment, and weakest when the task requires precision without judgment.

For the full comparison with blind evaluation methodology and results across 12 task categories, see Claude Code vs Codex: When to Use Which. For getting started individually, see the Claude Code guide or the Codex guide. For a practical walkthrough of the hook system that powers Claude Code’s governance layer, see the hooks tutorial.

References

FAQ

Can I use both Codex and Claude Code on the same project?

Yes. CLAUDE.md and AGENTS.md are separate files that each tool reads independently. Neither tool parses the other’s instruction file. Configuration files do not conflict. I maintain both in every active project. The only consideration is keeping shared content synchronized between instruction files, which takes minutes since the formats are similar.

Which is cheaper for daily use?

See the full Pricing Deep-Dive section above. Quick version: Claude Code has per-token Anthropic API pricing plus a subscription ladder (Pro $20, Max 5x $100, Max 20x $200, Team $30/user, Team Premium $150/user). Codex CLI has per-token OpenAI API pricing for GPT-5.4 ($2.50 input / $15 output per MTok, 2×/1.5× multipliers over 272K input) and the GPT-5.3-Codex / GPT-5.2-Codex family, plus ChatGPT Plus/Pro inclusions. Token efficiency varies by task type; for budget-sensitive work, run a representative task through both and compare actual charges. Per-token pricing differs between providers, so raw token counts do not map directly to cost.

Which handles larger codebases better?

Both handle large repositories well. After the April 2026 Opus 4.7 launch, Claude Code reaches 1M tokens at standard pricing. Codex CLI on GPT-5.4 reaches 1.05M tokens with long-context mode enabled (2×/1.5× input/output multipliers over 272K input); default context is 272K unless you opt into the long-context tier. Neither tool reads your entire codebase at once; both lean on retrieval for everyday work (codebase search in Claude Code, layered CLAUDE.md front-loading context; embedding-based file discovery in Codex). Raw window size matters most when reasoning about relationships across many files in a single turn, and for that both tools now deliver.

Does Codex CLI run locally or in the cloud?

Both, but not in the same mode. Codex CLI runs locally by default, the same pattern as any terminal tool.1 Cloud delegation is a separate flow via codex cloud exec or Codex Cloud, which runs your task in a container under OpenAI-hosted infrastructure and returns a diff. Codex Cloud is what people usually mean when they say “Codex sandbox”; Codex CLI’s local sandboxing is the kernel-level Seatbelt / Landlock path described in the Safety Models section above.

Can I access Claude Code and Codex from mainland China?

First-party OpenAI and Anthropic API access is not officially supported from mainland China. The CLI binaries install and run locally, but routing traffic to the first-party APIs from mainland China may cause account-suspension or compliance issues. The legitimate paths run through Azure OpenAI (specific non-China regions), AWS Bedrock (nearest public APAC regions including Tokyo, Seoul, Singapore, Mumbai, and Sydney; no mainland-China or Hong Kong runtime endpoint), Google Vertex AI (asia-east2 Hong Kong and other APAC regions with per-model availability caveats), and Microsoft Foundry on global Azure (not Azure China) for Claude. See Accessing Codex and Claude Code from China above for specifics.

How do Chinese-language comments or code affect token usage?

Chinese characters tokenize differently from English. Claude’s tokenizer treats most Chinese characters as one token each, which means Chinese source code is often more token-efficient than the equivalent English per-line but less efficient per-character (one token covers one character rather than a 4-6 character English word). Codex (GPT family) uses a similar approach. The practical effect: expect roughly comparable token counts for equivalent comment / docstring content in either language, with per-token behavior dominated by code structure rather than natural-language ratio.

Can I use Claude Code or Codex CLI with DeepSeek, Qwen, or Kimi as the backing model?

Only via an adapter or gateway. Claude Code expects the Anthropic API request/response shape (ANTHROPIC_BASE_URL points at Anthropic-compatible endpoints); Codex expects the OpenAI shape. DeepSeek / Qwen / Kimi all publish their own APIs that need translation before a Claude Code or Codex CLI session can drive them. Community adapter projects exist but are not first-class, and the tool-calling and prompt-caching dialects each provider uses differ enough that multi-turn agentic loops often break. DeepSeek / Qwen / Kimi are credible options for single-shot code generation through a separate shell harness, and for single-file review at their native price points. Full agentic-loop correctness and tool-calling reliability still come from the frontier Claude and GPT models these CLIs were tuned for.

What’s the difference between Codex CLI and ChatGPT’s Codex features?

Codex CLI is the terminal tool at github.com/openai/codex. “Codex” inside ChatGPT refers to the same model family surfaced through ChatGPT’s web/desktop/mobile apps with different UI affordances (cloud task delegation, async results, ChatGPT history integration). CLI and ChatGPT share the underlying models; the workflow and context-management differ. If your question is “which tool should I install on my laptop?”, you mean Codex CLI.

Do I need a ChatGPT subscription to use Codex CLI?

No, though it helps with cost. Codex CLI works with a standalone OpenAI API key billed per token. ChatGPT Plus or Pro bundles some Codex usage (check the current ChatGPT subscription page for caps).14 For Chinese developers, direct API billing via an OpenAI account is typically the cleaner path than ChatGPT subscription routing through mainland-China payment rails.

What’s the actual hook count in Claude Code?

26 lifecycle events as of v2.1.116 (April 2026).2 The count grew over time, so February posts that cite 17 events are stale. Major additions through 2026: PostToolUseFailure, SubagentStart, TeammateIdle, TaskCompleted, PermissionRequest, PermissionDenied, PreCompact / PostCompact, Elicitation / ElicitationResult, StopFailure, TaskCreated, CwdChanged, FileChanged, InstructionsLoaded, ConfigChange, WorktreeCreate / WorktreeRemove, and Setup.

When did Opus 4.7 ship and how does it change this comparison?

April 16, 2026. It’s Anthropic’s first post-Glasswing GA Opus release and ships with explicit cyber safeguards. The practical comparison changes: Claude Code now reaches 1M tokens at standard pricing (Opus 4.7 included, no long-context premium), SWE-bench Verified leadership shifts to Opus 4.7 at 87.6% over GPT-5-Codex’s 74.9% baseline, and Terminal-Bench 2.0 leadership swings the other direction. GPT-5.4 leads there at 75.1% and GPT-5.3-Codex at 77.3% vs Opus 4.7’s 69.4%. Benchmark leadership is fluid; treat any single result as a point-in-time measurement. See the Context and Models section above for the full numbers.


  1. OpenAI, “Codex CLI: Sandbox Architecture.” Seatbelt (macOS), Landlock and seccomp (Linux). GitHub: openai/codex 

  2. Anthropic, “Claude Code Hooks.” 26 lifecycle event types (as of v2.1.116, April 2026). docs.anthropic.com/en/docs/claude-code/hooks 

  3. Linux Foundation, “AGENTS.md Open Standard.” Agentic AI Foundation. GitHub: anthropics/agent-instructions 

  4. OpenAI, GPT-5.4 model docs. Snapshot gpt-5.4-2026-03-05. Default context 272K; experimental long-context mode up to 1,050,000 tokens when model_context_window and model_auto_compact_token_limit are set. Max output 128K. Knowledge cutoff Aug 31, 2025. Long-context pricing multiplier: 2× input / 1.5× output per session when input exceeds 272K, across standard / batch / flex tiers. See also Introducing GPT-5.4 for the launch post (positions GPT-5.4 as incorporating GPT-5.3-Codex’s coding capabilities and adding native Computer Use), and the historical GPT-5.3-Codex and GPT-5.2-Codex model pages for the 400K/128K Codex-family variants still available. 

  5. Anthropic, “Claude Opus 4.7.” 1M token context at standard pricing. anthropic.com/claude/opus. See also Claude Code model configuration

  6. OpenAI, “Codex Cloud Tasks.” codex cloud exec delegation. platform.openai.com/docs/guides/codex 

  7. OpenAI, “Codex Agent Architecture.” Concurrent thread model. GitHub: openai/codex 

  8. Anthropic, “Pricing.” Claude Max plan. platform.claude.com/docs/en/about-claude/pricing 

  9. OpenAI, “Codex Profiles and Policies.” Configuration. GitHub: openai/codex 

  10. Anthropic, “Claude Code: Best practices for agentic coding.” anthropic.com/engineering/claude-code-best-practices 

  11. Simon Willison, “Codex, Claude Code, and the state of agentic coding tools.” simonwillison.net 

  12. Benchmark numbers (April 2026). Opus 4.7 from Anthropic launch page: 87.6% SWE-bench Verified, 64.3% SWE-bench Pro, 69.4% Terminal-Bench 2.0, 70% CursorBench. GPT-5.4 official coding evals from OpenAI: Introducing GPT-5.4: 57.7% SWE-bench Pro, 75.1% Terminal-Bench 2.0. GPT-5.4 SWE-bench Verified is NOT published on the official model page or the launch page; third-party coverage (e.g. NxCode’s GPT-5.4 writeup) reports ~80% SWE-bench Verified, which I cite as third-party until OpenAI publishes official numbers. GPT-5.3-Codex 56.8% SWE-bench Pro / 77.3% Terminal-Bench 2.0 from OpenAI: Introducing GPT-5.3-Codex; the 75.2% SWE-bench Verified figure often cited for GPT-5.3-Codex is not on the official launch page (third-party attribution). GPT-5.2-Codex 56.4% SWE-bench Pro / 64.0% Terminal-Bench 2.0 from the same source. GPT-5-Codex 74.9% SWE-bench Verified is the widely-cited baseline from OpenAI’s original Codex launch (also referenced on OpenAI’s GPT-5 developer page); treat this as a floor for the Codex family rather than a current measurement. 

  13. Anthropic Pricing. Official per-token rates for Opus 4.7 ($5/$25 per MTok), Opus 4.6 ($5/$25), Sonnet 4.6 ($3/$15), Haiku 4.5 ($1/$5). Prompt caching multipliers: 5-min cache write 1.25×, 1-hour cache write 2×, cache hit 0.1× base input. 1M context on Opus 4.7 included at standard pricing (no long-context premium). Batch API: 50% discount. 

  14. OpenAI API Pricing for per-token rates and OpenAI Codex Pricing for plan tiers and 5-hour rate limits. GPT-5.4 per-token: $2.50 input / $0.25 cached input / $15 output per MTok; 2×/1.5× long-context multiplier over 272K input. Codex plans as of April 2026: Plus $20/mo, Pro 5× $100/mo, Pro 20× $200/mo (with May 31, 2026 promo boosts noted above), Business pay-as-you-go for Codex-only seats, Enterprise/Edu contact-sales. See also the GPT-5.4 model docs, GPT-5.3-Codex model docs, and GPT-5.2-Codex model docs for per-model context windows, rate limits, and API tier availability. Pricing is revised periodically as OpenAI rotates model variants; this post’s numbers reflect the rate card as of April 19, 2026. 

  15. AWS Bedrock runtime endpoints. Public Bedrock runtime endpoints cover APAC regions (Tokyo, Seoul, Singapore, Mumbai, Sydney among others) but list no mainland-China or Hong Kong runtime endpoint as of April 2026. Verify current coverage before relying on any specific region. 

  16. Google Vertex AI generative-AI locations. Asia-Pacific regions including asia-east2 (Hong Kong) serve generative-AI endpoints; specific model availability varies by region and expands over time. Check the locations page for the target region and model before committing. 

  17. Claude in Microsoft Foundry. Claude is deployed through global standard Foundry regions. Azure China (21Vianet) is a separate sovereign cloud with a distinct feature catalog; Claude is not listed as an Azure China model at the time of writing. 

  18. OpenAI supported countries does not include mainland China; OpenAI warns that access from unsupported countries may cause account blocking or suspension. Anthropic supported countries similarly lists officially supported markets; mainland China is not among them at the time of writing. Readers routing through non-mainland networks should review both providers’ terms and their own compliance posture before relying on that path. 

Related Posts

Claude Code vs Codex CLI 2026: Decision Reference

Use official docs for setup; use Blake's reference for architecture, safety, extensibility, and 36 blind duel results.

14 min read

AGENTS.md Patterns: What Actually Changes Agent Behavior

Which AGENTS.md patterns actually change agent behavior? Anti-patterns to avoid, patterns that work, and a cross-tool co…

12 min read

Claude Code Skills: Build Custom Auto-Activating Extensions

Build custom Claude Code skills that auto-activate based on context. Step-by-step tutorial covering SKILL.md structure, …

13 min read