← 所有文章

AGENTS.md Patterns: What Actually Changes Agent Behavior

From the guides: Claude Code & Codex CLI

My first AGENTS.md was a 200-line paste of our team’s style guide. It included naming conventions, code review checklists, deployment procedures, and architectural principles. The agent ignored most of it. Not because the instructions were wrong — because they were documentation, not operations.

The distinction matters more than any specific pattern in this post. AGENTS.md is operational policy for an AI agent, not a README for humans. The agent doesn’t need to understand why you use conventional commits. It needs to know the exact command to run and what “done” looks like.

TL;DR

Most AGENTS.md problems come from writing human documentation instead of agent operations. Effective files are command-first (exact invocations, not descriptions), task-organized (coding, review, release sections), and closure-defined (explicit “done” criteria). Anti-patterns that reliably get ignored: prose paragraphs, ambiguous directives (“be careful”), and contradictory priorities. AGENTS.md works across 12+ tools — write it once, get consistent behavior in Codex, Cursor, Copilot, and more.


Context: AGENTS.md is an open standard governed by the Linux Foundation’s Agentic AI Foundation. This post covers practical patterns. For Codex-specific configuration, see the Codex guide. For Claude Code’s equivalent (CLAUDE.md), see the Claude Code guide.

What Gets Ignored

These patterns reliably produce no observable change in agent behavior. I tested each by measuring task completion accuracy with and without the instruction.

Prose paragraphs without commands

<!-- BAD: Agent skips this -->
We value clean, well-tested code. Our team follows TDD principles
and believes in comprehensive test coverage. Please ensure all
changes are properly tested before submitting.

The agent reads this, represents it as a vague preference, and proceeds to write code without tests. There’s no actionable instruction — no command to run, no threshold to meet, no definition of “properly tested.”

Ambiguous directives

<!-- BAD: "Careful" means nothing to an agent -->
- Be careful with database migrations
- Optimize queries where possible
- Handle errors gracefully

“Careful” isn’t a constraint. “Where possible” isn’t a trigger condition. “Gracefully” isn’t a behavior specification. These read as human-to-human guidance, not agent instructions.

Contradictory priorities

<!-- BAD: Which one wins? -->
- Move fast and ship quickly
- Ensure comprehensive test coverage
- Keep the runtime budget under 5 minutes
- Run the full integration test suite before every commit

The agent can’t satisfy all four simultaneously. When instructions conflict without explicit priority ordering, the model picks whichever aligns with its default behavior — typically the path that requires the fewest tool calls.

Style guides without enforcement

<!-- BAD: No way to verify compliance -->
Follow the Google Python Style Guide for all code.
Use numpy-style docstrings for public functions.

Unless you include the exact linting command that enforces the style (ruff check --select D or pylint --rcfile=.pylintrc), the agent has no mechanism to verify its own compliance.

What Works

These patterns produce consistent, measurable changes in agent behavior.

Command-first instructions

## Build and Test Commands
- Install: `pip install -r requirements.txt`
- Lint: `ruff check . --fix`
- Format: `ruff format .`
- Test: `pytest -v --tb=short`
- Type check: `mypy app/ --strict`
- Full verify: `ruff check . && ruff format --check . && pytest -v`

Commands are unambiguous. The agent knows exactly what to run, what arguments to pass, and can verify success by checking the exit code.

Closure definitions

## Definition of Done
A task is complete when ALL of the following pass:
1. `ruff check .` exits 0
2. `pytest -v` exits 0 with no failures
3. `mypy app/ --strict` exits 0
4. Changed files have been staged and committed
5. Commit message follows conventional format: `type(scope): description`

This eliminates the most common failure mode: the agent reports “done” without verifying. When “done” is defined as specific exit codes, the agent runs each check before reporting completion.

Task-organized sections

## When Writing Code
- Run `ruff check .` after every file change
- Add type hints to all new functions
- Test command: `pytest tests/ -v -k "test_<module>"`

## When Reviewing Code
- Check for security issues: `bandit -r app/`
- Verify test coverage: `pytest --cov=app --cov-fail-under=80`
- List changed files: `git diff --name-only HEAD~1`

## When Releasing
- Update version in `pyproject.toml`
- Run full suite: `pytest -v && ruff check . && mypy app/`
- Tag: `git tag -a v<version> -m "Release v<version>"`

Task-organized files let the agent select relevant instructions based on what it’s currently doing. Flat lists force the agent to parse every instruction regardless of context.

Escalation rules

## When Blocked
- If tests fail after 3 attempts: stop and report the failing test with full output
- If a dependency is missing: check `requirements.txt` first, then ask
- If you encounter merge conflicts: stop and show the conflicting files
- Never: delete files to resolve errors, force push, or skip tests

Without escalation rules, agents default to increasingly creative workarounds when blocked — deleting lock files, bypassing checks, or silently ignoring failures.

Directory Scoping for Monorepos

AGENTS.md supports hierarchical scoping. Files closer to the working directory take precedence:

/repo/AGENTS.md                        ← Project-wide rules
  └─ /repo/services/AGENTS.md          ← Service defaults
      ├─ /repo/services/api/AGENTS.md  ← API-specific rules
      └─ /repo/services/web/AGENTS.md  ← Frontend-specific rules

Root-level instructions concatenate with deeper files. Use AGENTS.override.md at any level to replace (not extend) parent instructions:

<!-- /repo/services/payments/AGENTS.override.md -->
# Payment Service Rules (OVERRIDE)

This service has additional security requirements.
All changes require: `bandit -r . -ll` passing with zero findings.
No dependency updates without explicit approval.
Test with: `pytest -v --tb=long -x` (fail fast, full tracebacks)

When to use override: Release freezes, incident mode, or any service with security constraints that supersede project-wide defaults.

Cross-Tool Compatibility

AGENTS.md works in 12+ tools. Here’s how the same file behaves across ecosystems:

Tool Instruction File Reads AGENTS.md? Notes
Codex AGENTS.md Yes (native) Full hierarchy, override support
Cursor .cursor/rules Yes (fallback) Also reads .cursorrules
Copilot .github/copilot-instructions.md Yes (via config) Set project_doc_fallback_filenames
Claude Code CLAUDE.md No Separate format, similar patterns
Amp AGENTS.md Yes Native support
Gemini CLI AGENTS.md Yes Native support
Windsurf .windsurfrules Yes (fallback) Also reads AGENTS.md
Aider CONVENTIONS.md Partial Reads if configured

If your team uses multiple tools: Write AGENTS.md as the canonical source. Add tool-specific files (CLAUDE.md, .cursorrules) that either import or mirror the relevant sections. Don’t maintain parallel instruction sets that drift apart.

Testing Your AGENTS.md

Verify the agent actually reads and follows your instructions:

# Codex: Show the full instruction chain
codex --ask-for-approval never "Summarize your current instructions"

# Codex: Generate a scaffold from your repo
codex /init

# Claude Code: Check active instructions
claude --print "What instructions are you following for this project?"

# Verify specific rules are active
codex --ask-for-approval never "What is your definition of done?"

The acid test: Ask the agent to explain your build commands. If it can’t reproduce them verbatim, the instructions aren’t being read or are too verbose to retain in context.

Key Takeaways

For individual developers:

  • Replace prose with commands. Every instruction should be verifiable by running something.
  • Define closure explicitly. “Done” means specific exit codes, not feelings.
  • Test your AGENTS.md by asking the agent to recite it. What it can’t recite, it won’t follow.

For teams:

  • Use AGENTS.md as the single source of truth. Mirror to tool-specific files, don’t maintain parallel copies.
  • Organize by task (coding, review, release), not by category (style, testing, deployment).
  • Include escalation rules. Without them, blocked agents improvise in ways you won’t like.
  • Scope per directory in monorepos. Service-specific rules shouldn’t pollute global instructions.

References

相关文章

Claude Code vs Codex CLI: When to Use Which

A practitioner's comparison of Claude Code and Codex CLI — architecture, workflows, and a decision framework for choosin…

9 分钟阅读

Vibe Coding vs. Engineering: Where I Draw the Line

I use Claude Code daily with 86 hooks and a full quality gate system. Here's where I vibe code, where I engineer, and wh…

6 分钟阅读

Building Custom Skills for Claude Code: A Complete Tutorial

Build a code review skill from scratch — from identifying repeated context to reliable auto-activation.

7 分钟阅读