AI Engineering Series
56 posts on building with AI agents in production. Claude Code, Codex CLI, hooks, skills, memory, context, and autonomous workflows.
57 articles
Start Here
All 57 Articles
Taste Is a Technical System
Taste decomposes into constraints, evaluation criteria, pattern recognition, and coherence checks. Each component maps to engineering infrastructure.
Cybersecurity Is Proof of Work
Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tokens. Security is now a spendin...
Runtime Defense for Tool-Augmented Agents
ClawGuard demonstrates deterministic tool-call interception works. The Vercel telemetry incident shows why. Runtime defense is the enforceable layer.
The Dark Factory Verification Layer
When humans stop reading code, what does the verification layer look like? Mapping the infrastructure required for fully autonomous AI coding.
Static Skills Are Dead Skills
A new paper on cross-user skill evolution frames a problem I've been living: the skills you ship to your agent stack decay the minute nobody is watchi...
Your Agent Has a Middleman You Didn't Vet
Researchers bought 28 LLM API routers and collected 400 more. 17 touched AWS canary credentials. One drained ETH from a private key. The router layer ...
Your Agent Has Memory You Didn't Write
A new ACL 2026 paper measures a kind of LLM memory that existing evals overlook — unconscious behavioral adaptation. Top models score under 66%. The a...
MCP Servers Are the New Attack Surface
50 MCP vulnerabilities. 30 CVEs in 60 days. 13 critical. The attack surface nobody is auditing.
Project Glasswing: What Happens When a Model Is Too Good at Finding Bugs
Anthropic built a model that finds thousands of zero-days, then restricted it to 12 partners. What Project Glasswing means for agent-assisted security...
When Your Agent Finds a Vulnerability
An Anthropic researcher found a 23-year-old Linux kernel vulnerability using Claude Code and a 10-line bash script. 22 Firefox CVEs followed. What thi...
What the Claude Code Source Leak Reveals
A practitioner's analysis of the Claude Code source leak. 11 findings that explain how auto mode, bash security, prompt caching, and multi-agent coord...
Every Hook Is a Scar
84 hooks, 15 event types. Each one traces back to a specific failure. Institutional memory in shell scripts.
The Evidence Gate
I believe and it should are not evidence. Every completion report needs a file path, test output, or specific code. The discipline of proof in an age ...
Seventeen Thousand Signals
My vault has 17,900 signals from arXiv, Semantic Scholar, HN, NVD, and 9 other sources. Most are noise. The noise taught me what signal looks like.
Overnight
Between midnight and 6am, Googlebot crawls 21,000 pages, Bingbot crawls 10,000, and the comprehensive check grinds through 15,000. The site is more ac...
What I Run Before I Sleep
Every night: 15,000 pages checked, TTFB measured, cache verified, sitemaps crawled. The goodnight routine is where operational discipline lives.
The Fork Bomb Saved Us
The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 minutes.
Taste Is Infrastructure
As agents generate more of what ships, the quality ceiling is set by how well you encode aesthetic judgment into systems. Taste scales when it becomes...
The Handoff Document
A diagnosis that survived three code review corrections, two priority reorderings, and guided the correct implementation four days later. The most und...
The Agent Didn't Get Smarter
The model is the same between session 1 and session 500. The project changed. This reframes the entire AI productivity conversation.
Quality Is the Only Variable
Time, cost, resources, and effort are not constraints. The question is what's right, not what's efficient. A philosophy for building with AI agents.
Compound Context: Why AI Projects Get Better the Longer You Stay With Them
Every problem you solve with an AI agent deposits context that the next session withdraws with interest. This is context compounding.
AI Agent Research: Claude Beat 33 Attack Methods
Claude Code autonomously discovered adversarial attacks with 100% success rate against Meta's SecAlign-70B, beating all 33 published methods in 96 ite...
The Supply Chain Is the Attack Surface
Trivy got compromised. Then LiteLLM. Then 47,000 installs in 46 minutes. The AI supply chain worked exactly as designed.
AI Agent Memory Architecture That Actually Works
Hybrid BM25+vector retrieval, skills as markdown, drift detection. Five March 2026 papers validate the same architecture built from production failure...
AI Agent Security: The Deploy-and-Defend Trust Paradox
1 in 8 enterprise AI breaches involve autonomous agents. Runtime hooks, OS-level sandboxes, and drift detection break the deploy-and-defend cycle.
Every Iteration Makes Your Code Less Secure
43.7% of LLM iteration chains introduce more vulnerabilities than baseline. Adding SAST scanners makes it worse. SCAFFOLD-CEGIS cuts degradation to 2....
Codex CLI vs Claude Code in 2026: Architecture Deep Dive
Kernel-level sandboxing vs application-layer hooks, AGENTS.md vs CLAUDE.md, cloud tasks vs subagents. A technical comparison with clear decision crite...
Claude Code Setup Guide: Install to First Session
Set up Claude Code in 5 minutes: install via npm, configure CLAUDE.md project context, set permissions, and add your first hook. Practical walkthrough...
Claude Code Hooks Tutorial: 5 Production Hooks From Scratch
Build 5 production Claude Code hooks from scratch with full JSON configs: auto-formatting, security gates, test runners, notifications, and quality ch...
Your Agent Sandbox Is a Suggestion
An attacker opened a GitHub issue and shipped malware in Cline's next release. Agent sandboxes fail at three levels. Here is what actually works.
The Session Is the Commit Message
Git captures what changed. Agent sessions capture why. When agents write code, the session transcript is the real design document — and we discard it.
AI Agent Observability: Monitoring What You Can't See
AI agents consume disk, CPU, and network with zero operator visibility. Three observability layers close the gap before damage is irreversible.
Silent Egress: The Attack Surface You Didn't Build
A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the API key. No error. No log.
Building a Hybrid Retriever for 16,894 Obsidian Files
49,746 chunks, 83 MB, zero API calls. How BM25 + vector search + RRF fusion in one SQLite file turns 16,894 Obsidian files into a queryable knowledge ...
AGENTS.md Patterns: What Actually Changes Agent Behavior
Which AGENTS.md patterns actually change agent behavior? Anti-patterns to avoid, patterns that work, and a cross-tool compatibility matrix for 8 tools...
The Performance Blind Spot: AI Agents Write Slow Code
118 functions with slowdowns from 3x to 446x in two Claude Code PRs. AI agents optimize for correctness, not performance — here's the data.
Claude Code vs Codex CLI: When to Use Which
Architecture, safety, and extensibility compared side-by-side. Includes a decision framework based on 36 blind duels and daily production use of both.
The Protege Pattern
A 7B model with sparse expert access matches agents 50x its size. Route routine work to small models and judgment calls to frontier models.
The CLI Thesis
Three top HN Claude Code threads converge on one conclusion: CLI-first architecture is cheaper, faster, and more composable than IDE agent workflows.
Context Is the New Memory
Context engineering is the highest-impact skill in agent development. Three compression layers turn a 200K token window from liability into advantage.
What Actually Breaks When You Run AI Agents Unsupervised
Seven named failure modes from 500+ autonomous agent sessions. Each has a detection signal, a real example, and a concrete fix. The taxonomy HN asked ...
Anthropic Measured What Works. My Hooks Enforce It.
Anthropic analyzed 9,830 conversations. Iterative refinement doubles fluency markers. Polished outputs suppress evaluation. Quality hooks force iterat...
Claude Code as Infrastructure
Claude Code is not an IDE feature. It is infrastructure. 84 hooks, 48 skills, 19 agents, and 15,000 lines of orchestration prove the point.
The Blind Judge: Scoring Claude Code vs Codex in 36 Duels
Claude Code vs Codex CLI, scored blind on 5 dimensions across 36 duels. The winner matters less than the synthesis combining both agents' strongest id...
Thinking With Ten Brains: How I Use Agent Deliberation as a Decision Tool
You cannot debias yourself by trying harder. 10 AI agents debating each other is a structural intervention for better decisions.
The 10% Wall: Why AI Productivity Plateaus
121,000 developers surveyed, 92.6% using AI tools, productivity stuck at 10%. The wall is infrastructure, not intelligence. Three root causes and fixe...
What I Told NIST About AI Agent Security
Production evidence submitted to NIST: AI agent threats are behavioral. 7 failure modes, 3-layer defense, and framework gaps from 60 daily sessions.
The Fabrication Firewall: When Your Agent Publishes Lies
An autonomous agent published fabricated claims to 8 platforms over 72 hours. Training-phase safety failed at the publication boundary. Here is the fi...
Anatomy of a Claw: 84 Hooks as an Orchestration Layer
What 84 hooks, 43 skills, and 19 agents look like as a production agent orchestration layer. Three patterns that transfer to any agent harness.
Runtime Constitutions for AI Agents: A Governance Framework
Runtime constitutions enforce AI agent governance where training-phase alignment fails. Competence checks, output gates, and four subsystems keep agen...
How LLMs See Text: What My i18n System Taught Me About Token Economics
Translating my site into 6 languages revealed that Korean costs 2.8x more tokens than English for identical content. An interactive visualizer shows w...
Vibe Coding vs. Engineering: Where I Draw the Line
I use Claude Code daily with 86 hooks and a full quality gate system. Here's where I vibe code, where I engineer, and why the boundary matters.
AI Theater: Why 90% of Companies 'Use AI' But Only 23% Create Value
McKinsey found 90% of companies claim AI adoption but only 23% scale beyond pilots. I've witnessed three flavors of AI theater and practiced one mysel...
The Design Career That Survives AI
After 12 years as VP of Product Design, I watched three paradigm shifts. The skills that survived every one are the same skills AI can't replace.
Building AI Systems: From RAG to Agents
I built a 3,500-line agent system with 86 hooks and consensus validation. Here's what I learned about RAG, fine-tuning, and agent orchestration.
Compounding Engineering: How My Codebase Accelerates Instead of Decaying
Most codebases slow down as they grow. Mine accelerates. 95 hooks, 44 skills, and 14 configs make each feature cheaper than the last.