AI Engineering Series

56 posts on building with AI agents in production. Claude Code, Codex CLI, hooks, skills, memory, context, and autonomous workflows.

57 articles

Start Here

All 57 Articles

Taste Is a Technical System

Taste decomposes into constraints, evaluation criteria, pattern recognition, and coherence checks. Each component maps to engineering infrastructure.

2026-04-15 12m read

Cybersecurity Is Proof of Work

Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tokens. Security is now a spendin...

2026-04-14 11m read

Runtime Defense for Tool-Augmented Agents

ClawGuard demonstrates deterministic tool-call interception works. The Vercel telemetry incident shows why. Runtime defense is the enforceable layer.

2026-04-14 12m read

The Dark Factory Verification Layer

When humans stop reading code, what does the verification layer look like? Mapping the infrastructure required for fully autonomous AI coding.

2026-04-14 13m read

Static Skills Are Dead Skills

A new paper on cross-user skill evolution frames a problem I've been living: the skills you ship to your agent stack decay the minute nobody is watchi...

2026-04-10 16m read

Your Agent Has a Middleman You Didn't Vet

Researchers bought 28 LLM API routers and collected 400 more. 17 touched AWS canary credentials. One drained ETH from a private key. The router layer ...

2026-04-10 15m read

Your Agent Has Memory You Didn't Write

A new ACL 2026 paper measures a kind of LLM memory that existing evals overlook — unconscious behavioral adaptation. Top models score under 66%. The a...

2026-04-10 25m read

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities. 30 CVEs in 60 days. 13 critical. The attack surface nobody is auditing.

2026-04-08 8m read

Project Glasswing: What Happens When a Model Is Too Good at Finding Bugs

Anthropic built a model that finds thousands of zero-days, then restricted it to 12 partners. What Project Glasswing means for agent-assisted security...

2026-04-07 7m read

When Your Agent Finds a Vulnerability

An Anthropic researcher found a 23-year-old Linux kernel vulnerability using Claude Code and a 10-line bash script. 22 Firefox CVEs followed. What thi...

2026-04-05 9m read

What the Claude Code Source Leak Reveals

A practitioner's analysis of the Claude Code source leak. 11 findings that explain how auto mode, bash security, prompt caching, and multi-agent coord...

2026-04-02 12m read

Every Hook Is a Scar

84 hooks, 15 event types. Each one traces back to a specific failure. Institutional memory in shell scripts.

2026-03-29 15m read

The Evidence Gate

I believe and it should are not evidence. Every completion report needs a file path, test output, or specific code. The discipline of proof in an age ...

2026-03-28 9m read

Seventeen Thousand Signals

My vault has 17,900 signals from arXiv, Semantic Scholar, HN, NVD, and 9 other sources. Most are noise. The noise taught me what signal looks like.

2026-03-28 9m read

Overnight

Between midnight and 6am, Googlebot crawls 21,000 pages, Bingbot crawls 10,000, and the comprehensive check grinds through 15,000. The site is more ac...

2026-03-28 8m read

What I Run Before I Sleep

Every night: 15,000 pages checked, TTFB measured, cache verified, sitemaps crawled. The goodnight routine is where operational discipline lives.

2026-03-28 8m read

The Fork Bomb Saved Us

The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 minutes.

2026-03-28 7m read

Taste Is Infrastructure

As agents generate more of what ships, the quality ceiling is set by how well you encode aesthetic judgment into systems. Taste scales when it becomes...

2026-03-28 7m read

The Handoff Document

A diagnosis that survived three code review corrections, two priority reorderings, and guided the correct implementation four days later. The most und...

2026-03-28 8m read

The Agent Didn't Get Smarter

The model is the same between session 1 and session 500. The project changed. This reframes the entire AI productivity conversation.

2026-03-28 7m read

Quality Is the Only Variable

Time, cost, resources, and effort are not constraints. The question is what's right, not what's efficient. A philosophy for building with AI agents.

2026-03-28 8m read

Compound Context: Why AI Projects Get Better the Longer You Stay With Them

Every problem you solve with an AI agent deposits context that the next session withdraws with interest. This is context compounding.

2026-03-26 12m read

AI Agent Research: Claude Beat 33 Attack Methods

Claude Code autonomously discovered adversarial attacks with 100% success rate against Meta's SecAlign-70B, beating all 33 published methods in 96 ite...

2026-03-26 14m read

The Supply Chain Is the Attack Surface

Trivy got compromised. Then LiteLLM. Then 47,000 installs in 46 minutes. The AI supply chain worked exactly as designed.

2026-03-25 16m read

AI Agent Memory Architecture That Actually Works

Hybrid BM25+vector retrieval, skills as markdown, drift detection. Five March 2026 papers validate the same architecture built from production failure...

2026-03-21 12m read

AI Agent Security: The Deploy-and-Defend Trust Paradox

1 in 8 enterprise AI breaches involve autonomous agents. Runtime hooks, OS-level sandboxes, and drift detection break the deploy-and-defend cycle.

2026-03-20 19m read

Every Iteration Makes Your Code Less Secure

43.7% of LLM iteration chains introduce more vulnerabilities than baseline. Adding SAST scanners makes it worse. SCAFFOLD-CEGIS cuts degradation to 2....

2026-03-12 11m read

Codex CLI vs Claude Code in 2026: Architecture Deep Dive

Kernel-level sandboxing vs application-layer hooks, AGENTS.md vs CLAUDE.md, cloud tasks vs subagents. A technical comparison with clear decision crite...

2026-03-10 13m read

Claude Code Setup Guide: Install to First Session

Set up Claude Code in 5 minutes: install via npm, configure CLAUDE.md project context, set permissions, and add your first hook. Practical walkthrough...

2026-03-10 14m read

Claude Code Hooks Tutorial: 5 Production Hooks From Scratch

Build 5 production Claude Code hooks from scratch with full JSON configs: auto-formatting, security gates, test runners, notifications, and quality ch...

2026-03-10 13m read

Your Agent Sandbox Is a Suggestion

An attacker opened a GitHub issue and shipped malware in Cline's next release. Agent sandboxes fail at three levels. Here is what actually works.

2026-03-05 18m read

The Session Is the Commit Message

Git captures what changed. Agent sessions capture why. When agents write code, the session transcript is the real design document — and we discard it.

2026-03-02 18m read

AI Agent Observability: Monitoring What You Can't See

AI agents consume disk, CPU, and network with zero operator visibility. Three observability layers close the gap before damage is irreversible.

2026-03-02 22m read

Silent Egress: The Attack Surface You Didn't Build

A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the API key. No error. No log.

2026-03-02 18m read

Building a Hybrid Retriever for 16,894 Obsidian Files

49,746 chunks, 83 MB, zero API calls. How BM25 + vector search + RRF fusion in one SQLite file turns 16,894 Obsidian files into a queryable knowledge ...

2026-03-01 27m read

AGENTS.md Patterns: What Actually Changes Agent Behavior

Which AGENTS.md patterns actually change agent behavior? Anti-patterns to avoid, patterns that work, and a cross-tool compatibility matrix for 8 tools...

2026-02-28 12m read

The Performance Blind Spot: AI Agents Write Slow Code

118 functions with slowdowns from 3x to 446x in two Claude Code PRs. AI agents optimize for correctness, not performance — here's the data.

2026-02-28 16m read

Claude Code vs Codex CLI: When to Use Which

Architecture, safety, and extensibility compared side-by-side. Includes a decision framework based on 36 blind duels and daily production use of both.

2026-02-27 14m read

The Protege Pattern

A 7B model with sparse expert access matches agents 50x its size. Route routine work to small models and judgment calls to frontier models.

2026-02-27 12m read

The CLI Thesis

Three top HN Claude Code threads converge on one conclusion: CLI-first architecture is cheaper, faster, and more composable than IDE agent workflows.

2026-02-27 18m read

Context Is the New Memory

Context engineering is the highest-impact skill in agent development. Three compression layers turn a 200K token window from liability into advantage.

2026-02-27 18m read

What Actually Breaks When You Run AI Agents Unsupervised

Seven named failure modes from 500+ autonomous agent sessions. Each has a detection signal, a real example, and a concrete fix. The taxonomy HN asked ...

2026-02-27 19m read

Anthropic Measured What Works. My Hooks Enforce It.

Anthropic analyzed 9,830 conversations. Iterative refinement doubles fluency markers. Polished outputs suppress evaluation. Quality hooks force iterat...

2026-02-27 17m read

Claude Code as Infrastructure

Claude Code is not an IDE feature. It is infrastructure. 84 hooks, 48 skills, 19 agents, and 15,000 lines of orchestration prove the point.

2026-02-26 15m read

The Blind Judge: Scoring Claude Code vs Codex in 36 Duels

Claude Code vs Codex CLI, scored blind on 5 dimensions across 36 duels. The winner matters less than the synthesis combining both agents' strongest id...

2026-02-25 17m read

Thinking With Ten Brains: How I Use Agent Deliberation as a Decision Tool

You cannot debias yourself by trying harder. 10 AI agents debating each other is a structural intervention for better decisions.

2026-02-25 18m read

The 10% Wall: Why AI Productivity Plateaus

121,000 developers surveyed, 92.6% using AI tools, productivity stuck at 10%. The wall is infrastructure, not intelligence. Three root causes and fixe...

2026-02-24 20m read

What I Told NIST About AI Agent Security

Production evidence submitted to NIST: AI agent threats are behavioral. 7 failure modes, 3-layer defense, and framework gaps from 60 daily sessions.

2026-02-24 14m read

The Fabrication Firewall: When Your Agent Publishes Lies

An autonomous agent published fabricated claims to 8 platforms over 72 hours. Training-phase safety failed at the publication boundary. Here is the fi...

2026-02-23 16m read

Anatomy of a Claw: 84 Hooks as an Orchestration Layer

What 84 hooks, 43 skills, and 19 agents look like as a production agent orchestration layer. Three patterns that transfer to any agent harness.

2026-02-23 22m read

Runtime Constitutions for AI Agents: A Governance Framework

Runtime constitutions enforce AI agent governance where training-phase alignment fails. Competence checks, output gates, and four subsystems keep agen...

2026-02-22 17m read

How LLMs See Text: What My i18n System Taught Me About Token Economics

Translating my site into 6 languages revealed that Korean costs 2.8x more tokens than English for identical content. An interactive visualizer shows w...

2026-02-08 9m read

Vibe Coding vs. Engineering: Where I Draw the Line

I use Claude Code daily with 86 hooks and a full quality gate system. Here's where I vibe code, where I engineer, and why the boundary matters.

2026-02-08 9m read

AI Theater: Why 90% of Companies 'Use AI' But Only 23% Create Value

McKinsey found 90% of companies claim AI adoption but only 23% scale beyond pilots. I've witnessed three flavors of AI theater and practiced one mysel...

2026-02-08 10m read

The Design Career That Survives AI

After 12 years as VP of Product Design, I watched three paradigm shifts. The skills that survived every one are the same skills AI can't replace.

2026-02-08 11m read

Building AI Systems: From RAG to Agents

I built a 3,500-line agent system with 86 hooks and consensus validation. Here's what I learned about RAG, fine-tuning, and agent orchestration.

2026-02-08 13m read

Compounding Engineering: How My Codebase Accelerates Instead of Decaying

Most codebases slow down as they grow. Mine accelerates. 95 hooks, 44 skills, and 14 configs make each feature cheaper than the last.

2026-02-08 11m read