AI Engineering

Building with AI agents in production. Claude Code, Codex CLI, hooks, skills, memory, context engineering, and the patterns that make autonomous systems reliable.

64 articles

Cybersecurity Is Proof of Work

Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tokens. Security is now a...

2026-04-14

Runtime Defense for Tool-Augmented Agents

ClawGuard demonstrates deterministic tool-call interception works. The Vercel telemetry incident shows why. Runtime defense is the enforceable layer.

2026-04-14

The Dark Factory Verification Layer

When humans stop reading code, what does the verification layer look like? Mapping the infrastructure required for fully autonomous AI coding.

2026-04-14

Your Agent Has a Middleman You Didn't Vet

Researchers bought 28 LLM API routers and collected 400 more. 17 touched AWS canary credentials. One drained ETH from a private key. The router...

2026-04-10

Your Agent Has Memory You Didn't Write

A new ACL 2026 paper measures a kind of LLM memory that existing evals overlook — unconscious behavioral adaptation. Top models score under 66%....

2026-04-10

Static Skills Are Dead Skills

A new paper on cross-user skill evolution frames a problem I've been living: the skills you ship to your agent stack decay the minute nobody is...

2026-04-10

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities. 30 CVEs in 60 days. 13 critical. The attack surface nobody is auditing.

2026-04-08

Project Glasswing: What Happens When a Model Is Too Good at Finding Bugs

Anthropic built a model that finds thousands of zero-days, then restricted it to 12 partners. What Project Glasswing means for agent-assisted security.

2026-04-07

When Your Agent Finds a Vulnerability

An Anthropic researcher found a 23-year-old Linux kernel vulnerability using Claude Code and a 10-line bash script. 22 Firefox CVEs followed. What...

2026-04-05

What the Claude Code Source Leak Reveals

A practitioner's analysis of the Claude Code source leak. 11 findings that explain how auto mode, bash security, prompt caching, and multi-agent...

2026-04-02

Every Hook Is a Scar

84 hooks, 15 event types. Each one traces back to a specific failure. Institutional memory in shell scripts.

2026-03-29

The Agent Didn't Get Smarter

The model is the same between session 1 and session 500. The project changed. This reframes the entire AI productivity conversation.

2026-03-28

The Handoff Document

A diagnosis that survived three code review corrections, two priority reorderings, and guided the correct implementation four days later. The most...

2026-03-28

The Fork Bomb Saved Us

The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 minutes.

2026-03-28

Taste Is Infrastructure

As agents generate more of what ships, the quality ceiling is set by how well you encode aesthetic judgment into systems. Taste scales when it...

2026-03-28

The Evidence Gate

I believe and it should are not evidence. Every completion report needs a file path, test output, or specific code. The discipline of proof in an...

2026-03-28

AI Agent Research: Claude Beat 33 Attack Methods

Claude Code autonomously discovered adversarial attacks with 100% success rate against Meta's SecAlign-70B, beating all 33 published methods in 96...

2026-03-26

Compound Context: Why AI Projects Get Better the Longer You Stay With Them

Every problem you solve with an AI agent deposits context that the next session withdraws with interest. This is context compounding.

2026-03-26

The Supply Chain Is the Attack Surface

Trivy got compromised. Then LiteLLM. Then 47,000 installs in 46 minutes. The AI supply chain worked exactly as designed.

2026-03-25

AI Agent Memory Architecture That Actually Works

Hybrid BM25+vector retrieval, skills as markdown, drift detection. Five March 2026 papers validate the same architecture built from production failures.

2026-03-21

AI Agent Security: The Deploy-and-Defend Trust Paradox

1 in 8 enterprise AI breaches involve autonomous agents. Runtime hooks, OS-level sandboxes, and drift detection break the deploy-and-defend cycle.

2026-03-20

Every Iteration Makes Your Code Less Secure

43.7% of LLM iteration chains introduce more vulnerabilities than baseline. Adding SAST scanners makes it worse. SCAFFOLD-CEGIS cuts degradation to 2.1%.

2026-03-12

Claude Code Hooks Tutorial: 5 Production Hooks From Scratch

Build 5 production Claude Code hooks from scratch with full JSON configs: auto-formatting, security gates, test runners, notifications, and quality checks.

2026-03-10

Claude Code Setup Guide: Install to First Session

Set up Claude Code in 5 minutes: install via npm, configure CLAUDE.md project context, set permissions, and add your first hook. Practical...

2026-03-10

Codex CLI vs Claude Code in 2026: Architecture Deep Dive

Kernel-level sandboxing vs application-layer hooks, AGENTS.md vs CLAUDE.md, cloud tasks vs subagents. A technical comparison with clear decision criteria.

2026-03-10

Your Agent Sandbox Is a Suggestion

An attacker opened a GitHub issue and shipped malware in Cline's next release. Agent sandboxes fail at three levels. Here is what actually works.

2026-03-05

AI Agent Observability: Monitoring What You Can't See

AI agents consume disk, CPU, and network with zero operator visibility. Three observability layers close the gap before damage is irreversible.

2026-03-02

Silent Egress: The Attack Surface You Didn't Build

A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the API key. No error. No log.

2026-03-02

The Session Is the Commit Message

Git captures what changed. Agent sessions capture why. When agents write code, the session transcript is the real design document — and we discard it.

2026-03-02

Building a Hybrid Retriever for 16,894 Obsidian Files

49,746 chunks, 83 MB, zero API calls. How BM25 + vector search + RRF fusion in one SQLite file turns 16,894 Obsidian files into a queryable knowledge base.

2026-03-01

The Performance Blind Spot: AI Agents Write Slow Code

118 functions with slowdowns from 3x to 446x in two Claude Code PRs. AI agents optimize for correctness, not performance — here's the data.

2026-02-28

Claude Code Skills: Build Custom Auto-Activating Extensions

Build custom Claude Code skills that auto-activate based on context. Step-by-step tutorial covering SKILL.md structure, frontmatter, LLM-based...

2026-02-28

AGENTS.md Patterns: What Actually Changes Agent Behavior

Which AGENTS.md patterns actually change agent behavior? Anti-patterns to avoid, patterns that work, and a cross-tool compatibility matrix for 8 tools.

2026-02-28

What Actually Breaks When You Run AI Agents Unsupervised

Seven named failure modes from 500+ autonomous agent sessions. Each has a detection signal, a real example, and a concrete fix. The taxonomy HN asked for.

2026-02-27

Anthropic Measured What Works. My Hooks Enforce It.

Anthropic analyzed 9,830 conversations. Iterative refinement doubles fluency markers. Polished outputs suppress evaluation. Quality hooks force iteration.

2026-02-27

Context Is the New Memory

Context engineering is the highest-impact skill in agent development. Three compression layers turn a 200K token window from liability into advantage.

2026-02-27

The CLI Thesis

Three top HN Claude Code threads converge on one conclusion: CLI-first architecture is cheaper, faster, and more composable than IDE agent workflows.

2026-02-27

Claude Code vs Codex CLI: When to Use Which

Architecture, safety, and extensibility compared side-by-side. Includes a decision framework based on 36 blind duels and daily production use of both.

2026-02-27

The Protege Pattern

A 7B model with sparse expert access matches agents 50x its size. Route routine work to small models and judgment calls to frontier models.

2026-02-27

Claude Code as Infrastructure

Claude Code is not an IDE feature. It is infrastructure. 84 hooks, 48 skills, 19 agents, and 15,000 lines of orchestration prove the point.

2026-02-26

Thinking With Ten Brains: How I Use Agent Deliberation as a Decision Tool

You cannot debias yourself by trying harder. 10 AI agents debating each other is a structural intervention for better decisions.

2026-02-25

The Blind Judge: Scoring Claude Code vs Codex in 36 Duels

Claude Code vs Codex CLI, scored blind on 5 dimensions across 36 duels. The winner matters less than the synthesis combining both agents' strongest ideas.

2026-02-25

The 10% Wall: Why AI Productivity Plateaus

121,000 developers surveyed, 92.6% using AI tools, productivity stuck at 10%. The wall is infrastructure, not intelligence. Three root causes and fixes.

2026-02-24

What I Told NIST About AI Agent Security

Production evidence submitted to NIST: AI agent threats are behavioral. 7 failure modes, 3-layer defense, and framework gaps from 60 daily sessions.

2026-02-24

Anatomy of a Claw: 84 Hooks as an Orchestration Layer

What 84 hooks, 43 skills, and 19 agents look like as a production agent orchestration layer. Three patterns that transfer to any agent harness.

2026-02-23

The Fabrication Firewall: When Your Agent Publishes Lies

An autonomous agent published fabricated claims to 8 platforms over 72 hours. Training-phase safety failed at the publication boundary. Here is the fix.

2026-02-23

Runtime Constitutions for AI Agents: A Governance Framework

Runtime constitutions enforce AI agent governance where training-phase alignment fails. Competence checks, output gates, and four subsystems keep...

2026-02-22

AI Agent Memory Degradation: Why Multi-Turn LLMs Collapse

LLMs lose 39% accuracy across 200K+ multi-turn sessions. Three mechanisms drive collapse and longer context windows fix none of them.

2026-02-22

Your Agent Writes Faster Than You Can Read

Five research groups published about the same problem this week: AI agents produce code faster than developers can understand it. The debt is in your head.

2026-02-21

Context Engineering Is Architecture: 650 Files Later

Context engineering for AI agents across a 650-file, seven-layer hierarchy. Three production failures, real token budgets, and the system that survived.

2026-02-19

Boids to Agents: Flocking Rules for AI Systems

Craig Reynolds' 1986 boids algorithm produces flocking from three local rules. The same principles and failure modes appear in multi-agent AI systems.

2026-02-19

Metacognitive AI: Teaching Your Agent Self-Evaluation

Most agent instructions define behavior. The missing layer teaches self-evaluation. False evidence gates, seven named failure modes, and hedging detection.

2026-02-19

Multi-Agent Deliberation: When Agreement Is the Bug

Multi-agent deliberation catches failures that single-agent systems miss. Here is the architecture, the dead ends, and what is actually worth building.

2026-02-13

Why My AI Agent Has a Quality Philosophy

My Claude Code agent inherited every sloppy human habit at machine speed. I built 3 philosophies, 150+ quality gates, and 95 hooks. Here's what worked.

2026-02-10

Two MCP Servers Made Claude Code an iOS Build System

XcodeBuildMCP and Apple's Xcode MCP give Claude Code structured access to iOS builds, tests, and debugging. Setup, real-world results, and honest lessons.

2026-02-09

Claude Code Hooks: Why Each of My 95 Hooks Exists

I built 95 hooks for Claude Code. Each one exists because something went wrong. Here are the origin stories and the architecture that emerged.

2026-02-08

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what actually ships code.

2026-02-08

PRD-Driven Development: How I Use 30+ PRDs to Ship with AI Agents

I've written 30+ PRDs for AI agent tasks. Here's where PRD-driven development works, where it fails, and how my template evolved over 6 months.

2026-02-08

Vibe Coding vs. Engineering: Where I Draw the Line

I use Claude Code daily with 86 hooks and a full quality gate system. Here's where I vibe code, where I engineer, and why the boundary matters.

2026-02-08

Context Window Management: What 50 Sessions Taught Me About AI Development

I measured token consumption across 50 Claude Code sessions. Context exhaustion degrades output before you notice. Here are the patterns that fix it.

2026-02-08

Building AI Systems: From RAG to Agents

I built a 3,500-line agent system with 86 hooks and consensus validation. Here's what I learned about RAG, fine-tuning, and agent orchestration.

2026-02-08

Critical Yet Kind: How I Encoded Feedback Principles into 86 Hooks

Google's Project Aristotle found psychological safety predicts team performance. I encoded the same principles into automated code review hooks.

2026-02-08

Commands, Skills, Subagents, Rules: What I Learned Organizing 139 Extensions

Claude Code offers four extension types. After building 95 hooks, 44 skills, and dozens of commands, I learned which abstraction fits which problem.

2026-02-08

Claude Code + Cursor: What 30 Sessions of Combined Usage Taught Me

I tracked 30 development sessions using Claude Code and Cursor together. The data shows where each tool wins and where the combination fails.

2026-02-08