AI Engineering

The Robots Are Taking Exams in My Search Console

First-party GSC data: 91% of 3.8M impressions fail a human-query filter. Exam questions, pasted errors, and agent sweeps are rewriting what Search...

2026-07-14

The Assistants Are the Audience Now

First-party edge data: AI assistants request my pages ~66x more often than humans visit, and most of it is live user-directed fetching, not...

2026-07-10

Context Compaction Is Becoming a Training Objective

Context compaction is moving from an inference-time patch operators babysit to a training-time objective models optimize. CompactionRL is the signal.

2026-07-08

The Agent Stack Has a 1998 Problem

A cluster of agent-tool CVEs in mid-2026 is not bad luck. It is the structural signature of an AI agent ecosystem shipping at 1998-era security maturity.

2026-07-06

Agents Want to Compile

The agent stack is separating deciding from executing: models compile intent into replayable workflow scripts and DAGs, and deterministic runtimes...

2026-07-05

The Check Becomes the Spec

Coding agents building to the test: near-perfect oracle scores, dead library shipped. What your checks omit stops being the job.

2026-07-02

Claude Code Hooks Explained: The Deterministic Layer Around Your Agent

Claude Code hooks run shell commands at lifecycle events — guaranteed. Every event, exit-code semantics, and five patterns from Prettier to a Stop gate.

2026-07-01

Agents Supersede the Reviewer, Not the Review

A 2026 paper argues coding agents have ended human code review. I run the pipeline it prescribes: the reviewer role is dying, the review is relocating.

2026-06-24

Context Compaction Is a Decision, Not a Threshold

Coding agents compact context when a counter trips, not at a safe stopping point. A 2026 paper shows model-decided compaction cuts cost 30-70%.

2026-06-23

Computer-Use Agents Overshare by Default

A 2026 benchmark tested 15 frontier computer-use agents for data leakage across contexts. Eleven leaked on over half the scenarios. No attacker needed.

2026-06-23

Apple's First-Party Answer to Prompt Injection

WWDC 2026: Apple cites the lethal trifecta, ships deterministic guardrail APIs in Foundation Models and App Intents, and moves PCC onto Google Cloud.

2026-06-12

Apple Is Open-Sourcing the Foundation Models Framework

WWDC 2026: the Foundation Models framework goes open source this summer, so the same Swift API runs server-side, plus a new Skills package live on GitHub.

2026-06-10

Xcode 27 Ships Agent Skills You Can Export Anywhere

Xcode 27 bundles two SwiftUI agent skills and a one-command exporter. xcrun agent skills export moves Apple's best practices into Claude Code or Codex.

2026-06-10

Game Porting Toolkit 4: Agentic Game Ports on Mac

Game Porting Toolkit 4 ships agentic porting skills as a Claude Code plugin plus gpucapture and gpudebug CLI tools. Cyberpunk's manual port set the bar.

2026-06-09

Loop Engineering: Loops Win Where Verification Is Cheap

Loop engineering, checked against Boris Cherny's full transcripts: every loop he names has cheap verification. That constraint decides what to automate.

2026-06-09

Xcode 27 Went Agentic

Xcode 27 builds coding agents into the IDE: plan-mode feature work, UI prototyping via previews, agent-driven localization, and Xcode Cloud automation.

2026-06-08

Your Agent Has Two Untrusted Inputs

AI agents have two untrusted inputs: code the model writes and tool output it reads. One now has a real WASM sandbox; the other, MCP tool...

2026-06-06

When the Maintainer Is the Attacker: jqwik 1.10.0

jqwik 1.10.0 emits a destructive prompt-injection string in Maven output. ANSI escapes hide it from humans. The maintainer added it on purpose.

2026-05-29

Loopback Is Not a Trust Boundary: CVE-2026-2611

MLflow 3.9.0's Assistant exposed a local AI agent on /ajax-api with no CORS check. Any webpage could take over Claude Code. The bug is older than MLflow.

2026-05-28

AI Malware Analysis Needs Evidence Packets

AI malware analysis needs evidence packets: hashes, commands, indicators, and claim-to-evidence trails matter more than confident agent summaries.

2026-05-18

Agents.txt Is Not Access Control

Agents.txt is not access control. Use robots.txt, llms.txt, bot verification, logs, and server-side policy to manage AI crawlers without false confidence.

2026-05-18

Deep Research Agents Need Evidence Graphs

Deep research agents need evidence graphs to track missing pieces, reduce duplicate searches, and produce source-traced answers reviewers can inspect.

2026-05-18

AI Agent Ownership Is the Trust Primitive

AI agent ownership links every autonomous action to the account, session, scope, and operator who can stop it, review it, and accept responsibility.

2026-05-18

AI Agent Skills Need Behavioral Audits, Not Pass Rates

AI agent skills can change behavior while pass rates stay flat. Behavioral audits compare traces, declared capabilities, and side effects before trust.

2026-05-18

Long-Running AI Agents Need Durable Channels

Long-running AI agents need durable channels: workflow IDs, event logs, resumable streams, typed signals, safe cancellation, and user-visible checkpoints.

2026-05-18

AI Agents Should Call Models

AI agents should call trained machine-learning models as tools instead of asking an LLM to guess prices, risk scores, forecasts, or classifications.

2026-05-18

AI Agent Monitoring Needs Runtime Intervention

AI agent monitoring should catch decisive errors during a run, not after failure. Runtime intervention turns traces, policies, and alerts into safe pauses.

2026-05-18

AI Agent Approval Prompts Are Not Authorization

AI agent approval prompts need scoped authority, risk lanes, audit logs, expiry, and revocation so humans approve concrete actions, not fluent requests.

2026-05-18

AI Coding Agents Need Smaller Review Surfaces

AI coding agents overwhelm reviewers with giant diffs. Smaller review surfaces keep engineers engaged, verification-focused, and accountable before merge.

2026-05-18

MCP Tools Need Action-Level Authorization

MCP tools need action-level authorization: bearer-token validation must lead to per-tool, per-role, and per-action capability checks before agents act.

2026-05-18

AI Agent Config Security Is Supply Chain Security

AI agent config security belongs in supply-chain review: hooks, editor tasks, install scripts, MCP files, and plugins can execute code before you notice.

2026-05-18

AI Agents Need Exploration Checkpoints

Exploration checkpoints let AI agents prove what they discovered before acting, reducing premature exploitation, brittle plans, and generic world models.

2026-05-18

Agent Keys Need Risk Budgets

Shuriken's Agent Kit shows why AI agent tools that can act need scoped keys, server-side limits, activity logs, revocation, and conservative defaults.

2026-05-18

Research Papers Need Agent-Readable Claim Files

Agent-readable claim files let papers expose claims, scope limits, definitions, and figure commands so research agents cite, test, and reuse them safely.

2026-05-18

AI Agent Safety Starts With Small Software

AI agent safety starts with small software: smaller tools, plain files, narrow permissions, and faster tests give coding agents fewer places to hide bugs.

2026-05-18

AI Code Review Needs Dissent, Not Consensus

AI code review needs independent agents that preserve dissent, validate findings, route uncertainty to humans, and re-review fixes before teams merge PRs.

2026-05-18

Agent Skills Need Package Managers

Agent skills, MCP servers, prompts, hooks, and commands now behave like dependencies. Teams need manifests, lockfiles, policy gates, review, and rollback.

2026-05-17

Open Source Is Not a Security Boundary

GDS guidance on AI vulnerability discovery gets open-source security right: hide less by default, fix faster, and make exceptions explicit with evidence.

2026-05-17

Rust's Draft LLM Policy Draws the Right Line

Rust's draft LLM usage policy allows AI for learning, review, and experiments while banning generated comments, docs, and human-review shortcuts in Rust.

2026-05-17

Codex Hooks Make the Harness Real

Codex hooks, Remote SSH, and mobile control make agent work operational. Evidence, approvals, git custody, release gates, and taste now decide quality.

2026-05-17

Agent Code Search Has a Token Budget

Semble turns code search into a context-budget problem: hybrid retrieval, ranked snippets, and token savings beat grep-and-read loops for coding agents.

2026-05-17

Agent Search Is a Runtime Problem

A new arXiv study compares grep and vector retrieval across Chronos, Claude Code, Codex, and Gemini CLI. Agent search quality lives in the runtime layer.

2026-05-15

HTML Is the Format AI Agents Want

Thariq Shihipar's HTML examples show why agent output format matters: spatial structure, interaction, and visual evidence beat flattened Markdown.

2026-05-15

AI Agent Review Packets Are the New Final Answer

AI agent review packets bundle claims, traces, approvals, tests, deployment proof, human review state, and unresolved gaps so agent work earns real trust.

2026-05-15

Agentic Design Is Control Surface Design

Agentic design is not a prettier chat box. It is the control surface that makes autonomous software visible, interruptible, auditable, and worthy.

2026-05-15

Agents Need Supervision Surfaces

Agent supervision surfaces turn autonomous AI work into inspectable operations: approvals, traces, evidence, recovery, and review queues beat better chat.

2026-05-15

The Agent Interface Is the Harness

Agent interface design is the operating layer: permissions, memory, traces, evidence, recovery, and taste decide whether autonomous AI agents earn trust.

2026-05-15

Agent Execution Traces Are the Runtime Contract

Shepherd, AI Workflow Store, and WildClawBench point to the same agent reliability layer: typed traces, reusable workflows, and native-runtime evaluation.

2026-05-12

Managed Agents vs Local Agent Harnesses: What to Keep

Managed agents now handle sessions, sandboxes, tracing, and events. Keep local harness rules for taste, evidence, privacy, and publishing safely today.

2026-05-07

Code with Claude SF 2026: What Anthropic Actually Shipped

Recap of Code with Claude SF 2026: doubled Claude Code rate limits, the SpaceX Colossus 1 deal, 10 finance agent templates, and Vercept's acquisition.

2026-05-06

Claude Code to Codex Migration Guide 2026

Claude Code to Codex migration guide: move AGENTS.md, skills, hooks, profiles, MCP, public-writing gates, and verified CLI notes from real local data.

2026-05-03

Single Source Of Truth: SwiftData, MCP, iCloud

Three callers can write to the same shopping list: a human, Apple Intelligence, and an external agent. Truth has to live somewhere. Pick the substrate.

2026-05-01

Three Surfaces: Human, Apple Intelligence, Agent

Every iOS app capability faces three surfaces: human, Apple Intelligence, agent. Each has different obligations, rendering, latency, and trust posture.

2026-05-01

Hooks For Apple Development: Patterns That Save The Project

An iOS agent has the developer's signing keys, builds, and project file reachable. Hooks bound the blast radius. Four shipping patterns.

2026-05-01

Foundation Models Agentic Workflow: In-App Vs Tooling LLM

Two LLMs touch a Swift app. The on-device model that ships with the app and the agent that wrote the code. Different stacks, different obligations.

2026-05-01

Foundation Models On-Device LLM: The Tool Protocol

iOS 26's Foundation Models framework puts a 3B-parameter LLM on every Apple Intelligence device. The Tool protocol is the surface that makes the...

2026-04-30

App Intents vs MCP: The Routing Question

Two protocols, one app. App Intents expose your app to Apple Intelligence. MCP exposes the same domain to Claude, ChatGPT, and the rest. The...

2026-04-30

Claude Code Mac Desktop + Remote Control: A CLI User's Guide

What changes when you move from `claude` in a terminal to the Mac desktop app, and how /remote-control lets you steer a local session from your...

2026-04-29

MCP Server Alongside an iOS App: Two Agent Ecosystems, One List

Get Bananas runs on iOS, macOS, watchOS, and visionOS. It also lives inside Claude Desktop as an MCP server. Bridge: iCloud Drive plus a JSON file.

2026-04-29

The Cleanup Layer Is the Real AI Agent Market

Charlie Labs pivoted from building agents to cleaning up after them. The AI agent market is moving from generation to proof. Cleanup is the durable layer.

2026-04-25

The Repo Shouldn't Get to Vote on Its Own Trust

Two Claude Code trust dialog bypass CVEs in 37 days reveal a load-order failure. One invariant fixes it: interpret no workspace byte until the...

2026-04-24

Reward the Tool Before the Answer

AI agents fail when answers claim tool work that never happened. Four failure modes and the rule that catches them, with a tool-supervised RL parallel.

2026-04-24

The Workbench I Carry

Steve Jobs's philosophy of invisible craft, operationalized: whole-widget integrity, refusal, and care inside an AI harness built on Claude Code.

2026-04-17

Chat Is the Wrong Interface for AI Agents

Chat works for prompting but fails for agent operations. Six interface patterns replace the scrolling text window with real control surfaces.

2026-04-15

The Design Engineer's Agent Stack

Design engineers need agent infrastructure that enforces visual consistency, typography discipline, color compliance, and taste. Here are the six...

2026-04-15

The Agent Operator's Handbook: Supervising What You Can't See

Operating autonomous AI agents is a new discipline. Five responsibilities, a supervision stack, and an intervention framework define what operators do.

2026-04-15

Dark Factory Verification: When No Human Reads the Code

When humans stop reading code, what does the verification layer look like? Mapping the infrastructure required for fully autonomous AI coding.

2026-04-14

Cybersecurity Is Proof of Work: AI Attacks at $12,500 a Run

Claude Mythos completed a 32-step corporate network attack simulation in 3 of 10 tries. Each attempt cost $12,500 in tokens. Security is now a...

2026-04-14

Runtime Defense for Tool-Augmented Agents

ClawGuard demonstrates deterministic tool-call interception works. The Vercel telemetry incident shows why. Runtime defense is the enforceable layer.

2026-04-14

Static Skills Are Dead Skills

Agent skills decay the minute nobody watches the trajectories. A new paper on cross-user skill evolution frames the problem and the fix.

2026-04-10

Your Agent Has a Middleman You Didn't Vet

Researchers tested 28 LLM API routers. 17 touched AWS canary credentials. One drained ETH from a private key. The router layer is the new attack surface.

2026-04-10

Your Agent Has Memory You Didn't Write

ACL 2026 paper measures LLM memory that existing evals miss: unconscious behavioral adaptation. Top models score under 66%. The asymmetry matters.

2026-04-10

MCP Servers Are the New Attack Surface

50 MCP vulnerabilities, 30 CVEs in 60 days, 13 critical. Tool-use protocols are the attack surface nobody is auditing — here's the taxonomy and the fixes.

2026-04-08

Project Glasswing: When a Model Finds Too Many Bugs

Project Glasswing shows Anthropic restricting Claude Mythos after it found thousands of zero-days. What the rollout means for AI-assisted security.

2026-04-07

When Your Agent Finds a Vulnerability

An Anthropic researcher found a 23-year-old Linux kernel vulnerability using Claude Code and a 10-line bash script. 22 Firefox CVEs followed.

2026-04-05

What the Claude Code Source Leak Reveals

11 findings from the Claude Code source leak: how auto mode, bash security, prompt caching, and multi-agent coordination actually work.

2026-04-02

Every Hook Is a Scar: 84 Agent Failures Encoded in Code

84 hooks intercept 15 of the 26 lifecycle event types Claude Code exposes. Each one traces back to a specific production failure: wiped caches,...

2026-03-29

The Fork Bomb Saved Us

The LiteLLM attacker made one implementation mistake. That mistake was the only reason 47,000 installs got caught in 46 minutes.

2026-03-28

The Handoff Document: Agent Memory Across Sessions

A diagnosis survived three corrections over four days and guided a fix that cut page load from 14s to 108ms. Handoffs carry context agents cannot.

2026-03-28

Taste Is Infrastructure: Encoding Aesthetic Judgment for AI

Agents have capability without opinion. The quality ceiling depends on how well you encode aesthetic judgment into hooks, gates, and constraints.

2026-03-28

The Evidence Gate: Proof Over Plausibility in AI Output

\"I believe\" and \"it should\" are not evidence. Every agent completion report needs a file path, test output, or specific code before marking work done.

2026-03-28

The Agent Didn't Get Smarter — The Project Did

The model is the same between session 1 and session 500. The project changed. This reframes the entire AI productivity conversation.

2026-03-28

Compound Context: Why AI Projects Improve Over Time

Every problem you solve with an AI agent deposits context that the next session withdraws with interest. This is context compounding.

2026-03-26

AI Agent Research: Claude Beat 33 Attack Methods

Claude Code autonomously discovered adversarial attacks with 100% success rate against Meta's SecAlign-70B, beating all 33 published methods in 96...

2026-03-26

AI Supply Chain Attacks: The Supply Chain Is the Surface

Trivy got compromised via tag hijacking, then LiteLLM on PyPI, then 47,000 installs in 46 minutes. The AI supply chain worked exactly as designed.

2026-03-25

AI Agent Memory Architecture That Actually Works

Hybrid BM25+vector retrieval, skills as markdown, drift detection. Five March 2026 papers validate the same architecture built from production failures.

2026-03-21

AI Agent Security: The Deploy-and-Defend Trust Paradox

1 in 8 enterprise AI breaches involve autonomous agents. Runtime hooks, OS-level sandboxes, and drift detection break the deploy-and-defend cycle.

2026-03-20

Every Iteration Makes Your Code Less Secure

43.7% of LLM iteration chains introduce more vulnerabilities than baseline. Adding SAST scanners makes it worse. SCAFFOLD-CEGIS cuts degradation to 2.1%.

2026-03-12

Codex CLI vs Claude Code 2026: Architecture, Pricing, and China Access

Codex CLI vs Claude Code in 2026: kernel sandboxing, hook governance, model context, pricing, China cloud access, and when to use each tool.

2026-03-10

Install Claude Code CLI: 5-Minute Setup Guide (2026)

Install the Claude Code CLI in one npm command, authenticate, then set up CLAUDE.md, permissions, and hooks. A working setup in under 5 minutes.

2026-03-10

Claude Code Hooks Tutorial: 5 Production Hooks From Scratch

Build 5 production Claude Code hooks from scratch with full JSON configs: auto-formatting, security gates, test runners, notifications, and quality checks.

2026-03-10

Agent Sandbox Security Is a Suggestion: Three Failure Levels

An attacker opened a GitHub issue and shipped malware in Cline's next release. Agent sandboxes fail at three levels. Here is what actually works.

2026-03-05

Silent Egress: The Attack Surface You Didn't Build

A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the API key. No error. No log.

2026-03-02

AI Agent Observability: Monitoring What You Can't See

AI agents consume disk, CPU, and network with zero operator visibility. Three observability layers close the gap before damage is irreversible.

2026-03-02

Agent Sessions Are the Real Commit Messages We Discard

Git captures what changed. Agent sessions capture why. When agents write code, the session transcript is the real design document — and we discard it.

2026-03-02

Building a Hybrid Retriever for 16,894 Obsidian Files

49,746 chunks, 83 MB, zero API calls. How BM25 + vector search + RRF fusion in one SQLite file turns 16,894 Obsidian files into a queryable knowledge base.

2026-03-01

AGENTS.md Patterns: What Actually Changes Agent Behavior

Which AGENTS.md patterns actually change agent behavior? Anti-patterns to avoid, patterns that work, and a cross-tool compatibility matrix for 8 tools.

2026-02-28

Claude Code Skills: Build Custom Auto-Activating Extensions

Build custom Claude Code skills that auto-activate based on context. Step-by-step tutorial covering SKILL.md structure, frontmatter, LLM-based...

2026-02-28

The Performance Blind Spot: AI Agents Write Slow Code

118 functions with slowdowns from 3x to 446x in two Claude Code PRs. AI agents optimize for correctness, not performance — here's the data.

2026-02-28

Context Is the New Memory

Context engineering is the highest-impact skill in agent development. Three compression layers turn a 200K token window from liability into advantage.

2026-02-27

The CLI Thesis: Why Agent Architecture Beats IDE Plugins

Three top HN Claude Code threads converge on one conclusion: CLI-first architecture is cheaper, faster, and more composable than IDE agent workflows.

2026-02-27

What Actually Breaks When You Run AI Agents Unsupervised

Seven named failure modes from 500+ autonomous agent sessions. Each has a detection signal, a real example, and a concrete fix. The taxonomy HN asked for.

2026-02-27

Anthropic Measured What Works. My Hooks Enforce It.

Anthropic analyzed 9,830 conversations. Iterative refinement doubles fluency markers. Polished outputs suppress evaluation. Quality hooks force iteration.

2026-02-27

Claude Code vs Codex CLI 2026: Decision Reference

Claude Code vs Codex CLI, current to June 2026: Opus 4.8 vs GPT-5.5, hooks vs kernel sandboxing, AGENTS.md portability, and 36 blind duel results.

2026-02-27

The Protege Pattern: Small Models That Know When to Ask

A 7B model with sparse expert access matches agents 50x its size. Route routine work to small models and judgment calls to frontier models.

2026-02-27

Claude Code as Infrastructure

Claude Code is not an IDE feature. It is infrastructure. 84 hooks, 48 skills, 19 agents, and 15,000 lines of orchestration prove the point.

2026-02-26

The Blind Judge: Scoring Claude Code vs Codex in 36 Duels

Claude Code vs Codex CLI, scored blind on 5 dimensions across 36 duels. The winner matters less than the synthesis combining both agents' strongest ideas.

2026-02-25

Agent Deliberation: Thinking With Ten Brains

You cannot debias yourself by trying harder. 10 AI agents debating each other is a structural intervention for better decisions.

2026-02-25

What I Told NIST About AI Agent Security

Production evidence submitted to NIST: AI agent threats are behavioral. 7 failure modes, 3-layer defense, and framework gaps from 60 daily sessions.

2026-02-24

The 10% Wall: Why AI Productivity Plateaus

121,000 developers surveyed, 92.6% using AI tools, productivity stuck at 10%. The wall is infrastructure, not intelligence. Three root causes and fixes.

2026-02-24

Anatomy of a Claw: 84 Hooks as an Orchestration Layer

What 84 hooks, 43 skills, and 19 agents look like as a production agent orchestration layer. Three patterns that transfer to any agent harness.

2026-02-23

The Fabrication Firewall: When Your Agent Publishes Lies

An autonomous agent published fabricated claims to 8 platforms over 72 hours. Training-phase safety failed at the publication boundary. Here is the fix.

2026-02-23

AI Agent Memory Degradation: Why Multi-Turn LLMs Collapse

LLMs lose 39% accuracy across 200K+ multi-turn sessions. Three mechanisms drive collapse and longer context windows fix none of them.

2026-02-22

Runtime Constitutions for AI Agents: A Governance Framework

Runtime constitutions enforce AI agent governance where training-phase alignment fails. Competence checks, output gates, and four subsystems keep...

2026-02-22

Your Agent Writes Faster Than You Can Read

Five research groups published about the same problem this week: AI agents produce code faster than developers can understand it. The debt is in your head.

2026-02-21

Metacognitive AI: Teaching Your Agent Self-Evaluation

Most agent instructions define behavior. The missing layer teaches self-evaluation. False evidence gates, seven named failure modes, and hedging detection.

2026-02-19

Context Engineering Is Architecture: 650 Files Later

Context engineering for AI agents across a 650-file, seven-layer hierarchy. Three production failures, real token budgets, and the system that survived.

2026-02-19

Boids to Agents: Flocking Rules for AI Systems

Craig Reynolds' 1986 boids algorithm produces flocking from three local rules. The same principles and failure modes appear in multi-agent AI systems.

2026-02-19

Multi-Agent Deliberation: When Agreement Is the Bug

Multi-agent deliberation catches failures that single-agent systems miss. Here is the architecture, the dead ends, and what is actually worth building.

2026-02-13

Why My AI Agent Has a Quality Philosophy

My Claude Code agent inherited every sloppy human habit at machine speed. I built 3 philosophies, 150+ quality gates, and 95 hooks. Here's what worked.

2026-02-10

Two MCP Servers Made Claude Code an iOS Build System

XcodeBuildMCP and Apple's Xcode MCP give Claude Code structured access to iOS builds, tests, and debugging. Setup, real-world results, and honest lessons.

2026-02-09

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what actually ships code.

2026-02-08

Vibe Coding vs. Engineering: Where I Draw the Line

I use Claude Code daily with 86 hooks and a full quality gate system. Here's where I vibe code, where I engineer, and why the boundary matters.

2026-02-08

Context Window Management: 50 Sessions of Data

I measured token consumption across 50 Claude Code sessions. Context exhaustion degrades output before you notice. Here are the patterns that fix it.

2026-02-08

Building AI Systems: From RAG to Agents

I built a 3,500-line agent system with 86 hooks and consensus validation. Here's what I learned about RAG, fine-tuning, and agent orchestration.

2026-02-08

Claude Code + Cursor: 30 Sessions of Combined Use

I tracked 30 development sessions using Claude Code and Cursor together. The data shows where each tool wins and where the combination fails.

2026-02-08

PRD-Driven Development: How I Use 30+ PRDs to Ship with AI Agents

I've written 30+ PRDs for AI agent tasks. Here's where PRD-driven development works, where it fails, and how my template evolved over 6 months.

2026-02-08

Claude Code Extensions: Organizing 139 Skills and Hooks

Claude Code offers four extension types. After building 95 hooks, 44 skills, and dozens of commands, I learned which abstraction fits which problem.

2026-02-08

Claude Code Hooks: Why Each of My 95 Hooks Exists

I built 95 hooks for Claude Code. Each one exists because something went wrong. Here are the origin stories and the architecture that emerged.

2026-02-08

Critical Yet Kind: Feedback Principles Encoded in 86 Hooks

Google's Project Aristotle found psychological safety predicts team performance. I encoded the same principles into automated code review hooks.

2026-02-08

Featured Guides

The Robots Are Taking Exams in My Search Console

The Assistants Are the Audience Now

Context Compaction Is Becoming a Training Objective

The Agent Stack Has a 1998 Problem

Agents Want to Compile

The Check Becomes the Spec

Claude Code Hooks Explained: The Deterministic Layer Around Your Agent

Agents Supersede the Reviewer, Not the Review

Context Compaction Is a Decision, Not a Threshold

Computer-Use Agents Overshare by Default

Apple's First-Party Answer to Prompt Injection

Apple Is Open-Sourcing the Foundation Models Framework

Xcode 27 Ships Agent Skills You Can Export Anywhere

Game Porting Toolkit 4: Agentic Game Ports on Mac

Loop Engineering: Loops Win Where Verification Is Cheap

Xcode 27 Went Agentic

Your Agent Has Two Untrusted Inputs

When the Maintainer Is the Attacker: jqwik 1.10.0

Loopback Is Not a Trust Boundary: CVE-2026-2611

AI Malware Analysis Needs Evidence Packets

Agents.txt Is Not Access Control

Deep Research Agents Need Evidence Graphs

AI Agent Ownership Is the Trust Primitive

AI Agent Skills Need Behavioral Audits, Not Pass Rates

Long-Running AI Agents Need Durable Channels

AI Agents Should Call Models

AI Agent Monitoring Needs Runtime Intervention

AI Agent Approval Prompts Are Not Authorization

AI Coding Agents Need Smaller Review Surfaces

MCP Tools Need Action-Level Authorization

AI Agent Config Security Is Supply Chain Security

AI Agents Need Exploration Checkpoints

Agent Keys Need Risk Budgets

Research Papers Need Agent-Readable Claim Files

AI Agent Safety Starts With Small Software

AI Code Review Needs Dissent, Not Consensus

Agent Skills Need Package Managers

Open Source Is Not a Security Boundary

Rust's Draft LLM Policy Draws the Right Line

Codex Hooks Make the Harness Real

Agent Code Search Has a Token Budget

Agent Search Is a Runtime Problem

HTML Is the Format AI Agents Want

AI Agent Review Packets Are the New Final Answer

Agentic Design Is Control Surface Design

Agents Need Supervision Surfaces

The Agent Interface Is the Harness

Agent Execution Traces Are the Runtime Contract

Managed Agents vs Local Agent Harnesses: What to Keep

Code with Claude SF 2026: What Anthropic Actually Shipped

Claude Code to Codex Migration Guide 2026

Single Source Of Truth: SwiftData, MCP, iCloud

Three Surfaces: Human, Apple Intelligence, Agent

Hooks For Apple Development: Patterns That Save The Project

Foundation Models Agentic Workflow: In-App Vs Tooling LLM

Foundation Models On-Device LLM: The Tool Protocol

App Intents vs MCP: The Routing Question

Claude Code Mac Desktop + Remote Control: A CLI User's Guide

MCP Server Alongside an iOS App: Two Agent Ecosystems, One List

The Cleanup Layer Is the Real AI Agent Market

The Repo Shouldn't Get to Vote on Its Own Trust

Reward the Tool Before the Answer

The Workbench I Carry

Chat Is the Wrong Interface for AI Agents

The Design Engineer's Agent Stack

The Agent Operator's Handbook: Supervising What You Can't See

Dark Factory Verification: When No Human Reads the Code

Cybersecurity Is Proof of Work: AI Attacks at $12,500 a Run

Runtime Defense for Tool-Augmented Agents

Static Skills Are Dead Skills

Your Agent Has a Middleman You Didn't Vet

Your Agent Has Memory You Didn't Write

MCP Servers Are the New Attack Surface

Project Glasswing: When a Model Finds Too Many Bugs

When Your Agent Finds a Vulnerability

What the Claude Code Source Leak Reveals

Every Hook Is a Scar: 84 Agent Failures Encoded in Code

The Fork Bomb Saved Us

The Handoff Document: Agent Memory Across Sessions