Vibe Coding vs. Engineering: Where I Draw the Line

From the guide: Claude Code Comprehensive Guide

Andrej Karpathy coined the term “vibe coding” in February 2025, describing a development style where the programmer “fully gives in to the vibes, embraces exponentials, and forgets that the code even exists.”1

The line between vibe coding and engineering falls at blast radius, not complexity. Prototypes and disposable scripts get vibes: describe what you want, accept the output, iterate by feel. Anything that touches users, persists data, or compounds over time gets full engineering treatment with automated quality gates, hook-based safety checks, and deterministic verification.

I read that and thought: that’s half of my workflow. The other half is the opposite.

TL;DR

I build with Claude Code every day. Some of that work is pure vibe coding: describe what I want, accept the output, move on. Some of it runs through 86 hooks, a git safety guardian, a recursion guard, and a quality gate system that blocks commits with AI tells or passive voice. The line between the two isn’t arbitrary. Prototypes get vibes. Production gets engineering. The hard part is knowing when a prototype crosses the line.


My Actual Workflow

The Vibe Side

When I’m exploring an idea, I vibe code without apology. My Ace Citizenship iOS app started as a weekend experiment: “Build a spaced repetition quiz for USCIS civics questions.” Claude Code generated the initial SwiftUI views, the question bank, and the scheduling algorithm. I didn’t read every line. I ran it, tested it manually, and iterated by describing what felt wrong.

The interactive components on this blog (the RAG decision tree, the compound interest calculator) started the same way. “Build a decision tree that walks users through RAG vs fine-tuning with animated transitions.” Accept, test, adjust. The blast radius of a bug in a blog widget is contained to one page.

The Engineering Side

My Claude Code hook architecture is the opposite of vibe coding. Every hook exists because something went wrong.

git-safety-guardian.sh exists because Claude force-pushed to main during an early session. The hook now intercepts every git command, pattern-matches against a severity table (CRITICAL: force push to main; HIGH: adding .env files; MEDIUM: –no-verify), and injects a warning before execution.

recursion-guard.sh exists because a subagent spawned infinite children. The hook tracks agent lineage in a JSON file, enforces depth limits, and manages a spawn budget model that prevents runaway agent chains while allowing legitimate parallel work.

blog-quality-gate.sh exists because AI-generated prose sounds like AI-generated prose. The hook blocks commits to content/blog/ if it detects em dashes, passive voice, or words like “delve,” “crucial,” or “landscape.”

None of these hooks were vibe coded. Each one was written line by line, tested against real failure scenarios, and reviewed before deployment. The 86 hooks collectively represent the boundary between vibing and engineering.


Where the Line Actually Falls

Vibe: Disposable Prototypes

I vibe code anything I might throw away. A data transformation script I’ll run once. A CLI tool for personal use. A proof-of-concept to test whether an API does what the docs claim. The cost of a bug in disposable code is my own time, and the speed gain outweighs the debugging risk.

Vibe: Creative Exploration

When I’m exploring a design direction, vibe coding lets me test interaction patterns faster than Figma. “Build a search modal with keyboard navigation, result highlighting, and Cmd+K activation” produces a working prototype in minutes. I evaluate the feel, not the code.2

Engineer: Anything That Touches Users

The moment code serves someone other than me, it crosses the line. My blog runs through a 12-module linter that checks citations, heading hierarchy, readability grade, image alt text, internal link density, and content depth. The linter has 77 tests. The blog has 29 posts. The linter has more tests than the blog has content.

Engineer: Anything That Persists

Database schemas, API contracts, hook configurations, and deployment manifests get full engineering treatment. These decisions compound. A schema designed in a vibe session becomes a migration nightmare when three years of data accumulates on top of it.3

Engineer: Anything Security-Adjacent

AI-generated code reflects the security posture of its training data, which includes tutorials and Stack Overflow answers that routinely omit authentication, input validation, and error handling for brevity.4 My hooks catch some of this (the git safety guardian flags .env additions, credential files, and force pushes), but security-critical code gets manual review regardless.


The Comprehension Gap Problem

The most dangerous pattern in vibe coding isn’t bad code. It’s code that works until it doesn’t.

I generated a caching layer for my i18n translation system. It worked perfectly for English content. When I added Korean and Traditional Chinese, the cache key generation silently produced collisions for certain Unicode code points. Debugging took four hours because I’d never read the cache key function. The code was correct for ASCII, which is all the training data emphasized.5

The lesson: vibe-coded systems fail at the edges that training data underrepresents. If your users operate at those edges (non-Latin scripts, high concurrency, unusual network conditions), vibe-coded implementations carry hidden risk.


The Review Gate

Every piece of production-bound code in my system passes through a review gate, whether I wrote it or Claude Code did:

  1. Read every line. Generated code is a pull request from an untrusted contributor. Review accordingly.
  2. Verify error handling. Check that error paths reflect domain requirements, not generic try-catch patterns.
  3. Audit dependencies. AI resolves each prompt in isolation, importing whatever library solves the immediate request. After 50 prompts, you might have three date libraries and two HTTP clients.
  4. Add tests. Generated code rarely covers edge cases specific to your domain.
  5. Check security. Run static analysis. Verify authentication, authorization, and input validation.6

The review gate isn’t optional. It’s the difference between using AI as a force multiplier and using AI as a crutch.


The Industry Split

Software engineering is splitting into two tiers. The first uses AI as a force multiplier: generating boilerplate, exploring solution spaces, and accelerating the implementation of well-understood patterns while maintaining comprehension and quality standards. The second generates entire applications without understanding the output, trading short-term velocity for long-term fragility.7

The split mirrors early web development. Template builders like Squarespace democratized web publishing and produced millions of functional websites. Professional web development persists because production systems require quality, security, and maintainability that templates can’t provide.

I operate in both tiers deliberately. My hook system and quality gates exist specifically to catch the moment when tier-two work needs to graduate to tier-one standards. The 86 hooks aren’t bureaucracy. They’re the immune system that lets me vibe code freely while preventing vibe-coded work from reaching production without review.


Key Takeaways

For engineers who use AI daily: - Draw an explicit line between exploration and production; vibe code disposable work freely, but enforce review gates before anything touches users or persists - Build automated guardrails (hooks, linters, quality gates) instead of relying on discipline alone; discipline fails at 2 AM, hooks don’t

For engineering managers: - Establish clear boundaries between prototype-quality and production-quality code; vibe-coded prototypes that slip into production create a new category of technical debt - Evaluate productivity by outcomes (features shipped, bugs per feature, user satisfaction) rather than velocity metrics; vibe coding inflates line counts without proportionally improving outcomes


FAQ

What is vibe coding?

Vibe coding is a development style coined by Andrej Karpathy in February 2025, where the programmer “fully gives in to the vibes, embraces exponentials, and forgets that the code even exists.”1 In practice, it means describing what you want to an AI agent, accepting the output without reading every line, and iterating by describing what feels wrong rather than debugging the code directly.

When is vibe coding appropriate versus engineering?

Vibe code anything disposable: one-off scripts, personal CLI tools, proofs-of-concept, and creative exploration like testing interaction patterns. Engineer anything that touches users, persists as data or configuration, or handles security-adjacent concerns. The line is not about code complexity but about blast radius. A bug in a blog widget affects one page. A bug in a database schema or API contract compounds across years of accumulated data.3

What are the risks of vibe coding?

The most dangerous risk is code that works until it does not. Vibe-coded systems fail at edges that training data underrepresents: non-Latin scripts, high concurrency, unusual network conditions. AI-generated code also reflects the security posture of tutorials and Stack Overflow answers, which routinely omit authentication and input validation for brevity.4 The comprehension gap means you cannot debug failures in code you never read. My i18n cache key collision took four hours to debug because I had never read the cache key function.5

How do I prevent vibe-coded prototypes from reaching production?

Build automated guardrails instead of relying on discipline alone. My system uses 86 Claude Code hooks that enforce review gates automatically: a git safety guardian that flags force pushes and credential files, a blog quality gate that blocks AI-sounding prose, and a 12-module linter with 77 tests. Discipline fails at 2 AM. Hooks do not. The hooks collectively form the boundary that lets you vibe code freely while preventing vibe-coded work from shipping without review.6

Is vibe coding making software worse?

It depends on the infrastructure around it. Software engineering is splitting into two tiers: one that uses AI as a force multiplier while maintaining comprehension and quality standards, and one that generates entire applications without understanding the output.7 Organizations with review gates, automated testing, and quality enforcement absorb vibe-coded exploration without degradation. Organizations without them accumulate technical debt that compounds invisibly.


References


  1. Karpathy, Andrej, “Vibe Coding,” X/Twitter, February 2025. Original definition of the term. 

  2. Author’s workflow building interactive components and design prototypes with Claude Code, 2025-2026. 

  3. Author’s analysis of database migration costs across three production systems. Migration cost grew 15x over three years. 

  4. Pearce, Hammond et al., “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” IEEE S&P 2022

  5. Author’s experience debugging i18n cache key collisions in the blakecrosley.com translation system, February 2026. 

  6. Anthropic, “Claude Code Documentation: Best Practices,” docs.anthropic.com, 2025. 

  7. Author’s analysis of the emerging developer tier system, observed across Hacker News, X/Twitter, and developer conferences, 2025-2026. 

Related Posts

The Handoff Document: Agent Memory Across Sessions

A diagnosis survived three corrections over four days and guided a fix that cut page load from 14s to 108ms. Handoffs ca…

8 min read

Compounding Engineering: Why My Codebase Accelerates

Most codebases slow down as they grow. Mine accelerates. 95 hooks, 44 skills, and 14 configs make each feature cheaper t…

11 min read

The Ralph Loop: How I Run Autonomous AI Agents Overnight

I built an autonomous agent system with stop hooks, spawn budgets, and filesystem memory. Here are the failures and what…

11 min read