Vibe Coding vs. Engineering: Where I Draw the Line

Andrej Karpathy coined the term “vibe coding” in February 2025, describing a development style where the programmer “fully gives in to the vibes, embraces exponentials, and forgets that the code even exists.”¹

I read that and thought: that’s half of my workflow. The other half is the opposite.

TL;DR

I build with Claude Code every day. Some of that work is pure vibe coding: describe what I want, accept the output, move on. Some of it runs through 86 hooks, a git safety guardian, a recursion guard, and a quality gate system that blocks commits with AI tells or passive voice. The line between the two isn’t arbitrary. Prototypes get vibes. Production gets engineering. The hard part is knowing when a prototype crosses the line.

My Actual Workflow

The Vibe Side

When I’m exploring an idea, I vibe code without apology. My Ace Citizenship iOS app started as a weekend experiment: “Build a spaced repetition quiz for USCIS civics questions.” Claude Code generated the initial SwiftUI views, the question bank, and the scheduling algorithm. I didn’t read every line. I ran it, tested it manually, and iterated by describing what felt wrong.

The interactive components on this blog (the RAG decision tree, the compound interest calculator) started the same way. “Build a decision tree that walks users through RAG vs fine-tuning with animated transitions.” Accept, test, adjust. The blast radius of a bug in a blog widget is contained to one page.

The Engineering Side

My Claude Code hook architecture is the opposite of vibe coding. Every hook exists because something went wrong.

git-safety-guardian.sh exists because Claude force-pushed to main during an early session. The hook now intercepts every git command, pattern-matches against a severity table (CRITICAL: force push to main; HIGH: adding .env files; MEDIUM: –no-verify), and injects a warning before execution.

recursion-guard.sh exists because a subagent spawned infinite children. The hook tracks agent lineage in a JSON file, enforces depth limits, and manages a spawn budget model that prevents runaway agent chains while allowing legitimate parallel work.

blog-quality-gate.sh exists because AI-generated prose sounds like AI-generated prose. The hook blocks commits to content/blog/ if it detects em dashes, passive voice, or words like “delve,” “crucial,” or “landscape.”

None of these hooks were vibe coded. Each one was written line by line, tested against real failure scenarios, and reviewed before deployment. The 86 hooks collectively represent the boundary between vibing and engineering.

Where the Line Actually Falls

Vibe: Disposable Prototypes

I vibe code anything I might throw away. A data transformation script I’ll run once. A CLI tool for personal use. A proof-of-concept to test whether an API does what the docs claim. The cost of a bug in disposable code is my own time, and the speed gain outweighs the debugging risk.

Vibe: Creative Exploration

When I’m exploring a design direction, vibe coding lets me test interaction patterns faster than Figma. “Build a search modal with keyboard navigation, result highlighting, and Cmd+K activation” produces a working prototype in minutes. I evaluate the feel, not the code.²

Engineer: Anything That Touches Users

The moment code serves someone other than me, it crosses the line. My blog runs through a 12-module linter that checks citations, heading hierarchy, readability grade, image alt text, internal link density, and content depth. The linter has 77 tests. The blog has 29 posts. The linter has more tests than the blog has content.

Engineer: Anything That Persists

Database schemas, API contracts, hook configurations, and deployment manifests get full engineering treatment. These decisions compound. A schema designed in a vibe session becomes a migration nightmare when three years of data accumulates on top of it.³

Engineer: Anything Security-Adjacent

AI-generated code reflects the security posture of its training data, which includes tutorials and Stack Overflow answers that routinely omit authentication, input validation, and error handling for brevity.⁴ My hooks catch some of this (the git safety guardian flags .env additions, credential files, and force pushes), but security-critical code gets manual review regardless.

The Comprehension Gap Problem

The most dangerous pattern in vibe coding isn’t bad code. It’s code that works until it doesn’t.

I generated a caching layer for my i18n translation system. It worked perfectly for English content. When I added Korean and Traditional Chinese, the cache key generation silently produced collisions for certain Unicode code points. Debugging took four hours because I’d never read the cache key function. The code was correct for ASCII, which is all the training data emphasized.⁵

The lesson: vibe-coded systems fail at the edges that training data underrepresents. If your users operate at those edges (non-Latin scripts, high concurrency, unusual network conditions), vibe-coded implementations carry hidden risk.

The Review Gate

Every piece of production-bound code in my system passes through a review gate, whether I wrote it or Claude Code did:

Read every line. Generated code is a pull request from an untrusted contributor. Review accordingly.
Verify error handling. Check that error paths reflect domain requirements, not generic try-catch patterns.
Audit dependencies. AI resolves each prompt in isolation, importing whatever library solves the immediate request. After 50 prompts, you might have three date libraries and two HTTP clients.
Add tests. Generated code rarely covers edge cases specific to your domain.
Check security. Run static analysis. Verify authentication, authorization, and input validation.⁶

The review gate isn’t optional. It’s the difference between using AI as a force multiplier and using AI as a crutch.

The Industry Split

Software engineering is splitting into two tiers. The first uses AI as a force multiplier: generating boilerplate, exploring solution spaces, and accelerating the implementation of well-understood patterns while maintaining comprehension and quality standards. The second generates entire applications without understanding the output, trading short-term velocity for long-term fragility.⁷

The split mirrors early web development. Template builders like Squarespace democratized web publishing and produced millions of functional websites. Professional web development persists because production systems require quality, security, and maintainability that templates can’t provide.

I operate in both tiers deliberately. My hook system and quality gates exist specifically to catch the moment when tier-two work needs to graduate to tier-one standards. The 86 hooks aren’t bureaucracy. They’re the immune system that lets me vibe code freely while preventing vibe-coded work from reaching production without review.

Key Takeaways

For engineers who use AI daily: - Draw an explicit line between exploration and production; vibe code disposable work freely, but enforce review gates before anything touches users or persists - Build automated guardrails (hooks, linters, quality gates) instead of relying on discipline alone; discipline fails at 2 AM, hooks don’t

For engineering managers: - Establish clear boundaries between prototype-quality and production-quality code; vibe-coded prototypes that slip into production create a new category of technical debt - Evaluate productivity by outcomes (features shipped, bugs per feature, user satisfaction) rather than velocity metrics; vibe coding inflates line counts without proportionally improving outcomes

References

Karpathy, Andrej, “Vibe Coding,” X/Twitter, February 2025. Original definition of the term. ↩
Author’s workflow building interactive components and design prototypes with Claude Code, 2025-2026. ↩
Author’s analysis of database migration costs across three production systems. Migration cost grew 15x over three years. ↩
Pearce, Hammond et al., “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” IEEE S&P 2022. ↩
Author’s experience debugging i18n cache key collisions in the blakecrosley.com translation system, February 2026. ↩
Anthropic, “Claude Code Documentation: Best Practices,” docs.anthropic.com, 2025. ↩
Author’s analysis of the emerging developer tier system, observed across Hacker News, X/Twitter, and developer conferences, 2025-2026. ↩