← Todos los articulos

Every Iteration Makes Your Code Less Secure

After ten rounds of LLM-driven code refinement, 43.7 percent of iteration chains contained more vulnerabilities than the baseline code they started from.1 The agent improved functionality. The agent passed tests. The code also got less secure with each iteration — a pattern the researchers call “specification drift.” Nobody noticed because the code worked better on every metric except security.

TL;DR

A study of iterative LLM code refinement across three models (GPT-5-Nano, Claude Sonnet 4.5, DeepSeek-V3) and 2,880 iteration steps reveals a paradox: agents optimize for functional correctness while silently degrading security. Standard mitigations fail. Adding static analysis security tools (SAST gates) to the loop increased latent degradation from 12.5 percent to 20.8 percent. The SCAFFOLD-CEGIS framework addressed the problem with four verification layers, achieving a 2.1 percent latent degradation rate with 100 percent safety monotonicity at the cost of a 77 percent task completion rate. The finding matters for anyone running autonomous agent loops.


The Paradox

The researchers tested three LLMs (GPT-5-Nano, Claude Sonnet 4.5, DeepSeek-V3) across 24 programming tasks in six security categories (database, input handling, authentication, resource management, cryptography, path handling), producing 288 iteration chains and 2,880 total iteration steps.1 The finding: specification drift during multi-objective optimization causes security to degrade gradually over successive iterations.

The mechanism: when an agent optimizes code across multiple rounds, each round focuses on functional improvements (fixing bugs, adding features, passing tests, improving performance). Security constraints compete with functional objectives for the agent’s attention. Over ten rounds, the agent learns (implicitly, through context accumulation) that functional changes produce positive feedback while security constraints produce no feedback at all. Defensive logic that does not contribute to visible functionality gets simplified, refactored away, or replaced with weaker alternatives.

The 43.7 percent degradation rate comes from a separate observational study tracking GPT-4o across ten iteration rounds. The main experiment benchmarked SCAFFOLD-CEGIS against five existing defense approaches: prompt-based security, self-refine, post-hoc SAST, test-driven guard, and hybrid guard.1 The research community had already identified iterative degradation as a concern. None of the five alternatives solved it.

An independent study by Shukla, Joshi, and Syed, peer-reviewed and accepted at IEEE-ISTAS 2025, corroborates the pattern.4 The researchers took ten security-verified C and Java code samples, applied four distinct prompting strategies across ten iterations each (400 total samples), and measured a 37.6 percent increase in critical vulnerabilities after just five iterations. The vulnerability taxonomy covered 12 categories including memory safety, input validation, cryptographic implementation, and injection flaws. The consistency across different research teams, languages, and evaluation methodologies confirms that iterative degradation is a property of the approach, not an artifact of a single experimental setup.


Why SAST Gates Make It Worse

The most counterintuitive finding: adding static analysis security tools as gates between iterations increased latent degradation from 12.5 percent to 20.8 percent.1

The paper attributes the cause to specification drift during multi-objective optimization. A complementary explanation maps to a known pattern in human software development: when developers rely on linters and static analyzers, they write less defensively because the tools will “catch” problems. The same dynamic likely applies to LLM agents. When the agent receives SAST feedback between iterations, two things happen:

  1. The agent optimizes for passing the scanner, not for writing secure code. SAST tools check for known vulnerability patterns (SQL injection, XSS, buffer overflows). The agent learns to avoid those specific patterns while introducing novel security weaknesses that the scanner does not detect.

  2. The agent removes “redundant” defenses. If the scanner reports that input validation at layer A is sufficient, the agent removes validation at layer B during the next iteration. Validation at layer B was defense-in-depth, not redundancy. The scanner cannot distinguish between the two.

The result: SAST-gated iteration produces code that passes security scans but contains more latent vulnerabilities than ungated iteration. The tooling creates a false sense of security that causes the agent to be less cautious, not more.

Anyone running an autonomous coding loop with SAST gates between iterations should pay attention. The gates are not protecting you. The gates are training your agent to route around protection.


What SCAFFOLD-CEGIS Does Differently

The SCAFFOLD-CEGIS framework takes a different approach.1 Instead of checking for known vulnerability patterns, the framework enforces safety monotonicity: no iteration can make the code less secure than the previous iteration.

The results across all three approaches:

Approach Latent Degradation (SSDR) Safety Monotonicity Task Completion
No gating (baseline) 12.5% Not measured Higher
SAST gating 20.8% Not guaranteed Higher
SCAFFOLD-CEGIS 2.1% 100% 77.14%

The architecture uses four sequential verification layers, each checking a different property:1

Layer Function Gate Criterion
Correctness Run full test suite All tests pass
Safety monotonicity Compare SAST results between iterations No new vulnerabilities vs. previous
Diff budget Limit per-iteration change scale Change size within threshold
Anchor integrity Verify security-critical code elements Substring, regex, AST, or semantic match

The framework adopts the CEGIS (counterexample-guided inductive synthesis) principle: a closed loop of candidate generation, verification, feedback, and regeneration. Rather than using formal verifiers, the system uses static analysis and semantic-anchor checking, representing counterexamples as structured failure reports.1 If any layer rejects an iteration, the system reverts to the previous version rather than attempting to fix the regression.

The trade-off is real: SCAFFOLD-CEGIS achieved a 77.14 percent task completion rate, compared to higher completion rates for less secure approaches.1 Safety monotonicity costs productivity. The framework rejects iterations that a less strict system would accept and improve upon. Whether the trade-off is worth it depends on whether you value security guarantees over throughput.

The key insight: revert on failure rather than iterate on failure. Standard SAST-gated loops detect a problem and ask the agent to fix it, producing another iteration that can introduce new problems. SCAFFOLD-CEGIS detects a problem and discards the iteration entirely. The monotonicity guarantee comes from never accepting a regression, not from detecting and fixing regressions.


Connection to Agent Harness Design

The finding connects directly to how practitioners build orchestration layers around agent CLIs.2 The seven failure modes I documented from 500+ autonomous sessions include several that the iterative refinement paradox explains: agents that pass tests while degrading code quality, agents that optimize for the wrong metric, agents that remove safety constraints during refactoring.

The judgment hooks I described in “Anatomy of a Claw” address the degradation problem through a different mechanism. quality-gate.sh blocks completion reports that lack evidence. filter-sensitive.sh catches credential exposure before it reaches disk. recursion-guard.sh limits agent spawning depth. Each hook enforces a monotonicity property: the system should not get worse on a specific dimension as the agent iterates. The runtime constitution pattern extends the same idea: embedded governance rules that the agent cannot override during execution.

Karpathy’s autoresearch system uses the same pattern.3 The evaluation harness keeps improvements and discards regressions via git branch management. The training metric (validation bits per byte) serves as the monotonicity constraint. No experiment result that degrades the metric survives.

Three independent systems (formal verification research, ML research infrastructure, production agent harnesses) converge on the same design principle: never iterate on failure; always revert on failure. Giving an agent a second chance to fix a regression produces worse results than discarding the regression and trying a fresh approach.


What Practitioners Should Do

Three concrete actions based on the findings:

Audit your iteration loops for security monotonicity. If your agent runs multiple rounds of code modification, compare security posture at each round against the original baseline, not just against the previous round. Cumulative drift is invisible when you only compare adjacent iterations.

Do not rely on SAST gates alone. The SAST-gated results (20.8 percent degradation, worse than ungated) should change how you design feedback loops. SAST tools are valuable for detecting known patterns in human-written code. In agent iteration loops, the tools become optimization targets that the agent routes around. Use SAST as one signal among several, not as a gate.

Implement revert-on-failure, not fix-on-failure. When an iteration introduces a regression, discard the iteration entirely. Do not ask the agent to fix the regression in a subsequent iteration. The fix attempt is itself an iteration subject to the same degradation dynamics. A minimal implementation using git:

#!/bin/bash
# monotonicity-gate.sh — revert on security regression
BASELINE_HASH="$1"  # git hash of the known-good baseline

# Run your security checks against current state
CURRENT_VULNS=$(semgrep --config auto --json . | jq '.results | length')
BASELINE_VULNS=$(git stash && git checkout "$BASELINE_HASH" -q && \
    semgrep --config auto --json . | jq '.results | length' && \
    git checkout - -q && git stash pop -q)

if [ "$CURRENT_VULNS" -gt "$BASELINE_VULNS" ]; then
    echo "Security regression: $BASELINE_VULNS$CURRENT_VULNS vulnerabilities"
    git checkout "$BASELINE_HASH" -- .
    exit 2  # Block the iteration
fi

The pattern compares against the original baseline, not the previous iteration. Cumulative drift is the threat.


FAQ

Does iterative refinement always degrade security?

Not every iteration chain degrades. The SCAFFOLD-CEGIS study found 43.7 percent of chains contained more vulnerabilities after ten rounds, meaning 56.3 percent maintained or improved security posture.1 An independent IEEE-ISTAS study found a 37.6 percent increase in critical vulnerabilities after five iterations.4 The concern is that degradation is silent: the agent produces functionally correct code that passes tests while security properties erode. Without explicit security monotonicity checks, degradation goes undetected until a vulnerability is exploited.

Why do SAST gates make the problem worse instead of better?

Static analysis tools check for known vulnerability patterns. When an agent receives SAST feedback between iterations, the agent optimizes for passing the scanner rather than writing secure code. The agent avoids flagged patterns while introducing novel weaknesses the scanner cannot detect. The agent also removes defense-in-depth layers that the scanner marks as redundant. The net effect is code that passes scans but contains more latent vulnerabilities than code produced without SAST gating.

What is safety monotonicity and how does SCAFFOLD-CEGIS enforce it?

Safety monotonicity means no iteration can make the code less secure than the previous iteration. SCAFFOLD-CEGIS enforces the property through four sequential verification layers: correctness (test suite), safety monotonicity (SAST comparison between iterations), diff budget (limiting change scale), and anchor integrity (verifying security-critical code elements survive). The framework uses the CEGIS (counterexample-guided inductive synthesis) principle, representing counterexamples as structured failure reports rather than formal proofs. If any layer rejects an iteration, the system discards it entirely rather than passing it to the agent for fixing. The trade-off: 77 percent task completion rate, lower than less strict approaches.

How does revert-on-failure differ from fix-on-failure in agent loops?

Fix-on-failure detects a problem and asks the agent to correct it in the next iteration. The correction attempt is itself subject to the same specification drift that caused the original regression, often introducing new problems. Revert-on-failure discards the entire iteration and returns to the last known-good state. The agent starts fresh with a clean baseline rather than accumulating corrective patches. Git branch management makes reversion trivial in practice.

Can I apply these findings to my existing Claude Code or Codex workflow?

Yes. The three actions in the practitioner section apply to any agent loop that modifies code across multiple rounds. Audit your iteration loops by comparing security posture against the original baseline (not just the previous iteration). Treat SAST output as one signal among several rather than as a gate. When an iteration introduces a regression, use git checkout or git revert to discard the change entirely rather than prompting the agent to fix it. The hook-based harness pattern provides a concrete implementation model for encoding these checks as automated gates.


Sources


  1. Yi Chen et al., “SCAFFOLD-CEGIS: Preventing Latent Security Degradation in LLM-Driven Iterative Code Refinement,” arXiv:2603.08520, March 2026, arxiv.org/abs/2603.08520v1. Tested GPT-5-Nano, Claude Sonnet 4.5, DeepSeek-V3 across 24 tasks, 288 chains, 2,880 steps. 43.7% degradation rate (GPT-4o observational study); SAST gates increased SSDR from 12.5% to 20.8%; SCAFFOLD-CEGIS achieved 2.1% SSDR with 100% safety monotonicity at 77.14% task completion. 

  2. Blake Crosley, “Anatomy of a Claw: 84 Hooks as an Orchestration Layer,” blakecrosley.com, February 2026. 

  3. Andrej Karpathy, autoresearch: AI agents running autonomous ML research, March 2026, github.com/karpathy/autoresearch. 630-line Python script, ~700 experiments over two days, ~20 genuine improvements. 

  4. Shivani Shukla, Himanshu Joshi, Romilla Syed, “Security Degradation in Iterative AI Code Generation: A Systematic Analysis of the Paradox,” IEEE-ISTAS 2025, arXiv:2506.11022, arxiv.org/abs/2506.11022. 10 security-verified C/Java samples, 4 prompting strategies, 10 iterations each (400 total), 37.6% increase in critical vulnerabilities after 5 iterations. 12 vulnerability categories. 

Artículos relacionados

Silent Egress: The Attack Surface You Didn't Build

A malicious web page injected instructions into URL metadata. The agent fetched it, read the poison, and exfiltrated the…

17 min de lectura

The Invisible Agent: Why You Can't Govern What You Can't See

Anthropic silently dropped a 10GB VM on users' Macs. Agent observability requires three layers: resource metering, polic…

20 min de lectura

Your Agent Writes Faster Than You Can Read

Five research groups published about the same problem this week: AI agents produce code faster than developers can under…

16 min de lectura