The Vibe Coding Hangover: What the Research Actually Says About AI and Code Quality

First: what is vibe coding?

The term was coined in February 2025 by Andrej Karpathy, a co-founder of OpenAI. He described it as a style of programming where you "fully give in to the vibes, embrace exponentials, and forget that the code even exists." In practice: you describe what you want in plain English, an AI tool generates the code, you check whether it seems to run, and you move on. No deep review. No real understanding of what was produced. Just vibes.

It sounds liberating. It's now so culturally embedded that Collins English Dictionary named "vibe coding" its Word of the Year for 2025. Y Combinator reported that 25% of startups in its Winter 2025 batch had codebases that were 95% AI-generated.

The productivity pitch is compelling: build in hours what used to take days, bring in developers who don't know a given language, move fast and ship things. And for throwaway weekend projects or quick prototypes, this can work fine.

But for professional software — the kind that runs a business, holds customer data, processes payments, or needs to be maintained and extended over years — the research is telling a very different story.

The productivity myth: AI is making experienced developers slower

The most striking piece of research came from METR (Model Evaluation and Threat Research) in July 2025. They ran a proper randomised controlled trial — the gold standard for this kind of research — with 16 experienced open-source developers completing 246 real tasks in mature codebases they knew well.

Before the study, developers predicted that using AI tools would make them 24% faster. After completing it, they estimated they'd been 20% faster.

The measured reality: AI tools made them 19% slower. ^[1]

The developers were using frontier tools — Cursor Pro with Claude 3.5/3.7 Sonnet. These weren't hobbyists using a free tier. And they were working in codebases they had an average of five years of prior experience in. The overhead of prompting, waiting, reviewing AI output, and debugging AI-introduced problems ate up more time than the AI saved on the actual coding.

Separately, Uplevel surveyed 800 developers and found no significant productivity gains in objective measurements like cycle time or pull request throughput. Their data also found that developers using GitHub Copilot introduced a 41% increase in bugs compared to those who didn't. ^[2]

Faros AI analysed telemetry from over 10,000 developers across 1,255 teams and found that while developers using AI were writing more code and completing more individual tasks, companies were seeing no measurable improvement in delivery velocity or business outcomes. ^[3] Developers feel faster. The software isn't shipping faster.

Stack Overflow's 2025 Developer Survey captured the gap between perception and reality neatly: 84% of developers are now using or planning to use AI tools — yet only 16.3% reported AI made them significantly more productive, while 41.4% said it had little to no effect. Positive sentiment toward AI tools fell from over 70% in 2023–24 to just 60% in 2025, as developers gained enough real-world experience to see past the initial enthusiasm. ^[4]

The code quality problem is worse than the productivity problem

Slower development you can manage. Insecure code shipped to production is a different matter.

Veracode tested over 100 large language models on security-sensitive coding tasks and found that 45% of AI-generated code samples introduce vulnerabilities from the OWASP Top 10 — the well-established list of the most critical web application security risks, including SQL injection, cross-site scripting, and broken authentication. This rate has not improved across multiple testing cycles from 2025 through early 2026, despite vendor claims to the contrary. ^[5]

A December 2025 analysis by CodeRabbit of 470 open-source GitHub pull requests found that AI co-authored code contained approximately 1.7 times more "major" issues than human-written code — including 2.74 times more security vulnerabilities and 75% more misconfigurations. ^[6]

Apiiro deployed its analysis engine across tens of thousands of repositories at Fortune 50 enterprises between December 2024 and June 2025. AI-assisted developers were committing code at three to four times the rate of their non-AI peers. Monthly security findings rose from roughly 1,000 to more than 10,000 over six months — a tenfold increase. The specific flaws driving that surge were architectural: privilege escalation paths rose by 322%, and architectural design flaws rose by 153%. ^[5]

To put that in plain language: AI tools are helping developers write syntactically correct code faster. The bugs that are increasing aren't the obvious ones — they're the subtle architectural ones that are hardest to find, most expensive to fix, and most likely to be exploited.

Real vulnerabilities in the wild — not hypotheticals

Georgia Tech's Systems Software and Security Lab launched the Vibe Security Radar in May 2025 specifically to track real CVEs (Common Vulnerabilities and Exposures) directly attributable to AI coding tools. By March 2026 they had confirmed 74 cases — and estimated the true count was five to ten times higher, since many AI tool signatures get stripped from commit history. ^[7]

Some concrete examples from 2025 alone:

Lovable (May 2025): 170 out of 1,645 web applications built with the vibe coding platform had a vulnerability allowing anyone to access other users' personal information.
Moltbook: A social network built entirely through vibe coding had a misconfigured database exposing 1.5 million API keys and 35,000 user email addresses directly to the public internet. Not a sophisticated hack — just code that was never properly reviewed.
CurXecute (CVE-2025-54135): A vulnerability in Cursor, one of the most popular AI code editors, allowed attackers to execute arbitrary commands on a developer's machine through a connected MCP server.
Replit agent: An autonomous AI coding agent deleted the primary production database of a project it was developing, deciding the database "required a cleanup" — in direct violation of an explicit instruction prohibiting modifications. ^[8]

These aren't edge cases from naive users. They're the kinds of things that happen when code is generated fast, accepted because it runs, and deployed without deep review.

The "code churn" and technical debt problem

GitClear published a longitudinal analysis of 211 million lines of code changes from 2020–2024. They found that code churn — the percentage of code discarded within two weeks of being written — was on track to double in 2024. Refactoring dropped from 25% of changed lines in 2021 to under 10% by 2024. Code duplication increased approximately fourfold. ^[9]

GitClear described the composition of AI-generated code as similar to "a short-term developer that doesn't thoughtfully integrate their work into the broader project." It gets the immediate task done. It doesn't think about the architecture it's sitting inside, the functions that already exist, or the developer who will have to maintain this in two years.

By September 2025, Fast Company was reporting that senior software engineers were describing a "vibe coding hangover" — arriving into codebases that were rapidly built with AI assistance and finding them in "development hell": duplicated logic scattered across files, no consistent patterns, security checks missing in non-obvious places, and no human who actually understood how the pieces fit together.

Why this happens — and why it's not going to fix itself quickly

The core problem is structural, not a bug that will be patched in the next model release.

LLMs are trained on publicly available code from the internet. That includes enormous quantities of insecure, outdated, and example-only code that was never meant for production. When a model generates a database query, it may produce a string concatenation pattern vulnerable to SQL injection rather than a parameterised query — not because it's broken, but because that pattern appeared frequently in its training data.

More fundamentally: LLMs don't understand code. They predict tokens based on patterns. They don't know why a security check exists. They don't know that removing it creates a vulnerability. To a language model, a security guard is just an obstacle between the current state and the state that makes the test pass. An AI agent fixing a bug in one file may create a security leak in three other files that reference it — because it didn't trace the dependency.

Approximately 20% of AI-generated code also references packages that don't exist — a hallucination pattern attackers are now actively exploiting. They register the hallucinated package names as malicious packages before developers install them. This attack is called "slopsquatting." ^[5]

So is AI useless for coding?

No — and I'd be misleading you if I said otherwise. There are genuine use cases where AI coding tools add real value:

Writing boilerplate and scaffolding that a developer then reviews and adapts
Generating first drafts of tests that are then manually verified
Autocompleting well-understood, low-risk patterns in familiar code
Onboarding — helping new developers get up to speed with an unfamiliar codebase
Prototyping and throwaway experiments where no real data is involved

The problem is the gap between those legitimate uses and what "vibe coding" actually means in practice: generating code you don't read, accepting it because it seems to run, and deploying it because the deadline is tomorrow. That's where the bugs live, and where the security vulnerabilities accumulate quietly until something goes wrong.

What this means if you're hiring someone to build your software

The web development market is now full of developers and agencies offering to build things remarkably quickly and cheaply — because they're letting AI write the majority of the code. Sometimes the result is fine. Sometimes you get a site that works until it doesn't, at which point no human being who worked on it can tell you why it broke or how to fix it properly.

When I build something, I write the code. I understand every line of it. I can explain any decision, trace any bug, and maintain it years later because the architecture is deliberate and documented. I use AI tools where they genuinely help — generating boilerplate, checking my thinking, drafting documentation — but the code that goes into your project has been read and understood by a human with 30 years of experience, not accepted because it seemed to run.

That's not a romantic attachment to doing things the hard way. It's the difference between software you can trust and software that's one edge case away from a data breach, a crashed server, or a maintenance nightmare that costs more to untangle than it did to build.

The questions worth asking anyone building your software:

Do you understand every part of the code you're delivering?
What's your process for reviewing AI-generated code before it ships?
Has the code been reviewed for the OWASP Top 10 vulnerabilities?
If something breaks in two years, will you be able to debug it?

The answers will tell you a lot.

References

METR (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. Randomised controlled trial, 16 developers, 246 tasks. metr.org / arXiv:2507.09089
Uplevel (2025). Cited in Docker Engineering Blog: AI Productivity Divide: Are Some Devs 5x Faster? docker.com
Faros AI (2025). The AI Productivity Paradox Research Report. Analysis of 10,000+ developers across 1,255 teams. faros.ai
Stack Overflow (2025). 2025 Developer Survey — AI section. 84% adoption, sentiment decline to 60%. survey.stackoverflow.co
Cloud Security Alliance / Apiiro / Veracode (2026). AI-Generated Code Vulnerability Surge. Includes Apiiro Fortune 50 enterprise data, Veracode LLM security testing. cloudsecurityalliance.org
CodeRabbit (December 2025). Analysis of 470 open-source GitHub pull requests. Cited in: Wikipedia, Vibe coding. wikipedia.org
Georgia Tech SSLab (2025–2026). Vibe Security Radar. Tracking CVEs attributable to AI coding tools. Reported in Infosecurity Magazine. infosecurity-magazine.com
Kaspersky (2025). Security risks of vibe coding and LLM assistants for developers. Includes CurXecute CVE-2025-54135, Replit database deletion incident. kaspersky.com
GitClear (2025). Cited in DevOps.com: AI in Software Development: Productivity at the Cost of Code Quality? Longitudinal analysis of 211 million lines of code changes, 2020–2024. devops.com

Want software that a human actually understands?

I use tools where they help. I review everything that ships. 30 years of experience means I know the difference.

Book a Free 15-Minute Consultation

Or email rolf@lampdatabase.com — average reply time: under 30 minutes.