My latest book: Twin Wolves: Balancing risk and reward to make the most of AI
(Photo by Joshua Hoehne on Unsplash)
In the last few months you've no doubt seen headlines about unsafe, vibe-coded products that leaked personal data. Or stories about apps that were easily hacked because they lacked the most basic safety protocols. Does this mean that most AI-generated software is bad? Maybe, maybe not. But for perspective, let's remember:
The world has always had terrible software.
That's certainly not an excuse to let terrible software sit. Buggy code represents a clear risk exposure for anyone who uses it, and by extension, for those providing it. The fact that we've seen this before, interestingly, works in our favor:
Getting to the root of terrible human-created software can shed light on how to handle the AI-borne variety.
Not only does that shift vibe coding's risk/reward tradeoff in a more favorable direction; in some cases it means we can have our cake and eat it too. Professional software dev shops and vibe-coding hopefuls, take note.
I've worked in the tech space since the Dot-Com era. My early days as a contract software developer gave me a view of terrible code from the inside: I earned most of my money by cleaning up messes that others had left behind. I've also encountered plenty of buggy, poorly-written software as an end-user. (To be honest, a run-in with such a website is what prompted me to write this.) It's an ugly situation no matter where you sit.
It's important to discern between terrible software and good software that's having a bad day. Good software has a few bugs here and there. Once in a while, as with the 2012 Knight Capital disaster, those bugs collide in an unfortunate manner. It's unfortunate, it's unexpected, but it happens.
Terrible software is a different animal. It's riddled with preventable problems, most of which stem from lack of care and lack of experience. Most of the time it's still usable, but frustrating as hell to work with. Doubly so because the problems feel like they stem from a lack of attention and professionalism.
We've seen larger problems, too. Crypto fans may remember the infamous 2014 Mount Gox hack, in which thieves ran off with $460 million:
Mt. Gox, he says, didn't use any type of version control software -- a standard tool in any professional software development environment. This meant that any coder could accidentally overwrite a colleague's code if they happened to be working on the same file. According to this developer, the world's largest bitcoin exchange had only recently introduced a test environment, meaning that, previously, untested software changes were pushed out to the exchanges customers -- not the kind of thing you'd see on a professionally run financial services website. And, he says, there was only one person who could approve changes to the site's source code: Mark Karpeles. That meant that some bug fixes -- even security fixes -- could languish for weeks, waiting for Karpeles to get to the code. "The source code was a complete mess," says one insider.
Mount Gox represents a special flavor of terrible software: it's destined to fail, but doesn't do so straight away. This lulls dev teams into a false sense of security. They're surprised when the system eventually crashes, when instead they should be thankful that it ran as long as it did.
As I often say: just because it runs, doesn't mean it works.
From my experience, terrible software grows from three poison seeds:
1/ Ill-qualified teams and their individual team members – This includes inexperienced developers who don't receive proper mentorship, as well as seasoned developers who are not a fit for that particular project. Both groups make poor decisions that ultimately play out as frustrating bugs, hacks, and crashes.
2/ Good developers in bad situations – At times even the best developers may exhibit the flaws of an ill-qualified team member. When company leadership sets needlessly aggressive deadlines, for example, steady hands get sloppy.
3/ Weak or nonexistent support structures – Developers usually work hand-in-hand with other teams – product management, infrastructure, security, and so on. These groups' checks and balances create a safety net for the software product; without them, code problems are more likely to slip through.
These same factors will play out in AI-generated apps. In professional circles, experienced developers and testers will slip as they face a flood of generated code. This will only get worse as company leadership cuts developer headcount in favor of bots.
Hobbyist developers who have no software experience are close analogs of the entry-level developer who has no oversight, no mentoring, and no infrastructure team. They can get an app all the way to production – serving real people and collecting real personal data – without following any industry best practices. Troublesome UIs? Mysterious errors? Security lapses that leave the doors open for infiltrators? We've already seen that. And since genAI systems churn out code at machine-level speeds, expect to see a lot more of it.
It's not as though vibe coders are the first group of amateur developers to cause trouble, though.
Let's remember that Apple created legions of developers when it opened up its app ecosystem to anyone willing to register. Many of them sat in a grey overlap of "pure hobbyist" and "professional," since they lacked software dev experience but were also selling products to the public.
And before that, the Dot-Com tech craze coincided with the rise of open-source tooling. Anyone with a computer could install developer tools on an open-source operating system, crank away at source code until it ran, and then release it to an unsuspecting world. Some developers, heavy on ego but light on experience, created companies to trade their products for money. Others hung a shingle for professional services to write software for other businesses.
(Try to not think of how many times you plugged your personal info and payment details into a site built by such a crew. Instead, ask yourself how many self-trained hobbyist developers became today's CTOs.)
AI code-generation tools, like open-source tools and Apple's iOS developer ecosystem before them, represent the latest iteration of this phenomenon.
This history also holds the key to a solution.
These problems aren't a big deal for a weekend project that runs inside your own environment and is only used by you. You may as well vibe up that calendar app or school lunch tracker! Hobby projects are fun and a great way to learn.
Vibed-up code in a professional, commercial setting is held to a higher standard. It's also not going away anytime soon. How can we reap the benefits of code generation while protecting ourselves from its flaws? We can turn to industry best practices, and double down on what good teams have been doing all along:
Expert review is key. Professional software teams perform code reviews. It's because they recognize that even the best developer will make a mistake now and then. Getting extra eyes on the code improves the chances of finding problems before they make it to production.
Tests, test, test. The reason test-driven development (TDD) has had such staying power is that it works. A robust test suite is a safety net for every code change When teams are creating code at machine speeds and machine volumes, those tests are now more important than ever
Deployments as gating functions. A professional software shop pushes code through different stage gates – dev/test, QA, production – as yet another way to surface problems early. You'll be tempted to fully automate your deployments to get those vibed-up changes out there faster. This is one case where you want a human to stay in the loop and approve releases, if for no other reason than to be aware of when a change was made.
Monitoring. If automated tests catch problems during development, monitors will alert you to problems in production – either due to internal bugs, or external changes in operating conditions. If you haven't already implemented a robust set of monitors, now's the time.
Equipping the rest of the organization. AI-enhanced developers can create code a lot faster than their unassisted counterparts. This speed boost is of little value unless a company also provides extra support for the product and infrastructure teams that work alongside the developers.
Realistic expectations. Even with the best test suites and monitors, developers who are under needless pressure will make mistakes.
Over time, we'll no doubt uncover other, AI-specific software practices. But these should be a great start.
AI-generated code has proven a rare bright spot in the landscape of genAI use cases. Not only is it here to stay, it will probably grow. We'll need to improve how we manage the AI bots, and how we manage their place in the software development lifecycle, in order to reap the benefits of this technology while avoiding its pitfalls.
Software development is thankfully a mature field that has built a catalog of best practices. Many of those apply just as well to AI-generated code; we'd do well to take notice.
Complex Machinery 064: What does genAI do well?
The latest issue of Complex Machinery: In search of nails for the world's most popular hammer