The top failure modes of an ML/AI modeling project (Part 1)

Posted by Q McCallum on 2023-01-11
alt text

(Photo by Pawel Janiak on Unsplash)

The good part about machine learning (ML): you can build a model to automate document classification, pricing decisions, and other business tasks that are too nuanced for software.

The not-so good part: a model can also drain your time, money, and effort. And you won’t know until it happens.

That’s why I have my machine learning “anti-sales pitch,” where I tell a consulting prospect about some ways a modeling project can go awry. I want them to know what they’re getting into before we start, such that they can evaluate the risks and decide whether to proceed.

After a recent discussion here on LinkedIn, I figured it was time to share this anti-sales pitch more widely.

Things that can go wrong

Consider that building a model involves just three steps:

  1. You gather some training data, which you run through an algorithm.
  2. That algorithm looks for patterns in the data.
  3. The algorithm then saves those patterns as a model, which you can use for making additional predictions.

What could possibly go wrong?

Quite a bit, actually:

Not having enough training data. The algorithm needs a good amount of data in order to find patterns. Without enough training data, it will instead find non-patterns and the model will make bad predictions.

(Every model is wrong some of the time; but without enough training data, it will be wrong most of the time.)

This is like a person who draws sweeping conclusions from just one or two occurrences. The problem is especially troublesome for an ML model, since it has no way of knowing that it’s operating out of its depth.

There’s little “signal” in the training data. Feeding the algorithm lots of data won’t necessarily lead to a better model. If that data doesn’t align (“correlate”) with what you’re trying to predict, the algorithm still won’t find any patterns, and the model will emit bad predictions.

If you’re trying to predict churn, for example, you’ll probably want to use customers’ purchase history in the training data. The names of their pets? Not as helpful.

Exceeding your budget (of time, money, or anything else). We use the phrase “build a model,” but in reality your company’s data scientists and ML engineers are building a lot of models in search of the best performance. Along the way, they test a variety of techniques and tuning parameters.

Even a very talented team can run out of time or money before they find the model that performs well.

**The world changes. **A model’s entire “experience” is based on its training dataset. When the world changes, that training data no longer matches the present-day reality, and your model will make bad predictions.

Sometimes this is a subtle change and your model’s performance will slowly degrade. Other cases are more extreme. Consider the onset of the Covid-19 pandemic in early 2020, when people’s habits suddenly and drastically shifted due to stay-at-home orders. Any model that relied on, say, purchasing behavior or travel patterns was thrown for a loop.

Freak correlations. This one’s a particularly nasty problem. Your model performs very well during training and yet falls apart in the real world. This isn’t a case of the world changing, as with the previous item. It happens because the training data is bad, yet it coincidentally aligns with what you’re trying to predict.

Let’s say you’ve built a model that uses height and hair color to evaluate data scientist job candidates. It just so happens that your best data scientists are tall with brown hair, so the model makes stellar predictions when you test it on your current team. Since height and hair color have nothing to do with a person’s data skills, the model will give terrible predictions when you run it on new candidates. And you’ll have no idea why! You’ll think there are gremlins in the model, but really it’s due to an unfortunate connection in your training data.

It’s easy to see the problem with a toy example like this, but much harder to spot in a real-world scenario.

The grab-bag of non-technical issues. Maybe your model’s poor performance impacts your business (it mis-prices some purchases), you over-sell its capabilities to prospective customers, or you determine after the fact that an ML model was a poor fit for the problem you’re trying to solve. Whatever the case, your model is causing you trouble out in the real world.

One non-technical issue I’ll mention here is** the PR meltdown**, like what insurance startup Lemonade experienced in 2021. People felt that Lemonade was telling investors and customers two very different stories about how it was using AI. Did the models get the final say on whether to deny a claim? Was AI really being used to track “non-verbal cues” in an attempt to detect fraud? Lemonade’s messaging here was muddled, at best, and the incident brought them a lot of unwanted media attention.

What next?

The very act of developing an ML model exposes you to the risks I’ve listed above. (And quite a few more risks, frankly.) You have to go into this with eyes wide open, to be aware of the problems that may crop up.

Still, there are some steps you can take to mitigate the risks and reduce the pain should something go wrong. I’ll cover those next week.