Weekly recap: 2024-04-11

Posted by Q McCallum on 2024-02-11

2024/02/05: What happens when the model is wrong?

You’ve finally built that AI-driven product and it’s ready to go to market. Great! Just one question:

What happens when the underlying AI model is wrong?

Every model is wrong now and then. Some turn out to be wrong a lot of the time once they are released to production. So you want to be ready for that.

The thing is:

  • Some companies don’t think about this at all. (Cue the hand-waving and positive vibes.)
  • A few will focus on what happens to them when the model is wrong. (Cue the calls to legal and PR departments.)
  • Hardly any companies wonder what happens to other people when the model goofs. (Not unless they are required to do so by law, in which case they treat this as an extension of “what happens to us.”)

Why am I bringing this up now?

No reason. But here’s an article about UK supermarkets employing facial recognition – a technology with a well-known track record of being wrong – for age verification purposes.

And who better than to supply those machines than … Fujitsu. The company behind the UK Post Office’s Horizon IT system. A buggy system which, over the course of several years, led to wrongful accusations against (and then convictions of) more than 500 people.

Post Office scandal company to sell supermarket face-scanners” (The Telegraph)

2024/02/06: Minding the machines

I’ve noted before that AI models are akin to factory machines: they don’t know when they are operating out of their depth, so they need constant supervision.

This also holds when the AI is inside an actual factory machine:

Companies Brought in Robots. Now They Need Human ‘Robot Wranglers.’” (WSJ)

One weekend when Cusack was working overtime, he set up a robot to sand the material, then retreated to a nearby room to watch. But as the robot began moving, Cusack realized he had forgotten to set a critical control. The bot was on an unalterable path directly into an expensive fiberglass panel.

The robot slowly crawled forward until it “put in a giant circular hole in the side of the panel,” he said. He had to explain to his team the hundreds of thousands of dollars worth of damage was his fault for not properly instructing the bot.

2024/02/07: Being proactive about model safeguards

An AI model has no built-in protections against misuse. It’s up to you to build the guardrails and padding around that model. And a key part of that is to “red-team” your system (pretend you’re a bad actor and try to uncover exploits).

Red-teaming and similar risk-management exercises are especially important when your AI model drives a public-facing LLM chatbot. When billions of people are able to poke at your model, it’s quite likely that they will uncover corner cases and unintended uses.

All of this is to say: OpenAI recently tested whether their latest GPT models could be used to create bioweapons.

Building an early warning system for LLM-aided biological threat creation” (OpenAI)

(Credit where it’s due: I originally spotted this in Der Spiegel, which pointed me to the OpenAI writeup: “GPT-4 erleichtert den Bau einer Biowaffe nur ein ganz klein wenig”)

(For additional thoughts on this topic, I recommend my Radar article “Risk Management for AI Chatbots”)

2024/02/08: Putting too much faith in the machines

Companies are in a mad dash to implement and/or buy AI-driven systems. I can see the appeal in automating away mundane tasks. But sometimes AI systems emit wrong answers or even outright nonsense.

To abdicate responsibility to an AI model – especially for sensitive tasks – is simply irresponsible.

Anyway, here’s another story about companies putting too much faith in AI. (At least this one’s not about facial recognition…)

The AI tools that might stop you getting hired” (The Guardian)

But when she re-did her interview not in English but in her native German, she was surprised to find that instead of an error message she also scored decently (73%) – and this time she hadn’t even attempted to answer the questions but read a Wikipedia entry. The transcript the tool had concocted out of her German was gibberish. When the company told her its tool knew she wasn’t speaking English so had scored her primarily on her intonation, she got a robot voice generator to read in her English answers. Again she scored well (79%), leaving Schellmann scratching her head.

What you see here is the last week’s worth of links and quips I have shared on LinkedIn, from Monday through Sunday.

For now I’ll post the notes as they appeared on LinkedIn, including hashtags and sentence fragments. Over time I might expand on these thoughts as they land here on my blog.