This post is part of a series on approaching data ethics through the lens of risk: exposure, assessment, consequences, and mitigation.
- Part 1 – exploring the core ideas of risk, and how they apply to data ethics
- Part 2 and Part 3 – questions to kick off a data ethics risk assessment related to company-wide matters
- Part 4 – questions to assess individual data project
- Part 5 – risk mitigation and otherwise staying out of trouble
You want to use data science, machine learning, and AI to your company’s competitive advantage. At the same time, you want to stay out of trouble as you do this. A damaged reputation, PR cleanup, and legal action will all cut into those new data-driven revenue streams.
One way to stay out of trouble is to perform a proactive risk assessment around your data efforts. This means that you consider what else can happen, beyond your intended outcomes. You this by working through questions such as those we’ll cover today.
Take the time to think them through and to provide honest answers. Understand that there are few empirically “right” or “wrong” answers here. This is about understanding what your company is doing with data and evaluating your risk tolerance around those activities.
Finally, to repeat something I covered last time: a risk assessment isn’t about shutting down and saying “no” to every opportunity. Since every “no” closes a door on a chance to monetize your data, you have to decide whether a shot at your intended outcome is worth the potential to get burned.
What data do we collect, and why?
Sometimes a company will attempt to collect any and all data it can reach, and hold onto it forever, in the hopes that something in there will eventually provide value.
There is, technically, some truth in there: the more datasets you collect, the greater the chance that you’ll eventually uncover some useful nugget.
Then again, the old adage “idle hands are the devil’s workshop” also holds true for data hoarding and aimless analyses. People in your company may poke at the data out of boredom or curiosity and uncover novel – but also unsettling and unsanctioned – uses for it without clearing those efforts with company leadership. Or, if you never figure out how to use the data yourself, you might sell it to someone else who then turns out to be an unscrupulous player. Expect this news to leak to the public in either case.
Companies that start with a plan – companies that collect data with specific projects in mind – stand a smaller chance of hitting these accidental ethics problems.
When do we delete our data?
Data that’s not tied to a revenue stream – either because it’s out of date, or because you never figured out how to monetize it – is a prime candidate for removal. This runs contrary to claims that data is so valuable. Why would you ever throw it out?
Consider: any data you hold may be subject to an (accidental) leak or an (intentional) breach. It’s bad enough to wind up in the news because some data has left your walls; even worse, if it’s data that wasn’t even tied to any revenue streams.
Worse still, is when you hold data that people reasonably believe that you have deleted. Let’s say that someone unsubscribes from your marketing communications but you still hold their details – which could include name, e-mail address, mailing address, and who knows what other demographic info – “just in case.” When that data leaks, you now have two stories to explain to the public: how the leak occurred, and why you’ve impacted people who aren’t even your customers.
Practice Datensparsamkeit to reduce your risk of a data ethics problem here. This is is a term from German privacy law that loosely translates to “data sparingness.” Delete data you don’t need and don’t collect any new data you don’t need. You can’t get in trouble for data you don’t have.
How do we store/protect our data? Who can access it? and how?
Assuming you’re comfortable with all of the data you collect, then there’s the question of how you protect it from unintended use.
Companies generally understand what it means to protect their data from outside problems – hackers infiltrating your systems and walking away with your data – but protection from inside problems is a little murkier.
You hold a variety of data across an array of systems: HR details, customer web traffic, orders, and support requests are just a few. You’ve probably established a “data lake” to make it easier to combine these datasets, to unlock interesting and useful insights.
Part of combining datasets is to ask who can do so, and when. Should your web developers be able to see that HR data? Does your sales team really need to see the full customer service requests, or can they work with summaries or partially obfuscated data?
It’s unlikely that everyone needs to access all of the data. As part of a risk assessment, you should determine when to let datasets mix and when to maintain walls between them. You’ll also want to figure out what access controls and audit trails to put in place.
Exploring these questions can help you spot problems before they grow too large for you to handle.
In Part 3, we’ll continue to explore questions to uncover company-wide data ethics risks.
(This post is based on materials for my workshop, Data Ethics for Leaders: A Risk Approach. Please contact me to deliver this workshop in your company.)