Common Mistakes in Data Science Hiring : Part 1

Posted by Q McCallum on 2018-01-23

Are you having trouble hiring data scientists? or, once you hire them, do they not stick around? You may be tripping over your own feet.

Hiring and retention are common sources of frustration in the world of data science. (That includes machine learning (ML), artificial intelligence (AI), and big data.) It’s easy to blame this on candidates being fickle because data science is a hot, in-demand field. There’s certainly some truth to that. There’s also another truth: you’re unwittingly sabotaging your efforts to hire and retain data talent.

I’ve met with a number of companies that were having trouble hiring, and compared their experiences to data scientists’ job search grievances. Based on that, I’ve drawn ten common problems in data science hiring and offer solutions to each. I’ll cover five of those problems here and the remaining five in a post next week.

Some of these are subtle fixes, while others will require larger changes. As you read, keep track of the problems you face in your company and then take action to address them.

Problem 1: not having a purpose

Data science gets a lot of media attention these days, and with good reason: companies that effectively collect and analyze data may open new revenue streams and improve decision-making.

That said, some companies are diving into data science without really understanding what it is, nor how it can improve their business. I call this paying the data science vanity tax because they’re doing data science for the sake of saying they’re doing data science.

How do you know whether your data science efforts have purpose? Ask yourself and your fellow executives some pointed questions around what return on investment (ROI) they expect and how applying data science will improve the company. Vague answers or grandiose, unsubstantiated claims – “We’re going to solve this (complex, thorny) issue with machine learning!” – are a sign that something’s amiss.

Solution: figure out what you want to do, and why.

Ignore the media hype and vendor marketing materials. Focus on your business model and your mission. Ask yourself why data science interests you and figure out what it can do for you. Most importantly, do your homework on what “data science” and “machine learning” really mean, so you can determine how to employ them in your business.

Problem 2: not having a plan

Your purpose guides what you want to do, at a high level. It’s the vision. Vision is just the first step. What does data science mean for you on the strategic and tactical levels? Have you sorted out how to execute on this grand vision?

If you don’t have a detailed plan, you’re wandering blind: you’re wasting time, energy, morale, and money. Not having a plan is worse than not having a purpose, because you have the potential to do something great but you’re not really going anywhere.

Solution: develop a road map (data strategy).

Connect the “what we want to do” to the “how will we do it.” I’ve written a more detailed piece on developing your data strategy but the gist is: develop a set of questions you’d want to answer, translate those into specific data projects, and map out the data sources and skills you’d need to set this in motion. 1 The word “specific” is the key. Hand-waving about ideas and oversimplifying problems will only postpone your disappointment.

Problem 3: trying to dive into advanced data science as a first step

Business press and vendor marketing literature often speak of data science like it’s an island unto itself. This leads people to assume they can dive right into advanced data science projects (deep learning is a popular one these days) from scratch.

In reality, a successful data science effort stands on several layers of foundation. You’ll need clean data and a solid technology infrastructure before you can do even first-stage data science.

Solution: start with the basic data work, then work your way up.

Develop mechanisms to collect the data you’ll need, and deploy storage systems so you can hold and retrieve it. Review the data you already have and and confirm that it’s suitable for analysis. (If your response is, “we don’t know what data we have or need,” then you skipped the step about developing your data strategy. See above.)

If you already have that data and infrastructure, by the way, do yourself a favor and first try some business intelligence (BI) techniques such as counting, summaries, and roll-ups. Besides being generally useful to a business – who wouldn’t want, say, breakdowns of sales by region? – BI techniques are a fast and cheap way to test your data: summaries and roll-ups can quickly reveal inconsistent or otherwise problematic data. If the BI techniques fail, there’s no way the higher-end data science techniques will work.

Problem 4: making your (unprepared) recruiters the first step in outreach

You want job applicants to be prepared, right? You should do the same. The most common complaint I hear from experienced data scientists is that they get a cold phone call or e-mail from someone who boasts of an amazing job opening in a great company … but they don’t have any detailed information about the role.

Yes, you should tell a prospect about the vacation policy and office environment. You should also go into the specifics of what they’ll be doing. Experienced practitioners (and even some sharp entry-level folks) want to know what tools you use, what sort of projects they’ll take on, and how their work will help the company.

Some recruiters fake it here by dropping vague answers such as, “the team uses Python” or “we’ll sort out your role once we’ve hired you.” 2 This is a great way to get candidates to lose interest. Don’t expect them to take your calls after that.

Solution: hiring managers, you can do your own outreach.

There are so many opportunities for you to meet a prospective hire in advance (or instead!) of the formal interview. As someone with hiring authority, you have the ability and responsibility to get out of the office and do your own sourcing. Attend meetups and conferences. Offer to host one of the local data-related meetups in your office space. Be proactive about getting in front of possible candidates, in a relaxed environment.

When you’re at an event, don’t greet people with “we’re hiring” and send them straight to HR. Instead, talk to them about the role(s) you have in mind and ask them what sorts of projects they enjoy 3 . Assess their interest, technical skills, and team fit, and then loop in HR to formalize the deal.

Problem 5: waiting on perfection / artificial requirements

Do you have a data science role that’s been open for several months? No one’s biting at the job posting? Double points if this is happening with your first data science hire.

There’s a chance the problem isn’t the candidates, but your job posting and expectations. Many data science job postings ask for a real titan: someone with tons of experience, across a variety of toolsets – scikit-learn, TensorFlow, neural networks, text analysis, and churn predictions – and throw in a requirement of a PhD in statistics for good measure. Oh, and can they also write code to integrate their models into the production website? and manage the databases while they’re at it?

I get it. I firmly believe the old saying: if you don’t ask, you don’t get. But there’s another piece of the puzzle: just because you ask, doesn’t mean it will work out.

If you wait for the one candidate who matches every bullet point in the job posting, you’ll be waiting a long time. The handful people who fit that bill were snatched up ages ago (often, into some form of self-employment) which means that they’re out of your reach. Worse still, the people who meet most of the requirements will read between the lines: they know that a company with such a job posting is lost, so it’s no place for a knowledgeable, in-demand data professional.

Solution: Be realistic the job requirements.

This problem usually surfaces when you haven’t developed a data strategy. Without one, you don’t have a clear picture of what you need, so you’re more inclined to ask for every possible skill just in case.

That amazing diety of a data scientist might just show up… but they probably won’t. Split that one job into separate roles, and/or determine which skills are must-haves versus can-learns. Qualified applicants will see that you have realistic expectations, so they will be more likely to apply.

Get ready for Part 2 …

Thanks for reading this first set of tips on data science hiring. The next post in the series is now available.

Update 2018/02/05: My colleague Greg Reda has just released his tips on hiring data scientists. I encourage you to read that in addition to my posts here. Greg is a sharp thinker with good experience, and I trust what he has to say on this topic.

In the meantime: are you having trouble building or growing your data science team? I want to help. Please [contact me](/contact/) to start the discussion.

  1. Some companies try to side-step developing a data strategy by using data scientist interviews as a way to learn about data science and piecemeal their way into a plan. This is both unethical and ineffective. ↩︎

  2. Some folks who have interviewed at Certain Well-Known Companies have complained of this. The more experienced the candidate, the more details they’ll want, and the idea of going through a hellish interview process just to be dumped into a pool of new hires does not sit well. ↩︎

  3. Asking people what they want to do is such an easy step, and it speaks volumes, but hardly anyone does it. If you want to stand out, start asking people what they want instead of pigeon-holing them into a role you had in mind. Their answers may surprise and enlighten you. ↩︎