Lately I’ve read a lot of attempts at defining data scientist and differentiating it from other data-centric roles. The terms ‘data scientist’, ‘data analyst’, and ‘data engineer’ are obviously interrelated. But recently I’ve seen some weird definitions of them.
Let me make clear that this isn’t just a silly semantic quibble with no practical significance (though it certainly is partially, maybe largely, that). This issue often comes up when people are giving career advice. A recent blog post defined a data analyst as someone who interrogates data using SQL and Excel to produce reports, while a data scientist is someone who delivers software. This is a lead in to advice that a data analyst position is not good preparation for a data scientist position because data scientists are basically software engineers and data analyst positions don’t give you that experience. The premise, however, is based on an extremely narrow definition of both of these roles that might only be true in certain companies.
Another example of silly definitions for these roles came from a reddit thread I read, where someone was claiming that anyone who regularly uses pandas/sk-learn must be a data engineer. This bizarre claim seemed to stem from an idea that legitimate data scientists are at the forefront of machine learning research and therefore must be programming completely new machine learning algorithms from scratch.
Here’s the thing: if anyone claims a very clear, straightforward definition of any of these roles that sharply delineates them, they are probably extrapolating based on experience of how these roles are defined in one company (or industry). This is because these jobs, by their very nature, have a lot of overlap.
I would give the high-level, fuzzy (and hopefully not controversial) definitions of these roles as:
Data engineer: a data professional who focuses on building data pipelines, manages how to get data from point A to point B.
Data analyst: a data professional who focuses on producing reports describing trends/insights in data.
Data scientist: a data professional who focuses on producing insights and predictions from data.
Note my use of the weasel phrase ‘focuses on’ to avoid making any hard statements. Because at bottom, each of these roles overlaps significantly with the others - it’s typical for data professionals to need to extract data or move it around. It’s not uncommon for data engineers to produce dashboards or reports, or for data analysts to set up data pipelines. Data scientists may sometimes produce reports, and some data analysts are deliver code to go into production.
Any hard line you can think of (data scientists use machine learning, data analysts use excel, etc, etc) there are going to be lots of counter-examples.
At the end of the day, there is a difference in these roles. It’s a fuzzy yet important one. Typically data scientists are more specialized and have more experience/skills than data analysts. Data engineers might have more knowledge of databases but less about statistics. But the field is not at a point where you can easily determine the total extent of the role just by the title.
The bottom line is, if you’re looking to become a data scientist and want to know what path to take, getting experience as a data analyst (or data engineer) might not be a bad way to go about it. However, it’s dependent on the specifics of the particular position you get. If you’re a data analyst who sits with business people and only use excel to produce simple reports, you’re not going to get the kind of experience you need to move on to a data science position. But if you work closely with other data scientists, or are expected to learn stats/machine learning to perform your duties, it might be good experience. I think the blog post I mentioned above does give good advice on this - be concerned about who you’ll be sitting with, since that probably says a lot about what skills you’ll learn.