Up-and-coming data engineers complement entrenched data scientists

Even before the hot title of data scientist was fully defined, a complementary role began to bubble up: data engineer. Here's how they differ and why companies may need one, the other or both.

Despite the buzz around analytics and the data scientist, a new data management-oriented job title, data engineer, is appearing with increasing frequency.

The data engineering position is showing up in job postings, and, while sometimes confused with the data scientist job, it seems to be becoming a central part of many big data undertakings.

It can be easy to confuse the two roles based on their names, but data scientists and data engineers are actually quite different, according to Bob Melk, president of IT careers site Dice.com.

"The skills related to data science are much more focused on mathematical methods and data analytics, while data engineering is focused on data wrangling, cloud and programming skills," he said.

Data engineers are deep in the weeds of today's open source big data management. The trend is noted by Curt Monash, a long-time database industry observer. Monash said that on a recent journey to Silicon Valley, the hotbed of big data technologies, he encountered greater use of the term data engineer.

As he noted in a recent blog post, the data engineer is in some part a response to the great number of abilities that were expected from the data scientist.

Data engineer and data science jobs today display some blending and overlap of skills. That comes with any new field, as does an element of trendiness.

The reality is that the job description of the data scientist probably comprises too wide a set of skills. Running data processing clusters, coding to the latest open source data APIs -- these skills and more were also part of the original data scientist mandate, but they are coming to be associated with the data engineer.

Dice's Melk listed Apache Hadoop, distributed computing and NoSQL among important skills for data engineering. For data science, top skills he cites include statistics, statistical modeling, predictive modeling and machine learning.

The data engineering ranks appear to be growing. In April, there were 891 data engineer jobs posted on Dice. While that may not be too impressive a number by itself, what is impressive is the rise from the previous year. Listings for data engineering jobs were up 88%, compared to April 2015.

Up from data science

Some early data scientists had to do it all -- they had to build their Hadoop clusters and create predictive analytics models, too. Now, in some organizations, data engineers are beginning to take on some of those assignments, according to consultant Rick Sherman, founder of Athena IT Solutions, a consulting firm based in Maynard, Mass.

Sherman, like others, looks to data engineers to, in effect, free up data scientists to do what they were hired for: advanced statistical and other analytics that uncover new business opportunities.

"The bigger the data science group, the more that work can be offloaded from a data scientist to a data engineer," he said. That is especially important, he added, because companies pay a lot for data science skills.

Data engineers wanted

A quick look at some available jobs show that "data engineer" covers a wide swath of skills:

  • A leading financial data service firm is looking for a senior data engineer experienced in cloud infrastructure, Scala, Apache Spark and Python.
  • An online travel service seeks data engineers with the ability to coordinate the work of domain experts -- in particular, the ability to work effectively with a machine learning team.
  • A national bank is looking for data engineers who can program and run various open source frameworks. It is seeking people with skills in things like Akka, Cassandra, Accumulo, HBase, Hadoop/HDFS, Avro, MongoDB and Mesos -- and maybe some data processing frameworks that haven't been invented yet.

It seems that, by and large, the data engineer does not get compensated at quite the rate of a data scientist. For example, the national average for data engineer salaries is estimated by job and recruiting site Glassdoor at $95,526. The national average for data scientist salaries is $113,436.

Still, there are cases where the data engineer's pay can break the $100,000 barrier. Although Dice does not claim enough sample data to confidently estimate data engineer salaries, Melk does point out that yearly salaries for adjacent skills -- such as Cassandra ($147,811), Pig ($132,850) and MapReduce ($131,563) programming -- can easily surpass the $100,000 level.

Delivering the data

The data engineers deliver the data -- their role is to prepare or define the structure of the data. In a way, the data engineer is taking on aspects of traditional extract, transform and load and data integration jobs, which are seeing drastic retooling as new big data processing frameworks proliferate.

A Hadoop data lake being built at a major defense contractor requires the efforts of some individuals that work in "more of a data engineer role" than a data science role, according to Kathy Sonderer, principal database technologist at Raytheon Co., responding to questions earlier this year at Enterprise Data World in San Diego.

"Their job is to integrate the data and provide it to the scientist," she said. "It is a role at the integration level."

Sonderer said managers don't necessarily have to go outside to find such talent. She emphasized the need to form teams for new data science initiatives by repurposing the skills of people already on hand.

Overlap and morphing

Data engineer and data science jobs today display some blending and overlap of skills. That comes with any new field, as does an element of trendiness.

The industry has seen morphing job titles before. The underlying work required doesn't always change too significantly, but job postings sometimes coalesce around the hot new titles.

We saw this with the software engineer, whose skills, in many cases, were not a whole lot different than those of a programmer. The software engineer title arose in some organizations simply because it conveyed a bit more gravitas, a bit more professionalism.

We also saw it with the software architect -- a title that became very useful when managers needed to rationalize better salaries for their best software engineers -- the ones who had made themselves indispensable.

The true utility and potential longevity of the data engineer will be proved out in months to come. But, for now, the ramp seems to lead upward. What may be beyond question is that the rise of the data engineer is emblematic of many sea shifts going on in data management today.

Next Steps

Go behind the scenes in data science

Learn the secrets of hiring data scientists

Find out how data engineers contribute at eBay

Dig Deeper on Enterprise data architecture best practices