data engineer

Contributor(s): Jack Vaughan

A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses. The specific tasks handled by data engineers can vary from organization to organization but typically include building data pipelines to pull together information from different source systems; integrating, consolidating and cleansing data; and structuring it for use in individual analytics applications.

The data engineer often works as part of an analytics team, providing data in a ready-to-use form to data scientists who are looking to run queries and algorithms against the information for predictive analytics, machine learning and data mining purposes. In many cases, data engineers also work with business units and departments to deliver data aggregations to executives, business analysts and other end users for more basic types of analysis to aid in ongoing operations.

Data scientist vs. data engineer
A comparison of data scientist and data engineer roles.

Data engineers commonly deal with both structured and unstructured data sets -- as a result, they must be versed in different approaches to data architecture and applications. A variety of big data technologies, including an ever-growing assortment of open source data ingestion and processing frameworks, are also part of the data engineer's tool kit.

To carry out their duties, data engineers can be expected to have skills in such programming languages as C#, Java, Python, Ruby, Scala and SQL. They also need a good understanding of extract, transform and load tools and REST-oriented APIs for creating and managing data integration jobs, and providing data analysts and business users with simplified access to prepared data sets.

Hadoop data lakes that offload some of the processing and storage work of established enterprise data warehouses have been a chief area of application for the data engineer in support of big data analytics efforts. NoSQL databases and Apache Spark systems are also becoming increasingly common components of the data workflows set up by data engineers. Another area of focus is Lambda architecture, which supports unified data pipelines for both batch and real-time processing.

As the data engineer job has gained more definition, IBM, Hadoop vendor Cloudera Inc. and other organizations have begun offering certifications for data engineering professionals.

This was last updated in September 2016

Continue Reading About data engineer

Dig Deeper on Data management jobs, training and certification

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

What role do data engineers play in your organization?
The definition varies project to project and on data requirement basis. In my organisation, Data engineers at times may have to simulate the data streaming for creating an end to end ingestion and transmission/transfer workflow. Majorly they are involved in creating the a Data flow diagram and creating the process the movement, administration and DR process for the data. Recently we have also required them to enrich the data to the staging layer, from where the warehousing team can transform or create the marts or warehouses as per their requirement. There has been a significant distinction between the data engineering team and warehouse team in certain projects. This may not hold true always but may depend on requirement to requirement, project to project and organisational structure basis.