Definition

data engineer

This definition is part of our Essential Guide: Using big data platforms for data management, access and analytics
Contributor(s): Jack Vaughan

A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses. The specific tasks handled by data engineers can vary from organization to organization but typically include building data pipelines to pull together information from different source systems; integrating, consolidating and cleansing data; and structuring it for use in individual analytics applications.

The data engineer often works as part of an analytics team, providing data in a ready-to-use form to data scientists who are looking to run queries and algorithms against the information for predictive analytics, machine learning and data mining purposes. In many cases, data engineers also work with business units and departments to deliver data aggregations to executives, business analysts and other end users for more basic types of analysis to aid in ongoing operations.

Data engineers commonly deal with both structured and unstructured data sets -- as a result, they must be versed in different approaches to data architecture and applications. A variety of big data technologies, including an ever-growing assortment of open source data ingestion and processing frameworks, are also part of the data engineer's tool kit.

To carry out their duties, data engineers can be expected to have skills in such programming languages as C#, Java, Python, Ruby, Scala and SQL. They also need a good understanding of extract, transform and load tools and REST-oriented APIs for creating and managing data integration jobs, and providing data analysts and business users with simplified access to prepared data sets.

Hadoop data lakes that offload some of the processing and storage work of established enterprise data warehouses have been a chief area of application for the data engineer in support of big data analytics efforts. NoSQL databases and Apache Spark systems are also becoming increasingly common components of the data workflows set up by data engineers. Another area of focus is Lambda architecture, which supports unified data pipelines for both batch and real-time processing.

As the data engineer job has gained more definition, IBM, Hadoop vendor Cloudera Inc. and other organizations have begun offering certifications for data engineering professionals.

This was last updated in September 2016

Continue Reading About data engineer

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What role do data engineers play in your organization?
Cancel

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close