DataOps (data operations)

Contributor(s): Jack Vaughan

DataOps (data operations) is an approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.

A DataOps strategy, which is inspired by the DevOps movement, strives to speed the production of applications running on big data processing frameworks. Like DevOps, DataOps seeks to break down silos across IT operations and software development teams, encouraging line-of-business stakeholders to also work with data engineers, data scientists and analysts so that the organization’s data can be used in the most flexible, effective manner possible to achieve positive business outcomes.

As with DevOps, there are no “DataOps” software tools as such; there are only frameworks and related tool sets that support a DataOps approach to collaboration and increased agility. Such tools include include ETL/ELT tools, data curation and cataloging tools, and log analyzers and systems monitors. Tools that support microservices architectures, as well as open source software that lets applications blend structured and unstructured data, are also associated with the DataOps movement. Such software can include MapReduce, HDFS, Kafka, Hive and Spark.

Since it incorporates so many elements in the data lifecycle, DataOps spans a number of information technology disciplines, including data development, data transformation, data extraction, data quality, data governance, data access control, computation and capacity planning, and system operations. As of this writing, DataOps teams are often managed by an organization’s Chief Data Scientist or Chief Analytics Officer and job titles like “Data Ops Engineer” or “Data Ops Analyst” are still rare.

This was last updated in November 2016

Continue Reading About DataOps (data operations)

Dig Deeper on Data stewardship

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

[bias alert, I work for DataKitchen, a DataOps Company] We think that inflexibility, poor quality, and other obstacles hinder the successful production of analytics for data-driven organizations. Other types of organizations have faced similar challenges and the lessons learned in these other domains can be applied in data analytics. In software development, both Agile Development and DevOps have led to a major transformation in the speed and quality of code creation. In manufacturing, statistical process controls (SPC) assure quality and provide early feedback on non-conformances. Applying these methods to data analytics is called DataOps. DataOps is a combination of tools and process improvements that enable rapid-response data analytics at a high level of quality. DataOps adapts more easily to user requirements, even as they evolve, and ultimately supports improved data-driven decision-making.


File Extensions and File Formats

Powered by: