DataOps (data operations) is an approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production.
A DataOps strategy, which is inspired by the DevOps movement, strives to speed the production of applications running on big data processing frameworks. Like DevOps, DataOps seeks to break down silos across IT operations and software development teams, encouraging line-of-business stakeholders to also work with data engineers, data scientists and analysts so that the organization’s data can be used in the most flexible, effective manner possible to achieve positive business outcomes.
As with DevOps, there are no “DataOps” software tools as such; there are only frameworks and related tool sets that support a DataOps approach to collaboration and increased agility. Such tools include include ETL/ELT tools, data curation and cataloging tools, and log analyzers and systems monitors. Tools that support microservices architectures, as well as open source software that lets applications blend structured and unstructured data, are also associated with the DataOps movement. Such software can include MapReduce, HDFS, Kafka, Hive and Spark.
Since it incorporates so many elements in the data lifecycle, DataOps spans a number of information technology disciplines, including data development, data transformation, data extraction, data quality, data governance, data access control, computation and capacity planning, and system operations. As of this writing, DataOps teams are often managed by an organization’s Chief Data Scientist or Chief Analytics Officer and job titles like “Data Ops Engineer” or “Data Ops Analyst” are still rare.