Data Management/Data Warehousing Definitions
This glossary explains the meaning of key words and phrases that information technology (IT) and business professionals use when discussing data management and related software products. You can find additional definitions by visiting WhatIs.com or using the search box below.
-
A
Apache Falcon
Apache Falcon is a data management tool for overseeing data pipelines in Hadoop clusters, with a goal of ensuring consistent and dependable performance on complex processing jobs.
-
Apache Flink
Apache Flink is an in-memory and disk-based distributed data processing platform for use in big data streaming applications.
-
Apache Giraph
Apache Giraph is real-time graph processing software that is mostly used to analyze social media data. Giraph was developed by Yahoo! and given to the Apache Software Foundation for future management.
-
Apache Hadoop YARN
Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.
-
Apache HBase
Apache HBase is a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS).
-
Apache Hive
Apache Hive is an open source data warehouse system for querying and analyzing large data sets that are principally stored in Hadoop files.
-
Apache Incubator
Apache Incubator is the starting point for projects and software seeking to become part of the Apache Software Foundation’s efforts. The ASF is a non-profit organization that oversees the development of Apache software.
-
Apache Pig
Apache Pig is an open-source technology that offers a high-level mechanism for parallel programming of MapReduce jobs to be executed on Hadoop clusters.
-
Apache Spark
Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.
-
B
big data
Big data is an evolving term that describes a large volume of structured, semi-structured and unstructured data that has the potential to be mined for information and used in machine learning projects and other advanced analytics applications.
-
big data management
Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.
-
C
column database management system (CDBMS)
There are different types of CDBMS offerings, with the common defining feature being that data is stored by column (or column families) instead of as rows.
-
columnar database
A columnar database is a database management system (DBMS) that stores data in columns instead of rows.
-
compliance
Compliance is the act of being in alignment with guidelines, regulations and/or legislation. Organizations must ensure that they are in compliance with software licensing terms set by vendors, for example, as well as regulatory mandates.
-
conformed dimension
In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.