Data Management/Data Warehousing Definitions

This glossary explains the meaning of key words and phrases that information technology (IT) and business professionals use when discussing data management and related software products. You can find additional definitions by visiting WhatIs.com or using the search box below.

  • A

    Apache Falcon

    Apache Falcon is a data management tool for overseeing data pipelines in Hadoop clusters, with a goal of ensuring consistent and dependable performance on complex processing jobs.

  • Apache Flink

    Apache Flink is an in-memory and disk-based distributed data processing platform for use in big data streaming applications.

  • Apache Giraph

    Apache Giraph is real-time graph processing software that is mostly used to analyze social media data. Giraph was developed by Yahoo! and given to the Apache Software Foundation for future management.

  • Apache Hadoop YARN

    Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.

  • Apache HBase

    Apache HBase is a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS).

  • Apache Hive

    Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

  • Apache Incubator

    Apache Incubator is the starting point for projects and software seeking to become part of the Apache Software Foundation’s efforts. The ASF is a non-profit organization that oversees the development of Apache software.

  • Apache Pig

    Apache Pig is an open-source technology that offers a high-level mechanism for parallel programming of MapReduce jobs to be executed on Hadoop clusters.

  • Apache Spark

    Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.

  • B

    big data

    Big data is an evolving term that describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information.

  • big data management

    Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.

  • C

    column database management system (CDBMS)

    There are different types of CDBMS offerings, with the common defining feature being that data is stored by column (or column families) instead of as rows.

  • columnar database

    A columnar database is a database management system (DBMS) that stores data in columns instead of rows.

  • compliance

    Compliance is the act of being in alignment with guidelines, regulations and/or legislation. Organizations must ensure that they are in compliance with software licensing terms set by vendors, for example, as well as regulatory mandates.

  • conformed dimension

    In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close