Data warehouse Definitions

  • A

    Apache Flink

    Apache Flink is an in-memory and disk-based distributed data processing platform for use in big data streaming applications.

  • Apache Hadoop YARN

    Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.

  • Apache HBase

    Apache HBase is a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS).

  • Apache Hive

    Apache Hive is an open source data warehouse system for querying and analyzing large data sets that are principally stored in Hadoop files.

  • Apache Incubator

    Apache Incubator is the starting point for projects and software seeking to become part of the Apache Software Foundation’s efforts. The ASF is a non-profit organization that oversees the development of Apache software.

  • Apache Pig

    Apache Pig is an open-source technology that offers a high-level mechanism for parallel programming of MapReduce jobs to be executed on Hadoop clusters.

  • Apache Spark

    Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.

  • B

    Big data

    Big data is a combination of structured, semistructured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.

  • big data management

    Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.

  • C

    conformed dimension

    In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.

  • D

    dark data

    Dark data is digital information that is not being used. Consulting and market research company Gartner Inc. describes dark data as "information assets that an organization collects, processes and stores in the course of its regular business activity, but generally fails to use for other purposes."

  • data analytics (DA)

    Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information.

  • Data as a Service (DaaS)

    Data as a Service (DaaS) is an information provision and distribution model in which data files (including text, images, sounds, and videos) are made available to customers over a network, typically the Internet.

  • data engineer

    A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses.

  • data mart (datamart)

    A data mart is a repository of data that is designed to serve a particular community of knowledge workers.

  • data modeling

    Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow.

  • data silo

    A data silo exists when an organization's departments and systems cannot, or do not, communicate freely with one another and encourage the sharing of business-relevant data.

  • data warehouse

    A data warehouse is a federated repository for all the data collected by an enterprise's various operational systems, be they physical or logical.

  • data warehouse as a service (DWaaS)

    Data warehousing as a service (DWaaS) is an outsourcing model in which a service provider configures and manages the hardware and software resources a data warehouse requires, and the customer provides the data and pays for the managed service.

  • dimension

    In data warehousing, a dimension is a collection of reference information about a measurable event (fact).

  • What is data management and why is it important?

    Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization, as explained in this in-depth look at the process.

  • E

    Entity Relationship Diagram (ERD)

    An entity relationship diagram (ERD), also known as an entity relationship model, is a graphical representation that depicts relationships among people, objects, places, concepts or events within an information technology (IT) system.

  • Extract, Load, Transform (ELT)

    Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.

  • extract, transform, load (ETL)

    In managing databases, extract, transform, load (ETL) refers to three separate functions combined into a single programming tool.

  • G

    Google BigQuery

    Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets. BigQuery was designed for analyzing data on the order of billions of rows, using a SQL-like syntax.

  • Google Bigtable

    Google Bigtable is a distributed, column-oriented data store created by Google Inc. to handle very large amounts of structured data associated with the company's Internet search and Web services operations.

  • Google Cloud Dataflow

    Google Cloud Dataflow is a cloud-based data processing service for both batch and real-time data streaming applications.

  • Google Cloud Spanner

    Google Cloud Spanner is a distributed relational database service that runs on Google Cloud.

  • H

    Hadoop

    Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.

  • Hadoop 2

    Apache Hadoop 2 is the second iteration of the Hadoop framework for distributed data processing.  Hadoop 2 adds support for running non-batch applications as well as new features to improve system availability.

  • Hadoop data lake

    A Hadoop data lake is a data management platform comprising one or more Hadoop clusters.

  • Hadoop Distributed File System (HDFS)

    The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.

  • J

    JAQL (json query language)

    JAQL is a query language for the JavaScript Object Notation (JSON) data interchange format. Pronounced "jackal," JAQL is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data.

  • M

    MongoDB

    MongoDB is an open source database management system (DBMS) that uses a document-oriented database model which supports various forms of data.

  • multimodel database

    A multimodel database is a data processing platform that supports multiple data models, which define the parameters for how the information in a database is organized and arranged.

  • N

    NewSQL

    NewSQL is a term coined by the analyst firm The 451 Group as shorthand to describe vendors of new, scalable, high performance SQL databases.

  • O

    OLAP (online analytical processing)

    OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.

  • S

    semantic technology

    Semantic technology is a set of methods and tools that provide advanced means for categorizing and processing data, as well as for discovering relationships within varied data sets.

  • snowflaking (snowflake schema)

    In data warehousing, snowflaking is a form of dimensional modeling where dimensions are stored in multiple related dimension tables. 

  • SQL-on-Hadoop

    SQL-on-Hadoop is a class of analytical application tools that combine established SQL-style querying with newer Hadoop data framework elements.

  • star schema

    In data warehousing, a star schema is the simplest form of dimensional model, with data organized into facts and dimensions. 

  • T

    TensorFlow

    TensorFlow is an open source framework developed by Google researchers to run machine learning, deep learning and other statistical and predictive analytics workloads.

  • tree structure

    A tree structure is an algorithm for placing and locating files (called records or keys) in a database. The algorithm finds data by repeatedly making choices at decision points called nodes. A node can have as few as two branches (also called children)...

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close