Big data management Definitions

  • A

    Apache Hadoop YARN

    Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.

  • Apache Hive

    Apache Hive is an open source data warehouse system for querying and analyzing large data sets that are principally stored in Hadoop files.

  • Apache Incubator

    Apache Incubator is the starting point for projects and software seeking to become part of the Apache Software Foundation’s efforts. The ASF is a non-profit organization that oversees the development of Apache software.

  • Apache Pig

    Apache Pig is an open-source technology that offers a high-level mechanism for parallel programming of MapReduce jobs to be executed on Hadoop clusters.

  • Apache Spark

    Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.

  • B

    Big data

    Big data is a combination of structured, semistructured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.

  • big data management

    Big data management is the organization, administration and governance of large volumes of both structured and unstructured data.

  • D

    dark data

    Dark data is digital information that is not being used. Consulting and market research company Gartner Inc. describes dark data as "information assets that an organization collects, processes and stores in the course of its regular business activity, but generally fails to use for other purposes."

  • data engineer

    A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses.

  • What is Data Management and Why is it important?

    Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization, as explained in this in-depth look at the process.

  • E

    Entity Relationship Diagram (ERD)

    An entity relationship diagram (ERD), also known as an entity relationship model, is a graphical representation that depicts relationships among people, objects, places, concepts or events within an information technology (IT) system.

  • G

    Google Bigtable

    Google Bigtable is a distributed, column-oriented data store created by Google Inc. to handle very large amounts of structured data associated with the company's Internet search and Web services operations.

  • Google Cloud Spanner

    Google Cloud Spanner is a distributed relational database service that runs on Google Cloud.

  • H

    Hadoop

    Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.

  • Hadoop Distributed File System (HDFS)

    The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close