Get started
Bring yourself up to speed with our introductory content.
Get started
Bring yourself up to speed with our introductory content.
Graph database vs. relational database: Key differences
Relational databases and graph databases both focus on the relationships between data but not in the same ways. Here are some key differences between the two. Continue Reading
feature engineering
Feature engineering is the process that takes raw data and transforms it into features that can be used to create a predictive model using machine learning or statistical modeling, such as deep learning. Continue Reading
DBMS keys: Primary, super, foreign and candidate keys with examples
Here's a guide to primary, super, foreign and candidate keys, what they're used for in relational database management systems and the differences among them. Continue Reading
-
NoSQL (Not Only SQL database)
NoSQL is an approach to database design that can accommodate a wide variety of data models, including key-value, document, columnar and graph formats. Continue Reading
Top 5 U.S. open data use cases from federal data sets
The U.S. government has made data sets from many federal agencies available for public access to use and analyze. Check out some of the ways that data is being used. Continue Reading
Quiz on MongoDB 4 new features and database updates
Check out this excerpt from the new book Learn MongoDB 4.x from Packt Publishing, then quiz yourself on new updates and features to the database.Continue Reading
Google BigQuery
Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.Continue Reading
Why understanding data structures is so important to coders
Jay Wengrow talks about how his new book on data structures and algorithms and considerations for making your choices as efficient as possible.Continue Reading
Key steps in the feature engineering process
Feature engineering is key to machine learning algorithms. Read on to learn how those features are created and chosen to increase the accuracy of those models.Continue Reading
data analytics (DA)
Data analytics (DA) is the process of examining data sets in order to find trends and draw conclusions about the information they contain.Continue Reading
-
Apache Hadoop YARN
Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.Continue Reading
MongoDB
MongoDB is an open source NoSQL database management program. NoSQL is used as an alternative to traditional relational databases.Continue Reading
How to ensure your data lake security
Your data lake is full of sensitive information and securing that data is a top priority. These are the best practices to keep that information safe from hackers.Continue Reading
When a DIY database management system design is the best fit
Learn how a combination of homegrown, off-the-shelf and open source tools, plus proper motivation, can yield a DIY DBMS that meets corporate expectations, needs and ROI.Continue Reading
Building a database application the DIY way
Business users experience the trials, tribulations and exultations of building a DIY DBMS, especially when IT expertise is not readily available or costs are too high.Continue Reading
relational database
A relational database is a collection of information that organizes data points with defined relationships for easy access.Continue Reading
Developing an enterprise data strategy: 10 steps to take
Consultants detail 10 to-do items for data management teams looking to create a data strategy to help their organization use data more effectively in business operations.Continue Reading
database replication
Database replication is the frequent electronic copying of data from a database in one computer or server to a database in another -- so that all users share the same level of information.Continue Reading
What is data governance and why does it matter?
Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage.Continue Reading
What steps are key to building a data catalog?
An enterprise data catalog can help data stewards and other users in an organization manage metadata and explore data assets. Here are 10 key steps for creating a data catalog.Continue Reading
data stewardship
Data stewardship is the management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner.Continue Reading
consumer privacy (customer privacy)
Consumer privacy, also known as customer privacy, involves the handling and protection of the sensitive personal information provided by customers in the course of everyday transactions.Continue Reading
corporate performance management (CPM)
Corporate performance management (CPM) is a term used to describe the various processes and methodologies involved in aligning an organization's strategies and goals to its plans and executions in order to control the success of the company.Continue Reading
Extract, Load, Transform (ELT)
Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.Continue Reading
How data lineage tools boost data governance policies
Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so and key features in data lineage tools.Continue Reading
Data warehousing design and value change with the times
Big data, the cloud and analytics profoundly shape data warehouse purpose and design. Learn how companies derive value from a repository that at times needs definition.Continue Reading
database as a service (DBaaS)
Database as a service (DBaaS) is a cloud computing managed service offering that provides access to a database without requiring the setup of physical hardware, the installation of software or the need to configure the database.Continue Reading
data integration
Data integration is the process of combining data from multiple source systems to create unified sets of information for both operational and analytical uses.Continue Reading
RDBMS (relational database management system)
A relational database management system (RDBMS) is a collection of programs and capabilities that enable IT teams and others to create, update, administer and otherwise interact with a relational database.Continue Reading
master data management (MDM)
Master data management (MDM) is a process that creates a uniform set of data on customers, products, suppliers and other business entities from different IT systems.Continue Reading
data quality
Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it's up to date.Continue Reading
data classification
Data classification is the process of organizing data into categories that make it is easy to retrieve, sort and store for future use.Continue Reading
What is data management and why is it important?
Data management is the process of ingesting, storing, organizing and maintaining the data created and collected by an organization, as explained in this in-depth look at the process.Continue Reading
big data
Big data is a combination of structured, semistructured and unstructured data collected by organizations that can be mined for information and used in machine learning projects, predictive modeling and other advanced analytics applications.Continue Reading
Hadoop
Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.Continue Reading
Third-party database tools boast attractive alternatives
For companies considering third-party database tools, this handbook provides expert advice on evaluating and deploying on-premises and cloud options from third parties.Continue Reading
Entity Relationship Diagram (ERD)
An entity relationship diagram (ERD), also known as an entity relationship model, is a graphical representation that depicts relationships among people, objects, places, concepts or events within an information technology (IT) system.Continue Reading
data
In computing, data is information that has been translated into a form that is efficient for movement or processing.Continue Reading
data warehouse
A data warehouse is a federated repository for all the data collected by an enterprise's various operational systems, be they physical or logical.Continue Reading
Big data containers gain wider appeal in system deployments
This handbook examines the use of Docker containers in Kubernetes clusters to run big data systems and offers insight on container deployment and management issues.Continue Reading
Data virtualization tools promote anywhere, anytime data access
This online handbook examines data virtualization software and how organizations are deploying and using the technology as part of their data integration processes.Continue Reading
Data as a Service (DaaS)
Data as a Service (DaaS) is an information provision and distribution model in which data files (including text, images, sounds, and videos) are made available to customers over a network, typically the Internet.Continue Reading
Advice on enterprise data cleansing from an SAP VP
SAP's Kristin McMahon details data cleansing best practices and explains why a good data cleanse needs continual communication, collaboration and oversight.Continue Reading
DataOps (data operations)
DataOps (data operations) is an Agile approach to designing, implementing and maintaining a distributed data architecture that will support a wide range of open source tools and frameworks in production. The goal of DataOps is to create business ...Continue Reading
data virtualization
Data virtualization is an umbrella term used to describe any approach to data management that allows an application to retrieve and manipulate data without needing to know any technical details about the data such as how it is formatted or where it ...Continue Reading
Data model design tips to help standardize business data
Data models should be understandable to business users and kept to a reasonable scope, say the leaders of a data modeling initiative at England's Environment Agency.Continue Reading
USA Patriot Act
The USA Patriot Act is a law enacted in 2001, granting new and extended data-collection abilities to federal agencies in an effort to combat terrorism after the September 11 attacks.Continue Reading
USAA adds data engineering skills to speed data science work
When the United Services Automobile Association's data science team wasn't getting data in the right format, the team lead realized the USAA needed more data engineers.Continue Reading
data profiling
Data profiling is the process of examining, analyzing and reviewing data to collect statistics surrounding the quality and hygiene of the dataset.Continue Reading
5 things to know about deploying big data systems in data containers
Planning for security and container APIs, and watching out for infrastructure sprawls are some issues to be aware of before deploying big data in containers.Continue Reading
DataOps is more than DevOps for data, Delphix CTO says
Data operations is young compared to DevOps, but it is increasingly used as part of projects that put data at the center of development. Here, Delphix CTO Eric Schrock makes observations about the trend.Continue Reading
HR makes major strides toward improving employee engagement
What is the difference between DBMS and RDBMS?
A relational database management system is the most popular type of database management system for business uses. Find out how RDBMS software differs from DBMS technology in general.Continue Reading
data modeling
Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow.Continue Reading
SQL vs. NoSQL: What do you know about the database designs?
The decision to use a SQL database or a NoSQL database can be made wisely only if the ins and outs of both are understood. See how well you know the database architectures.Continue Reading
data mart (datamart)
A data mart is a repository of data that is designed to serve a particular community of knowledge workers.Continue Reading
11 features to look for in data quality management tools
As the need for quality data has increased, so have the capabilities of data quality tools. Learn how collaboration, data lineage and other features enable data quality.Continue Reading
AI for analytics augments and bolsters business intelligence
What is an enterprise data strategy?
Defining a data strategy can help focus an organization's data management initiatives -- but it isn't the same as data governance. Expert Anne Marie Smith explains why.Continue Reading
customer data integration (CDI)
Customer data integration (CDI) is the process of defining, consolidating and managing customer information across an organization's business units and systems to achieve a "single version of the truth" for customer data.Continue Reading
5 to-dos for your GDPR compliance checklist
It's never too late to fine-tune your GDPR strategy. Expert Anne Marie Smith suggests a current state analysis of your PII protections, drafting a data privacy policy and more.Continue Reading
Cloud vs. legacy ERP systems: Tug of war intensifies for SMBs
Aging legacy ERP systems at SMBs seem to be getting plenty of scrutiny these days. Heightened consumer demands, shifting technology landscapes and relentless market disruptions, not to mention maintenance costs, technical support and obsolescence, ...Continue Reading
Apache Hive
Apache Hive is an open source data warehouse system for querying and analyzing large data sets that are principally stored in Hadoop files.Continue Reading
Good data quality for machine learning is an analytics must
As companies add machine learning applications, they need to really understand -- and be able to improve -- their data. That's where data quality initiatives come in.Continue Reading
The benefits of columnar storage and the Parquet file format
What's behind Apache Parquet's growing popularity? It may be the file format's columnar storage orientation, which leads to benefits including improved query performance.Continue Reading
Four first steps for customer data management
Forrester's Mike Gualtieri details how to develop a unified plan to manage customer data that gives business users what they need to manage CRM programs.Continue Reading
Three factors for protecting sensitive data in the GDPR era
Data privacy is a hot topic nowadays thanks to GDPR and the Facebook data scandal. But how do data security, access control and data protection differ?Continue Reading
What's the difference between DDL and DML?
What's the difference between DDL and DML? Get the answer and see examples of data manipulation language and data definition language commands for SQL databases.Continue Reading
What goes into a customer analytics data integration framework
Customer data integration is a minefield for IT teams to navigate. But incorporating a set of core technical functions into an integration architecture can ease the process.Continue Reading
Google Cloud data lake fuels cloud payment processing flow
To create a cloud payment processing system, Global Payments first had to deploy a data lake in the Google Cloud. Getting quick user feedback was another early step.Continue Reading
Develop smart AI in CRM strategies to win and keep customers
Of the three words that comprise customer relationship management, one word binds the other two. As necessity and competition dictate that CRM upgrade itself with artificial intelligence and flights to the cloud, what counts most in ...Continue Reading
GDPR compliance requirements drive new winds of data privacy
Hello, GDPR. May 25 is the witching hour for enforcement of the EU's much-discussed GDPR compliance requirements -- and it's a harbinger of more changes to come.Continue Reading
What does the GDPR definition of personal data include?
The definition of personal data in the EU's GDPR data protection rules is broad enough to include any type of data that can be used to directly or indirectly identify a person.Continue Reading
Data expert: GDPR deadline is an opportunity, not a burden
There is stress as the EU's General Data Protection Regulation compliance deadline nears, but the GDPR privacy movement is a good thing for data policies, advises consultant Daragh O Brien.Continue Reading
Hadoop Distributed File System (HDFS)
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.Continue Reading
TensorFlow
TensorFlow is an open source framework developed by Google researchers to run machine learning, deep learning and other statistical and predictive analytics workloads.Continue Reading
Hyperledger Fabric offers path to enterprise blockchain future
Blockchain arose from bitcoin, but it's looking to find a place in the enterprise. Frameworks like Hyperledger Fabric could smooth out the technology's rough side for business uses.Continue Reading
Data lake concept needs firm hand to pay big data dividends
Data lakes pose technology deployment and data management challenges that can leave analytics users high and dry if the implementation process isn't handled properly.Continue Reading
Slow to gain traction, AI apps on the verge of explosion
From chatbots ("Can I help you?") to killer bots ("I'll be back."), artificial intelligence runs the gamut of applications and emotions like no other technology. It's been nearly 70 years since AI first came into consciousness with humankind, yet ...Continue Reading
Three ways to turn old files into Hadoop data sets in a data lake
Hadoop data lakes offer a new home for legacy data that still has analytical value. But there are different ways to convert the data for use in Hadoop depending on your analytics needs.Continue Reading
Hadoop data lake
A Hadoop data lake is a data management platform comprising one or more Hadoop clusters.Continue Reading
How AI and IoT will influence data management in 2018
AI and IoT will alter the data management landscape in 2018, according to analyst James Kobielus. AI will need regular updates, and DevOps will become more prevalent as a result.Continue Reading
MariaDB
MariaDB is an open source relational database management system (DBMS) that is a compatible drop-in replacement for the widely used MySQL database technology.Continue Reading
dark data
Dark data is digital information that is not being used. Consulting and market research company Gartner Inc. describes dark data as "information assets that an organization collects, processes and stores in the course of its regular business ...Continue Reading
Database architecture design has to guard against DBMS chaos
The proliferation of database technologies gives organizations more options to meet data processing needs. However, a strong architecture strategy is a must to avoid a DBMS free-for-all.Continue Reading
Business benefits of DevOps empower citizen developers
The mysteries once associated with coding and application development are gradually giving way to the forces of market demand for speed and simplicity. No sooner did we get somewhat comfortable with DevOps and what it means that a new wrinkle...Continue Reading
Good data quality for analytics becomes an IT imperative
High-quality data is a must for analytics applications. That's driving more demand for data quality tools, but quality initiatives are still maturing in many companies.Continue Reading
Cloud big data clusters test users on migration, management
There are good reasons to move big data systems to the cloud, but doing so also poses challenges for IT teams on migrating workloads and then managing clusters and system instances.Continue Reading
Database protection methods expand to shield data from attackers
Database vendors have beefed up the security tools in their software -- and that's a good thing because attackers are increasingly targeting database systems to steal sensitive data.Continue Reading
Google Cloud Spanner
Google Cloud Spanner is a distributed relational database service that runs on Google Cloud.Continue Reading
denormalization
In a relational database, denormalization is an approach to optimizing performance in which the administrator selectively adds back specific instances of duplicate data after the data structure has been normalized.Continue Reading
semantic technology
Semantic technology is a set of methods and tools that provide advanced means for categorizing and processing data, as well as for discovering relationships within varied data sets.Continue Reading
Apache Spark
Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.Continue Reading
Business value key to data governance software selection
There are several different types of data governance software, and choosing the right product -- or products -- involves business considerations in addition to technology ones.Continue Reading
Greater variety of database platforms increases IT options
The days when relational databases were the answer to almost every data management question have given way to a more varied environment, which has pros and cons for IT teams.Continue Reading
Using big data platforms for data management, access and analytics
Big data architectures typically involve multiple processing platforms. In this essential guide, you'll find information and advice on managing Hadoop, Spark and other big data technologies.Continue Reading
multimodel database
A multimodel database is a data processing platform that supports multiple data models, which define the parameters for how the information in a database is organized and arranged.Continue Reading
How to select the best DBMS software: A buyer's guide
Learn how to evaluate and buy the best DBMS software for your organization.Continue Reading
What are key features for choosing the best ETL tools for your needs?
Choosing the right ETL tool for your data integration requirements can be a challenge. Here's a rundown on what to look for in ETL software and potential vendors to consider.Continue Reading
data engineer
A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses.Continue Reading