juanjo tugores - Fotolia


New data preparation tools open up more info to analysts

Consultant David Loshin says self-service data preparation software helps give data analysts broader access to business information than they typically can get in a data warehouse.

A growing number of business analysts are sharpening their skills at writing ad hoc queries and analytical algorithms to uncover useful information in corporate data stores -- and help their organizations become more data-driven in making business decisions. Yet as these workers become more sophisticated in their use of analytics tools, many find that conventional data warehouse architectures impede their ability to analyze the data they want to look at.

There are three core reasons for that -- all things that potentially can be addressed by an emerging class of self-service data preparation tools designed to enable business analysts, data scientists and other end users to bypass the data warehouse and carry out key pieces of the data integration and preparation process themselves.

First, the traditional data warehouse typically is a repository for data sets that have been extracted from internal transaction processing or operational systems for use in reporting on business performance. This limits the scope and types of analyses that can be performed against the data.

Second, the extracted data sets are integrated and standardized, using a monolithic set of business rules, to align with a predefined data model designed for dimensional slicing and dicing. Doing so filters out information that may be relevant to particular analytics applications. And third, the IT group is usually responsible for developing the rules and processes for transforming the data going into a data warehouse -- an approach that similarly may not meet the information needs of the analysts who are ultimately expected to use that data.

Obviously, conventional data warehousing processes can work for companies, but the data landscape is rapidly changing. Organizations increasingly are looking to blend their transaction data with information coming from a variety of other sources, including website clickstream and activity logs, sensors on manufacturing equipment and other devices, customer emails, social networks and streaming data feeds from customers, data aggregators, and third-party information services providers.

New data types, new data platforms

Exploiting these often external data sources can boost efforts to generate actionable intelligence that, when paired with changes in business processes, provides the means to make a company truly data-driven. In many cases, though, the added data is better suited to being processed and stored in a big data platform -- a Hadoop cluster, NoSQL database or Spark system, for example -- than in a data warehouse. Or it may be accessible through an external Web portal.

In addition, business analysts, as well as data scientists and other analytics professionals, often want to access different combinations of the available data -- sometimes in its raw form.

For example, the marketing team at a consumer products maker may want to analyze a mix of customer profile records, news feeds and social media data to look for patterns that can help in planning an online marketing campaign. Meanwhile, the customer experience team may want to monitor social media feeds and product reviews from various websites to identify potential product issues, so it can take action to placate dissatisfied customers. And so on for other departments. Because each has different requirements and goals, it's virtually impossible for a homogenized data warehouse to enable all of their analytics objectives to be met.

Empowering analysts to work with the data that best meets their individual needs can be a more fruitful approach. It has implications for the various aspects of data integration, including data discovery, ingestion, profiling, validation and quality. But the new self-service data preparation tools developed by a variety of vendors offer a potential helping hand.

Logical separation on data preparation

The technologies create a sensible segregation of duties between analytics users, and IT and data management teams. Business analysts and data scientists can use the data preparation tools to find relevant data in different systems, pull it together, profile and cleanse the data for consistency, and define the business rules that govern their use of the information. With the data prep software at their disposal, they're able to get more comprehensive and customized views of the data they're interested in than they typically could from a data warehouse.

Ideally, the analysts also become more accountable for how the data is used. That means they should be tasked with understanding and adhering to high-level governance policies on data usage and collaborating with others to ensure that data, and how it's interpreted, remains consistent across the enterprise.

Because data sets are being captured and maintained in their original formats, the IT department is freed from having to implement integration and transformation rules dictating what data is available for analysis. Instead, IT's responsibility transitions into managing the overall infrastructure supporting data discovery, integration and analysis, and providing control mechanisms to monitor for inconsistencies in data definitions and noncompliance with defined governance directives on using business data.

Data warehouses likely aren't going away in most organizations that have deployed them. And self-service data preparation software is a relatively new and still-maturing technology, sold primarily by startup vendors. But the blossoming of these data preparation tools points the way to increased analytical flexibility and effectiveness in companies that are looking to get more out of their data.

About the author:
David Loshin is president of Knowledge Integrity Inc., a consulting and development services company. Email him at [email protected].

Email us at [email protected] and follow us on Twitter: @sDataManagement.

Next Steps

Listen to a podcast on how self-service tools are affecting data integration and preparation

Get tips on building an extended business intelligence and analytics architecture

Self-service BI, big data initiatives put pressure on data governance efforts

Dig Deeper on Data preparation