Dimensional modeling can bring new capabilities for ad hoc reporting, but it also may entail hidden development overhead, according to a data practitioner helping to create business reports at a large insurance company.
The dimensional model, based on star schemas, is a key part of many modern data warehousing efforts. Reasons why the dimensional path is selected often center on business reporting. Notable here is the notion that data processes using dimensional modeling show benefits in supporting extensibility of queries and schema.
But working with dimensional models every day presents challenges, as David Blackfield told fellow data professionals attending a session about living with dimensional modeling at DAMA International's recent Enterprise Data World 2014 conference in Austin, Texas.
In dimensional models, you seek to bring context, or dimension, to facts, or measures. Blackfield, who is an information analyst with Allstate Insurance Co.'s Information Services Group, said his group created an integrated model that allowed use of common dimensions (such as names, codes and descriptions) across multiple facts (such as data, counts and amounts) to further the organization's ability to learn from data.
From Blackfield's perspective, dimensional models can help fulfill the basic mission behind the data warehouse, which is to provide information that the business can use. To this end, an underlying dimensional model can better enable users to ask ad hoc questions, he said.
The set of challenges that confront people implementing the dimensional model, in his estimation, is led by overhead. Said Blackfield: "There is no such thing as an easy fact table."
What he means is that there is complexity in expanding integrated dimensional models. "If we want to add a new fact table, we have to consider 20 common dimensions to be edited," he said.
For more data management tips
Joshua Greenbaum on new database technology
Forrester's Mike Gualtieri on data management strategy
David Loshin on the data steward role
The overhead incurred is on the development side of the data life cycle. The dimensional environment is more than just modelers doing their thing, Blackfield emphasized. "Somewhere behind that, somebody has to write code," he said. And that can get complicated.
Before, a team could build a new table with less consideration for other outcomes. Now, that table has to be linked, in the Allstate case, to 20 common dimensions. These in turn are connected to an even greater number of sources. Job scheduling and other elements constitute additional risk factors that adding fact tables can uncover.
The added software development time, or overhead, that Blackfield refers to can range from 20% to 50% over entity-relationship (ER) modeling jobs. While noting that this overhead is being reduced as teams gain more experience working with dimensional models, he told new dimensional modelers to anticipate added work along the way.
From a modeling or design standpoint, Blackfield said, adding new fact tables can be easy: "You just draw a line -- done and done." But, he reminds, ETL and other coding activities create overhead you should anticipate when entering the dimensional realm.