Home > Ask the Data management / BI Experts > DBMS and data warehousing Questions & Answers > Defining dimensional vs. normalized database design, dimension vs. fact tables
Ask The Data Management Expert: Questions & Answers
EMAIL THIS

Defining dimensional vs. normalized database design, dimension vs. fact tables

Mark Whitehorn EXPERT RESPONSE FROM: Mark Whitehorn

Pose a Question
Other Data Management Categories
Meet all Data Management Experts
Become an Expert for this site


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


>
QUESTION POSED ON: 14 April 2009
When faced with an existing database implementation, is there a way to determine if it is in fact-dimension style or a normalized style? Is it shades of grey, or is there a unique features of one or the other? What is the difference between a dimension table and a fact table?


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
DBMS and data warehousing
Definition of primary, super, foreign and candidate key in the DBMS
What is the difference between a logical and physical warehouse design?
What are some emerging data warehouse and DBMS trends?
How to get data/database independence with a three-tier architecture
How to select an MPP database: DB2 vs. Teradata
What comes first — the data mart or the data warehouse?
What are the top database management systems (DBMS)?
What is the role of DBMS in RDBMS?
Is an Inmon-modeled BI system, like Madison, the future of data warehousing?
What are the benefits and disadvantages of a RDBMS?

Database management systems (DBMS) architecture and design
Definition of primary, super, foreign and candidate key in the DBMS
What is the difference between a logical and physical warehouse design?
What are some emerging data warehouse and DBMS trends?
Data Warehouse Platforms Product Directory
Designing for performance: Strategic database application deployments
An introduction to database transaction management
Database access security: network authentication or data encryption?
Executing SQL statements using prepared statements and statement pooling
Static SQL vs. dynamic SQL for database application performance
How to get data/database independence with a three-tier architecture

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
data classification  (SearchDataManagement.com)
OLAP  (SearchDataManagement.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


There is a great deal of overlap between these two questions, so we'll answer them together.

Let's start with normalized data. This is how we usually structure data for transactional systems such as an order entry system, an HR system and so on -- in other words, traditional, operational databases.

In essence, normalized data is held in a very simple structure. The data is stored in tables. Each table has a primary key and should contain data relating to one entity — so a normalized customer table contains only data about customers. We need to make logical connections between the entities (for example, this customer placed these orders). To do this we use a foreign key in the orders table to point to the primary key in the customer table.

To get a little more technical, there are three levels of normalization — first, second and third normal form and we can test a table to see if it meets these criteria. (To get even more precise, there are more than three levels but mostly we are only interested in the first three).

One advantage of normalized data is that there is very little data redundancy — each piece of data is stored once and once only.

Now, in a dimensional design (fact-dimensional style) all of these rules go out of the window. The fact table holds the measures (usually numerical) that we want to analyze and the dimension tables contain data/information about the ways in which we want to analyze the data.

The dimension tables in particular are often highly de-normalized. For example, a customer table might store the zip code of the customer, their town and state. If you have fifty customers in Walla Walla, then the customer dimension table will store the fact that Walla Walla in is Washington a total of fifty times. So there is often massive data duplication. Take this Time dimension table for example (sorry, it's in UK date format, but the sense remains the same):

Call DateCalYearQuarterCalMonthCalDay
01/08/20082008Q381
02/08/20082008Q382
03/08/20082008Q383
04/08/20082008Q384
05/08/20082008Q385
06/08/20082008Q386
07/08/20082008Q387
08/08/20082008Q388
09/08/20082008Q389
10/08/20082008Q3810
11/08/20082008Q3811
12/08/20082008Q3812
13/08/20082008Q3813
14/08/20082008Q3814
15/08/20082008Q3815
16/08/20082008Q3816
17/08/20082008Q3817
18/08/20082008Q3818
19/08/20082008Q3819
20/08/20082008Q3820
21/08/20082008Q3821
22/08/20082008Q3822
23/08/20082008Q3823
24/08/20082008Q3824
25/08/20082008Q3825
26/08/20082008Q3826
27/08/20082008Q3828
28/08/20082008Q3829
29/08/20082008Q3830
30/08/20082008Q3831
01/09/20082008Q381

It stores the information that calendar month eight (August) is in quarter three a total of 31 times for every year that is held in the table. But that is fine, because these structures are not built to manage transactions; they are built to help us to analyze the data as easily as possible.

Dimension tables have a column containing unique values (in this case CalDate). The data in this column is used to link the dimension table to the fact table (see below).

Each of the remaining columns is there because the users have said that they want to analyze by that attribute. So, in this case, the users wanted to analyze by Year, Quarter, Month and Day. Dimension tables effectively spell out the analysis that the users want to perform.

Fact tables, on the other hand, contain the numerical measures that the users want to analyze. So, a good question at this point is, "Why aren't they called measure tables?". And the answer to that is that fact tables contain more than just measures, they contain measures placed in some kind of context.

Think about it this way — a measure is simply a numerical value. A fact is that same value with context applied to it. We could also say that a fact is a measure anchored in multidimensional space, which sounds wonderfully exotic but is in essence perfectly true.

So, if I tell you we sold three items; that is simply a measure. If I tell you we sold three copies of a CD called "Escape to Wapping" on the 23/08/2008 to Lilly Murray, you now have some context around that measure; so what you now have is a fact.

A fact table might look something like this:

Item NoCustomer IDEmployee CodeOrderDateDispatchDateQuantity
2858511923/10/200829/10/20082
4058511923/10/200829/10/20084
13058511923/10/200829/10/20084
6847031623/10/200823/10/20082
7347031623/10/200823/10/20082
3135816223/10/200810/11/20082
4135816223/10/200810/11/20082
5735816223/10/200810/11/20083
12235816223/10/200810/11/20082

We can hook both OrderDate and DispatchDate back to the time dimension so that we can analyze these facts over time. So, for example, if we wanted to know how many items had been sold during August 2008 we could scan down the time dimension table looking for the value eight in Cal month and the value 2008 in Cal year. That would return 31 rows with 31 unique date values. We could then query the OrderDate column in the fact table, identify all the appropriate rows, and sum the values in the Quantity column.

This may sound a little long-winded. You might argue that we could simply look for the dates in the fact table and find the appropriate ones. There are two reasons why we tend not to do this.

The first is the fact tables can hold quite literally billions of rows; which makes them slow to query. The second is that we would potentially have to perform a very large number of date function calls in order to do this. Of course, the intelligent use of indexing would mean we would not have to perform a full table scan on the fact table and billions of function calls; nevertheless it would be a relatively slow operation.

You'll notice that another way of thinking about the date dimension table is that it is a materialized store of the results of data function calls. This table is also a tiny compared with the fact table — 10 years worth of data results in a table of only approximately 3650 rows.

And, going back to illustrating how the fact table works, if the Customer dimension table looks like this:

Customer IDTitleFirst NameLast NameGender
352MsMaryJohnstonF
353MsLilyDoraF
354MsMoiraMcLeodF
355MrsMargaretWinterbottomF
356MrJamesGallM
357MrsMayKeithF
358MrsLilyMurrayF
359MrJasonFergussonM
360MissMaureenMansonF
361MsAngesGaryF
362MrsHelenHealyF
363MsCharlotteMorrisF
364MrEdwardKnightM
365MrsDavinaTurnbullF

You can see that it was, indeed, Lily Murray who bought item 31. And if the Product dimension table is like this:

Item NumClassificationCD ROM Title
19TravelThe Canal
20OtherMother smiles
21HobbyNeckless
22HobbyCollecting lorgnettes
23CarsBentley, the man and his cars
24TravelAqueuos Amsterdam
25OtherWelcome to the party
26TravelSeven go placid in Bangor
27HobbyEcclesiastical embroidery
28WorkAre you following me?
29OtherA chip on the shoulder
30OtherBasso profundo
31TravelEscape to Wapping
32TravelRoss' bush
33WorkShipping for profit
34OtherAn orchard in May
35CookingEating well
36ComputingDatabase vol. 3
37HobbyDentistry for laypersons

You can see that she did indeed buy the CD called "Escape to Wapping."

As to why anyone would want to buy such a CD, sadly history does not relate.




Search and Browse the Expert Answer Center
Search and browse more than 25,000 question and answer pairs from more than 250 TechTarget industry experts.
Browse our Expert Advice

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts