To continue reading for free, register below or login
To read more you must become a member of SearchDataManagement.com
');
// -->

This is an excellent question because, as we all know, testing is vital in any development project. The bigger the project is, the more important the testing becomes -- and data warehouses are usually large projects.
So the answer is "No, I don't really have any specific information about data warehouse testing."
"What!? It's very important, but you don't know anything about it?! And you claim to be an expert!?"
Ah no, I didn't say I didn't know; I said I didn't have any specific information about it. The reason is that while the design of a data warehouse is not the same as other applications, the testing part is pretty similar. So everything that you already know about software (and hardware) testing is applicable here.
So, let's take a specific example, say, an ETL routine. You write it and then you test it. You send it a small block of typical data. Then a bigger block, then a huge block, then a much bigger block than it would normally see in operational terms. You time all the runs and plot a scalability graph. You look for errors in the output.
Then you feed in a block of data that examines all the edge and boundary effects. What happens if a value which is normally a positive integer is zero? Or negative, or has a decimal? What happens if it is 255 or 1000 or
1000000 or 1023 or 1024 or 65535?
And on and on and on. I do realize that I'm not telling you anything new here, this is all part of standard testing. But that's the point; a data warehouse is simply a mix of large database and large application. So we can test it as such.
So the question is still an excellent one because it highlights the fact that we have to test our data warehouses very rigorously. The good news is that we already have most of the skills inhouse and they can be transferred to the data warehouse. This is why most of the classic books on the subject don't have a huge section on testing. For example, Kimball's classic "The data warehouse lifecycle toolkit" has one page on the subject, and that's in
the chapter on "Building End User Applications".
Incidentally, if you think that those binary boundaries (like 1024, 1024) don't matter anymore in modern software, check out the recent stories about the bug in Excel 2007 with 65,535.....
More on data warehouses
The data warehouse toolkit
Demystifying data warehouse appliances
Data warehouse development: Four strategic steps
Gartner data warehouse DBMS Magic Quadrant 2007: New tools, old mantras
|