So the answer is "No, I don't really have any specific information about data warehouse testing."
"What!? It's very important, but you don't know anything about it?! And you claim to be an expert!?"
Ah no, I didn't say I didn't know; I said I didn't have any specific information about it. The reason is that while the design of a data warehouse is not the same as other applications, the testing part is pretty similar. So everything that you already know about software (and hardware) testing is applicable here.
So, let's take a specific example, say, an ETL routine. You write it and then you test it. You send it a small block of typical data. Then a bigger block, then a huge block, then a much bigger block than it would normally see in operational terms. You time all the runs and plot a scalability graph. You look for errors in the output.
Then you feed in a block of data that examines all the edge and boundary effects. What happens if a value which is normally a positive integer is zero? Or negative, or has a decimal? What happens if it is 255 or 1000 or 1000000 or 1023 or 1024 or 65535?
And on and on and on. I do realize that I'm not telling you anything new here, this is all part of standard testing. But that's the point; a data warehouse is simply a mix of large database and large application. So we can test it as such.
So the question is still an excellent one because it highlights the fact that we have to test our data warehouses very rigorously. The good news is that we already have most of the skills inhouse and they can be transferred to the data warehouse. This is why most of the classic books on the subject don't have a huge section on testing. For example, Kimball's classic "The data warehouse lifecycle toolkit" has one page on the subject, and that's in the chapter on "Building End User Applications".
Incidentally, if you think that those binary boundaries (like 1024, 1024) don't matter anymore in modern software, check out the recent stories about the bug in Excel 2007 with 65,535.....
More on data warehouses
Dig Deeper on Data warehouse software
Related Q&A from Mark Whitehorn
Here's a guide to primary, super, foreign and candidate keys, what they're used for in relational database management systems and the differences ... Continue Reading
The unstructured data types common in big data systems are often better managed by a NoSQL database than relational software, Mark Whitehorn says. Continue Reading
IT managers should ask cloud providers some pointed questions about the security of data stored in cloud databases, says expert Mark Whitehorn. Continue Reading