Bill Inmon: Kimball methodology ignores the value of textual data

In an interview, data warehousing pioneer Bill Inmon states his case on the merits of his namesake methodology vs. Ralph Kimball's and discusses the "DW 2.0" version of the Inmon architecture.

Talk to Bill Inmon and it quickly becomes clear that the "Father of the Data Warehouse" has grown a bit weary of...

the longstanding debate over which is better: his pioneering enterprise data warehouse architecture or the one created by consultant Ralph Kimball, the founder of the Kimball Group. Oh sure, Inmon will talk about it -- because that's what reporters and conference attendees want to hear. But he'd much rather discuss how his methodology has evolved to incorporate things like textual data warehousing capabilities and more. decided to ask Inmon about both topics. In this interview, Inmon, the author of DW 2.0: The Architecture for the Next Generation of Data Warehousing, talks about why he thinks “Kimballites” -- particularly those who work for Microsoft -- can be a rather myopic bunch. He also discusses the history of the Inmon vs. Kimball debate and explains why he thinks textual data warehousing is the new frontier.

I predict that 10 years from now the Kimballites are going to discover textual data. They tend to lag 10-15 years behind my architecture.

Bill Inmon, author and consultant

I read an article you wrote which stated that the Ralph Kimball methodology has evolved into something more closely resembling the Bill Inmon methodology. Could you explain what you meant by that?

Bill Inmon: Certainly. When Kimball started out back in the early 1990s, he talked about building data marts. The whole Kimball architecture centered on building data marts. In his books, [Kimball wrote that] a data warehouse is a union of data marts. Kimball now talks about an enterprise data warehouse and integrating data -- something that in the early days was really a taboo subject with [Kimball and his supporters]. Today, they've taken a 180-degree position to where they were 20 years ago.

What do you think is the key differentiator between the two approaches today?

Inmon: When you take a look at what Kimball is talking about today, he talks about building the enterprise data warehouse as an integrated data warehouse, and that is where I started out in 1990. They're now where we were twenty years ago. I have continued to advance my understanding of architecture and the most recent addition to the Inmon architecture has been textual unstructured data. We now have a whole body of thought on how to get textual data into a data warehouse. I predict that 10 years from now the Kimballites are going to discover textual data. They tend to lag 10 to15 years behind my architecture. I suppose what ticks me off is that when you talk to the Kimballites, they will not acknowledge the existence of the Inmon-style architecture. I never talked with a group of people that are as closed-minded as the Kimballites are. I sure hope that the Inmonites out there are a lot more open-minded.

Could you give an example of a time when you encountered this closed-mindedness?

Inmon: I was invited to go on a [worldwide] tour from Microsoft last year, and Microsoft is the original 'house that Kimball built.' That makes sense from Microsoft's standpoint because for years and years, Microsoft was building data marts. [Now they're] trying to do real data warehouses and they're trying to take the techniques from the past and say that they fit for the future. I was appreciative of the fact that Microsoft invited me to do a tour because I like Microsoft as a company, but man, the people in there. The thing that I found to be the most frustrating is that when you try to use reasoning and rationale and talk to people, for whatever reason, their minds are closed. [I started] to tell the Microsoft people that there is this really important stuff out there called textual data. If you're going to be building databases and data warehouses, you need to be able to start to address the issue of textual data. But of course, Kimball doesn't have a thing to say about textual data, and the people at Microsoft said, 'Oh well, textual data is not important.' But I have news for you: That's the new frontier.

Why do you believe that the Inmon approach has historically been a tougher sell than the Kimball methodology?

Inmon: In the early days, selling data marts was unquestionably an easier sell than an enterprise data warehouse. I'm the first person to point out that when you go about building a data warehouse environment, it's not short and fast and it's not a fast return on investment. However, it is a tremendous long-term corporate return on investment. The Inmon approach has always been a tougher sell because we're selling long-term architecture, not short-term reports.

Could you explain how the latest iteration of your data warehouse architecture, DW 2.0, incorporates unstructured or textual data?

Inmon: If you're familiar with the Inmon architecture, it started out as Data Warehouse, then it morphed into something called The Corporate Information Factory [CIF], and then it morphed from into something called DW 2.0. A big part -- not the only part -- of DW 2.0 is the notion that we need to be including textual information in our data warehouse. And I have to tell you, my phone is ringing off the hook in terms of people that are discovering that you can indeed take text and start to do significant and important things with it.

How this might this play out in the real world?

Inmon: Corporate contracts. Every corporation has contracts. If you ask an executive, do you have your corporate contracts under control, the executive always says 'yes' because executives are paid to be in control. Now what the executive means is that if it comes down to looking at three or four contracts, then the executive can look at those three or four contracts and can get a lawyer to read them. The problem is when you talk about contracts collectively -- a thousand contracts, 10 thousand contracts or a million contracts. It's about being able to take that body of knowledge and [determine] what the corporation has committed to contractually in terms of expiration dates, liabilities, products and in terms of price points.  What you can do with DW 2.0 and something called Textual ETL [is read] the contracts, put them into a relational database and [run queries]. As fast as a SQL Query can execute, which of course is in seconds, you can get your answer. That's just one example.

What is one more example?

Inmon: Another example is email. Every corporation out there has email, and in email there is a lot of very important information that is flowing. But how is email used in corporate decision making today? The truth is that email isn't. And email gets to be read once and then that email goes into a pile and that pile effectively is a garbage can. The problem is that there is a lot of really important information that comes through email that ought to be quite useful to the corporation. But it's not just emails. It's not just contracts. There is a huge amount of textual information in the corporation that ought to be used. [People are] discovering that once you put text into a database format, a whole new world opens up to you. I find that world to be really exciting.

Dig Deeper on Enterprise data architecture best practices