Mark Madsen's perspective is informed by his stints as a Unix programmer and extraction-transform-load scripter, yes, but also just as much by his time as a student working a supermarket register, or as a data center janitor -- or as a CIO. ("Being a janitor is better than being a CIO," he joked.) Now, as president of consultancy Third Nature Inc., in Portland, Ore., he is in a unique place to view the movement of big data analytics applications. SearchDataManagement spoke with Madsen recently just before he addressed a Boston Chapter meeting of TDWI. This is part one in a two-part interview. Read part two about new models of processing.
I have heard you say that, in the face of big data analytics, our traditional notions of what information is have become outdated. Could you expand on that?
Mark Madsen: Everything that we have been doing is an outgrowth of the old mainframe reporting mentality. When I say those ideas are outdated, I invite you to look across the board. If you look at BI tools, you see that where they got their start was with early decision support environments that are sort of like the dashboards we have today. Things moved toward reporting.
Today, a BI tool still works like a decision support tool. You specify queries. You get the data back. You make it look pretty on the screen. But it is not substantively different from what we were doing 20 years ago.
Over the years, [business intelligence] has moved from a capability to being framed as a tool. You say, 'I have a BI problem,"' rather than 'I have a decision-making problem,' or 'an information problem.'
It's the hammer-and-nails situation, where you have a hammer and you start looking at everything as something that needs hammering. So, we start to take the BI mentality of building queries that generate reports that you display on a screen.
BI became kind of synonymous with dashboards and querying data -- kind of static. But then you moved into analytics, which is not a read-only reporting problem. Analytics was being applied to the collection of statistics, operations research and machine learning -- what I call 'mathy stuff.'
In the last couple of years as the term was catching on in the business community and the tech community, vendors started applying the name 'analytics' to their products. Everybody's doing the same thing. But analytics, to me, always meant the 'mathy' pieces.
Analysis is not a read-only problem; I think that is one of the fundamental, core misunderstandings about data today, and why I say our ideas are outdated. Our idea is from the old [Bill] Inmon definition of about 1990. It is about a read-only repository. Well, that's great when you are doing reporting.
But that's not analysis. Analysis is starting with a hypothesis, seeking information and then finding relationships that take you to other information. Maybe you link things. Maybe you change them. Maybe you summarize them. Then you take that and link it to something else. It is an exploratory problem. It's not just, 'I have a question.' It's, 'I have a hypothesis of what I think is going on, and I think I know what I need to build out a picture of that.'
You've said that much of big data analytics applications is not new conceptually.
Madsen: In the late 1980s, there was the introduction of UPC codes. I worked in a supermarket at the time and remember some people considered [the codes] the mark of the devil. But it did away with laborious nighttime inventory counts. When [the] industry developed market-basket analysis, that was big data. And it drove interest in data warehousing. It was not really the size of the data that was the problem then. That's true now, too. We've been through shifts similar to the big data analytics shift before.
Take a dip in a Hadoop data lake with Forrester's Gualtieri
Think about computer cognition with expert Judith Hurwitz
Go behind the NoSQL scenes with author Dan Sullivan