You don't have to explain data mart proliferation to Clint Johnson. As vice president of data warehousing and business...
intelligence at Zion Bancorporation, Johnson is all too familiar with the concept.
When so-called power users at Zion wanted to do complex data analysis that wasn't supported by the bank's business intelligence (BI) applications – like predictive modeling for customer retention -- Johnson would extract the data from source systems and databases and let power users create their own data marts.
The result was departmental data marts everywhere, largely out of the control of IT, Johnson said. Not only were the data marts and the hardware required to run them difficult to manage, but the marts were disconnected from one another, meaning different groups and departments were often working with different sets of data.
But rather than trying to stop or even slow data mart sprawl, a task he conceded was almost impossible, Johnson decided to take a different approach with a new private cloud computing initiative. Zion is now in the process of implementing Greenplum's Enterprise Data Cloud (EDC), essentially a virtual infrastructure of commodity hardware upon which the bank will deploy and run Greenplum's data warehouse platform.
Once the system is operational, Zion workers will be able to create and dismantle data marts on an internal or private cloud, as it's called, via a self-service portal, as demand dictates. Irrespective of which department they work in, they will be able to tap into the same, consistent data sources on the internal cloud, where IT will also have a one-stop shop for managing the platform. No more data marts living in the shadows, as Johnson puts it.
The more conventional method of data warehouse deployment and management – deploying one large data warehouse backed up by weeks and months of data modeling and cleansing – is too rigid for today's workers, who want fast and flexible access to data for BI and other analytics, according to Werther. In that environment, rather than waiting for IT to create a data mart, workers often take the initiative and create their own, he said. That leads to data mart proliferation, an unmanageable situation for IT.
With EDC, customers get the best of the public cloud model – easy provisioning of data marts, unfettered access to data sources, and centralized management of the cloud itself – without the risk -- letting sensitive corporate data outside the firewall, Werther said. And it is economically possible, he said, because, unlike many competitors' offerings, Greenplum's database runs on cheaper commodity, not proprietary, hardware.
"Hardware is very cheap now. You can buy 1,000 cores of servers for under $1 million, much less than buying a Teradata machine," Werther said. Eventually, data warehousing on internal clouds "is going to become, we think, the way of doing data warehousing."
James Kobielus, an analyst with Cambridge, Mass.-based Forrester Research, tends to agree with that assessment. He thinks data warehousing in the cloud, or virtual data warehousing, as he also calls it, is the wave of the future.
"Data warehousing is increasingly moving away from being a discipline with a focus on centralized analytic databases or a single physical node [or enterprise data warehouse] to a more virtualized data warehousing ecosystem, or cloud," Kobielus said. "It's a highly, massively parallel grid of nodes that collectively manage multiple data analytics instances."
Some nodes can focus on data integration functions like extract, transform and load (ETL), others on data cleansing, others on provisioning new data marts, he added. "The idea is that it is very flexible."
Greenplum is in a particularly good position to push the cloud deployment model, Kobielus said. In addition to running on commodity hardware -- unlike competitors such as HP's NeoView and Vertica's Analytic Database, for example -- Greenplum's database uses massively parallel processing to simultaneously query large data sets – a prerequisite for a virtualized, distributed environment like the cloud.
But EDC will not have the immediate effect of giving IT one internal cloud to manage, he cautioned, because customers will have to standardize on Greenplum. Most organizations operate in heterogeneous environments, with data warehouses and data marts from multiple vendors. "That's just a reality," Kobielus said.
And, while Greenplum may have a head start on the field, other data warehouse vendors are likely to join them in the data warehousing in the cloud market. Just this week, for example, IBM announced its Smart Business cloud portfolio, which will let customers run integrated software and applications in either a public or private cloud, both supported by Big Blue.
While IBM's new cloud portfolio does not currently support data warehouse deployments, "we are definitely looking at the opportunity to deliver those types of services in the cloud" as customer demand increases, said Dennis Quan, director of autonomic computing at IBM.
Microsoft may also be a candidate to eventually begin offering data warehousing on internal clouds. Already, for example, it offers data warehousing in the public cloud built on its own cloud platform, Azure, which debuted last year, and SQL Data Services.
Nevertheless, "the cloud model is still in an embryonic stage" when it comes to data warehousing, Forrester's Kobielus said, and it could be a year or more before it truly begins to mature. "Few vendors have put together a coherent story going forward to help the industry and help users jump to the next plateau of development of data warehousing in a purely cloud environment."
But that assessment hasn't stopped Zion Bancorporation, where Johnson is counting on EDC to reduce maintenance and support costs caused by proliferating data marts and to break down barriers between groups and departments, giving all workers a unified way to find, access and analyze corporate data.
With implementation under way, the bank hopes to have EDC fully deployed by the end of the year. Then, Johnson said, as many as 50 of Zion's "most seasoned analysts" will begin accessing around 4 terabytes of data in the internal cloud and creating manageable data marts of their own.
"We're going to give direct database access to end users and the ability to upload their own data and create their own data warehouses," Johnson said. "It's to give them a place [the private cloud] to work where they can do complicated things without having to remove the data."