This article originally appeared on the BeyeNETWORK
I absolutely love this time of year – summer is over, football weather is upon us, leaves are changing color (yes, even here in Northern California), and the kids are back in school. One of my favorite things about starting a new school year when I was a kid was reviewing the basics that we’d learned the year before to ease us into the harder materials that loomed before us. My other favorite thing was hearing about what all of my classmates did over the summer. It turns out that a favorite summer vacation for families here in California is a trip to Disneyland – a trip I’ve taken many times with my own family.
By now, you may be wondering: What do going back to school and Disneyland have in common with data warehousing? Join me as I review several basic tenets of data warehousing by taking you on a trip to Disneyland. And in the immortal words of Walt Disney, who in 1955 learned that all our dreams can come true if we have the courage to pursue them, “To all who come to this happy place: Welcome.”
Many of you may not be aware that bank financing for Walt Disney’s dream to build a 330-acre theme park in Anaheim, California, called “Disneyland” was turned down numerous times. However, after his passionate determination to obtain financing finally came through, Disneyland became a reality in 1955. An even more ambitious plan to purchase 47 square miles of Florida swampland and build Walt Disney World was laughed at by the critics. Again, due to Disney’s determination, the 28,000-acre Walt Disney World entertainment center became a reality in 1971.
Like both Disneyland and Walt Disney World, your data warehouse initiative will be merely a vision without adequate financing and sponsorship. However, achieving the necessary sponsorship and funding can be harder than you imagine, so take a lesson from Walt Disney and find the courage to pursue your goal of turning data into information and knowledge. One of the best means of achieving sponsorship and funding is to educate executive management on the benefits of data warehousing (or enterprise information management in a broader scope). An effective way to do this is by creating a strategy alignment and business case scenario, which should:
- Establish a clear charter, value statement and strategy map for the enterprise information management (EIM) program
- Introduce and integrate best practices to the strategy and plan
- Align the EIM strategy to business goals and objectives
- Align the EIM strategy to technology architectures, platforms, tools and transactions systems
- Produce one or more compelling business cases, which quantify increases to revenues or profits, and/or decreases to operating costs.
What if, upon visiting Disneyland, you found it was just one big rollercoaster – and the line of people waiting to get on seemed miles long? You’d probably turn around and go home. Disneyland thrills and entertains its customers because it is an integrated entertainment center made up of multiple theme parks, with an appropriate infrastructure and architecture, rules and regulations, policies and processes, security, support and maintenance. As with Disneyland, you can’t just build a data warehouse and expect your customers to be thrilled and entertained. To be truly successful, you need the entire theme park, not just one attraction.
Just like the Matterhorn is to Disneyland, the data warehouse is a component of an overarching framework for enterprise information management. The EIM framework includes the following eight foundational elements, which all must be in place to allow for an effective, efficient and long-term data warehouse/business intelligence solution that will satisfy and maybe even thrill your customers:
- Strategy/vision – goals and objectives
- Stakeholder and customer communication management program
- Capability blueprint and road map defining end-state requirements and phased plan
- Information and data quality standards, certification and measurement
- Governance – organization, resources, process, policies and controls
- Methodology – processes, workflows and artifacts
- Architecture – technical and data components, structures and layers
- Operations – service levels and support processes
Have you ever noticed that each time you visit Disneyland something new and different is offered? In 2006, for example, the Pirates of the Caribbean ride was taken offline so the characters for Captain Jack Sparrow and Captain Barbossa from the upcoming movie could be added to the attraction’s storyline. In fact, a read through the Disneyland timeline from 1955 to the present shows that something new has been added to the park in almost every single year of its operation over the past fifty years.
Just like Disneyland, your data warehouse is never “done.” Successful data warehouses attract both new users, which lead to expanded data breadth, and new uses, which lead to applications dramatically different from those for which the architecture was originally designed. Properly designed data architectures, metadata and transformational layers optimize flexibility. Parameter- and metadata-driven applications enable transformation logic to be reused for a variety of applications, including reading new sources and creating new data marts, aggregations and reports. The fact that the data warehouse will continue to grow and change over time is a very important expectation to set and manage with executives and business users, but it will also keep them coming back again and again.
If you’ve ever been to Disneyland on a weekday during spring break, or pretty much any day during the summer months, then you know what it is like on the planet Gideon from Star Trek: The Original Series episode #72 – very crowded (my apologies to you non-Trekkies). And the basics of queuing theory at Disneyland dictate that the more crowded it gets, the longer you wait… and wait… and wait.
Your data warehouse can become a lot like Disneyland on a mid-summer’s day, as the volume of data increases substantially over time and the number of end users grows. To mitigate issues around performance and scalability, enterprise class data warehouses should be built and tested to “speed up” in proportion to added resources in order to hit shrinking batch windows and/or to “scale up” to handle more data in existing batch windows. Specific techniques can be deployed across hardware, application and database layers to ensure linear scalability is achieved through simple configuration changes. Additional techniques include, but are not limited to, three parallelism types (process, data and pipe-lining), data reduction (compression, aggregation, variable length records, etc.), data-sensitive partitioning schemes and indexing.
Most of you have heard about rides breaking down or accidents occurring from time to time at Disneyland. However, given the sheer magnitude of computerized operations, heavy mechanical equipment and tens of thousands of humans converging every day, it is amazing that breakdowns and accidents are not more prevalent within the theme park. To provide such a high level of service to its guests, Disneyland has engineered its infrastructure and operating procedures to minimize the duration and impact of outages, breakdowns and accidents.
Again, lessons from Disneyland can be applied to the world of large-scale data warehousing. In the arena of terabytes of data, millions or even billions of rows may be processed daily, with 24x7 operations. Supporting hundreds of users while processing feeds around the clock demands stiff service levels, and catching up after prolonged outages can be complicated, if not impractical. In environments of this class, exceptions happen every day, if not every hour (e.g., missing/late feeds, bad data, corrupt disks, processor faults). The necessary response is to engineer for automated fault tolerance, repair and recovery at all levels: hardware, system software, applications and operations. Reliability features that should be implemented in the data warehouse environment include: embedded applications for audit, balance and control; automated job execution and recovery; standardized error-handling routines; automated system monitoring and alarming functions; intelligent placement of redundant layers; and fully decoupled architectures.
Have you ever wondered what goes on “backstage” at Disneyland? Every attraction at Disneyland contains hidden walkways, service areas, control rooms and other behind-the-scenes operations. The only way for a guest to see these areas is to be evacuated from an attraction in the event of a breakdown. While these rare breakdowns can be wearisome, they can also offer illuminating views of the attractions – this I know from firsthand experience of having to walk through most of the Roger Rabbit ride in Toontown when the automated cars broke down. By definition, “backstage” areas are generally off-limits to park guests. This prevents guests from seeing the industrial areas that violate the "magic" of onstage and allows cast members some solace while they work or rest.
To the customers or end users of a data warehouse, a best practice is to make the data warehouse itself a “backstage” environment. End users interact with the data warehouse through a data presentation layer or through business intelligence software that should shield the end user from the complexities of the underlying data warehouse architecture. For example, a business end user should not be concerned with the physical data model, joins, indexes, primary keys or surrogate keys of the data warehouse. Nor should the business end user be expected to understand the transformations that are taking place during data movement from source tables to target tables in the warehouse. These behind-the-scenes operations are the responsibility of the IT department tasked with developing an easy-to-use information access data environment.
If you’ve ever been to Disneyland, I am sure you have noticed how incredibly clean the park is. It’s fun to watch a little kid chuck his half eaten ice-cream cone at his big sister, only to miss and have it splatter on the ground at the base of Tarzan’s Treehouse. You can practically count the seconds until a Disney “cast member” comes scurrying by with a little broom and dustpan to clean up the mess! Can you imagine how unfortunate it would be if there were no cast members scurrying around sweeping up all of the litter and trash carelessly tossed by thousands of sweet little kids (and some not-so-sweet adults) every day?
Similar to Disneyland, the lack of cleanliness in your data warehouse can make for a very unpleasant experience for your customers. One consistent driver for the information management discipline is achieving and maintaining high levels of data quality or data cleanliness. The costs of bad or “dirty” data are frequently documented and can be quite large. Yet, the corresponding benefits of clean data are also large. Best practices for data quality in your warehouse environment suggest the following:
- Moving to a metadata-driven environment with an awareness of data quality
- Assuring formal, effective data governance processes
- Measuring data quality and understanding the costs of defects
- Building data quality into all ETL and data warehousing efforts
- Applying the right tools for profiling and managing data quality
- Implementing a closed-loop workflow capability to support data quality
- Working to assure consistent reference data (conforming dimensions) across all repositories
Top Ten Ways Data Warehousing is Like Disneyland
In closing, I hope Walt Disney and his Magic Kingdom can serve as sources of inspiration as you embark on your own data warehousing adventures. Now let’s take a look at the top 10 ways data warehousing is like Disneyland – the two have more in common than you might think.
It can take numerous attempts to get the sponsorship and funding necessary to move from vision to reality. Don’t give up on your dreams and goals.
Every time you visit it, there is something new.
The more crowded it gets, the longer you wait.
#7:Building a data warehouse can be like Mr. Toad’s Wild Ride –many scary twists and turns, but in the end, a lot of fun.
What happens behind the scenes should stay hidden from your guests.
The data warehouse/business intelligence solution can serve visitors from many different geographies – “It’s a small world after all.”
Keeping it clean is critical to visitors’ enjoyment.
If you don’t plan well, you can end up spending way more than you budgeted.
If you have a bad experience, you may not go back.
A successful data warehouse can make your company the “Happiest Place on Earth.”