This article originally appeared on the BeyeNETWORK.
The Florida K-20 Education Data Warehouse (EDW) was created by legislative appropriation in July 2000. A consulting firm was engaged and the EDW was completed in July 2003 at a cost of $5.73 million. Like many data warehouse projects, it encountered the common pains of scope creep, fuzzy objectives and unclear sponsorship. Shortly after going live, design and architectural problems began to emerge. Data storage volumes were greater than anticipated, expansion of certain subject areas was difficult, unexpected data volatility made answers hard to interpret and users were unable to access data marts or queries as the front end tools collapsed under the quantity of data retrieved. In December 2003, work began on version two of the EDW.
The first step in the EDW redesign was to address the architectural issues. Although there was an information technology (IT) architecture in place, it primarily addressed technical issues such as ETL (extract-transform-load) processes and database structure. A more robust architecture is required. In order to accomplish this, the architecture must meet certain requirements:
- Provide a long-term vision that is responsive to change. In order to allow long-term implementation, the architecture must be capable of adapting to future changes without major redesign. Individual components may change but the architecture needs to maintain a stable structure.
- Allow for phased implementation over time. The architecture cannot use a "Big Bang" approach. It must be an evolutionary approach allowing building components as time, resources and funds permit. It must allow for measurable progress towards goals.
- Affordable. An architecture that can never be built is useless. Care must be taken in spending public funds wisely.
- Use standard technologies. It must rely on components that are available and stable.
- Support multiple types of processing. It must support transactional, DSS (decision support system), exploratory, mining and future types of processing.
- Support multiple types and levels of security. While corporations almost never design their information architecture to support data sharing, government agencies must. In doing so, the government must be concerned with maintaining privacy and confidentiality while also sharing information that may reveal security threats.
- Provide for integration and reusability of data and components. Government systems are well recognized as highly stovepiped. This institutionalized redundancy comes at a very high price. Although few agencies would consider themselves individually at great risk, the events of Sept. 11, 2001, highlight just how high the cost of data fragmentation can be. As long as stovepipe data and systems exist, government information will never be as efficient as it could be.
- Vendor neutral. Certain classes and types of technology are specified, but for an architecture to survive over time, it must not rely on any single vendor.
The only data warehouse architecture that meets these requirements is Bill Inmon’s Government Information Factory (GIF). Using GIF as a basis for design, the EDW Architecture is shown in Figure 1.
There are five key components to the EDW Architecture: the operational environment (yellow), the information environment (green), the volatile reporting environment (blue), the communications environment (light brown) and the quality assurance environment (red).
The operational environment consists of information systems located throughout the state of Florida at local education agencies (LEA) such as school districts as well as the systems located at the Department of Education (DOE) in Tallahassee. The data at the LEAs is consolidated by the DOE. This data is extracted into a staging area from more than 25 systems at DOE and represents 67 school districts, 28 community colleges and 11 state universities, as well as central systems controlling such things as state-wide facility management and teacher certifications.
The information environment contains the core components of a data warehouse: normalized database, reporting facilities, data mining and exploration environments and ETL. Not all of these components are currently available, but inclusion of them in our architecture ensures that they are considered in long-range planning and budgeting. The GIF framework allows for this kind of flexibility and completeness.
Volatile Reporting Environment
The kinds of data being tracked in the EDW create numerous interdependencies that affect the load schedule. For example, we cannot add a student’s course schedule until we have loaded the master schedule of course offerings. However, certain high-stakes information requires analysis before it is loaded into the EDW. To accommodate these issues, we have included a volatile reporting environment. This component consists of an operational data store (ODS) in which the volatile information is held and a set of reporting marts, which draw data from the ODS and EDW to provide current and longitudinal/trend analyses.
Education information both draws from and provides vital information to other agencies. We must also be accountable to the citizens of Florida. The communications environment provides those mechanisms for inter-agency, policy maker and public access to education information.
Quality Assurance Environment
The EDW Architecture is concerned with two aspects of quality assurance: the stored information is accurate; and the reported information is interpreted correctly. The first entails a trace-ability and inspection process. The second is controlled through careful design and access to reports and data marts.
We view the EDW Architecture as a master plan for the next five years. Even then, we don’t foresee more than incremental adjustments to keep it functional for many years to come. At the current time, roughly 40 percent of the EDW Architecture has been implemented. Work is actively proceeding on another 10 percent. However, having a roadmap of where we want to go helps keep us focused on how the pieces fit together and lets us evaluate new technologies as they become available. It also helps us define requirements for new components because we have a “big picture” of where everything has to fit and what components it must interact with. We have become firm believers in the power and utility of the GIF as a planning and management tool. Just as you would never build a house without a set of blueprints, you should never build a data warehouse without an information architecture.