This article originally appeared on the BeyeNETWORK.
First there were applications. Then the data warehouse was created and following its development the corporate information factory (CIF) emerged. Following the events of September 11, 2001, the government information factory (GIF) was created.
The GIF is in many ways similar to the CIF. The GIF basic architecture is very similar to the CIF architecture. There is no question that the two architectures are related. First came the CIF then came the GIF. There are significant differences between the two architectures.
The first significant difference between the architectures is the need for wide integration of data in government systems. The corporate motivation for a data warehouse includes only the corporation. They only build a system for themselves.
The government has a different set of problems. After September 11, 2001, there is a mandate from congress and the president to share data among government departments. Within the boundaries of the law data between the FBI, Immigration and Naturalization, the CIA and many others must be shared. If the United States is to be serious about defeating terrorism, data sharing must become a reality. But the different government organizations over the years have created their own fiefdoms of data. Indeed politicians as widely diverse as Dick Durban and George W. Bush have noted the systemic reluctance of government organizations to share data.
The GIF is an architecture that is designed to address the infrastructure technology of sharing across the government. The political issues of data sharing remain with politicians.
So the first difference between the CIF and the GIF is the scope of data sharing and integration.
The second difference appears to be mundane. The difference is that the recognition of data remains longer in government systems than in corporate systems. In the corporation the argument is made that the business was so different five years ago that any data reflecting business conditions before that may actually be harmful because old data may be misleading. In practice, a few commercial organizations collect and manage data greater than five years old. But most corporations do not.
However, in the government, data has a long life. There are many reasons for the longevity of data in governmental circles. Sometimes keeping the data is mandated by law. I worked on a project for the army collecting data that went back to the Civil War. But even the smallest agency collects data and holds on to it for a long time. It simply is not true that in government circles that data lives for five years or less. It lives longer and often for more years than anyone could anticipate.
As a consequence, because data lives longer in the government environment, there are greater volumes of data to be managed than found in the commercial sector. This means that archival storage and bulk processing of data is very important in government while these topics may be of lesser importance in the commercial world.
The third difference between architectures for the government and for the commercial world is that of security. In the commercial world, security for data warehouses is non-important. This is probably an understatement. In truth, very little emphasis is put on the security of a commercial warehouse. The commercial impetus is to get the warehouse up and running and to start to use the warehouse. Most organizations think of securing the warehouse as an afterthought.
In government circles security is paramount. Because of the nature of the users of the warehouse and relevant legislation, government agencies cannot afford to take a blasé attitude toward security. Security must be built into the architecture from the very beginning when it comes to government data warehouses.
These are the main differences between the GIF and the CIF. There are undoubtedly more differences. But the ones listed here:
- The need for widespread integration and data sharing well beyond one agency or user;
- The need to accommodate data for very long periods of time to preserve the history; and
- The need for security from the outset of design are the most important differences between the CIF and the GIF.
One of the obstacles the GIF faces is the “nih” syndrome—the “not invented here” syndrome. The government never sponsored the GIF. There is no mandate in any agency’s charter to build systems according to the GIF. The difference between the CIF and the GIF is not insurmountable. System integrators recognize that the GIF is protected intellectual property and is not in the public domain.
However, systems integrators should recognize that the GIF does not compete with other architectural approaches found in the government. The other architectural approaches are more “paper and pencil” exercises to make sure that requirements have been gathered completely and properly. When it comes to the nuts and bolts of implementation—how architecture meets technology—there is only the GIF. In this regard, the GIF is complementary to the other government sponsored architectures By aligning with the GIF, other government architectures have a roadmap to achieve implementation.
Despite these obstacles, the GIF is seeping into the government consciousness. Large contracts are being awarded that specify that they have to align with the GIF. If the government contractors are resistant to the GIF, their stubbornness melts when contract specifications are awarded.
In this special report on data storage optimization techniques, we examine the companies positioning themselves in the primary storage deduplication and compression space to find out where the technologies are headed. We also speak with three users of NetApp data deduplication to learn how they've cut space requirements for their VMware environments. Finally, Tony Asaro, a senior analyst and founder of The INI Group, explains the challenges, benefits and use cases for data reduction in primary storage systems.