This article originally appeared on the BeyeNETWORK.
A sure way to go broke is to sell archival systems. Or so it was twenty years ago. Instead of buying, people just nod off and daydream about their golf game or their upcoming vacation, rather than listen to a sales pitch about archival data and processing.
Twenty years ago, most organizations were young and new and really didn’t have much data that needed to be archived. But fast forward to today and you find that organizations are paying a lot of money carrying old data on high performance disk storage. This is an expensive practice that hurts performance dramatically. Carrying old archival data on high performance systems weighs down the system like a high cholesterol rating invites a heart attack.
So today, smart organizations are starting to look at archiving systems with a seriousness not found a decade or two ago. And it is predictable that archiving needs will only grow as systems and organizations move into the future and accumulate more data that needs to be managed.
There are many reasons for the need for archiving of information. In some cases, there are laws that mandate that data be archived. In other cases, organizations do competitive analysis and research and make extensive use of the archival data that has been gathered by the organization. And in yet other organizations, there is the attitude that the data was so difficult to gather and structure in the first place, that they are loath to just let the information disappear down a rathole. If there ever is a need for archival information, recreating it or recalculating it would be a very onerous exercise. Preemptively saving it is simply easier.
Archiving of data is in everyone’s future.
There are many facets to archiving data. The standard practices of information management that sufficed in an earlier day and age often do not apply to archival data. Indeed, a new book is being written on data management for archival data.
This article will address just one aspect of the world of archiving data. That aspect is the access of archival data.
Years ago, in my first job out of college, I worked for a large public utility. Being a curious type, I was intrigued by this door that people occasionally went into and came out of. One day, one of my coworkers asked me if I would like to go into this room. I of course said yes.
The room was the archival room. In the room were rows and rows of plastic containers of magnetic tape. It was explained to me that the purpose of the room was to pass the annual audit that was required by the government. Every year, the auditor asked if there was an archival facility. Every year, the auditor was led to this room. And every year, upon surveying the room, the auditor made a check mark on the audit report.
I asked my friend if the data could be used. My friend laughed and said, “Let me show you something.” He opened one of the plastic canisters and out fell what looked like dust. Only it wasn’t dust – it was oxide from the tape. The tape was useless.
So there it was. A wonderful archival facility of data that was a one way street. Data went in, but nothing ever came out. The archival facility was a joke.
Creating such an archive today is a waste of time and money (as it was then). If you are going to archive data, it needs to be a two-way archive. You need to be able to access and use the data in the archive, otherwise the archive is a colossal waste.
Now what happens in an archive? Not much. The data just sits there. And what is the greatest fear of a data analyst when it comes to archival data? The greatest fear is having to do a sequential search through all of the archives. With these thoughts in mind, it makes sense to create for the archival environment what can be termed “passive indexes.”
A passive index is an index that is created for archival data that has no known requirements. In standard information processing, indexes are created for known requirements. But in the case of archival data, there often are no known requirements. Thus, passive indexes are created for “just-in-case” requirements.
The idea is to have the passive indexes ready and waiting should someone want to find something in archives. Rather than face the dreaded massive sequential search, the analyst will have passive indexes that will be much easier to search. The passive indexes can point the analyst in the right direction, circumventing the need for a massive sequential search.
And since the archival data is just sitting there all day long doing nothing, creating a passive index essentially costs nothing, but it can save a lot when it comes time to access and use the archival data.
Passive indexes are created according to the perceived most likely path to the data. The designer uses his/her intuition and guesses at the ways the archival data may need to be accessed.
In most shops, there are old computers that are ready to be retired. These computers have had most of the life squeezed out of them and the accountants are through depreciating them. They have long ago been paid for. What better way to use them than to produce indexes for tomorrow (rather than discard them)?
Passive indexes then, are just one way the organization can prepare itself for the future – the world of archival processing.