This article originally appeared on the BeyeNETWORK.
You have built your archival environment and you have done everything right. You have built many “data vaults.” You have freed the archival data from the originating technology. You have tucked metadata into the data vaults along with the content as well. So everything is just fine.
Well, sort of.
One day someone comes along and wants help looking for some archival data, saying, “Please find me all the transactions for the past twenty years that have been conducted in Arkansas for less than $10,000.” While this query can certainly be accommodated by your archival data, you realize that a lot of data is going to have to be searched, probably in a sequential manner. The length of time this will take and the amount of resources this will consume send a shiver down your spine.
You formulate your query and set it into motion. You come back at the end of the week and the query appears to be about half finished. In the meantime, management and the end user are beating on your doorstep. You hear the following from them: “We gave you all this money to build a modern archival environment, and now that we want to use it, it takes forever to find anything. I really question if we are better off today with the 'modern' archiving environment than we were yesterday with an inexpensive one. We can’t find anything in either case. So why have we spent all this money?"
Do you take this challenge lying down? Isn’t there a better solution?
After your archival data has been loaded, it usually just sits there, for a long while. Then, one day, someone wants to use it and you are caught flat-footed. You should have considered the creation of passive indexes.
A passive index is an index that is built while no one else is using the archival data, and there may be no particular requirements that shape the creation of the passive index. For archival data, there are usually large blocks of time when no one is using or is interested in the archival data.
You build a passive index by finding one or two old servers that aren’t being used. Perhaps your accountant has fully depreciated the machines and they are waiting for the junk pile. You rescue these machines and bring them into your archival environment. Then, build indexes on anything that looks likely for a future search –anything that looks like it remotely might be used as search criteria. With a passive index, there may be no requirements that are known today for the index – but you build the passive index anyway.
No one misses the discarded machines – management has already mentally discarded them. No one knows that you are building a set of indexes for your archival data since no one is doing anything with the archival data anyway. You have a lot of time to build these passive indexes.
But now, when someone wants you to find something in the archival environment, you stand a good chance of being able to find it quickly with your passive index. It is much more efficient to search the passive index than it is to do a sequential search of your archive files.
There is no guarantee that you will be able to satisfy every request, but the ability to satisfy many requests for archival data is a big step forward in the end user’s satisfaction with the archival environment.
The worst case scenario for the analyst is to have to face a sequential search of the archival environment. Passive indexes at least give you a chance of avoiding those kinds of searches.