This article originally appeared on the BeyeNETWORK.
Everyone knows that there are different forms of storage – online disk storage and sequential storage. Everyone knows that sequential storage is capable of holding vastly more amounts of storage than disk. And everyone knows that disk storage is capable of direct access of data and support of online systems.
Now – knowing these things – when does it make sense to place data on disk storage and when does it make sense to place data on sequential storage? In other words, where do you draw the line between the placement of data on the two different forms of storage?
The simple formula that a lot of organizations use is to say that all data that is more than five years old goes on sequential storage, and all data less than five years old goes on disk storage.
For many organizations, this crude establishment of the dividing line between the two types of storage works just fine. But even when this dividing line works, it is crude for the following reasons.
First, much of the data found on disk storage will not be used. The organization is paying a premium price for the data on disk storage that is not being used. Second, some of the data found in sequential storage needs to be used. When this is the case, finding, accessing and sending the needed data to disk storage can be clumsy and is certainly not efficient.
For these two very real reasons then, the mere establishment of an arbitrary date as to the storage placement of data is fraught with difficulties.
An alternative to setting an arbitrary line based on date for the placement of data in one location or the other is that of using a monitor. Data tracking monitors tell what data has been used. These monitors enable the systems analyst to get a very precise feel for what data is and is not being used. Once the systems analyst understands what data is and is not being used, the dividing line of what data is to be placed where can be drawn very precisely. On the one hand, very current data that is not being used does not need to be placed on disk storage. And on the other hand, data that is accessed frequently can be placed on disk storage whether the data is one year old or 10 years old. With a data monitor, there can be a very high degree of precision as to where the line needs to be drawn for the placement of data.
In addition, with a data tracker, the systems analyst can look not only at what rows of data are being accessed, but at what columns of data are being accessed. It is not unusual to find that many columns are simply not being accessed at all, ever. The placement of data on disk storage can exclude these columns that are not being accessed and save huge amounts of disk storage.
There are different kinds of data trackers. One kind of data tracker that is available is a tracker that comes with the database management system (DBMS). In most cases, these trackers do an adequate job in answering the question of what data is and is not being used. However, there is a problem with the data trackers provided by the database management system vendors, and that problem is that the DBMS vendor supplied trackers require massive amounts of system resources when turned on. In fact, the database management system vendors recommend that the data trackers not be turned on during peak periods of processing. This is, of course, the precise moment when you want them to be turned on. Thus, the DBMS supplied trackers are somewhat self-defeating.
The other type of data tracker is the third-party vendor supplied tracker. The third-party data tracker is built and maintained by the independent community. These trackers work by employing “sniffing” technology. The sniffing technology intercepts the bit-stream passing to and from the server to the client. This sniffing requires an absolute minimum of resources. The system is truly unaware that anything unusual is going on. There is simply no drag on the system to speak of. Once the bit-stream has been intercepted, it is shuffled of out of harm’s way. Once out of harm’s way, the bit-stream is reconstructed to show what data is and is not being accessed.
By using sniffing techniques, the third-party vendors have managed to greatly reduce the resources required for data tracking. And in doing so, the organization is perfectly free to employ the sniffing approach 100% of the time, unlike the approach of the DBMS vendors.
It is true that the DBMS vendor-supplied trackers are free (in that they are automatically included with the DBMS software). The problem is that they don’t work, especially when you need them. The third-party vendor software is not free, but it performs as needed and has no problem operating when it is needed the most.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.