2. If I have 1/2 terabyte of data, can I take the approach of moving first to VSAM, and then move to DB2?
Well, the best advice I can give you is to bypass VSAM altogether if you are going to go to DB2. VSAM and DB2 are very different technologies. VSAM is a file access method and DB2 is a database management system (DBMS).
VSAM, or Virtual Sequential Access Method, is a methodology for the indexed or sequential processing of records on direct access devices. There are three ways to access data in a VSAM file: random (or direct), sequential, and skip-sequential. Random access is enabled using a search argument to directly access data and sequential access is accomplished by processing one record at a time. Skip-sequential access is a combination of random and sequential: the first record is obtained randomly and from there on each subsequent record is processing one after the other. With VSAM though, direct access to data can only be provided using a pre-defined key: there must be a primary key (there can be multiple, alternate keys).
Of course, this simple introduction to VSAM just skims the basics. There is much more that you will need to know if you are using VSAM or considering whether to convert from VSAM to DB2. Sometimes the only motivating factor driving a conversion effort is that VSAM is old technology and DB2 is newer. That is not a very useful criterion for converting. Of course, if you are having trouble trying to hire folks who are knowledgeable about the older technology, then the reasoning makes a little more sense. But I have never heard that line of reasoning with regard to VSAM; it is more common with regard to converting from IMS to DB2.
A better motivating factor for converting from VSAM to DB2 is to take advantage of the facilities and capabilities of a DBMS. Let's examine the core benefits of using a DBMS. Being a DBMS comes with advantages (and overhead). A DBMS is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs also are responsible for data integrity, data access control, and automated rollback, restart and recovery.
The main advantage of using a DBMS is to impose a logical, structured organization on the data. A DBMS delivers economy of scale for processing large amounts of data because it is optimized for such operations. Using a DBMS provides a central store of data that can be accessed by multiple users, from multiple locations. Data can be shared among multiple applications, instead of new iterations of the same data being propagated and stored in new files for every new application. Central storage and management of data within the DBMS provides:
- Data abstraction and independence
- Data security
- A locking mechanism for concurrent access with ACID properties (ACID is an acronym for atomicity, consistency, isolation, and durability)
- An efficient handler to balance the needs of multiple applications using the same data
- The ability to swiftly recover from crashes and errors
- Robust data integrity capabilities
- Simple access using a standard API
- Uniform administration procedures for data
Furthermore, a DBMS offers the ability to provide many views of a single database schema. A view defines what data the user sees and how that user sees the data. The DBMS provides a level of abstraction between the conceptual schema that defines the logical structure of the database and the physical schema that describes the files, indexes, and other physical mechanisms used by the database. Users function at the conceptual level — for example, by querying columns within rows of tables — instead of having to figure out how to access data using the many different types of physical structures used by the DBMS to store the data.
When a DBMS is used, systems can be modified much more easily when business requirements change. New categories of data can be added to the database without disruption to the existing system. With DB2, for example, adding a new "field" is as simple as issuing an ALTER statement to add the new column to the table. Performing a similar task in VSAM is much more difficult — especially if the file does not have any unused "filler" area at the end.
A DBMS provides a layer of independence between the data and the applications that use the data. In other words, applications are insulated from how data is structured and stored. The DBMS provides two types of data independence: logical data independence; that is, protection from changes to the logical structure of data –and– physical data independence, meaning protection from changes to the physical structure of data.
As long as programs use the API (application programming interface) to the database as provided by the DBMS, developers can avoid changing programs because of database changes. With DB2, this API is SQL and there is no other approved way of accessing DB2 data. Furthermore, ad hoc access to your data is very, very difficult — perhaps nigh on impossible – when using VSAM alone. Ad hoc access to data in a DBMS is simple, just code up some SQL. This offers better data availability and access to your end users.
Of course, a DBMS must perform additional work to provide these advantages, thereby bringing with it the overhead. A DBMS will use more memory and more CPU than a simple file storage system. But remember, you are accomplishing a lot more with the DBMS.
Now, as to converting your existing flat file system to DB2, any application programs that are already written will require re-write in order for them to work effectively and efficiently with DB2. Any conversion you attempt will disrupt the apple cart, at least somewhat. Flat files are processed record by record and DB2 uses SQL – which operates on data a set-at-a-time. You will need to convert "master file processing" to SELECTs and joins – and that can be difficult for programmers to adapt to.
If you understand DB2 and use it appropriately, its performance will be excellent. If you use DB2 like flat files (or even like VSAM for that matter), application performance will probably stink. Think about it this way: compare a DB2 SELECT of four columns in a clustering index against the application code needed to access the same data by reading the entire flat file (or VSAM file). In such a scenario, properly coded DB2 will undoubtedly outperform properly coded VSAM requests.
Flexibility is another important concern. DB2 is flexible, whereas flat files and VSAM are not. If you do not believe that, then think about what it would take to add an index to existing data. With DB2, you add the index, rebind the program, and DB2 will take advantage of it without having to change any application code. With VSAM and flat you would have to explicitly code requests to use the new index – not very flexible, is it?
Hopefully this short synopsis of VSAM and DB2, along with the advantages of the database approach over the flat file approach, has been helpful. But it is just an overview. Your organization will need to gain expertise in DB2 before you attempt any conversion.
This was first published in May 2006