Backups are central to any data protection strategy, but by some estimates more than half of all backups fail either in whole or in part. When you look at the reasons for why they are failing, the same issues come up again and again. Below is a list of the common problems that cause backup failure, in decreasing order of frequency.
In the case of tape, this means making sure you follow the vendor's directions for handling and storage, replacing the tapes regularly and cleaning the drives according to the manufacturer's schedule. It also means discarding any suspicious tapes.
Don't assume disk-based backup protects you from media-related failures. While the incidence of media-related failures is considerably lower with disk than tape, failures still occur.
For example, SATA disk arrays are often used in backups because they cost less and backups can usually get by with lower performance systems. However, it's a mistake to equate "lower performance" with "lower reliability." Saving money by using backup arrays that don't have features like redundant power supplies and hot spare disks leaves data at risk.
The best safeguard against human error in backups is to train those involved to follow best practices. Make sure that the people performing backups and restores understand exactly what they need to do -- and what not to do.
It is also a good idea to take the person out of the loop as much as possible. Ideally, backups should not require any human action. Be especially cautious of situations where backup isn't part of someone's main duties -- for instance, someone in a branch office who's been asked to make a backup tape every night.
More commonly, the problem is misconfiguration. Modern backup software is extremely flexible; in other words, you have a lot of options to choose from and choosing the wrong options can result in incomplete backups or backups that fail totally.
A related problem is that backup configurations are no more static than anything else in a modern storage environment. As resources are added and shifted and priorities change, the list of files to be backed up needs to change as well.
For example, drift produces a particularly nasty kind of failure in tape drives. As the drive ages, the heads slowly wander out of alignment. As a result, other drives can't read the tape -- and the drive can't read a tape it wrote some time ago. The nasty part of this is that the drive can almost always read a tape it just wrote, so the tape passes an immediate verification step in the backup process without complaint.
This is a less prolific source of backup failures because the network, LAN or SAN, is used for much more than just backup, so problems will tend to become obvious before they can hurt your backups.
Do you know…
How to deal with tape errors
How to integrate a virtual tape library with real tape?
About the author: Rick Cook has been writing about mass storage since the days when the term meant an 80 K floppy disk. The computers he learned on used ferrite cores and magnetic drums. For the last 20 years, he has been a freelance writer specializing in storage and other computer issues.
27 Jul 2006