Backup Protocols - Best Practices

This paper discusses best practices for backing up data for the purpose of disaster recovery. To begin, there are two terms that must be understood clearly to properly understand what a backup protocol is and what it is not.

The term backup means simply to make a copy of data for the purpose of increasing the data's survivability in the face of disaster, either man made or natural; or to make a copy of data for the purposes of historical record, or archives. When data is replicated across more than one media (whether those medias are the same type or different type is irrelevant), its statistical probability of survival naturally increases. This is due to the fact that to compromise the data, every media that the data exists on must be compromised nearly simultaneously.

The more media that the data sits on, the more difficult, and thus unlikely, the loss of the data becomes. Additionally, the geographical separation of the media also increases the survivability of the data due to the inability for any one agent, be it man or nature, to acquire access to all of the medias to compromise all of them simultaneously. Backup does not equate to "tape", but to "replicate in numbers" for the purpose of survivability. Tape is just one media that has been deployed in the past in numbers.

Additionally, backup is a process by which snapshots of data "freezes" the data in time as a historical record to be examined or put back into service at a later date. With an archive of regular snapshots of working data, one can go back to any point in time to examine old data for any number of reasons.

The term mirror has two primary meanings in the computer world today. First, a mirror can be a web site or FTP site that has the same data as another site that sits at a remote location, usually around the world. This provides distribution of servers for people to download data from the Internet, reducing the hops a user requires to connect to a server over the Internet. The replication process between one server and another can be automated or manual.

Second, a mirror can be a hard drive subsystem that ensures when data is written to a drive, it is written to at least one additional drive in the mirror configuration in real time. What this accomplishes is not data survivability, but data availability. You couldn't call it data survivability, because if someone maliciously deleted a file, it is deleted simultaneously from all drives in a mirror configuration. Such mirror systems have no protection from malicious nor erroneous loss of data. The mirror simply ensures that more than one drive is always available to deliver data; should one of the drives suddenly fail for any reason, another is immediately ready to take over the role of delivering data in real time.

This last point is very important to keep in mind. Data survivability is accomplished by replicating snapshots of data, where as data availability is accomplished by replicating real time flow of data. This paper discusses only backup of data for the purposes of survivability, not the mirroring of data for the purposes of availability.

In this context, mirrors may support, relax, or facilitate backup processes in special cases, but they have no role in the processes themselves. It is this context that The Hanalei Company offers CloneThat as the primary backup tool for any personal or business use. CloneThat is not a mirror, but a complete Windows application backup tool that makes snapshots of data, freezing and replicating them for the purposes of survival and archival storage.

Tapes, Hard Disks, and DVDs

Backup protocols have centered on tape drives for many years. Tape drives still fill an important need when thousands of gigabytes must be stored offsite on a regular basis, or more than a few gigabytes require regular archiving. However, with the price of hard disks falling, and the ubiquitous DVD writer becoming ever popular, tape drives are becoming less attractive as the replication media.

The primary problem with tape systems is their proprietary nature, which springs forth from their specialty purpose. While many tape drives may use similar media, one can never be sure that their tape will read back on some other manufacturer's tape system. This creates the need to always have a working tape system in house.

But what if the tape drive meets with the same disaster that the data encounters? Can the tape drive be replaced with another system that can read the tapes that hold the critical data? Even if it can, how long will it take to get access to the data? That is, how long will it take to rebuild an equivalent tape system? And does your company have in house the expertise to assemble a working and compatible tape drive system in little time?

Let’s take our favorite scenario – your worse nightmare. You come to work and find every server destroyed. Not only are the hard disks gone, but also the servers won’t boot. What do you do?

If you had backed up your data onto a tape system, you will have to go buy a PC, then the same tape system (if it isn’t obsolete), get the tape drive software (if it is still available), and then go through the unique and sometimes difficult process of installing both the operating system and the tape drive software. If you did each step correctly, you will then be able to get your data off the tape. Otherwise, you will have to keep at it until you get it right.

If you had backed up your data onto a hard disk, you simply need to buy a PC, install the hard disk, and power up the machine to get to your data.

DVDs are similar to hard disks in that it is relatively painless to buy a PC with a DVD reader, and you are back in business in minutes. You don't even have to install a DVD like you do a drive. And like tapes, DVDs offer permanent, historical records of data snapshots for long term archiving. And with the proper software, a block of data can be written across multiple DVDs, making the 4.7G limit on DVDs a non issue.

The choice of backup media is dependent upon the amount of data and the underlining needs of the business to protect the data. For example, only tapes are a practical backup media for thousands of gigabytes of data. But for one or two hundred gigabytes, a hard disk may serve a more efficient and reliable role for rotational replication media. For just a few gigabytes, a DVD would be the best of all worlds, as a reliable, unbiquitous media that offers immediate access to data with off the shelf PCs and no hardware alterations.

Remote Storage

If a natural or man made disaster were to hit your place of business and take out your servers (e.g.- flood, fire, tornado, theft, etc.), and you stored your backup media near your servers, you run the risk of your backup media meeting the same fate as your servers. Even in a fireproof safe, the data could still be damaged or compromised by someone intent on destroying the company’s jewels.

Keeping your data in a geographically disparate environment helps ensure that the backup data will not meet the same disaster as your servers. By putting distance between copies of the same data, you increase the survivability of your company's critical data. One practice is to have a copy of the critical data kept geographically far from the origin of the data – the servers from which they come – in an offsite server's hard disk or in a tape vault.

CloneThat provides the means to simultaneously write any number of source files to multiple destinations, some of which can be on very remote servers through the use of UNC paths. Thus, a single backup task (a snapshot) can be replicated to a local server over the company LAN and to a remote server at the same time.

Local Storage

While remote storage increases the probability for the survival of critical data against disasters, local copies increase recovery time from disasters. In a small network, several servers can hold replicas of the data while being desperately located in different buildings across the company's campus. If a disaster was localized and affected only one server, the other servers that had the mission critical data can immediately take over for the downed server. The down time for some of these scenarios is measured in seconds or minutes, rather than hours or days.

As mentioned before, CloneThat provides simultaneouos replication to multiple destination drives across a LAN for each backup task.

Chain of Trust

Best practices do not trust anyone with the company’s jewels (except if the company is so small that the owner or someone with similar, dramatic interest in the well being of the company assumes charge of the data). No one IT person has exclusive access and control of data that can destroy a business, but the well being of such data is shared among several employees. This safe guards not only against malicious intent, but also reduces risks of accidental failures.

One way to eliminate exclusive access to mission critical data by any one employee is to backup the data to a remote location out of reach of the employee. A remote backup service provider introduces neutral parties into the equation, preventing any one employee from dominating access and control of mission critical data. But these services can be expensive.

CloneThat provides the means to replicate data simultaneous to multiple remote servers, each of which can be managed or under the care of different individuals to ensure that the data itself is safe by sheer number of trusted and independent individuals.

Fully Automated And Self Testing

Best practices for a backup protocol include the following:
  • A backup protocol must be automated whenever possible, both in backup, test recovery, and reporting all anomalies, minimizing the occasional human error.
  • When using encryption or compression against the data, the process must be reversed so that the recovered data can be verified to decrypt or decompress properly. A byte per byte comparison of the data is required to ensure that the data did indeed recover as expected.
  • After writing data to the media, the data should be read back to ensure it is perfectly accurate and can in fact be read back from the media. A byte per byte comparison of the data is required to ensure that the data did indeed recover as expected.
  • The process should immediately alert the operator(s) to the loss of any file that was present in the previous backup.


This last point is absolutely key to any backup protocol. Imagine that a file is missing from your hard disk. Imagine that this file is critical, but seldomly used. Imagine that you have been backing up this hard disk for four or five months now without realizing it was missing. All your efforts are down the tubes, because you won't be able to recover the data as it wasn't around to be backed up. And you never knew it went missing!

Since the backup process does not know what files should correctly exist, it must assume the worse. Yet, how many backup applications or automated tools will clue you in on a file that has gone AWOL? A backup operator can run a backup protocol without awareness that a file has been missing for months and every backup image since its deletion has failed to include it – because it wasn’t present to be copied! This type of damage is a silent disaster that may go unnoticed for months before someone happens to discover the problem by mere chance. The result of this damage is a collection of compromised backups that cannot be exclusively used to fully restore the original data in the event of a disaster.

Securing Data Properly

NTFS volumes offer access control - a form of security - for files and folders through the NTFS security subsystem. This subsystem will only grant access to files or folders to those accounts that have been given permission to access them, and then only those types of access that their permissions allow.

For example, an HR employee may have read, write, and delete access to employee records, while management may have only read access, but the IT person performing the backup of the employee data has no access to the data at all. In fact, the security can be set to where the IT person cannot even detect the presence of those files on the server!

This restriction against the IT backup operator must be enforced at all times, even during the backup process. A backup protocol must never require access controls be weakened to be performed properly.

NTFS volumes also provide encryption based upon user accounts. Encryption is used for the purpose of maintaining confidentiality of data. Likewise, confidentiality of data includes the confidentiality of the encryption key. Thus, the backup process must never have access to the key, and thus should never be able to acquire the data from the encrypted file. A proper backup protocol maintains the encrypted state of the data and restores it in exactly the same form it was retrieved in.

(CloneThat version 1 does not properly maintain NTFS access control nor NTFS encryption for folders and files. If such features are required, then another solution should be sought. However, CloneThat version 2 does maintain both NTFS access control and NTFS encryption, will be freely available to licensed users of CloneThat version 1 , and is due to be released Q2 2007.)

Snapshots

A snapshot is a copy of a file at a moment in time. However, the copy process, like that of taking a picture with a camera, actually takes time, even if that amount of time is measured in milliseconds. Like a camera shutter, the process of copying a file has a window of vulnerability where the data could be changed during the backup process.

With a camera, a child may be in motion, compromising the picture by introducing blurring into the image. Likewise, a file can be changing during the snapshot process, making the data unpredictable and unstable, compromising the backup process. There are several scenarios regarding changing, unstable data that can occur for which we need to address.

First, a file could be changed between the time that the file indicators are measured and the time that the file is actually opened for reading. In this case, the contents are even more up to date for backup, but the file is not necessarily scheduled to be copied if the indicators reflect an old and already copied image exist. While CloneThat does not have a feature to handle this scenario, it is not considered significant, for the data is considered newer than the replicas at only one given point in time. (No backup software is omnipotent, but each has a “point in time” to measure any given file. This is typical. In the end, the file will be replicated the next time the snapshot is ran. To update a replica by events requires driver level mirroring software. What is important to understand is that the data is stable when the file is open for replicating.)

Second, a file could be changed during the process of reading it for backup purposes. In this case, a simple choice to lock the file for backup will prevent another application from sharing it with the backup software. CloneThat has such a feature on the source point dialog pane called Don’t Share File Access.

It is reasonable to ask, "Why have an option to lock at all? Why not just lock every time?" The answer is that in some special cases, the backup software may have no other choice but to share a file with another process, a process that maintains an open status upon an important file. This is a constraint that is placed upon the backup protocol by the prevailing needs of the system and have no reflection of the backup process in the general sense.

Third, a file could be changed between the time it is locked for replication to one destination point and the time it is locked for replication to another destination point. In this case, the file actually is replicated twice, but the two replicas are different, the latter reflecting the most up to date image. CloneThat does not suffer from this issue within a single task, as each source file is opened once and replicated simultaneously to any number of destination points.

Finally, a file could change between the time that the file is replicated and its indicators are cached. However, CloneThat eliminates this problem by recording the indicators when the file is opened for copying. If files are locked for copying, then the indicators will always reflect the actual file being copied.

Best practices would include scheduling a snapshot to occur when the data is stable and unchanging. If the data is never stable long enough for a snapshot, then a true mirroring solution may be necessary to provide a frozen copy to make a snapshot from. If the mirror could be disabled while yielding a stable and valid copy to clone against, then CloneThat or any other "above driver" solution could be used to clone the stable copy. But if disabling the mirror leaves the copy in an invalid state (e.g.- a file may be only partially updated, or the state of the file system may literally be invalid), then the copy is useless, even though it is stable.

In general, for snapshots, the data must be stable and entirely valid during the replication process. Anything less will produce a backup image from which a valid or meaningful restoration is not possible.

Snapshots are precisely what the classical tape backup performs. When a tape backup software application is ran, it copies files on a disk to the tape. That data must be stable during the copying process for the image on the tape to be meaningful. Once again, it does not matter if the copy is made to a tape, another hard drive, a DVD, a CD, or any other media. What matters is that the data is replicated, increasing the survivability of the data itself.

Rotational Versus Archival Media

Tapes are used as both rotational media and archival media. DVDs are exclusively used as archival media, while hard disks are nearly always used as rotational media.

In a typical backup protocol, there are some tapes assigned as archival tapes, and some assigned rotational tapes. For example, every week, a backup is made. But the last week of the month is made to a tape that is taken out of the rotation and stored for permanent archival purposes. All the other tapes used through out the month rotate to the next month, maintaining a high degree of recent information (no more than one week stale data) while reducing the total costs of tapes.

Using large disks in backup servers, daily rotations can be made automatically using CloneThat . Using DVDs, archival replications of data can be made once a week, once a month, or whatever your company may decide is appropriate for their needs. That said, tapes have very little value as a backup media these days apart from the sheer quantity of data they support.

Due to this important distinction between the various choices of media, CloneThat is targeted specifically for personal use, and small to medium business - specifically, where data quantities are not so excessive that DVDs and hard disks remain a viable alternative to tapes. For archiving, CloneThat can be setup to make a snapshot or a series of snapshots of source points into a single destination point that is then copied to one or more DVDs. For rotational backup, CloneThat can be setup to make several copies through a period (say a month) to several destination points to hard drives across the network.

Of course, you can setup CloneThat in any configuration and use hard disks for archival purposes. CloneThat allows you to add the date and time to the snapshot's destination path, thus making it possible to form a collection of snapshots on a hard disk for archival purposes. (In this case, the archive collection can be pruned to make room for new archives or the hard disk can be put in storage for indefinite period.)

Summary

As we have seen, the backup process itself has the very specific goals of replicating data for the purpose of ensuring statistical probability of data survival in the face of disaster, and can include options such as archiving or rotational replication to achive variations of the replication goal. In each case, CloneThat provides the necessary features in a flexible, user friendly fashion to meet variations of your choice and needs.

Take a test drive of CloneThat today and see for yourself how easy, flexible, and powerful the backup process can become.
Get Firefox!     Get Thunderbird!