Backup Protocols - Best Practices
This paper discusses best practices for backing up data for the purpose of
disaster recovery. To begin, there are two terms that must be understood
clearly to properly understand what a backup protocol is and what it is not.
The term
backup means simply to make a copy of data for the
purpose of increasing the data's survivability in the face of disaster, either
man made or natural; or to make a copy of data for the purposes of historical
record, or archives. When data is replicated across more than one media
(whether those medias are the same type or different type is irrelevant), its
statistical probability of survival naturally increases. This is due to the
fact that to compromise the data, every media that the data exists on must be
compromised nearly simultaneously.
The more media that the data sits on, the more difficult, and thus unlikely,
the loss of the data becomes. Additionally, the geographical separation of the
media also increases the survivability of the data due to the inability for any
one agent, be it man or nature, to acquire access to all of the medias to
compromise all of them simultaneously. Backup does not equate to "tape", but to
"replicate in numbers" for the purpose of survivability. Tape is just one media
that has been deployed in the past in numbers.
Additionally, backup is a process by which snapshots of data "freezes" the data
in time as a historical record to be examined or put back into service at a
later date. With an archive of regular snapshots of working data, one can go
back to any point in time to examine old data for any number of reasons.
The term
mirror has two primary meanings in the computer world
today. First, a mirror can be a web site or FTP site that has the same data as
another site that sits at a remote location, usually around the world. This
provides distribution of servers for people to download data from the Internet,
reducing the hops a user requires to connect to a server over the Internet. The
replication process between one server and another can be automated or manual.
Second, a mirror can be a hard drive subsystem that ensures when data is
written to a drive, it is written to at least one additional drive in the
mirror configuration in real time. What this accomplishes is not data
survivability, but data availability. You couldn't call it data survivability,
because if someone maliciously deleted a file, it is deleted simultaneously
from all drives in a mirror configuration. Such mirror systems have no
protection from malicious nor erroneous loss of data. The mirror simply ensures
that more than one drive is always available to deliver data; should one of the
drives suddenly fail for any reason, another is immediately ready to take over
the role of delivering data in real time.
This last point is very important to keep in mind. Data survivability is
accomplished by replicating snapshots of data, where as data availability is
accomplished by replicating real time flow of data. This paper discusses only
backup of data for the purposes of survivability, not the mirroring of data for
the purposes of availability.
In this context, mirrors may support, relax, or facilitate backup processes in
special cases, but they have no role in the processes themselves. It is this
context that
The Hanalei Company
offers
CloneThat
as the primary backup tool for any personal or business use.
CloneThat
is not a mirror, but a complete Windows application backup tool that makes
snapshots of data, freezing and
replicating them for the purposes of survival and archival storage.
Tapes, Hard Disks, and DVDs
Backup protocols have centered on tape drives for many years. Tape drives still
fill an important need when thousands of gigabytes must be stored offsite on a
regular basis, or more than a few gigabytes require regular archiving. However,
with the price of hard disks falling, and the ubiquitous DVD writer becoming
ever popular, tape drives are becoming less attractive as the replication
media.
The primary problem with tape systems is their proprietary nature, which
springs forth from their specialty purpose. While many tape drives may use
similar media, one can never be sure that their tape will read back on some
other manufacturer's tape system. This creates the need to always have a
working tape system in house.
But what if the tape drive meets with the same disaster that the data
encounters? Can the tape drive be replaced with another system that can read
the tapes that hold the critical data? Even if it can, how long will it take to
get access to the data? That is, how long will it take to rebuild an equivalent
tape system? And does your company have in house the expertise to assemble a
working and compatible tape drive system in little time?
Let’s take our favorite scenario – your worse nightmare. You come to work and
find every server destroyed. Not only are the hard disks gone, but also the
servers won’t boot. What do you do?
If you had backed up your data onto a tape system, you will have to go buy a
PC, then the same tape system (if it isn’t obsolete), get the tape drive
software (if it is still available), and then go through the unique and
sometimes difficult process of installing both the operating system and the
tape drive software. If you did each step correctly, you will then be able to
get your data off the tape. Otherwise, you will have to keep at it until you
get it right.
If you had backed up your data onto a hard disk, you simply need to buy a PC,
install the hard disk, and power up the machine to get to your data.
DVDs are similar to hard disks in that it is relatively painless to buy a PC
with a DVD reader, and you are back in business in minutes. You don't even have
to install a DVD like you do a drive. And like tapes, DVDs offer permanent,
historical records of data snapshots for long term archiving. And with the
proper software, a block of data can be written across multiple DVDs, making
the 4.7G limit on DVDs a non issue.
The choice of backup media is dependent upon the amount of data and the
underlining needs of the business to protect the data. For example, only tapes
are a practical backup media for thousands of gigabytes of data. But for one or
two hundred gigabytes, a hard disk may serve a more efficient and reliable role
for rotational replication media. For just a few gigabytes, a DVD would be the
best of all worlds, as a reliable, unbiquitous media that offers immediate
access to data with off the shelf PCs and no hardware alterations.
Remote Storage
If a natural or man made disaster were to hit your place of business and take
out your servers (e.g.- flood, fire, tornado, theft, etc.), and you stored your
backup media near your servers, you run the risk of your backup media meeting
the same fate as your servers. Even in a fireproof safe, the data could still
be damaged or compromised by someone intent on destroying the company’s jewels.
Keeping your data in a geographically disparate environment helps ensure that
the backup data will not meet the same disaster as your servers. By putting
distance between copies of the same data, you increase the survivability of
your company's critical data. One practice is to have a copy of the critical
data kept geographically far from the origin of the data – the servers from
which they come – in an offsite server's hard disk or in a tape vault.
CloneThat provides the means to simultaneously write any number
of source files to multiple destinations, some of which can be on very remote servers
through the use of UNC paths. Thus, a single backup task (a snapshot) can be replicated
to a local server over the company LAN and to a remote server at the same time.
Local Storage
While remote storage increases the probability for the survival of critical
data against disasters, local copies increase recovery time from disasters. In
a small network, several servers can hold replicas of the data while being desperately
located in different buildings across the company's campus. If a disaster was
localized and affected only one server, the other servers that had the mission
critical data can immediately take over for the downed server. The down time
for some of these scenarios is measured in seconds or minutes, rather than
hours or days.
As mentioned before, CloneThat provides simultaneouos replication to multiple
destination drives across a LAN for each backup task.
Chain of Trust
Best practices do not trust anyone with the company’s jewels (except if the
company is so small that the owner or someone with similar, dramatic interest
in the well being of the company assumes charge of the data). No one IT person
has exclusive access and control of data that can destroy a business, but the
well being of such data is shared among several employees. This safe guards not
only against malicious intent, but also reduces risks of accidental failures.
One way to eliminate exclusive access to mission critical data by any one
employee is to backup the data to a remote location out of reach of the
employee. A remote backup service provider introduces neutral parties into the equation,
preventing any one employee from dominating access and control of mission
critical data. But these services can be expensive.
CloneThat provides the means to replicate data simultaneous
to multiple remote servers, each of which can be managed or under the care of
different individuals to ensure that the data itself is safe by sheer number
of trusted and independent individuals.
Fully Automated And Self Testing
Best practices for a backup protocol include the following:
-
A backup protocol must be automated whenever possible, both in backup, test
recovery, and reporting all anomalies, minimizing the occasional human error.
-
When using encryption or compression against the data, the process must be
reversed so that the recovered data can be verified to decrypt or decompress
properly. A byte per byte comparison of the data is required to ensure that the
data did indeed recover as expected.
-
After writing data to the media, the data should be read back to ensure it is
perfectly accurate and can in fact be read back from the media. A byte per byte
comparison of the data is required to ensure that the data did indeed recover
as expected.
-
The process should immediately alert the operator(s) to the loss of any file
that was present in the previous backup.
This last point is absolutely key to any backup protocol. Imagine that a file
is missing from your hard disk. Imagine that this file is critical, but seldomly
used. Imagine that you have been backing up this hard disk for four or five
months now without realizing it was missing. All your efforts are down the
tubes, because you won't be able to recover the data as it wasn't around to be
backed up. And you never knew it went missing!
Since the backup process does not know what files should correctly exist, it
must assume the worse. Yet, how many backup applications or automated tools
will clue you in on a file that has gone AWOL? A backup operator can run a
backup protocol without awareness that a file has been missing for months and
every backup image since its deletion has failed to include it – because it
wasn’t present to be copied! This type of damage is a silent disaster that may
go unnoticed for months before someone happens to discover the problem by mere
chance. The result of this damage is a collection of compromised backups that
cannot be exclusively used to fully restore the original data in the event of a
disaster.
Securing Data Properly
NTFS volumes offer access control - a form of security - for files and folders
through the NTFS security subsystem. This subsystem will only grant access to
files or folders to those accounts that have been given permission to access
them, and then only those types of access that their permissions allow.
For example, an HR employee may have read, write, and delete access to employee
records, while management may have only read access, but the IT person
performing the backup of the employee data has no access to the data at all. In
fact, the security can be set to where the IT person cannot even detect the
presence of those files on the server!
This restriction against the IT backup operator must be enforced at all times,
even during the backup process. A backup protocol must never require access
controls be weakened to be performed properly.
NTFS volumes also provide encryption based upon user accounts. Encryption is
used for the purpose of maintaining confidentiality of data. Likewise,
confidentiality of data includes the confidentiality of the encryption key.
Thus, the backup process must never have access to the key, and thus should
never be able to acquire the data from the encrypted file. A proper backup
protocol maintains the encrypted state of the data and restores it in exactly
the same form it was retrieved in.
(CloneThat version 1
does not properly maintain NTFS access control nor NTFS encryption for folders
and files. If such features are required, then another solution should be
sought. However,
CloneThat version 2
does maintain both NTFS access control and NTFS encryption, will be
freely available to licensed users of
CloneThat version 1
, and is due to be released Q2 2007.)
Snapshots
A snapshot is a copy of a file at a moment in time. However, the copy process,
like that of taking a picture with a camera, actually takes time, even if that
amount of time is measured in milliseconds. Like a camera shutter, the process
of copying a file has a window of vulnerability where the data could be changed
during the backup process.
With a camera, a child may be in motion, compromising the picture by
introducing blurring into the image. Likewise, a file can be changing during
the snapshot process, making the data unpredictable and unstable, compromising
the backup process. There are several scenarios regarding changing, unstable
data that can occur for which we need to address.
First, a file could be changed between the time that the file indicators are
measured and the time that the file is actually opened for reading. In this
case, the contents are even more up to date for backup, but the file is not
necessarily scheduled to be copied if the indicators reflect an old and already
copied image exist. While
CloneThat
does not have a feature to handle this scenario, it is not considered
significant, for the data is considered newer than the replicas at only one
given point in time. (No backup software is omnipotent, but each has a “point
in time” to measure any given file. This is typical. In the end, the file will
be replicated the next time the snapshot is ran. To update a replica by events
requires driver level mirroring software. What is important to understand is
that the data is stable when the file is open for replicating.)
Second, a file could be changed during the process of reading it for backup
purposes. In this case, a simple choice to lock the file for backup will
prevent another application from sharing it with the backup software.
CloneThat
has such a feature on the source point dialog pane called
Don’t Share File
Access.
It is reasonable to ask, "Why have an option to lock at all?
Why not just lock every time?" The answer is that in some special cases, the
backup software may have no other choice but to share a file with another
process, a process that maintains an open status upon an important file. This
is a constraint that is placed upon the backup protocol by the prevailing needs
of the system and have no reflection of the backup process in the general
sense.
Third, a file could be changed between the time it is locked for replication to
one destination point and the time it is locked for replication to another
destination point. In this case, the file actually is replicated twice, but the
two replicas are different, the latter reflecting the most up to date image.
CloneThat
does not suffer from this issue within a single task, as each source file is
opened once and replicated simultaneously to any number of destination points.
Finally, a file could change between the time that the file is replicated and
its indicators are cached. However,
CloneThat
eliminates this problem by recording the indicators when the file is opened for
copying. If files are locked for copying, then the indicators will always
reflect the actual file being copied.
Best practices would include scheduling a snapshot to occur when the data is
stable and unchanging. If the data is never stable long enough for a snapshot,
then a true mirroring solution may be necessary to provide a frozen copy to
make a snapshot from. If the mirror could be disabled while yielding a stable
and valid copy to clone against, then
CloneThat
or any other "above driver" solution could be used to clone the stable copy.
But if disabling the mirror leaves the copy in an invalid state (e.g.- a file
may be only partially updated, or the state of the file system may literally be
invalid), then the copy is useless, even though it is stable.
In general, for snapshots, the data must be stable and entirely valid during
the replication process. Anything less will produce a backup image from which a
valid or meaningful restoration is not possible.
Snapshots are precisely what the classical tape backup performs. When a tape
backup software application is ran, it copies files on a disk to the tape. That
data must be stable during the copying process for the image on the tape to be
meaningful. Once again, it does not matter if the copy is made to a tape,
another hard drive, a DVD, a CD, or any other media. What matters is that the
data is replicated, increasing the survivability of the data itself.
Rotational Versus Archival Media
Tapes are used as both rotational media and archival media. DVDs are
exclusively used as archival media, while hard disks are nearly always used as
rotational media.
In a typical backup protocol, there are some tapes assigned as archival tapes,
and some assigned rotational tapes. For example, every week, a backup is made.
But the last week of the month is made to a tape that is taken out of the
rotation and stored for permanent archival purposes. All the other tapes used
through out the month rotate to the next month, maintaining a high degree of
recent information (no more than one week stale data) while reducing the total
costs of tapes.
Using large disks in backup servers, daily rotations can be made automatically
using
CloneThat
. Using DVDs, archival replications of data can be made once a week, once a
month, or whatever your company may decide is appropriate for their needs. That
said, tapes have very little value as a backup media these days apart from
the sheer quantity of data they support.
Due to this important distinction between the various choices of media,
CloneThat
is targeted specifically for personal use, and small to medium business -
specifically, where data quantities are not so excessive that DVDs and hard
disks remain a viable alternative to tapes. For archiving,
CloneThat
can be setup to make a snapshot or a series of snapshots of source points into
a single destination point that is then copied to one or more DVDs. For
rotational backup,
CloneThat
can be setup to make several copies through a period (say a month) to several
destination points to hard drives across the network.
Of course, you can setup
CloneThat
in any configuration and use hard disks for archival purposes.
CloneThat
allows you to add the date and time to the snapshot's destination path, thus
making it possible to form a collection of snapshots on a hard disk for
archival purposes. (In this case, the archive collection can be pruned to make
room for new archives or the hard disk can be put in storage for indefinite
period.)
Summary
As we have seen, the backup process itself has the very specific goals of replicating
data for the purpose of ensuring statistical probability of data survival in the face of
disaster, and can include options such as archiving or rotational replication to
achive variations of the replication goal. In each case, CloneThat
provides the necessary features in a flexible, user friendly fashion to meet
variations of your choice and needs.
Take a
test drive of CloneThat today and see
for yourself how easy, flexible, and powerful the backup process can become.