Working at AvePoint, I have had a lot of conversations about backups and archives. AvePoint makes solutions for both but, often as not, these conversations have a detour that runs something like this:
“Senior management doesn’t see a need for a records solution, because we’re backing up every day.”
“For now we’re not bothering with backups, we just declare everything to be a record.”
Oy. It’s occasions like these that I reach out for a simple explanation or analogy that helps delineate the distinction between the two. Lack of Surprise! Others have written about this as well, such as this pitch page from Solar Winds MSP, who put it very well:
An archive is a collection of historical records that are kept for long-term retention and used for future reference. Typically, archives contain data that is not actively used.
A backup is a copy of a set of data, while an archive holds original data that has been removed from its original location.https://www.solarwindsmsp.com/content/backup-archive
Archives and Backups are not the same thing.
A key element of records is “immutability”. The content did not, could not, was not changed once it was archived. It is a record. It is a defined representation of data at the point in time it was declared to be a record. Saying “it’s a record until I modify it” does not count.
This distinction is blurred at the consumer level because, for most of us, the content we are backing up isn’t going to change. I have thousands of photos, tens of thousands of songs, and probably hundreds of documents on my computer at home. They do not change.
Because I work primarily with enterprise customers, and primarily in the world of SharePoint and SharePoint Online, I’ll add a couple of more concrete examples from those spaces.
In SharePoint, I might have a site collection that has several Word and Excel documents that my team works on. Let’s say we’re a finance team working on quarter-end reports, using realtime co-authoring to consolidate a lot of information from multiple documents.
A backup would be called for if, let’s say, someone accidentally deleted one or more of these files, or more likely, accidentally deleted large swaths of data or formulas or worksheets within those files. We need that data back so that we can continue working on it.
In SharePoint, I would probably rely on the recycle bin, but that has limits, especially in SharePoint Online. I might use a third party solution as well, something that copies files and then makes incremental backups on a schedule. In any case, I have a recurring job that makes a copy of these files, so that if they are inadvertently mucked with, I can reproduce a copy.
An archive would be called for if, let’s say, after all that work is finished, we need to keep it on hand for reference later – say in our year-end reporting, or perhaps for audit purposes. There might be an industry or company requirement that financial records must be retained for a defined period, but they cannot be altered. Particular requirements might be: it must be placed somewhere with limited access, or defined access to a specific records management team.
In SharePoint I have a couple of options. One would be to declare these documents as records (making them immutable), and then I could alternately move them to a records center, or declare them as records-in-place. In the former case, I move them to a different location, which might have different permissions, so they’re not even visible to the team that created them; in the latter case, they’re still immutable records, but in a location that the team can access for reference.
Immutability may not be an explicit requirement for records, but if the point of a record is to act as a reference, it has to be implied. I can’t look up what I invoiced a customer three years ago if the invoice was editable. I can’t look up what I was paid on my W2 last year if it was in Word and something I could edit. I am trusting that the record is the same as it was when it was declared to be a record.
Backups replace my burned manuscript for me to keep working on. Archives present a copy that I can reference, perhaps if I’m comparing what I wrote with what was published.
Why aren’t Archives Backups, or Backups Archives?
So far I’ve made an affirmative case for why Records and Backups are not the same. They serve different purposes. But, why can’t one or the other serve as the poor man’s Archives or Backups?
An archive is a historical record. There are two issues in using a Backup for the purpose of an archive. First of all, because backups are potentially making multiple copies of a file over time, how would you know which was the “historical record”? The last copy made? The copy someone marked as “true and official”? Archives lean heavily on the element of integrity. This file was not altered in any way since a given date, and this is the true, unadulterated copy of that file.
Think of real-life examples such as death certificates, deeds to property, and tax records. Running the original through a photo copier hundreds of times doesn’t make the original any more or less true; it creates additional copies that I have to wonder, “have they been marked up or altered since that original copy? How would I know, if I don’t have the original?” Without a way to know that a document has remained unchanged, and when that status was achieved, I cannot use my backups as an archive.
A backup is a “just in case” copy of a document. I might be backing up something that was only downloaded or created once, or I might rely on backing up this copy of the Great American Novel that I’ve been writing out by hand in my desk. My desk catches on fire, and the manuscript is destroyed – where is my backup?
An archive isn’t going to help because it wasn’t making copies multiple times a day (or week, or hour, or whatever). If they were, we’d run into the problem above where, at best, every version of the book would be its own record.
If we think of the pre-digital versions of archives versus backups, we get a final analogy that, hopefully, will help clear up the distinction.
An archive is, after discussion, proofing, all edits are made, placed in a safe, secure place, with just the right humidity to keep the paper from falling apart. The monks have rolled it up in a tube and place it on a rack.
A backup would be, we’ll bring in a couple of scribes to write down the copies and send them out to other . . .castles, households, or what have you. Perhaps later in human history, we’ll consider publishing to be “backups”, with a copy of the same book in every house – but would we rely on those as immutable copies of the original? What about “second editions” that fix printing errors?
Archives and backups are two important types of data retention. They serve different purposes, and neither process is well-suited to meet the purpose of the others. A robust data governance program will include processes both for archiving and backing up content.