Two failed backup restoration attempts

0 Members and 1 Guest are viewing this topic. Read 1194 times.

skunark

  • Full Member
  • Posts: 1434
Two failed backup restoration attempts
« on: 22 Mar 2013, 03:51 am »
A few weeks ago an aging external hard disk drive containing music and videos finally failed in the form of a corrupted file system.   With Time Machine and Backblaze backups running along with cloning key files via rsync to a Bryston BDP-1 player’s HDD, I wasn’t worried about restoring any of the content.   Worst case scenario would be to re-rip the CDs, re-download content from iTunes or retrieve my HD Tracks songs from SD cards.   Basically I’m pretty well covered here in the event of this hdd failure.

With a new drive ordered and in route, I decided to reformat the failed drive for the off chance that this was just a random error and went ahead restored the drive using Time Machine.  I have performed time machine restorations several times before as a way to copy files to a larger drive as a way to build confidence in Apple’s solution, which gave me the expectation everything would go just as smoothly as before.  Time Machine did its magic and started to restore the files to the drive, of course with USB2.0 external drive in the mix this was going to run overnight.    The next morning all seemed well but quickly noticed that Time Machine restored about half the files, which was easy to check via rsync (add the –dry-run switch) comparing the drive on the BDP-1.   Luckily Time Machine still had every file available and ended up requesting a more selective restore and all worked out barring the hiccup.   Eventually the drive failed again in the same way, so felt it was time to wait for the replacement drive.

Once the replacement drive arrived, I started the Time Machine restore again but the same partial restore failure occurred, same artist and album too and again without warning.  I repeated the same steps as before to selectively restore the remaining data, annoyed with Time Machine, but still working.  With everything falling into place, I went ahead and enabled Time Machine and was about to walk away when I noticed Time Machine immediately started to perform a backup, oops. This seemed a bit odd as an existing restoration was in process and was later prompted with a warning that the backup drive was full and needed to remove older backups.  I hastily clicked this prompt and immediately realized I made a mistake as Time Machine will be deleting the files that didn’t exist on the new drive, my first and second user-error in this restoration campaign.  Luckily I still have the rsync drive for the bulk of the files and Backblaze for the remaining drm files that the BDP won't play.  Worst case I also had an old Time Machine drive that would have everything but the last year of data.  Only positive point is that now this will give me a chance to try out Backblaze’s restoration process.

After recovering the key files with rsync, Backblaze was up next for the remaining 3GB of data.  Their website is tediously to navigate with a large number of files and folders, but took only a few minutes to select the files for restore.   Their online process is rather simple, selects the files, folders or drives to be restored; they send you an email to download a zip file minutes later.   There are other options to send a USB drive but at a cost and since it was a few gigabytes, I opted to just download the zip file.

After uncompressing the zip file to restore the remaining items, I noticed a single file was missing from Backblaze’s zip file, this time with iTunes, and without a warning from Backblaze.  I did another restore selecting that single file to download and a few minutes later an email arrived indicating the file restoration failed.  Since Backblaze keeps the weekly revisions for the past 4 weeks, I attempted two more times with different weeks all with a failure. I’ve contacted their customer service and a day later had their response that indeed the file is not available on their end, and out of ten thousand support tickets there's only three incidents like this.   

Now I’m curious if there are any more issues with my files backed up at Backblaze.  I’ve asked Backblaze that question and their first response is just to restart the backup, which doesn’t even answer my question.  I pressed further and they admitted that they had no way to rescan your files to see if anything needed to be backed up.  It appears they just rely on the OS to provide indication when files are modified, which is pretty unfortunate that is the only mechanism they use.  The other curious question is if the missing file was corrupted on my end, so after plugging back in the failed drive I restored, the file was there with the correct size and even played fine.         

With all files accounted for, i'm still not exactly happy with Time Machine or Backblaze.   At the end of the day, nothing I needed to restore here I would consider critical, just time saver and possible a money saver, but if this was a personal data like photos, then this becomes a whole different story.



A few observations for Time Machine and Backblaze (and other snapshot style backup solutions):

-If you restore a drive/partition, or file and even though it’s identical Time Machine will consider this a new file and back it up again, this is unfortunate.  The backup drive probably needs to be more than 2.5 the size of your total storage then, which to me is excessive if you expect to retain any revision history of files.  This requirement is probably true for any snapshot style backup solutions. 

-The best course of action during a restoration is to turn off Time Machine until all files are restored and of course this is a risk that there could be files not backed up. Having a second drive for Time Machine could alleviate that risk.   Backblaze recommends that you immediately create a restore-point for the drive that failed, which is kept for 7 days.

-Keep your old retired music and backup drives, don’t repurpose them.   As I’ve made the conscious decisions to backup music I’ve purchased from HD Tracks, I didn’t follow that same pattern for iTunes content I’ve purchased. This saved me an entire $1.29 but might have been much worse.  i.e. a photo.

-Don’t trust that the restoration completed successfully… I’m not sure what to advise here, I know most folks won’t know how to use rsync (w/ -c options) nor want to spend the money for a local backup solution that keeps a checksum manifest of all the files and revisions.

-Don't consider an online backup as your key backup, clearly there are issues and a quick google will indicate that every consumer solution has these issues.


This brings me to another issue I’ve experienced over the last two years I’ve noticed a few songs with a zero bytes file size not match what iTunes expected them to be and both Time Machine and Backblaze also reported zero bytes on those files.   The silent data corruption could have been a drive failure, hardware failure, or even an application failure and sadly software backup solutions won’t check this.  This silent data corruption occurred on the same drive that eventually failed, so perhaps that was the tell-tale sign, but the disk utility checks reported no issues.

Since Backblaze did notice the failure when restoring just that one file, I’m curious if they also have had this silent data corruption issue.   Because the file was intact from the first original restoration on the now twice failed drive, this is really is a problem on their end.

Filesystems like BTRFS and ZFS that have features to detect silent data corruption which is a huge improvement but neither can address this issue cause by an application (like iTunes, backup software, etc).  Since this will improve hardware and even file-system errors this will be something I will be using once released for either my Mac or Synology NAS.   RAID5/6 or the numerous other raid-like solutions also won’t fully address the issue, but still an improvement from a single drive.  I've done raid5 in the past, and it's rather expensive to maintain for just detecting a hardware failure.  Drive cloning backup solutions also won't exactly help either and you lose out on snapshot recovery window.


So what can I do to check that all the files were correctly recovered from a backup restoration and detect if a rare silent data corruption occurred?  Clearly I can't trust Time Machine or Backblaze to do this.

Seems like a good option is to use an open-source command line tool called hashdeep that creates a manifest of all the files in a subdirectory.  This manifest contains the file size, md5 hash, sha256 hash along with the relative path and filename and can be used to check existing files.   Revision control software like git and subversion also uses hashes to keep track of files and for consistency checks and with a great track record on finding modified files.   Each time I change a tag, rip cd or import a file, I will have to run hashdeep to report the difference and update the file manifest if they are acceptable.  I plan to just add this as an additional step each time kick off rsync to synchronize music files to the Bryston BDP-1, but not exactly sure how I will handle photos and other personal files.  I do use Aperture's vault to backup to network shares and a dedicated external drive, this is all in addition to Time Machine and Backblaze.     Ideally, seems that iTunes and other media managers could provide a file consistency check with behind the scenes updates on known operations, but for now, seems like hashdeep is an easy solution.



I'm sure this failure is rare for most users, but as our media libraries grow this could be more and more of an issue.  I'm curious what other experiences others might have and the steps they took.   

Jim

p.s. Sorry for the long post..

Copy of my feedback to Backblaze customer support as unresolved and unsatisfied:
"With at least one file missing from your backups and with no resolution or assurance that my other files are correctly backed up, how can i be satisfied.   Initially <name> had some very canned responses, but followup conversations we more on point.  In so many words the gist from <name> is that Backblaze has no way to confirm the files on my computer match that on backblaze servers and that I probably should look elsewhere for a better solution.   <name> also pointed out that each update of the software address problems that might have corrupted files on previous backup attempts, [but] won't actually self-correct [on future backups].   I'm still researching my options.. if crash plan wasn't java based, I would be asking for a full refund and filing a report with the BBB."
« Last Edit: 24 Mar 2013, 06:14 am by skunark »