in GIS

Long Term Storage of Spatial Data

Following on with yesterday’s blog post, I’m also concerned about where I’m storing the data. Until this month I stored the data in Dropbox. I can’t recall when I signed up for Dropbox, but I’ve probably paid them over $1,000 for the privilege of using their service. As with most SaaS products, they start trying to help consumers and then they pivot to enterprise. That’s what Dropbox is doing and I’m tired of it. Their client software is just a hack and there are too many other solutions that better fit with my budget needs than a stand along cloud storage solution.

So as of May 2020, I no longer pay Dropbox $99/year. I’ve moved all my data to iCloud because I do pay for 2TB of storage there (Family plan) and it integrates better with my workflows. I could have put it in Google Drive too, but I’ve never liked how it works which is a shame because it is easy to share with other users. But this isn’t archival by any means. All I’m doing is putting data on a hard drive, though a virtual hard drive in the cloud. It gets backed up sure, but there isn’t any check to make sure my daughter doesn’t drag the data to the trash and click empty. A true archival service is one that makes the data much safer than just storing it in a folder.

Now back in the old days, we used to archive off to DLT tapes and then send those offsite to a place like Iron Mountain. Eventually you’d realize you needed a restoration and the IT guy would request the tape/tapes come back from offsite and restore them to a folder that you could access. Hopefully they were in a format you could read, but generally that wasn’t too much of a problem, there is a reason though we kept a Sun workstation around in case we needed to restore data from ARC/INFO on Solaris. The good thing about this is that that data was always a copy, sure the tape could get damaged, but it was offsite and not prone to being messed with. If I needed data from October 2016, I could get it. Of course, eventually, old tapes were destroyed because of space needs but generally it was a great system.

I’m doing the math in my head as to the cost of those DLT tapes

Now I’m not thinking I need to get a DLT tape drive and pay Iron Mountain for this privilege, but I do need to get data off site and by offsite I mean off my easy to access cloud services (iCloud, Google Drive, AWS S3, etc). I have been working with Amazon S3 Glacier and it has been a really great service. I’ve been moving a ton of data there to not only clean up my local drives and iCloud storage, but ensure that that data is backed up and stored in a way that makes it much safer than having it readily available. Not Glacier is easy enough to use, especially if you are familiar with S3, but you don’t want to throw data in there that you need very often because of how it is costed. Uploading is free, and they charge you $0.004 per GB/mo which is insanely low. Retrieval is 3 cents per GB which is reasonable and after 90 days you can delete data for free.

Glacier isn’t new by any means, I had been using it to archive my hard drives using Arq but not this specifically using projects. I’ve just started doing this over the weekend so we’ll see how it goes but I like that the data is in a deep freeze, ready to be retrieved if needed but not taking of space where it isn’t needed. I’ve also set a reminder in 2 years to evaluate the data storage formats to ensure that they are still the best method moving forward. If I do decide to change formats, I’ll continue to keep the original files in there just in case the archival formats are a bad decision down the road. Storing this all in Glacier means that space is cheap, and I can keep two copies of the data without problems.

Write a Comment

Comment

  1. AWS Glacier is *great* for the price. I’ve been using it for years and recently have been syncing to it from a local ZFS-based FreeNAS system. Pretty much has all the features I would want at a really good price-point.