Decentralize Your Damn Data

Tagged: data & files & storage & decentralized

This article got a bit long so I've broken it up into 4 parts. This is part 1.

You and me, we're all generating data at a faster rate than a previous generation would've imagined possible. Some of it is even worth saving, or at least it's valued in the eyes of some beholder. Only some of that data is personally of interest to you rather than to advertisers. But have you ever tried retrieving files from a generation (= 30 years) ago? Back when we were storing a megabyte of files, five and a quarter floppy inches at a time. Kids today wouldn't even know how or where to insert a diskette while those of us that do remember, don't have access to any working hardware. It's not like a floppy drive can just connect to your phone.

How are we saving those precious memories? Most people are probably thinking: YOLO, just give it to Apple|Google and call it a day.

But what if you don't want to trust hundreds of thousands of Goopple employees (taken together, the population of Gooppleville would be among the 100 largest American cities with a GDP more akin to a sizeable nation) and the global surveillance state with your mother's maiden name, Bitcoin wallet, disturbing thoughts, and very interesting cat photos? As attractive and convenient as these massively centralized cloud storage platforms are, you really should think twice.

Solution: Decentralize it!

Caveat: Decentralize here means moving away from centralized, without necessarily being distributed nor peer-to-peer.

Decentralized systems are "reliable & resilient" and networks, like blockchains for instance, can achieve 100% uptime even if 100% of its components (nodes) could (will) go down at some point (just not all at once). It's resilient to sniper attacks, liars, attrition, and future technology. But this resilience can come at a cost and that cost may be too high for our use case. It's technically feasible to store lots of data on existing public blockchains and we'll cover that topic and why it may not make sense at some point. For now, let's focus on practical commodity solutions.

But why decentralize your data? Privacy is one good reason. (And putting your private data on a public blockchain is counter to the goal of privacy.) Trustlessness, not trusting and relying on a third party can be another reason. And the goal is to have a true and reliable copy of your data whenever you need it whether tomorrow or after you're dead.

Decentralize, in this case, simply means you store data using something other than a public cloud. Use something under your full control, something you can physically touch. So something like a floppy. But obviously, not a floppy.

We want our information storage to be "reliable & resilient" even if it's not part of a whole system with that property.

Hard drives & SSDs

So the first option, for those who use a computer with a hard drive (or SSD), is to go with what-chu got. In the larger scheme of things, today's generation of hard drives provide good storage value (capacity) for the money. Your first decentralized storage option is putting your files onto a hard drive. It could be in an external enclosure connected by USB. Drives can last a few years but probably not a few decades. Backblaze has extensive statistics on various brands of hard drives they've used in their facilities over the years (many Seagates). Keep in mind this is for drives that are online 24/7. Drives frequently fail, at the multiyear timescale. Drive connector standards (SATA, IDE), however, don't change frequently, measured in decades, so you'll probably be able to connect a drive to a future computer, for the foreseeable future. But you are generally recommended to spin up a drive, if kept offline, every once in a while and check for damage and corruption.

Speaking specifically of SSDs, they infamously have a write limit, how many times you can write to them before they degrade. But for a long term archive this isn't an issue because you might be writing only once. Also, there's no spinning motor and moving parts in an SSD compared to a hard disk, although a hard disk for this purpose would be kept turned off. SSDs tend to last longer. But a major point for us is that, while an SSD (remember, no moving parts) uses less power than a hard drive, if the SSD goes without power for too long (weeks) it can start to lose data, unlike a hard drive. This could depend on the type of SSD, so newer ones might not be affected.

To be continued...