Question: Why would anyone create dozens of copies of a sensitive database such as voter records for 190 million US citizens and then deploy them on customer sites in a totally unsupervised manner?
Answer: It’s an artefact of the pre-internet era when businesses that had sold data by the DVD started selling it by the FTP download. But given that data syntax formats are standardizing on either XML or JSON and the same ubiquitous internet access that rendered DVDs obsolete also makes mass theft possible, having an unknown number of duplicate copies of the data set lying around under wildly differing security protocols in an era where it’s really easy to join disparate data sets is a recipe for trouble .
Let’s look at a historical analogy: Why did people invent the check? Well, carrying, minding and handing over gold in a safe manner proved to be something of a challenge for consumers. So they kept their gold in banks, where people who actually knew how to mind gold were in charge. And eventually instead of physically going to the bank to get gold to buy something they would write a letter to the bank asking them to hand part of their gold to someone else. Thus was born the check. Checks were really useful, because if the recipient used the same bank wealth could be transferred without any clinking sounds at all. Eventually people wanted to use a check they’d been given to pay someone else, and found that this was much easier if the check wasn’t payable to a specific person, which is where widespread usage of paper money comes from.
So what can we learn from this?
Well, nobody keeps their gold at home because it’s hard to keep safe and acts as a magnet for trouble. Surely the same can be said about minding a lot of data? You’re either in the data security business (running a bank) or you’re not. If ‘not’, the wise thing to do would be to get someone else to mind it for you.
The other thing is that the bank has clearly defined APIs for accessing gold, even if it’s your gold, and in particular they don’t have one which allows you to take everyone else’s gold home for the night. You can do pretty much all the banking you’d want via these APIs without ever seeing or touching cash. And if you insist on minding your own gold you’ll find that not only will the bank not be responsible for what happens but all your friends will laugh at you when you try to blame the bank for your inevitable misfortune.
What’s needed here is a shift from ‘publishing’ databases as if they were DVD’s and instead providing REST API driven services hosted and managed in central, controlled locations.