Life Used to be Simple
Picking a database used to be simple job. By 2010 the market had consolidated into an oligarchy of three major players: DB2, SQL Server and Oracle were the default choices depending on whether you were an IBM, Microsoft or Unix shop. Of course there were other databases, but in terms of mindshare and corporate respectability you couldn’t go far wrong if you started with these.
Then NoSQL and NewSQL Showed Up
The new database technologies are a reaction to number of failings which had become apparent with the mainstream databases over time:
Jack of all trades, Master of none – The incumbent database products are stuffed with features, but the net result is getting all of these features to work at the same time is that nothing is done optimally.
Feature Bloat – The incumbent oligarchy of databases had vanquished their original competitors by having more features than them, and they responded to emerging threats by adding even more.
Pricing Arrogance – When you can buy the database hardware for the price of the sales tax on your database licence people notice.
Advances in computer science – Architectural decisions which made sense in 1984 no longer make sense now.
Failure to handle ‘Big Data’ – The amount of data we had to use has consistently expanded faster than our ability to manage it. This doesn’t show up as an absolute limit. Instead we see diseconomies of scale emerging.
Faster. For a Reason
We now have about 100 different products and technologies to choose from and all of these new technologies claim to be better than the legacy databases, with some claiming speeds 1-2 orders of magnitude faster. Many of these claims are true, but before we all rejoice we need to understand how this was done, and contemplate the implications.
Let’s start by dismissing two explanations. The new databases weren’t written by smarter people than the legacy ones. Neither are they fast just because of radical advances in architecture.
In 2008, a paper was published by Michael Stonebraker and others. They took an OLTP system running against an open source RDBMS and started systematically removing functionality that wasn’t absolutely relevant to their application. By the time they were finished it was about 20 times faster, but still worked, albeit with limitations. I would argue that this paper is as important as Ted Codd’s – It shows that trying to make one database that does everything quickly isn’t practical, and that if you want speed you need to make clear decisions about the Use Cases you want to favour at design time.
The Bottom Line
NewSQL and NoSQL are not fast because they are ‘better’. Instead they simply don’t do many of the things a legacy database does. They are usually designed for a specific use case and implement just what is required to support it. It is therefore logical that they are fast, but their speed is only useful if you can live with the finite set of features you get with NewSQL/NoSQL.