Thoughts on Postrelational Databases

July 18, 2010 at 04:51 PM | Uncategorized | View Comments

Everyone and their dog has strong feelings about postrelational databases, so I'll keep mine short: postrelational databases are not "general purpose" in the same sense as relational databases, and postrelational databases encourage optimizations which are, for the majority of applications, premature. So, while they really cool in theory, I don't believe that they have as much practical value as the hype would suggest.

First, none of the existing postrelational databases can be considered "general purpose" in the same way modern SQL databases are "general purpose": once data is in an SQL database, there are almost no limits to how it can be queried there are well known limits to how it can be queried… And queries which are conceptually simple (eg, see below) often translate into simple SQL. However, this is not necessarily true of postrelational databases: because they offer a lower-level interface to the data, more work needs to be done by the programmer to query them.

For example, consider the problem of "finding all items tagged with A or B": a straight forward query in SQL land, but becomes much less straight forward with CouchDB*. Or take a look at the SQL to MongoDB translator.

Additionally, the kind of denormilization that postrelational databases encourage** is, by definition, an optimization. And for some applications, this optimization is necessary. But for everything else, this optimization seems remarkably similar to the kind of optimization that Knuth cry: premature optimization.

For example, consider a site like Flickr, which allows people to comment on photos, and imagine that comments show the commenter's full name:

    David Wolever says: Wow, nice shot!

When you're a site which serves 12,000 photos every second, denormalizing the commenter's full name and storing it along with the comment is a sensible tradeoff… But the added JOINs that would be needed to normalize the data probably won't kill the rest of us.

Just like any other kind of optimization, postrelational databases are really cool and there are situations in which they are invaluable. But, contrary to the hype, they aren't right for everyone… And I don't think they are right for me. Which is why I'm going to be living in safe, boring, SQL land for the foreseeable future.

*: it can be done with a one-off map/reduce (resulting in a table scan), or by sending off two queries then performing client-side filtering… But either option is more difficult then tag='a' or tag='b'. Did I get this wrong? See my question on StackOverflow.

**: require