Daily Archives: February 13, 2008

Is Stonebreaker Right? Time to Reinvent the Database?

At HPTS this past October, Michael Stonebreaker delivered a presentation called It’s Time for a Complete Rewrite.

The main point seems to be that the general purpose relational database management system has outlived its usefulness after 30-40 years, and the market needs specialized database management systems to meet current application requirements.

A lot of these topics were covered in an interview published in the Sept/Oct issue of ACM Queue. However, Mike stops short of describing some of his new proposals for these specialist databases. Last Monday a lot of this was discussed at the New England Database Day session at MIT, where Michael now teaches.

It looked to me as if about 100 people showed up, and I believe they said a majority were from industry. The presentations were very interesting. A good summary can be found here.

A highlight was certainly Dave DeWitt‘s presentation on Clustera. Despite the fact I’ve been taking an interest in what Google and others are doing around the new “scale out” architectures, I had missed Dave’s blog on why Map Reduce isn’t so great. He included some of the points in his presentation, but to me it was more of a defense of the current RDBMS than a realistic criticism of Map Reduce in its own context. I am oversimplifying, I’m sure, but a lot of it sounded like “you can do this with a relational database if you wanted to, so why are you bothering with something new?”

Personally I think this kind of misses the main point, which is to consider what advantages can be gained by doing more in memory and disconnecting the whole persistence thing. Another way to put it is that the industry has been focused for years on the fastest way to persist data in order to ensure the best reliability and consistency possible, and do as much automatically as possible and avoid human intervention – the kind of stuff I had to do before transactions were widely available in database systems, i.e. go in by hand and find any partial results that occured due to machine failure and back them out.

But if we were to break that assumption, and say that manual intervention might be ok in some cases, and everything does not have to be done automatically, we could gain some advantages in performance and overall capabilities to handle large amounts of data more effeciently.

I definitely agree it’s time for some new thinking in the database and transaction processing world. The specialized database ideas have a lot of merit – column-oriented databases to improve data warehousing and decision support, streaming databases for event processing, embedded databases for higher performance and better flexibility, and in memory databases for improved update latency – but the most interesting point for me is changing some of the key underlying assumptions.

For example, using memory based systems and thinking about persistence as a second level operation, disconnected or asynchronous from the memory update. Assuming failure and designing systems that fail often to take advantage of lower priced hardware. After all, even the biggest and most expensive systems fail – including software, of course. So why not assume it? And the idea of a central controlling authority, such as a central database system of record, root transaction manager, common system adminstration domain – the Web based designs are clearly showing the need to redesign and rearchitect not only our system software such as databases, TP monitors, and application servers – but also our enterprise applications.