Is Stonebreaker Right? Time to Reinvent the Database?

At HPTS this past October, Michael Stonebreaker delivered a presentation called It’s Time for a Complete Rewrite.

The main point seems to be that the general purpose relational database management system has outlived its usefulness after 30-40 years, and the market needs specialized database management systems to meet current application requirements.

A lot of these topics were covered in an interview published in the Sept/Oct issue of ACM Queue. However, Mike stops short of describing some of his new proposals for these specialist databases. Last Monday a lot of this was discussed at the New England Database Day session at MIT, where Michael now teaches.

It looked to me as if about 100 people showed up, and I believe they said a majority were from industry. The presentations were very interesting. A good summary can be found here.

A highlight was certainly Dave DeWitt‘s presentation on Clustera. Despite the fact I’ve been taking an interest in what Google and others are doing around the new “scale out” architectures, I had missed Dave’s blog on why Map Reduce isn’t so great. He included some of the points in his presentation, but to me it was more of a defense of the current RDBMS than a realistic criticism of Map Reduce in its own context. I am oversimplifying, I’m sure, but a lot of it sounded like “you can do this with a relational database if you wanted to, so why are you bothering with something new?”

Personally I think this kind of misses the main point, which is to consider what advantages can be gained by doing more in memory and disconnecting the whole persistence thing. Another way to put it is that the industry has been focused for years on the fastest way to persist data in order to ensure the best reliability and consistency possible, and do as much automatically as possible and avoid human intervention – the kind of stuff I had to do before transactions were widely available in database systems, i.e. go in by hand and find any partial results that occured due to machine failure and back them out.

But if we were to break that assumption, and say that manual intervention might be ok in some cases, and everything does not have to be done automatically, we could gain some advantages in performance and overall capabilities to handle large amounts of data more effeciently.

I definitely agree it’s time for some new thinking in the database and transaction processing world. The specialized database ideas have a lot of merit – column-oriented databases to improve data warehousing and decision support, streaming databases for event processing, embedded databases for higher performance and better flexibility, and in memory databases for improved update latency – but the most interesting point for me is changing some of the key underlying assumptions.

For example, using memory based systems and thinking about persistence as a second level operation, disconnected or asynchronous from the memory update. Assuming failure and designing systems that fail often to take advantage of lower priced hardware. After all, even the biggest and most expensive systems fail – including software, of course. So why not assume it? And the idea of a central controlling authority, such as a central database system of record, root transaction manager, common system adminstration domain – the Web based designs are clearly showing the need to redesign and rearchitect not only our system software such as databases, TP monitors, and application servers – but also our enterprise applications.

Advertisements

6 responses to “Is Stonebreaker Right? Time to Reinvent the Database?

  1. Oh my, you’re really hitting the good topics this time!
    One does wonder why databases, after well over 30 years of research and development, still require a team of “rocket scientists”, high-paid developers and an army of people to maintain them?
    I think if one were to compare database technology today with that of middleware, e.g., Web Services (whatever flavour), standards-based interoperability, queuing, communication, databases would come off quite badly; get one database to “talk” to another, for instance!
    Not to mention the advances in hardware such as disks, connection speeds (fiber), storage arrays and so forth.
    I recall one well known DB person once saying: “…it’s all about sorting. If you can do it quickly enough…”. Over simplifying the matter I’m sure, but, with machines capable of offering 2TB of memory, and solid state disks being a viable reality, how come one still needs all these specialists for a fairly simple DB?
    Of course, the question of mapping DB to objects is one all in its own rights, including the intricacies of that venerable old language, SQL. I don’t think any vendor has fully implemented the latest SQL standard completely yet!
    Cheers, John

  2. Yes, SQL has been starting to receive its own share of abuse lately as well.
    Ruby on Rails is one of the things Stonebreaker mentions to illustrate the problem: in Rails the data you declare is mapped automatically to the database so you don’t have to use SQL.

  3. Hi Eric,
    Consistency, availability or partition tolerance… pick two. AKA, Brewer’s conjecture:
    http://citeseer.ist.psu.edu/544596.html
    Sometimes asynchronous update is a feature, sometimes it’s a problem. For this and other reasons, a single solution to rule them all is not practical.
    Maybe it is time to consider a rewrite but it would be foolish to throw the baby out with the bathwater. Map/reduce doesn’t have to be perfect, to be perfectly useful.
    Darach.

  4. Hi Darach,
    I guess a better way to describe what’s happening here is a proposal for special purpose databases to replace RDBMS and/or SQL for specialized applications, but not for everything.
    There’s plenty that RDBMSs do well, but it is time to stop trying to use them for every application that needs a database.
    Eric

  5. Hi Eric,
    Yup, that is a much better way of saying it! 🙂
    There are some fairly good rebuttal’s of Stonebraker and DeWitt’s arguments too, given that map/reduce is not a DBMS, but a family of algorithms:
    http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
    http://scienceblogs.com/goodmath/2008/01/databases_are_hammers_mapreduc.php
    Darach.

  6. Great article and I completely agree with Stonebraker and John’s comments further up.
    Personally, I think one of the major deficiencies of current RDBMS’s as a general rule is the contorted support for structured data such as xml and object data in a way that enables reasonable change management. A number of OR mapping layers have been created, hibernate as an example, to deal with the problem. These really help from an initial development approach but it really doesn’t help or simplify the problem when it becomes necessary to modify the structure of the database. While most of the major databases now support xml in some way, this support is almost always stapled on at the edge over the top of the same old rdbms approach.
    RDBMS seem very orthogonal, and not in a good way, to the way most developers work with the data. Even Oracle seems to recognize the problem as they’ve been buying up many of the database companies, even open source, that best demonstrate some of the new approaches discussed by Stonebraker. In the past few years they’ve purchased Times Ten, Berkeley DB and TangoSol (and more i’m sure). The question is – will they bring this technology to market or bury it. Personally, I’m very sad to see many of these new innovative DB’s fall under the umbrella of the “other” software evil empire.
    Cheers, Lee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s