Monthly Archives: February 2008

Is Stonebreaker Right? Time to Reinvent the Database?

At HPTS this past October, Michael Stonebreaker delivered a presentation called It’s Time for a Complete Rewrite.

The main point seems to be that the general purpose relational database management system has outlived its usefulness after 30-40 years, and the market needs specialized database management systems to meet current application requirements.

A lot of these topics were covered in an interview published in the Sept/Oct issue of ACM Queue. However, Mike stops short of describing some of his new proposals for these specialist databases. Last Monday a lot of this was discussed at the New England Database Day session at MIT, where Michael now teaches.

It looked to me as if about 100 people showed up, and I believe they said a majority were from industry. The presentations were very interesting. A good summary can be found here.

A highlight was certainly Dave DeWitt‘s presentation on Clustera. Despite the fact I’ve been taking an interest in what Google and others are doing around the new “scale out” architectures, I had missed Dave’s blog on why Map Reduce isn’t so great. He included some of the points in his presentation, but to me it was more of a defense of the current RDBMS than a realistic criticism of Map Reduce in its own context. I am oversimplifying, I’m sure, but a lot of it sounded like “you can do this with a relational database if you wanted to, so why are you bothering with something new?”

Personally I think this kind of misses the main point, which is to consider what advantages can be gained by doing more in memory and disconnecting the whole persistence thing. Another way to put it is that the industry has been focused for years on the fastest way to persist data in order to ensure the best reliability and consistency possible, and do as much automatically as possible and avoid human intervention – the kind of stuff I had to do before transactions were widely available in database systems, i.e. go in by hand and find any partial results that occured due to machine failure and back them out.

But if we were to break that assumption, and say that manual intervention might be ok in some cases, and everything does not have to be done automatically, we could gain some advantages in performance and overall capabilities to handle large amounts of data more effeciently.

I definitely agree it’s time for some new thinking in the database and transaction processing world. The specialized database ideas have a lot of merit – column-oriented databases to improve data warehousing and decision support, streaming databases for event processing, embedded databases for higher performance and better flexibility, and in memory databases for improved update latency – but the most interesting point for me is changing some of the key underlying assumptions.

For example, using memory based systems and thinking about persistence as a second level operation, disconnected or asynchronous from the memory update. Assuming failure and designing systems that fail often to take advantage of lower priced hardware. After all, even the biggest and most expensive systems fail – including software, of course. So why not assume it? And the idea of a central controlling authority, such as a central database system of record, root transaction manager, common system adminstration domain – the Web based designs are clearly showing the need to redesign and rearchitect not only our system software such as databases, TP monitors, and application servers – but also our enterprise applications.

Have We Got it all Backwards with Software Assembly?

I am as guilty of this as anyone else. Back in the 90s I was on a big project to standardize enterprise software. We wrote a few papers about it, and a chapter in a book. We often used the “Henry Ford” analogy, which relates to the impact standards for interchangable parts had on hard goods manufacturing.

The Henry Ford analogy says that the hard job in mass assembly is getting the interchangeable parts standardized – thereafter creatng the moving assembly line is the easy job. Ford pulled it off with the significant market success of the Model-T and changed the world.

In the original story (which the link directly above summarizes), the crucial quote for us was:”The key to mass production wasn’t the continuously moving assembly line, as many people believe, but rather the complete and consistent interchangeability of parts and the simplicity of attaching them to each other.”

But of course in the updated book, Toyota further changed the world from craft to mass production (i.e. Ford’s achievement) to lean production. In software however we are still struggling to achieve mass production, never mind lean production.

The application of the Ford analogy to software is that if you can standardize application programming APIs and communictions protocols, you can meet requirements for application portability and interoperability. If all applications could work on any operating system, and easily share data with all other applications, IT costs (which are still mainly labor) would be significantly reduced and mass production achievable.

The industry has seen many efforts in this direction fail, or only partially succeed. Today’s environment is better than the early 90s, but we still have incompatibilities across various editions of Java, enough differences among J2EE application server implementations to inhibit easy migration among them, and of course a signficant level of incompatibility between enterprise Java, Microsoft .NET, and IBM mainframe environments. Applications that want to leverage the best of breed across these enironments typically have to do a lot of craft, i.e. hand coding.

Seven years ago I remember thinking Web services and XML might finally solve the problem, but perhaps because of the way the specifications were implemented (basically adapting to existing technologies) in the end only a partial soluton was achieved. Yes, interoperability is improved compared to what it had been, but it still requires too much hand coding.

Even though I’ve been working towards the “Henry Ford” analogy for more than a decade, recent exposure to inversion of control concepts (e.g. Spring and Guice) and OSGi makes me think the mass production analogy is backwards for software after all.

The Ford analogy has played out in software typically by positioning the assembly problem as the easy part of the job and creating resuable services for assembly as the hard part of the job. I can’t tell you how many times I’ve heard business process modeling and orchestration tools pitched at “business analysts” only to discover the proper use of the tool requires someone who can actually code.

The easy part should be developing the reusable services. The hard part should be their composition and assembly.

Corporations around the world are squeezing IT budgets, which means looking to reduce labor costs. Many are turning to outsourcing to China and India, and others are looking to hire college graduates in place of highly paid (and more highly skilled) coders.

But almost by definition the Ford analogy can’t work. You cannot really get lower skilled, untrained developers to tackle sophisticated problems such as component reuse. They can create simple objects incorporating business logic, and to use one description, the plainer the old Java object (POJO) the better.

What we need are not simple tools for business analysts to compose services into flows. We need sophisticated tools for architects and designers to import POJOs and plain old anything else, check them for conformance to the plan, and fix them up for deployment. What’s the right analogy here? Farming?