Category Archives: Software Evolution

Do You Need to Develop for the Cloud?

And other notes from last week’s IASA NE cloud computing roundtable (more about the meeting format later – also note the Web site currently does not have up to date information – we are volunteers after all ;-).

We had good attendance – I counted 26 during the session, including folks from Pegasystems, Microsoft, Lattix, Kronos, SAP, The Hartford, Juniper, Tufts, Citizens Bank, Reliable Software, Waters, and Progress Software,  among others (I am sure I missed some), providing an interesting cross-section of views.

The major points were that no one is really thinking about the cloud as a place for accessing hosted functionality, and everyone is wondering whether or not they should be thinking about developing applications specifically for the cloud.

We touched on the landscape of various cloud computing offerings, highlighting the differences among Google, SalesForce.com, Microsoft, and Amazon.com.  Cloud vendors often seem to have started with trying to sell what they already had – Google has developed an extensive (and proprietary) infrastructure for highly available and scalable computing that they offer as Google App Engine (the idea is that someone developing their own Web app can plug into the Google infrastructure and achieve immediate “web scale”).

And Salesforce.com had developed a complex database and functionality infrastructure for supporting multiple tenants for their hosted application, including their own Java-like language, which they offer to potential cloud customers as Force.com.

Microsoft’s Azure offering seems to be aiming for a sort of middle ground – MSDN for years has operated a Web site of comparable size and complexity to any of the others, but Microsoft also supplies one of the industry’s leading application development environments (the .NET Framework). The goal of Azure is to supply the services you need to develop applications that run in the cloud.

However, the people in the room seemed most interested in the idea of being able to set up internal and external clouds of generic compute capacity (like Amazon EC2) that could be related, perhaps using virtual machines, and having the “elasticity” to add and remove capacity as needed. This seemed to be the most attractive aspect of the various approaches to cloud computing out there. VMware was mentioned a few times since some of the attendees were already using VMware for internal provisioning and could easily imagine an “elastic” scenario if VMware were also available in the cloud in a way that would allow application provisioning to be seamless across internal and external hosts.

This brought the discussion back to the programming model, as in what would you have to do (if anything) to your applications to enable this kind of elasticity in depoyment?

Cloud sites offering various bits of  “functionality” typically also offer a specific programming model for that functionality (again Google App Engine and Force.com are examples, as is Microsoft’s Azure). The Microsoft folks in the room said that a future version of Azure would include the entire SQL Server, to help support the goal of reusing existing applications (which a lot of customers apparently have been asking about).

The fact that cloud computing services may constrain what an application can do, raises the question of whether we should be thinking about developing applications specifically for the cloud.

The controversy about cloud computing standards was noted, but we did not spend much time on it. The common wisdom comments were made about being too early for standards, and about the various proposals lacking major vendor backing, and we moved on.

We did spend some time talking about security, and service level agreements, and it was suggested that certain types of applications might be better suited to deployment in the cloud than others, especially as these issues get sorted out. For example, company phonebook applications don’t typically have the same availability and security requirements that a stock trading or medical records processing application might have.

Certification would be another significant sign of cloud computing maturity, meaning certification for certain of the service level agreements companies look for in  transactional applications.

And who does the data in the cloud belong to? What if the cloud is physically hosted in a different country?  Legal issues may dictate data belonging to citizens of a given country be physically stored within the geographical boundary of that country.

And what about proximity of data to its processing? Jim Gray‘s research was cited to say that it’s always cheaper to compute where the data is than to move the data around in order to process it.

Speaking of sending data around, however, what’s the real difference between moving data between the cloud and a local data center, and moving data between a company’s remote data center?

And finally, this meeting was my first experience with a fishbowl style session. We used four chairs, and it seemed to work well. This is also sometimes called the “anti-meeting” style of meeting, and seems a little like a “user-generated content” style of meeting.  No formal PPT.  At the end everyone said they had learned a lot and enjoyed the discussion. So apparently it worked!

Stay tuned for news of our next IASA NE meeting.

Advertisements

Second Edition of TP Book out Today

It’s hard to believe, but the second edition of Principles of Transaction Processing is finally available. The simple table of contents is here, and the entire front section, including the dedication, contents, introduction and details of what’s changed, is here. The first chapter is available here as a free sample.

Photo of an early press run copy

Photo of an early press run copy

It definitely turned out to be a lot more work than we expected when we created the proposal for the second edition almost four years ago.  And of course we originally expected to finish the project much sooner, about two years sooner.

But the benefit of the delay is that we were able to include more new products and technologies, such as EJB3, JPA, Spring,  the .NET Framework’s WCF and system.transactions API, Spring, SOA, AJAX, REST/HTTP, and ADO.NET even though it meant a lot more writing and rewriting.

The first edition was basically organized around the three-tier TP application architecture widely adopted at the time, using TP monitor products for examples of the implementations of the principles and concepts covered in the book. Then as now, we make sure what we describe is based on practical, real-world techniques, although we do mention a few topics more of academic interest.

The value of this book is that it explains how the world’s largest TP applications work – how they use techniques such as caching, remote communications (synchronous as well as asynchronous), replication, partitioning, persistence, queuing, database recovery, ACID transactions, long running transactions, performance and scalability techniques, locking, threading, queuing, business process management, and state management to process up to tens of thousands of transactions per second with high levels of reliability and availability. We explain the techniques in detail and show how they are programmed.

These techniques are used in airline reservation systems, stock trading systems, large Web sites, and in operational computing supporting virtually every sector of the economy. We primarily use Java EE-compliant application servers and Microsoft’s .NET Framework for product and code examples, but we also cover popular persistence abstraction mechanisms, Web services and REST/HTTP based SOA, Spring,  integration with legacy TP monitors (the ones still in use), and popular TP standards.

We also took the opportunity to look forward and include a few words about the potential impact on TP applications of current trends toward cloud computing, solid state memory, streams and event processing, and the changing design assumptions in the software systems used to power large Web sites.

Personally this has been a great project to work on, despite its challenges, complexities, and pressures. It could not have been done without the really exceptional assistance from 35 reviewers who so generously contributed their expertise on a wide variety of topics. And it has been really great to work with Phil again.

Finally, the book is dedicated to Jim Gray, who was so helpful to us in the first edition, reviewed the second edition proposal, and still generally serves as an inspiration to all of us who work in the field of transaction processing.

IASA NE Roundtable June 23 on Cloud Computing

I just wanted to pass along the notice for the June 23 meeting of the IASA NE Chapter, which will be a roundtable on “cloud computing” hosted by Microsoft but chaired by Michael Stiefel of Reliable Software.

Details and registration

(No, you do not need to be a member of IASA although of course we encourage that. Basic membership is free, and full membership is only $35. )

What should be interesting this time is that everyone seems to be doing something slightly different around cloud computing, whether it’s Amazon, Microsoft, Google, VMware, SalesForce, etc. Cloud computing is definitely an exciting new area, but like many other new and exciting areas it is subject to considerable hype and exaggeration. Good questions include:

  • What exactly can you do in the cloud?
  • What are the restrictions, if any, on the kind of programs and data you can put “in the cloud”?
  • Can you set up a private cloud if you want to?

I think the trend toward cloud computing is closely related to the trend toward commodity data centers, since you kind of need one of those to offer a cloud service in the first place. Like the ones James Hamilton describes in this powerpoint presentation, which I heard him present at HPTS 2007.  (Looks like James has left Microsoft and joined Amazon BTW.)

I would expect a lot of heated discussion from the folks who usually attend the IASA meetings. Attendance has been steadily increasing since we founded the local chapter about a year ago, so I would hope for and expect a very lively discussion.

As usual, the event includes networking time and food & drinks (not usually beer though – have to work on that I guess 😉

Please be sure to register in advance so Microsoft knows how much food to buy.

Thanks & hope to see you there!

Eric

More on the Software Assembly Question- Do Patterns Help?

Since I posted the initial entry questioning the validity of the Henry Ford analogy for improving software productivity through interface standardization, there’s been some good posts by Hal and Richard, and some good feedback to the Sys Con site that syndicated the entry.

While I have to say I think the posts and comments make excellent points about the value of design, and the differences between mass producing hard goods and creating individual applications, I am not sure any clear recommendation is emerging for how to improve the software development process. So now I am wondering whether we can get at this progblem through patterns.

One aspect of the debate over software productivity and assembly is whether or not visual tools can help. I think that they do – visual abstractions can be very meaningful – but I do not know of any visual system that actually solves the complete problem (i.e none have solved the customization/round trip problem). UML tools are furthermore too object oriented for some applications – such as services and REST- although of course I will get an argument from the UML (and MDA?) folks that models are the way to go anyway, and UML and MDA are being changed to be more data and document oriented (i.e. sequence diagrams could be improved in this direction).

I admit I am not up to date with the latest in UML and MDA. But I also don’t know of any reason to change my view that they do not provide the answer. I have yet to see any graphical system entirely able to replace any human oriented language, and I do not think programming languages are any different. People still need text, even when the graphics and icons are superb.

So noting the growing adoption of software patterns, including integration patterns and SOA patterns, and observing the fact that software systems such as Apache Camel for example, are starting to be built around them, I can’t help wondering whether the solution might be found there.

The fundamental issue seems to be identifying the right abstractions. Software is the way people have of telling computers what to do, and it is still too hard, requiring way too much work.

In the Henry Ford analogy, the API (or interface) is seen as the right abstraction. As long as the interface to a program is standardized, its implementation can contain any code. With a standardized interface, programs can be assembled with predictable results (i.e. other programs know what to expect when invoking the interface). This led to the idea of reuse, of libraries of components, objects, and services that someone could sell and others could use in building applications. And this has happened to some extent, but there are also many unfulfilled promises in this area (as David Chappell, among others, has pointed out).

Now if we look at patterns, and how Camel is representing them in software, we see a different type of abstraction being used – basically a variation on the theme of domain specific languages. The domain in this case being integration, and the realization of integration patterns in particular.

One of the challenges of DSLs is integration in fact – that is, how do you join together programs written using different DSLs into a system or application? It sounds like a crazy idea, but what if we were to use integration techniques, such as patterns, themseleves implemented using DSLs, to join programs together written using other DSLs?

Would we have the abstractions right? I.e. in the language instead of in pictures or interfaces? And would we be able to assemble programs together quickly and easily? Maybe we need some patterns specifically for application assembly?

Is Stonebreaker Right? Time to Reinvent the Database?

At HPTS this past October, Michael Stonebreaker delivered a presentation called It’s Time for a Complete Rewrite.

The main point seems to be that the general purpose relational database management system has outlived its usefulness after 30-40 years, and the market needs specialized database management systems to meet current application requirements.

A lot of these topics were covered in an interview published in the Sept/Oct issue of ACM Queue. However, Mike stops short of describing some of his new proposals for these specialist databases. Last Monday a lot of this was discussed at the New England Database Day session at MIT, where Michael now teaches.

It looked to me as if about 100 people showed up, and I believe they said a majority were from industry. The presentations were very interesting. A good summary can be found here.

A highlight was certainly Dave DeWitt‘s presentation on Clustera. Despite the fact I’ve been taking an interest in what Google and others are doing around the new “scale out” architectures, I had missed Dave’s blog on why Map Reduce isn’t so great. He included some of the points in his presentation, but to me it was more of a defense of the current RDBMS than a realistic criticism of Map Reduce in its own context. I am oversimplifying, I’m sure, but a lot of it sounded like “you can do this with a relational database if you wanted to, so why are you bothering with something new?”

Personally I think this kind of misses the main point, which is to consider what advantages can be gained by doing more in memory and disconnecting the whole persistence thing. Another way to put it is that the industry has been focused for years on the fastest way to persist data in order to ensure the best reliability and consistency possible, and do as much automatically as possible and avoid human intervention – the kind of stuff I had to do before transactions were widely available in database systems, i.e. go in by hand and find any partial results that occured due to machine failure and back them out.

But if we were to break that assumption, and say that manual intervention might be ok in some cases, and everything does not have to be done automatically, we could gain some advantages in performance and overall capabilities to handle large amounts of data more effeciently.

I definitely agree it’s time for some new thinking in the database and transaction processing world. The specialized database ideas have a lot of merit – column-oriented databases to improve data warehousing and decision support, streaming databases for event processing, embedded databases for higher performance and better flexibility, and in memory databases for improved update latency – but the most interesting point for me is changing some of the key underlying assumptions.

For example, using memory based systems and thinking about persistence as a second level operation, disconnected or asynchronous from the memory update. Assuming failure and designing systems that fail often to take advantage of lower priced hardware. After all, even the biggest and most expensive systems fail – including software, of course. So why not assume it? And the idea of a central controlling authority, such as a central database system of record, root transaction manager, common system adminstration domain – the Web based designs are clearly showing the need to redesign and rearchitect not only our system software such as databases, TP monitors, and application servers – but also our enterprise applications.

Have We Got it all Backwards with Software Assembly?

I am as guilty of this as anyone else. Back in the 90s I was on a big project to standardize enterprise software. We wrote a few papers about it, and a chapter in a book. We often used the “Henry Ford” analogy, which relates to the impact standards for interchangable parts had on hard goods manufacturing.

The Henry Ford analogy says that the hard job in mass assembly is getting the interchangeable parts standardized – thereafter creatng the moving assembly line is the easy job. Ford pulled it off with the significant market success of the Model-T and changed the world.

In the original story (which the link directly above summarizes), the crucial quote for us was:”The key to mass production wasn’t the continuously moving assembly line, as many people believe, but rather the complete and consistent interchangeability of parts and the simplicity of attaching them to each other.”

But of course in the updated book, Toyota further changed the world from craft to mass production (i.e. Ford’s achievement) to lean production. In software however we are still struggling to achieve mass production, never mind lean production.

The application of the Ford analogy to software is that if you can standardize application programming APIs and communictions protocols, you can meet requirements for application portability and interoperability. If all applications could work on any operating system, and easily share data with all other applications, IT costs (which are still mainly labor) would be significantly reduced and mass production achievable.

The industry has seen many efforts in this direction fail, or only partially succeed. Today’s environment is better than the early 90s, but we still have incompatibilities across various editions of Java, enough differences among J2EE application server implementations to inhibit easy migration among them, and of course a signficant level of incompatibility between enterprise Java, Microsoft .NET, and IBM mainframe environments. Applications that want to leverage the best of breed across these enironments typically have to do a lot of craft, i.e. hand coding.

Seven years ago I remember thinking Web services and XML might finally solve the problem, but perhaps because of the way the specifications were implemented (basically adapting to existing technologies) in the end only a partial soluton was achieved. Yes, interoperability is improved compared to what it had been, but it still requires too much hand coding.

Even though I’ve been working towards the “Henry Ford” analogy for more than a decade, recent exposure to inversion of control concepts (e.g. Spring and Guice) and OSGi makes me think the mass production analogy is backwards for software after all.

The Ford analogy has played out in software typically by positioning the assembly problem as the easy part of the job and creating resuable services for assembly as the hard part of the job. I can’t tell you how many times I’ve heard business process modeling and orchestration tools pitched at “business analysts” only to discover the proper use of the tool requires someone who can actually code.

The easy part should be developing the reusable services. The hard part should be their composition and assembly.

Corporations around the world are squeezing IT budgets, which means looking to reduce labor costs. Many are turning to outsourcing to China and India, and others are looking to hire college graduates in place of highly paid (and more highly skilled) coders.

But almost by definition the Ford analogy can’t work. You cannot really get lower skilled, untrained developers to tackle sophisticated problems such as component reuse. They can create simple objects incorporating business logic, and to use one description, the plainer the old Java object (POJO) the better.

What we need are not simple tools for business analysts to compose services into flows. We need sophisticated tools for architects and designers to import POJOs and plain old anything else, check them for conformance to the plan, and fix them up for deployment. What’s the right analogy here? Farming?

The Second Wave

The IT industry is in the middle of a signficant transition. At least one article has already labeled it IT 2.0. I give Dr. Elizabeth Joyce (a former IONAian BTW if I’m not mistaken) some credit for noticing the change in IT, but the industry grokked the change in emphasis from data to information about 20 years ago. That’s why it is now called “Information Technology” instead of “Data Processing.”

It is also true, as she points out, that IT is becoming more business focused. But this is a symptom of change, not the cause. The cause of all the changes is the result of the first wave of adoption drawing to a close and the second wave starting up.

The first wave was about adopting computers in business – automating previously manual operations. This activity was based on a strong ROI that justified a heavy initial investment in hardware, software, and application development.

The second wave is about improving upon what was created during the initial adoption phase, and is based on a very different ROI. Automating a previously manual business function is a very different activity than improving upon that automation.

Let’s look at some of the clear signs of change in the industry:

  • SOA adoption
  • Vendor consolidation
  • Open source commoditization
  • Offshoring and outsourcing

SOA Adoption

Depending on who you ask, SOA has been around for 10-15 years. For sure IONA has customers who have been actively developing SOA-based applications for 4, 5, and even 8 years. So we know it has been around for a while, but it has been on the peripherphy until now.
Why now?

Gartner says the term first appeared in a 1996 research paper by Yefim Natis. I also know that the Credit Suisse SOA has been in production since 1998. So it’s clear SOA has been around for a while, but why is it so popular now, as opposed to 10 or more years ago when it was introduced? Don’t IT trends typically follow the reverse pattern?

I believe the answer is that the industry has finally reached the point of maturity at which it makes sense to adopt it. Instead of looking forward to the next function to automate, enterprises are looking back at the heterogeneous spaghetti-like mess resulting from the heads-down rush to computerize. It’s only now that the end is in sight that we can start to take a look back and assess what was done – and try to find a way to improve it. I.e. adopt SOA and reuse it.

Consolidation

Larry Ellison gave his take on this to Business Week in a 2003 interview. I think he was right about some things, but obviously wrong that software innovation would primarily come from large companies and that the startup wave is over.

But it interesting that other industries have also gone through an initial adoption wave and then gone through a consolidation stage. Making it a clear sign the first wave is drawing to a close. No doubt many of the transition changes are painful in starting the second wave. But unlike Larry Ellison, I believe the established enterprise software companies are not immune from that pain. In fact they may bear the brunt of it in the end. They are not changing their approach – they are stuck promoting their old products, and in their old business models. Consolidation is at least in part a defensive move. Industry disruption does not come from the establshed players.

Commoditization

Another clear sign of change is in the commoditization happening in open source software. We don’t need more new features and functions in our operating systems, database management systems, programming languages, and middleware. We need cheaper and better implementations of them. This is a clear sign we have reached a critical mass of core features and functions, and the industry does not need a major new feature or function in enterprise software any more than an office worker needs a new feature in Excel or PowerPoint.

To continue the office automation analogy, at one time new features were defninitely needed in Word so that companies could justify its use in place of special purpose word processing machines, but no longer. That war is over.

Similarly, the open source phenomenon is all about commoditizing well known and well established software features and functions, and increasing its overall value. During the initial adoption phase with a strong ROI over replacing manual processes, it made sense for vendors to compete based on their IP – someone was, after all, going to come up with a better way to implement a transaction. Now that we are done with the first wave we can focus on how to produce software that accomplishes the same things more cheaply and effeciently.

Outsourcing

Here the correlation between the first and second waves is pretty clear. Durig the initial adoption phase the cost of labor, still the largest cost of IT by far, was not a signficant issue since we were replacing manual processes with computers. As we start on the second wave, reducing labor costs will continue to be important. Companies are struggling to reconcile the costs of IT with their business strategies – it’s very hard for many companies to understand exactly how their IT spend influences the bottom line.

A better way of doing IT, such as a division of labor through the adoption of service interfaces (for example) and the trend toward configuration based development (e.g. Spring and OSGi) are clear responses to the need to do more with less – to increase the proportion of the contribution from lower cost IT labor, in other words.

Summary

Remember when all we cared about was finishing one application so we could move on
to the next? Should we automate order entry next? Inventory management? Manufacturing planning? How about removing those old office automation machines and replacing them with PCs? We are so done with all that.

Now it’s all about how to improve what we already have, since what we already have is pretty much enough. I mean we have automated pretty much all of the previously manual operations that can be automated. Ok, there are still a few left, but what’s left is of marginal impact.

The major point is the difference between a foundational investment and an incremental investment – the difference between the first and second waves.

A foundational investment establishes the industry – in this case creates the initial applications – based on the initial wave of adoption of computers. The second wave of adoption, in which the balance of the investment shifts to improvement, is well underway, with clear evidence in the major changes now taking place.

Consequently new and different approaches to IT are emerging – such as SOA – to focus on improving what’s already there. And a new kind of enterprise software is also emerging, one that is better aligned with the changes of the second wave – a lightweight, configurable container – that gives the right set of features and functions for the right price point.