Friday, March 12, 2010

Outside the box ...

A good friend of mine who runs a construction business received a set of photos from contractors who work with him. The photos, actually taken at the work sites, depict extraordinary examples of poor decision-making. Imagine nailing wood to a perimeter fence with nails 3” too long, so that rows of sharp-pointed nails protruded from the supporting rails surrounding a play-area for children or mounting a light fitting, with a regular light bulb, in a bathroom and placing it six inches above a functioning shower-head! But perhaps the best picture was from someone upset over regular power outages from circuit-breakers that kept tripping. Pulling all out the wires from behind each breaker and bypassing the fuses seems to have done the trick – and the photo above is of the newly re-configured fuse box.

There’s definitely a strong case to be made for those who take the initiative and can do things themselves. I am in awe of neighbors who can fix leaking faucets and can wire a house for speakers. I also look favorably on anyone who can get a lawnmower (or snow blower) working after months of neglect, and I hold in high regard anyone who can fix a water heater or furnace. At the time these skills were being handed out, I must have been elsewhere and I have had to live a poorer life as a result. I am getting now into it, very slowly, I have to admit, and I am enjoying every opportunity of late to roll a jack under the car and changing all four wheels.

The bypassed fuse box, however, reminds me of how frustrating it can be when services fail and we can no longer do what we had planned on doing. This week, Californians once again had to live through a Department of Motor Vehicles (DMV) computer crash – the second such crash this year. Newspapers and television stations quickly picked up on the developing story as long lines began to form outside many DMV’s branches. While there is no immediate news about what caused the current problem, the incident earlier this year was reported to have been caused by a major router failure.

I do not know what servers are installed at California DMV, and I am not certain about the network equipment deployed, but all I can visualize is someone inside the data center, agitated by a troublesome router, pulling out all the lines and twisting them together. See, it works! Just as the fuse box above most likely succumbed to the heat and caught fire, I can imagine a whole bundle of communications lines just shorting out! Probably didn’t happen, but the results had to be the same – no power, again, in the home, and long lines outside the DMV. Availability, or in this case, the lack of availability, remains as big an issue as it ever has and it’s not going to be fixed by someone who likes to do things by himself!

I have just finished writing a column for the March, ’10 issue of the eNewsletter, Tandemworld. It’s been six months since I began working for myself to develop white papers and articles for other companies. As I looked at the value that comes with active participation in social networks, and the head-start this gave me as I launched my company, I thanked the companies that really helped me out. While I sounded a bit like a NASCAR driver after winning a big race, as I thanked all my sponsors, I am encouraged by the work that continues to come my way from the NonStop community. I am starting to diversify but even as I work with companies on platforms other than NonStop, I often see the potential for more business if only the product ran on NonStop!

The value proposition that comes from running solutions on NonStop continues to center on availability. It is a byproduct of the architecture supporting continuous availability that brings with it the levels of scalability we see today. There are many within our community who would like to see more emphasis given to the security attributes of NonStop and for sure, they make a strong case; but it is the superior levels of availability that separates NonStop from all other computer architectures. In working with NonStop vendors, there’s never any backing away from, or discounting the value of, the continuous availability inherent with running NonStop!

I frequently engage with other commentators on the merits of the NonStop architecture, starting out as it did in support of transaction processing where the typical user was the company’s customer. Nothing flusters any customer more than not being able to complete a transaction they had waited some time to perform. No question about it, as the TV crews began interviewing the public, lined up as they were outside the DMV offices this week, it was the frustration from not being able to complete the simple transaction of renewing a license, registering a car, obtaining titles, and so forth that was so easily recognized.

Of late, IBM is pursuing a strategy of downplaying the importance of availability. When HP began talking about the number of “nines” that could be reached when using NonStop, one IBM executive dismissed such categorization out of hand with the remark “leave the number nine to the Beatles,” a reference to their Revolution 9 on the White Album. More recently, analysts working closely with IBM have acknowledged that “7 nines” is achievable (3.15 seconds of downtime per year) using NonStop, and that no System z configuration can reach this level of continuous availability, only to suggest that commercial IT systems rarely achieve 7 nines, or need it.

However, many users of NonStop will readily testify to reaching these levels – and are doing it routinely. Just look at recent winners of NED’s availability awards, handed out at the yearly show! For them, the value of having such servers underpinning critical business applications is that it allows them to differentiate their “products” from those of competitors with less reliable deployments. It never ceases to amaze me how so many users think that by doubling up on hardware and duplicating processes and designing abstraction layers that hide where the work is actually done, improvements can be made to just how many 9s of availability can be achieved.

Many years ago, an engineering friend of mine reminded me that for units with a Mean Time Between Failure (MTBF) of five years helps little should you have 60 units. In a modern house these days, when you add up all the kitchen appliances, audio / video equipment and their remote controllers, heaters and air conditioning units, garage door openers and basic security system components, it’s pretty easy to accumulate 60 units. With five years MTBF I should expect something failing every month – and I should be planning accordingly.

I only raise this as on so many occasions, I hear perfectly reasonable CIOs talk of how adding additional components into their IT environment is improving the overall availability when really, it only improves when you strip things away! The beauty of NonStop is in its simplicity. For decades, NonStop engineers have catalogued all the problems that bedevil computer systems and the product we see today reflects the recognition of all that had to be considered. The simple reason why NonStop has little competition and why it alone sits atop the table of 9s is that the bar has been set so high that the entry price for new participants is just so prohibitively expensive, no one is prepared to take the risk.

In talking with one of my clients who has just ported his solution to NonStop, one of the reasons it was done was to reduce the costs – and this took me by surprise. But in reality, this too is a byproduct of a good architecture. “When it comes to the IBM mainframe, the solutions IBM provides to improve availability are extremely expensive and require substantial investments in fostering and retaining in-house expertise. Implementing comparative levels of availability to that of NonStop Server requires a Parallel Sysplex configuration with additional mainframe systems,” I was advised.

“Achieving uptime comparable to NonStop is just as problematic for users of Unix systems as well. In order to support the highly-available system financial institutions demand, the costs not only include the UNIX hardware and related operating system, but also that of data base, clustering software, and transaction processing middleware, and the related annual product support fees over a 5 year period can amount to as much as the cost of the acquisition of the payment solution,” my client then explained.

It is observations like this from vendors investing in ports to NonStop that I find so encouraging. There will never be a competitive architecture matching the attributes of NonStop. In today’s intensely competitive marketplace, I am sure HP recognizes the product they now have and the satisfaction so many NonStop users have in never having to face the press or be captured on TV following the unavailability of a service.

The contractor who found the fuse box, shown above, must have been really alarmed with what he saw, but I can sympathize with the client. Up to a point! In this instance, the results were pretty scary – who can say for sure how long it would have been before the surrounding structure caught fire. I have to admit though – he was definitely thinking outside the box! Just like this fuse box, however, thinking outside the box will prove of little value when it comes to availability, that’s for sure!

1 comment:

Alan said...

More recently, analysts working closely with IBM have acknowledged that “7 nines” is achievable (3.15 seconds of downtime per year) using NonStop, and that no System z configuration can reach this level of continuous availability, only to suggest that commercial IT systems rarely achieve 7 nines, or need it.

To make a statement like that IBM clearly does not understand the business.
If a major stock exhange system, and remember they do not run 24x7, fails during the trading hours it can have a major impact on the market and country's economy