Wednesday, June 2, 2010

What’s in your garage?

Over coffee with neighbors the other morning the discussion turned to innovation and to how ideas moved from just being sketches on napkins to becoming viable products. Our neighbor, Kevin, has seen several of his ideas turned into products and when he invited some of us to come and take a look at what he was currently developing, I jumped at the opportunity. Sure enough, inside his garage, there was a laboratory with prototypes and testing equipment of all types spread out along a number of benches. The picture above is of us gathered in his garage, sections of the roll-up garage door clearly visible.

In case this looks unimpressive, what he was developing, and had already taken to a prototype stage, proved to be pretty sophisticated. Kevin was working on a way to measure potential impurities that could find their way into the fuel onboard satellites, and where after reaching orbit, there’s very little opportunity to perform any maintenance. By using light, and measuring disruptions in the light as the propellant passes through his measuring device, microscopic particles can be detected. Kevin has developed a reputation within military circles with previous inventions, and this latest device looks likely to be snapped up as well.

Why such space-age creations are in the hands of “garage inventors” like Kevin? And why would government agencies within the United States have expectations that such inventions would come from suburban garages? Time and time again, when something innovative is called for, it’s the cadre of small inventors who routinely sort through the technological possibilities, often far-removed from mainstream consideration, and yet come up with affordable products. It’s no surprise then that so much of the technology we have come to rely upon has had it’s origins in small garages along the Californian coastline.

It’s hard for any of us to ignore the history of Apple with the fabrication of their first PC in a Silicon Valley garage, just as it’s hard for anyone to ignore perhaps the most famous garage of all, the one used by the founders of HP that is now listed on the national registry of historic places. Even though I knew of these very famous garages, it wasn’t until I stepped into Kevin’s garage and listened to his passionate description of the research he was doing, that I really had a sense of how literal garage-research and garage-prototyping was and that so much could be engineered from basic items you could buy from your local store. It’s hard to imagine that the device to be used to measure particles in satellite fuel started out with an outdoor low-wattage light bulb, a couple of discs made from aluminum foil, and some round mirrors from a beauty salon!

Recently I had an email exchange with Jimmy Treybig, founder of Tandem Computers back in the mid ‘70s. For those who have seen the recent comments posted to the discussion “Scale? Not a fishy subject ...” in the Real Time View group on LinkedIn, a complementary social channel to this blog, would have seen that I was revisiting the “Tandem Fundamentals”. This discussion started following the remark of how “a few days ago, a comment posted elsewhere by Nigel Baker has had me thinking - scalability, the oft-forgotten, attribute of NonStop.” After all, with the emphasis on availability, what about scalability? And, just as importantly, with all the discussions about virtualization and cloud computing, is scalability becoming even more important than availability? After thirty-five years, should we rethink the attributes that first surfaced when Tandem Computers were little more that sketches on beer coasters?

In the email exchange with Jimmy, he explained that “scalability is the same as on-line repair which was there in the beginning (with Tandem). If some part fails, you must be able to repair it on line and then the system must expand while it is running to reincorporate the failed part.” In other words, while addressing the ability to provide a truly fault tolerant computer, and where a failed part (including a complete processor) could be taken offline, worked upon, and returned to service without disrupting the application that was running, scalability played an integral part in ensuring Tandem was fault tolerant!

In the article I posted on April 29th, 2010 “Adding tow hooks?” that covered the news release on HP’s mission-critical “converged infrastructure”, where HP had made the decision to beef-up the redundancy and resiliency of the cross-bar fabric, I suggested that “NonStop users will recognize that this is exactly what ServerNet provides today. However, improving redundancy and reliability doesn’t create a fault tolerant system!” In that post I went on to add that “the difference between redundancy and resilience, to the fault tolerance NonStop provides, is similar to comparing a tow hook to the electronic aids of a modern car.”

From my earliest times at Tandem Computers, I have known that at the core of the Tandem Fundamentals there had always been Fault Tolerance, Scalability, and Data Integrity. Applications developers were quick to exploit these capabilities despite the lack of tools and infrastructure. Dr Michael Rossbach told me of how he “was approached in 1978 by a friend … (as) at the time, there was a lack of skills about Tandem among software vendors – they were all looking for resources to be trained in Guardian / TAL; there was no Pathway at that time … so I started training in the early spring of 1979 and started my own business in July 1979!” Dr Rossbach wasn’t alone and over the next three decades, solutions leveraging the availability attributes of Tandem appeared from every part of the planet!

However, while very few within the industry question the NonStop’s availability properties, even as competitors continue to hype how they continue to bridge the gap between their server offerings and the HP NonStop server, is it also time to look more closely at what really separates the NonStop server from all other server offerings, particularly as it plays such a significant role in the support of today’s mission-critical applications. Perhaps it was time to check in with Martin Fink, Senior VP and General Manager, HP Business Critical Systems and get his take on what were the key attributes of NonStop today!

In hid response, and somewhat of a surprise, Martin was quick to list scalability first stating “there are two general types of scale: Scale-up and Scale-out. Nonstop excels at Scale-out. Why is that important? Because when customers (like banks) need to deploy tens of thousands of ATM machines, they need to know that ATM machine #1 and #50,000 will perform the same way and deliver the same customer experience. That’s what NonStop does. Extreme scale, with consistent performance across the scale spectrum. Nothing else can do it as well as NonStop.”

Martin then added real-time performance as an attribute, pointing out to me “when a cellular operator needs to decide in less than second that a subscriber is authorized to make a call, NonStop delivers that. But, the real point is that NonStop does it in real-time when millions of subscribers are trying to connect calls all at the same time. I don’t know of anything else out there that can deliver that kind of real-time results on the scale of millions of transactions the way NonStop does.”

Having read the blog post already referenced here, Martin agreed with me, adding “as you point out in your article, there’s more to fault-tolerance than redundancy. While most systems out there (including Unix, Linux, Windows) operate under the concept of ‘Fail-Over’, Nonstop combines a shared-nothing hardware infrastructure with a software ‘Take-Over’. The Nonstop take-over system operates at the process level and is near instantaneous. The point here is that not only does NonStop deliver extreme resiliency, it does it in a transparent way, and with the simplest of configurations.”

Finally, rounding out the list of key attributes, Martin didn’t miss the chance to talk about open standards, and finished with “that was the point of bringing NonStop to the blades world. NonStop now uses standard blades (the same ones used in the Integrity portfolio). Where others develop fault-tolerant systems thinking proprietary from the ground up, we think about standards from the outset and focus our innovation on things that really matter to customers. Things like NSK take-over, extreme scale-out, shared nothing, etc.”

As Kevin guided us around his garage laboratory, new projects were already starting. Kevin’s enthusiasm never missed a beat and it was certainly contagious. In my exchange with Jimmy I asked him whether Tandem Computers had it’s origins in a garage as well. Unfortunately, when it came to full-fledged computer system such as a Tandem, starting in a garage was not an option. As Jimmy explained “a garage start-up was not possible (as it) took too much money ($3 million), and there was not a product that could generate revenue before the total was finished.” Tandem Computers gave us the Friday beer-bust, First Friday reviews, the TOPS club, but no, there wasn’t a garage.

And yet, I have to believe there were many people, like Kevin, every bit as enthusiastic about what they were building. That there is a readership today still interested in commenting about the attributes of NonStop and about the Tandem Fundamentals is testament to the material impact the technology continues to have on the way we support applications. Garage or not, the innovation that surfaced with Tandem and that still intrigues so many of us in the industry, is as relevant today as it was those thirty-five years ago!

6 comments:

Gerhard Schwartz said...

Good to see a summary on the NonStop fundamentals - but it might be worthwhile to remind on why these are important ...

Unlike 35 years ago when NonStop was born, we now live in an entirely different era - we do live in the Internet age.

And it turns out that in the Internet age, those fundamentals like continuous availability around the clock and around the calendar, linear scalability far beyond any perceiveable limit and rock-solid data integrity no matter what happens are key requirements to provide dependable and successful services via the Internet. Those NonStop fundamentals are now more important than ever before !

And in order to deliver such great services at a reasonable price, we need self-managing and self-healing systems that don't require armies of system administrators and other IT specialists to keep them running.

So far for the nice theory, but do we get that kind of service via the Internet today ? Nope, we all do know the various deficiencies coming with those complex server farms running Internet-based services today. Availablity is less than ideal, lack of scalability brings slow response times in high load situations - and sometimes databases do get corrupted, resulting in all kind of difficulties.

But by far the worst deficiency is lack of Internet security. This causes vast spendings (many billions of dollars per year) for all kinds of humble products and services that cannot solve the problem. In fact, the money spent in this area is badly missing in order to fix these other problems mentioned before.

In the pre-PC and pre-Internet age, IT systems were very expensive and so there was the common understanding that those high-price systems have to work flawlessly. Users and their management would not accept poor service levels.

Today, those people all do have their own experience with PC's. People have become more "fault tolerant", they figure that today's IT is essentially based on cheap PC technology and are much more willing to put up with poor service. And nobody really keeps track on the losses caused by poor service levels ...

Instead, those people are asking for further price reductions in the investment phase. For instance, they ask for server virtualization to squeeze out more from fewer hardware boxes. They fail to realize that this introduces even more complexity into their IT infrastructure, causing less reliability and higher system management cost.

But we can't blame those people for doing stupid things unless we tell them about better alternatives. We need to tell them that there are alternatives to that cheap PC technology - which may be great in many application areas, but not-so-great in some others, especially when delivering Internet-based services to millions of people.

We also need to make NonStop Internet-ready. In fact, the inherent security (like not requiring costly security patching and being immune against buffer overflow attacks) makes NonStop the ideal platform for Internet access. But we still need some functional improvements to play in that role.

And we need to complete the list of NonStop fundamentals. These are:

+ continuous availability
+ extreme data integrity
+ linear scalability

and

+ top notch platform security

In combination, these NonStop fundamentals deliver mainframe service levels at standard server cost. The higher initial price tag is quickly compensated by the savings resulting from lower system management cost and avoided downtime cost.

Keith Dick said...

I generally agree with most of what Gerhard wrote. I especially agree about the point that people have become conditioned over the years that computers are, by nature, unreliable. They see the frequent failures of the computer systems they deal with at work, in their daily retail transactions, and at home. They think it has to be that way.

But there is one statement he made that I have to question. He said:

"In fact, the inherent security (like not requiring costly security patching and being immune against buffer overflow attacks) makes NonStop the ideal platform for Internet access."

Really? I have seen claims that there have been no breaches of security on NonStop systems. Maybe that is true so far. But, if true, I wonder whether that is more due to lack of effort by attackers than to the inherent security of the system.

The iTP WebServer product is a slightly modified Apache. I know Apache has some vulnerabilities that have been exploited from time to time. I would expect that at least some of those vulnerabilities apply to the iTP WebServer product as well.

I think the Java virtual machine used on NonStop is based on a commonly-available Java virtual machine (I don't recall which one). I would expect that that commony-available JVM has some vulnerabilities, and that at least some of them would also be present in the NonStop Java virtual machine.

And even code implemented from scratch by Tandem, now HP, developers probably is not immune to design or implementation errors that would make the system vulnerable to some attacks.

I'm not saying the NonStop system is any worse than the others. But I don't believe we have enough evidence to say that it is any better. If I'm wrong and there is strong evidence that the NonStop system is less vulnerable than the others, then that would be a good selling point and it would be good to add that to the marketing efforts.

Gerhard Schwartz said...

Interestingly, some NonStop folks (who are of course working with very critical systems) seem to be also very critical by their own nature - and are always looking for arguments against that unique platform which is ultimately paying their checks ... (;-))

So when NonStop hasn't been hacked from the outside so far, this can only be because hackers are not interested ... Really ? NonStop systems are controlling most of the world's cash dispensers, what would be a more convenient way for a little criminal hacker to steal money ? And how about a top notch criminal hacker, wouldn't he very much like to divert some millions from a high value payments system to some far-away place ?

The reality is that NonStop is indeed much more secure than those usual platforms, it is on par with the IBM mainframe. Ever heard about an IBM mainframe being hacked from the outside ? No, and there are many more of those around than there are NonStop systems. So a much better level of IT security is indeed possible, so why should it not be possible on NonStop as well ?

But it is true that there are several factors contributing to that extremely high level of security.

It starts with the hardware - the commonly used X86 architecture has lots of weak points, and is actually to blame for many problems that Microsoft usually gets blamed for. Itanium is a different kind of beast.

The NonStop OS is built for security from day one, eg. it won't allow those buffer overflow attacks often used to insert malware into mainstream systems.

X86 hackers have very easy access to related HW and SW. Not so to NonStop HW and SW, and if they had they would have a very hard time to figure out how it works. I'm aware that "Security by obscurity" is an insult amongst IT security folks, but everybody knows it works very well ... (;-))

So I'd believe that apart from just counting incident (very quickly finished on NonStop), there are also a number of other good reasons why NonStop is more secure - even if you run some Java apps on it.

And even if we dreamed that in ten years from now we would have a million NonStop servers on this planet, and every once in a while one of them would indeed get hacked - wouldn't the world still be a better place for us ? (;-))

Richard Buckle said...

I suspect that today there is definitely an element (of our society) well aware of the role NonStop plays and, unlike in the past, there's a solid sprinkling of PhDs among them. Create a prize and someone will have a go - I am a little pessimistic that one day, the security of NonStop will be breached and we may never hear about it. Someone siphons off something too big for a country to want to talk about it ...

But this is not the place where I want to encourage such thinking. What I do think that NonStop is often overlooked today for Internet “integration” either as a front-end transaction processing engine or a back end data base engine. At HPTF last week I heard (for the first time) the importance of NonStop for “continuity-critical” solutions. Maybe this term has a home elsewhere – if so, I haven’t heard if before. But I like it – that puts yardage between other cluster solutions and NonStop!

Perhaps even while as Gerhard suggests, “the reality is that NonStop is indeed much more secure than those usual platforms, it is on par with the IBM mainframe” then this only adds to the compelling case for deploying NonStop where such continuity-critical solutions run! After all, continuity has something to do with no hacker causing a disruption, right?

Wil Marshman said...

It turns out that iTP WebServer is not Apache based. It originated from code that Tandem acquired from a third party,Open Market, and was significantly modified/enhanced to work efficiently on the NonStop architecture.
Regarding the Security of NonStop systems, we are continuously working to improve it (we do not rest on our "laurels"). Could some expert hacker break in - probably. But as Gerhard says they would have to spend a lot of time/effort hunting for (obscure) vulnerabilities.

Keith Dick said...

Wil: Thanks for the correction. After looking over a couple of the manuals, I think my memory mixed up the fact that NSJSP is based on Tomcat from Apache with the origin of the web server. Definitely my mistake, and I'm glad you mentioned it.

As for the security of the NonStop systems, I think we have similar opinions about it. I agree it is very good, and I did not mean to imply otherwise. My point was that no system is safe against every attack, so it would be unwise to claim complete invulnerability. Someday, some attacker is bound to find a weakness, and the damage to the NonStop image from that incident will be less if the marketing messages prior to that point had been more realistic, not claiming to be invulnerable.

And I'd think that if the marketing claimed invulnerability, that would increase the chances of a successful attack by leading customers and application developers not to be quite as careful about security as they otherwise would be.