Thursday, August 11, 2011

NonStop revels in Clouds!

Perhaps the biggest surprise for many at HP Discover, was the demonstration of GuardianAngel – NonStop, capitalizing on cloud resources – and for Enterprise users, a whole new way to view cloud computing!

Business took me into Omaha last week, and then I headed off to Minneapolis. Rather than driving an American car, I drove an import on this trip and though this has little to do with the immediate story, when the return trip took us through Sturgis, South Dakota, during the annual motorcycle festival, there were moments when I had second thoughts about the wisdom of what I had done.

The picture above shows me standing in the center of the street surrounded by motorcycles stretched out as far as the eye could see. And the overwhelming choice of the rally participants was American-made bikes and the older the better. Surprising, there was even an exhibit by the motorcycle vendor, Indian, who I thought had long since left the scene following a brief resurrection a few years back.

And of course, there’s no escaping the unique looks of the very modern Victory motorcycle, with its distinctive V-shaped tail lights! Looking at these bikes that are representative of what had been developed in the past was pretty cool, but still, there wasn’t anything I was anxious to own any time soon.

Picking up a copy of Road and Track last night I turned to the editorial only to see the headline stating “The shape of things to come” where editor, Matt DeLorenzo, quoted current Renault chief designer, Laurens Van der Acker, as having said “cars should be a symbol of progress!” The background for this story had been a car show at Lake Como’s Villa d’Este featuring cars of the past and yet, “when you look at the levels of performance available relative to what cars have historically cost,” DeLorenzo wrote, “we are living in a golden age.”

In the post of November 30th, 2010, “Nothing seems to last ...” I wrote of how readers “may have missed some commentary I provided in NonStop – A Running Commentary in the October issue of the eNewsletter, Tandemworld.Net and the slight variation I made on my earlier forecasts. Gone is the pursuit of a hypervisor capable of supporting NonStop, and the availability of hybrid clusters in a box … New are the observations of a NonStop server becoming a smart controller!” Could we see NonStop as a participant in a new golden age?

The fact that I continue to speculate about the future of NonStop is a clear sign that I truly believe there is a future for NonStop. Of late, I have a strong sense that the industry is turning ever so slightly and pursuing a course where the capabilities of NonStop will come to the fore. As Road and Track editor, DeLorenzo, wrapped up his column, he suggested (and here it’s easy to substitute NonStop servers for automobiles), “the key to making automobiles once again the symbol of progress is being able to make these new-era vehicles different from what has gone on before.”

From the time I first heard of the demonstration that was given at HP Discover by members of the Americas’ NonStop Solutions Engineering Group (ANSEG), where NonStop was shown running a typical internet application (written in Java) – specifically, the Pet Store application – and where load conditions could be triggered that led to a CloudBurst: that is, selected transactions being pushed out and away from the NonStop server and onto commodity-based Clouds (both private and public, e.g. Amazon) I was shocked!

How could this be? NonStop providing oversight of transactions to the point where even when they were no longer present on the NonStop server, they were still somehow connected. As processing returned to normal levels on the NonStop, the processing of these transactions returned to the NonStop.

Pulling back the layers of software involved and talking to the Team, I was to learn that this new capability had a lot to do with what was now available with Pathway, or TS/MP V 2.4, to be more precise. As someone who has enjoyed a lengthy association with NonStop for many years, I have known of process pairs, persistent processes, and Pathway but I am the first to admit that I didn’t put it all together with quite the effect that some very clever folks within ANSEG did. As Justin Simonds, a member of this group, was to tell me later, “GuardianAngel was really just a combination of capabilities that leveraged an API that we developed, some standard open-source techniques, and, of course, Pathway.”

The new TS/MP V2.4 (Pathway) provides a Domain capability for load-balancing and distribution of workload in support of Pathway server processes across processors and server instances. With this added capability, Pathway can distribute instances of an application within a single processor or CPU, across multiple CPU’s, and in particular, to any CPU within any node within a cluster. But the way it went about supporting this opened the door for yet one more capability, and with the introduction of the API that was developed in support of the demo, instances of the application could be invoked on platforms other than NonStop.

The GuardianAngel API was crucial to the CloudBurst demonstration. A small, lightweight Pathway “Gateway server” where “half” the GuardianAngel API resides, pushed the selected transactions out onto two Linux systems. As part of the demonstration, even the resources available in this ‘private cloud’ (Linux) were exceeded, so Pathway, via its GuardianAngel Gateway server, called up resources on a public cloud (Cloudburst).

For the demonstration the public Cloud instances were pre-loaded to avoid public server start-up time (2-8 minutes) however I’m told they could have been started via Amazon or Rackspace API based on a NonStop threshold having been exceeded. As a final demonstration one of the Linux system ‘fails’ and its load is handled by NonStop till it recovers – so Pathway instances using the same code base are running on Linux, in a public cloud and on NonStop all at the same time under the control of Pathway – talk about hybrid!

For those attendees viewing the Pet Store application seamlessly shifting from the cloud to NonStop and back to the cloud, according to Tom Miller and another member of ANSEG “it was jaw-dropping for those watching and who were unaware of the capabilities of NonStop!” The promise this brings to the Enterprise is mind boggling, in my opinion. For some time I have been fumbling around looking for the right way to express some very basic concepts and the more I watched the demo, the more I saw how more advanced this reality had become.

Here is another key observation: all who saw the demonstration on the HP stand, on the floor of the very busy and noisy exhibition hall, stayed glued to the screens for more than half an hour and each came up with new implementation concepts pertinent to their own business. Looking ahead, the team within HP NonStop is seizing upon the early enthusiasm and holding workshops and developing deployment scenarios.

“We have had amazing interest in the capabilities of NonStop when it comes to integration with cloud services and also, for point cross-platform business applications,” explained Keith Moore, another member of ANSEG. “Since the 2011 HP Discover event, we do 2 – 4 live real time demonstrations to customers per week many of which lead to continuing discussions about how NonStop can help deliver ‘the fundamentals’ to off-platform current and future applications. ANSEG believes that the basic ideas and implementation that we have demonstrated can help in other areas across the greater HP product suite as well as with other common business deployments.”

For me, this is starting to look like hybrid computing done right – some configurations of NonStop with Linux, for instance, could certainly prove appealing even among the more hardened mainframe community! As more use cases are uncovered, perhaps nothing stands out more prominently for me as having a database on NonStop, as scalable as it is available with NS SQL/MX, and low-value transactions being dispatched into the cloud, all managed by Pathway.

This project didn’t just suddenly appear overnight; it has its roots deep into earlier projects within NonStop development. With many code names and with several early appearances, it really did take on a life following the release of the latest version of Pathway. But for me, it truly does tie-in with the thoughts I have been having for some time about NonStop becoming a smart controller. Perhaps not the most glamorous of tasks, but as enterprises hasten to deploy clouds, deploying NonStop as a controller overseeing it all, has a lot of appeal for me. Its Safety, and Assurance, with a capital S and a capital A!

There’s no escaping that this is a part of the NonStop history, too. After all, NonStop really did achieve its initial break-through when it was a smart front-end to mainframe computers, servicing large networks of ATMs and POS terminals. For me, GuardianAngel is a return to what NonStop has always proved effective at doing; shielding imperfection behind a level of availability simply not matched in any other manner. For business, this is something that’s exciting and is now out there, demonstrable; this genie will be impossible to put back in the bottle and with the strategy of HP so tied to clouds, will prove difficult to ignore.

Then again, it’s not quite like a return to the past – the commodity-based NonStop server we see today is far-removed from what we worked with two or three decades ago. Modern NonStop Server blades are proving that costs can be taken out of the NonStop Server platform and business is already capitalizing on this most recent development within NonStop.

The opinions expressed by Road and Track’s editor DeLorenzo remain as valid when applied to NonStop as they are to automobiles, and to paraphrase: “when you look at the levels of performance available relative to what (computers) have historically cost, we are living in a golden age.” For the NonStop server, GuardianAngel will become highly visible and our appreciation of clouds may never be the same!


Keith Dick said...

This is the first description of that GuardianAngel demo that I have seen. Thanks for writing it.

It does raise a few questions.

One of the things you said is: "... so Pathway instances using the same code base are running on Linux, in a public cloud and on NonStop all at the same time under the control of Pathway ...". Did you really mean to say that Pathway instances run on Linux and in a public cloud, or did you mean that Pathway server instances were running there?

How is it arranged that the servers running in all three places access a common database? Arranging to have servers running on Linux and a cloud service like that is a nice trick, but if they aren't sharing a common database, I'm not sure it is very useful.

Hiding the time it would take to start up the overflow servers on the public cloud service seems to be a bit misleading, unless the folks doing the demo believe they know how to reduce that start up time to a few seconds, but were not able to do the work needed in time for the demo. Or am I overlooking something about that?

Another quote: "... and low-value transactions being dispatched into the cloud, all managed by Pathway." I wonder whether it makes sense to have the low-value transactions flow through the NonStop system. If the system architects consider the transactions to be low-valued, they probably would not want to spend any NonStop resources on them. I imagine a lot of the cost of processing occurs before that switching point, so if they enter through the NonStop system, I imagine that recognizing and forwarding them to the cloud would cost almost as much as it would to do all the processing in the NonStop system, and so would be unattractive. And even if the low-value transactions don't enter through the NonStop system, but access the database on the NonStop system, that is putting a fair chunk of the processing on the NonStop system. Am I guessing wrong on those points, or does it actually not make very much sense to have low-value transactions touch the NonStop system at all?

Using a public cloud as an "emergency CPU upgrade" does seem like a valuable capability, as long as the implications of sending possibly-sensitive data out of the company's data center are taken into account, but I wonder a bit about the notion of designing a system to work that way under normal loads. For instance, the NonStop system might pretty quickly become a bottleneck. Or the resources needed to send transactions to the cloud might be comparable to the resources needed to process the transactions locally, especially if the database they use is on the NonStop system. Do you know how closely those sorts of things have been studied? And if it does turn out that there is not such a big win to offloading transactions to the cloud, I guess that kind of undermines the notion of using this mechanism as an emergency CPU upgrade, too.

Justin said...

Keith, As they say 'good questions'. As a member of the GuardianAngel team let me try and address them. When we say instances running on Linux, Public Cloud and NonStop it probably would have been better to say instances of serverclasses running on these various platforms. Pathway itself must run on NonStop (at least today) since it is a process-pair subsystem. Pathway or should I say Pathmon, is overseeing the serverclass instances running on the various platforms through the gateway serverclass that uses the GA-API.
Common database was the number one question in Las Vegas at Discover. For the demo we use NonStop SQL/MX on NonStop but used an in-memory database for Linux and the cloud. These were kept in sync by doing a dual write. Since SQL/MX was the only database that had update activity (buying a pet - in memory DB merely displayed info about pets) - all updates were from NonStop to the in-memory database. Since there are so many flavores and versions of database that is something we would address individually at customer workshops (but we have thought a lot about it).
I'm not sure hiding the start-up time was misleading but we can concede the point. Amazon server start-up takes 2-8 minutes. We felt 2-8 minutes of watching a bar graph eventually go to 100% would have lost us most of our audience in Las Vegas but point taken. There is no way to speed Amazon or any other public cloud up unless you have instances pre-started. Everyone suffers the 2-8 minute start-up delay. In a real situation you would have to start based on an exceeded threshold and hope everything was ready by the time you ran out of resources which is a public cloud limitation not a GuardianAngel one.
In terms of 'low-value' it is probably application dependent. You are absolutely correct running these, even as a message switch, through NonStop adds to the overall path-length. Is it worth it? Well I am of the belief that 'low-value' does not necessarily mean 'no-value'. By running it through NonStop you are guarding against a complete outage - I know that's something almost unheard of in public clouds...but still it could happen..8^) So by running it through NonStop we would have the capability of preventing a full outage and in our example, and I believe in most instances, the high-value transactions are actually dependent on the low-value. That-is would you buy a pet from pet store without first seeing it?
Final point about the NonStop application and sensitive data in the public cloud is spot on. That would really need to be evaluated before bursting any part of a NonStop application into a public cloud. However as we said in the demo - bursting could actually occur to another NonStop system not a cloud. What if, under this rare occurance where processing was exceeded that you 'burst' to your NonStop development system? Or perhaps there was an HP NonStop private cloud you could burst to? Please don't imply any future HP commitments from that statement - it is a personal question only.
Hopefully I've addessed the questions if not please post again.

Keith Moore said...

Keith Dick brings up some very good points. My reply is too big to fit in one posted reply. Please excuse my verbosity, but I am hoping that this information helps...
- Keith Moore
Hello Keith. I haven't seen your name in a long time. It's good to hear from you.

I will try and address these questions (in addition to Justin's comments). Hopefully, I won't confuse or contradict anything already said.

Pathways Serverclasses are instantiated on Linux (or Windows) under PATHMON control as-if they were local to NonStop. This allows TS/MP to manage creation/deletion, and recovery of business-level application services running on or off the NonStop platform. The value is in that NonStop is always-on and can reliably manage these services based upon performance queue metrics. I don’t need to explain Pathway to you, for sure! So it’s not PATHWAY instances, it is PATHWAY SERVERCLASS instances.

“Low-value” transactions. We have struggled with terminology here. Being NonStop biased, we consider all transactions to be “high-value”. However, as Napoleon the pig said in Animal Farm, some transactions are “more equal” than others. And in fact, some portions of some transactions, are more equal than others. The idea is that in the real world of application services, some portions of a transaction are not necessarily in need of all of the NonStop fundamentals. Most often, this is because of lack of need for tight cross-transaction synchronicity, lock management, external databases, data-free, process-heavy activity. Examples would be browse-before-buy, fraud-analysis engines, market analytics, commodity pricing lookup, and etcetera. Also, there are some COTS applications that need to integrated into transaction services that need NonStop fundamentals. Examples of this are, again, Analytics engines, SAP applications, and Oracle apps and database. We don’t mean that they are “low-value” as much as perhaps they are context-free, retry-able, or in some way not in full need of the full reliability of NonStop. Most commonly, this is COTS or context-free application access as part of a greater transaction that needs to be running on NonStop. Another common example is where, as part of a NonStop “money transaction” the application exit needs to go to a “less reliable” source to get a risk assessment or fiscal “position”. In the past, if the transaction could not get to the remote service, then stand-in logic would be required. This is still the case, but instead, the application can be designed to use TS/MP to maintain multiple instance of this service on other physical of virtual servers. With private cloud, this is likely common today except that the service management is at the SERVER level, not the application-level. This is one of the key values of NSGA.

A specific example of this is in the travel industry where a single request from a travel website (orbits, expedia, etc) can generate many transactions at the travel booking site. At a car rental agency, a single car rental discovery could include up to 20 rate/car class lookups, location hours of operations and vehicle availability. Using the NSGA architecture, an application will receive that single request and farm-out rate and availability requests to different pools of commodity servers; then formulate a single reply back. This would be all under cover of one transaction with several "lighter-weight" (or retry-able, context-free) query transactions within it. This is a common travel industry function.

Keith Moore said...

(Continued from above)
There is also a side benefit to some of the hardware that make NSGA even more viable now than this same architecture would have been on past NonStop servers. Portions of the TCP/IP stack are now "offloaded" to IP-CLIM "controllers". Becuase of this, the TCP/IP processing no longer competes against application logic for Itanium cpu cycles. This NSGA design benefits from this, as do many newer NonStop applications.

The team is well aware of the need for database synchronicity within the designs of these types of systems. This is no different than it is today with various cross-platform application implementations. Oracle replication is commonly used for non-NonStop database, GoldenGate, Attunity, Gravic Shadowbase are commonly used to manage cross-platform database synchronicity. However I think that because of the newer, more commonly used application frameworks (Axis, MyServerFaces, iBatis, Hibernate) running under MVC pattern, this data synchronization is managed more-or-less as an application access issue instead of the traditional problem of cross-platform replication. In other words, current data access frameworks are database agnostic and allow us (generic “us” application designers) to deploy to various databases and ‘assume’ that the data server will provide ACID access with lock protection. And the frameworks allow for lock requests to persist via the DAO. Fortunately, Pathway allows us to initiate and retry/recover transactions way up at the start point if NonStop is at the base of the transaction (and not subordinate to another transaction). We demonstrated a multi-database using Java POJO frameworks and Pet Store. The use of a data access object (DAO) is exactly what I describe here. The DAO abstracts the database implementation from the usage. This allows the application architect to designate how it needs to protect from failure/retry/collision.

I am a CISSP and look at these implications regularly. I completely agree that there are significant issues associated with public cloud (and private cloud, for that matter) when it relates to movement of data, and ownership/responsibility of sensitive information of all types. It is not part of our demonstration to address this issue directly. However, we have all been aware of it as we demonstrate this capability.

We did not hide the delay in startup of the actual virtual server. We just decided for the demonstration, not to show it. In effect, this is a CREATEDELAY activity and would be deployed just like this old-fashioned “NEWPROCESSCREATE” calls that we made on our old TXPs. I do recall recommending that we minimize how often we do this because often a transaction would queue waiting for a NEWPROCESS to occur. This is an extreme version of this same create problem. Creating a new virtual machine does take some time. But it does work (with about an 8 minute delay in a public cloud). Also note that the pathway still does the DELETEDELAY (dissolve) the servers when not needed. The most painful part of our show demonstration is when we have to wait for the “DELETEDELAY” to happen at the end of the demonstrations. This is representative of why we chose to just use existing servers that we created in the cloud. Also note that the typical public cloud pricing model allows us to leave these running for a long time with only pennies of expense. In the real world, I would assume a mix of these services; some pre-booted, and some not.

The design of applications that use this NSGA approach will assume that the application architect has researched and designed a service-based business implementation with varying service level agreements (SLAs) for the components. Armed with this design, the NSGA methods of use for TS/MP can significantly improve efficiency of most of the various server types where the business services will run.

I hope that this helps.
- Keith M.

Anonymous said...

Copied from the LinkedIn group, Pyalla Technologies:

You said, "When we say instances running on Linux, Public Cloud and NonStop it probably would have been better to say instances of serverclasses running on these various platforms".

Just to clarify, is it correct to assume that the Linux version of the serverclass simply uses stdin / stdout to receive transactions and reply to them? Are there any Guardian Angel-specific api calls that must be made, analogous to $RECEIVE READUPDATE / REPLY?

Neil Coleman,
Infrasoft Pty Ltd

Keith Moore said...

We have a very lightweight portable application programming interface (API) that allows inter-system messaging for the purposes of this capability. For example, when we demonstrated the Java Pet Store application, we were able to move Java code (binary jar files) as-is from system to system to allow java frameworks to abstract the access calls. In other words, using Java, we have to only write a portable data access object (DAO) that used these "GA API" calls. That was the only change necessary to make Java Pet Store software exploit this capability.
I hope this helps.

Justin said...

Yes you are correct the Limux serverclasses receive and reply through standard stdin & stdout. The API does not include calls to handle $RECEIVE - that is handled by the Gateway Server portion of GuardianAngel.