For those who have hit the high seas with the intension
of sailing to a faraway land, there are many elements that can contribute to
missing arriving at your harbor. Sail one degree off course then for every 60
miles you sail you will miss your target by one mile. Not much, or so it seems
and yet, sail around the world following the equator and you will be off course
by 500 miles.
While flying between Sydney and San Francisco during
the year I commuted between the Tandem office in North Sydney and the Cupertino
campus, there was a time when I received an invite to enter the flight deck.
Those were different times when security wasn’t an issue and on the Queen of
the Sky, that venerable Boeing 747.
Imagine my surprise to see the first officer pulling
out his sextant, sighting a planet as it rose from the horizon and then
checking with the aircraft’s inertial navigation system remarking as he did so
that he was right on course.
It shouldn’t come as a surprise then to know that the
NonStop systems of today have been tracking industry trends for decades. Not
just being a part of the technology journey that continues unabated but closely
tracking to real world customer requirements. If it wasn’t for the uncertainty
of 1970s technology then few of these customers would have wanted to invest in
fault tolerant computers, but they did. And for good cause!
The world was going real time, 24 x 7 and direct
business interactions with end users couldn’t afford to be offline for any
cause. Those early real time applications involved financial transactions and
the timely movement of goods such that the business suffered whenever outages
occurred. NonStop systems made the unreliable, reliable! NonStop systems
successfully addressed the business requirement for greater availability when
built on unreliable 1970s technology.
But change is inevitable, as I have posted throughout
the past decade but what is really having an impact on business is the velocity
of change. It was only a short time ago when building out proprietary server
farms that virtualized everything was the fashion and today the cloud
experience has entered the virtual conversation.
That isn’t to say, virtualization is out of favor but
rather with virtualization yet another layer of technology has arrived
overarching the physical and virtual building blocks we have assembled to date.
The complexity just ratcheted up further and you don’t need to turn to your
sextant for a better sighting of where this is leading or to understand you are
now straying off course.
And this is the course correction that of itself has
the potential to set NonStop towards a vastly changed horizon. There is a
strong wind coming in fresh and at its center is the cloud. Yes, as cloud
service providers continue to struggle to provide the guaranteed level of
availability we all assume today with NonStop, there is an opportunity to move
from supporting unreliable computers and even error–prone virtual machines /
hypervisors, to where NonStop can treat clouds as nothing more than another
grouping of processors. Think AWS as Cloud0, another AWS as Cloud1 and perhaps
a private cloud as Cloud3 with Azure, the home to Cloud4.
This may be premature to consider today but in talking
to HPE NonStop personnel, the day may be not too far away. When I posted
earlier of there being two opportunities for NonStop with one being NonStop playing
a guardian role – no pun intended - that was the
subject of my previous post, What’s
not broken and just keeps on running? NonStop delivers! The other
opportunity focused on
treating the world of clouds as no different to either converged NonStop
processors or virtual machines. If you think back to the origins of NonStop
this isn’t too radical an idea even as there are obstacles to doing so effectively
today.
Clearly, to
those who have been around NonStop for a very long time it all comes down to
the interconnect fabric. Can I reliably deliver content between different cloud
instances even when they are hybrid? Could I depend upon AWS and Azure to give
me access to something that would work in this regard? Could there be a simple
answer.
As
one of NonStop’s leading technologist said, “The answer is ‘yes.’
We can do inter-cpu failover across ‘clouds’ if you define the ‘clouds’ part
correctly. I always look at NonStop as a closely-coupled cluster of
operating systems. This means that we are actually doing this now when we
span VMware vSphere bare metal instances (which we do all of the time for
hardware fault resiliency). So if you define a cloud as a single instance
of vSphere (VMware orchestration tool for esxi hypervisors), then we already
can span clouds.”
However, I am looking to go even further, even as I am
hoping to attract the attention of at least one of the bigger NonStop users who
has already decided to become more engaged with cloud providers. Moving the
needle within the NonStop development community after all accelerates whenever
there are interested NonStop users actively looking for a solution. Where I
want to take the conversation is not just between instances of VMware within
say a rack or server farm but rather when it involves communicating over much
longer distances.
Latency together with an interconnect fabric that wont
let us down under any condition. “As with all things we have done for 45 years,
the level at which NonStop can provide a viable offering to fail, grow,
recover, retract services, is dependent on the current latency limitations to
communicate amongst the components. This has been true for all renditions of NonStop
systems when it comes to CPU interconnects,” said previously quoted NonStop
technologist.
Think about this for a moment he then suggested.
“Expand has a similar requirement for wide-area network (WAN), but it is much
broader and more malleable in deployment. The reason NonStop has never
had the intra-CPU (processor-to-processor without Expand) using a WAN is
because of the latency, as has been noted. Think in terms of
NonStop’s ability to leverage the legacy FOX (Expand using fiber and DMA) all
the way through to now where it leverages InfiniBand for NonStop
clustering. All of these require low-latency guaranteed delivery.
This is why NonStop currently requires a closed RoCE-enabled network for its
vNSK CPU interconnect – it’s about guaranteed response time and message
latency.”
Before you begin to sigh and think that this was a good idea while it lasted, the industry continues to move on and one of the biggest issues being addressed is the never-ending search for greater speeds over the global WANs that tie businesses together. New networks are becoming available where switching is moving toward 200Gb and 400Gb in 2022-2025 timeframes even as today, from a practical usage point of view, we max out just below 100Gb. But it’s coming at some point with even faster speeds possible beyond 2025. This kind of networking would have been unimaginable to those first NonStop engineers back in the late 1970s.
However,
this isn’t all that is needed as there are other considerations to be made. “In
order to accomplish this, NonStop would need to leverage QoS internet services
requiring low-latency packet switching and also leverage some HPE technologies
… for intelligent, guaranteed packet routing,” my technologist said. As for the
good news? “All of this could definitely be done. It’s not even something that
requires new technologies”
As I just wrote, all it takes is for a NonStop user committing to clouds coming around to the idea that traditional NonStop take over can occur across disparate cloud services to where having multiple cloud vendors supporting mission critical applications, the service levels can be guaranteed to where there would be no difference to running these same mission critical applications in-house on a traditional NonStop system. Imagine that; even as these different cloud providers may not be all that ready to provide access to critical networking components, a strong enough NonStop user backed by HPE could swing a more favorable response than otherwise provided.
As a technology, NonStop
has never been fixed in time. That is one reason why nearly five decades later
it is still as relevant today as it was back then. We may have moved on from
clusters of real CPUs to clusters of virtual machines and making that stretch
play to clusters of clouds doesn’t seem all that far-fetched to those who spend
time considering what might come next for NonStop.
There are reasons why
course corrections are made. For the most part, it is as a result of external
forces. In these times when IT professionals ponder the potential for cloud
support of mission critical applications, surely the time is right to steer
NonStop towards supporting clouds as readily as it has real CPUs and virtual
machines. When you consider networking has never proved to be a barrier over
the long term, who will be the first to demonstrate the levels of availability
over their cloud deployments as we have all come to expect with NonStop. Will
it be you?
Comments