Skip to main content

What’s not broken and just keeps on running? NonStop delivers!

As I look back at the past year perhaps the best way to describe it was how there were many times where things were broken. I am not talking about the occasional drinking glass or dinner plate but objects that mattered to us. When things are broken their performance is hindered to where in some cases they no longer serve any useful purpose. Replacement appears to be the order of the day.

Mind you, I am not talking about the weather here in Colorado as I have lost track of how many days have passed without a meaningful snow fall. This morning may have been the exception as a little rain together with what might pass as a little sleet did fall but not for long enough to stick to anything. Enjoying December days that have climbed into the 70sF (20sC) would surely pass as unusual even if we didn’t talk about weather patterns that look to be broken.

However, before developing this story line further, please be happy for us as remedies for almost everything have been found. But what was broken? This time last year Margo broke her leg badly where the remedy happened to include the insertion of metal rods and nails. In summer our Range Rover was rear-ended on the freeway and the insurance company wrote it off. That new sectional we had waited for did finally show up but the central portion of the sectional had been scratched in transport.

When it comes to IT and to the digital transformation and the pivot to everything-as-a-service, it’s hard to make light of the fact that the role clouds are playing isn’t proving as rock-solid as promoters would have you believe. Not for them is an outage here or there something for us to worry about, but when a major cloud services provider like Amazon Web Services (AWS) breaks then yes, we should all be concerned. In fact there were enough outages for CRN to publish The 10 Biggest Cloud Outages Of 2021 (So Far). As for the tag line, it was rather long but managed to sum up the predicament of many affected at the time:

“‘Outages can mean the end for companies, depending on their choices in design and deployment, or they can be complete non-events,’ Miles Ward, chief technology officer at Los Angeles-based Google partner SADA Systems, tells CRN. ‘Cloud has changed the nature of outages.’”

But then, CRN highlights something that should warm the hearts of many in NonStop, particularly at this time of year. Consider it your early arrival of your Christmas gift:

“‘Every cloud engineering team has seen how impossible it is for customers to engineer around these kinds of outages and is working hard to distribute, subdivide, and make fault-tolerant these central services,’ Ward said.”

Given that this article by CRN was published back in late July so missed reporting on the big AWS outage it’s worth noting that among the top three worst outages were:

In third place – Fastly Outage in June. “Fastly impacted bulletin board website Reddit, video streaming service Twitch and a number of news sites including CNN and The New York Times.” Among the comments reported at the time by CRN was this particular gem:

“Michael Goldstein, CEO of LAN Infotech, a Fort Lauderdale, Fla.-based solution provider, told CRN at the time that the global outage shows how critical it is for customers to properly architect their cloud and on-premises network.

“‘Cloud isn’t any different than on-premises—with both cloud and on-premises you need to make sure you have the right architecture,’ Goldstein said. ‘We make sure that when we put mission-critical applications in [Microsoft] Azure for our customers we have multiple data center regions to prevent an outage like this. You need a fail-safe and a continuity plan to prevent outages.’”

Rising to second place and given the generalized heading of More Microsoft Issues this time it centered on issues to do with Microsoft Teams. Apparently, “Teams’ calling service sent calls straight into some users’ voicemails.” Now depending on your level of tolerance of virtual meetings this may have been a blessing in disguise, but in reality, it really all came back to issues with the infrastructure, according to Microsoft via updates provided by the Microsoft 365 Status Twitter account:

“…Microsoft ‘isolated a recent change that has caused portions of infrastructure to send some Microsoft Teams calls straight to voicemail.’”

But then, one Microsoft partner, Amaxra, according to its president and CEO, Rosalyn Arntzen, told CRN that “over the past few years, Microsoft had gotten “dramatically better” at updating partners “as soon as they are aware of an issue and listing when they expect the issue to be solved—or at least provide a status.”

Coming in with the blue-ribbon winning outage of the year (so far) was the Akamai Outage, June 17. Remember this outage? Turns out it happened “Nine days after the Fastly outage, (where) a system issue with Cambridge, Mass.-based Akamai Technologies caused internet outages for global airlines, banks, and stock exchanges. The company saw service disruptions for its hosting platform, which helps defend against Distributed Denial-of-Service (DDoS) attacks.

The way CRN reported this outage was to highlight that:

“The disruption affected several large companies around the globe, including Southwest Airlines, United Airlines, Commonwealth Bank of Australia, Westpac Bank, and Australia and New Zealand Banking Group, as well as the Hong Kong Stock Exchange’s website. Services for many of the companies impacted were restored within the day.

“Downdetector.com showed spikes in complaints about service outages for websites of companies inside the U.S. as well as in a number of other countries including Australia, Germany and India.”

And remember among the also-runs was the outage at Verizon that reports blamed on a fiber cut in Brooklyn, but that was later confirmed as being “a software issue triggered during routine network management activities.” And then there was the issue at Google when “The Google Drive cloud storage service—and associated cloud apps including Google Docs and Google Sheets—suffered multiple service issues … While users could still access Google Drive, affected users could not create new documents and were ‘seeing error messages, high latency, and/or other unexpected behavior,’ according to the company.”

And there you have it: The myth of the infallibility of clouds. Amazon, Microsoft and Google. Of course, it was left to Larry Ellison to capitalize on their circumstances by virtue of his claim that Oracle cloud didn’t fail. Surely, you cannot be serious, Larry?

For all the upside associated with capitalizing on cloud services there is still the fundamental issue that resilience and indeed reliability of levels we associate with NonStop are simply mythical. Fail-safe continuity and indeed fault tolerance for “central services” is being openly discussed even as we know that with todays’ modern languages tools and services there is a lot that can be done to deliver a kind of pseudo fault tolerance. To think that all those years ago, the original Tandem Computers understood the issues better than any other vendor.

And yet, when those cloud services' vendors, providing the underlying infrastructure and most important of all the networking and integration services get it so hopelessly wrong, how can users deploying mission critical applications know for certain that these services will always be there, 24 x 7? The reality is a lot more sobering; they cannot provide anything close to ironclad guarantees. There is a reason why NonStop continues to thrive four decades after being first introduced; it’s fault tolerant in so many ways that it should be hard to ignore it’s contribution to cloud computing.

I am not entering into this conversation lightly. However we aren’t discussing how to best fix a broken toy of which there will be many reports over the holidays. Two opportunities come to mind that in the coming months I will be exploring in more detail. And they have to do with how we think about NonStop going forward and whether our own ideas about the role of NonStop may indeed be outdated.

There is the potential to have NonStop play a guardian role – no pun intended. Should there be a central NonStop essentially polling the hybrid multi-cloud environment common today among enterprises so that exposure to any one cloud can be marginalized to where outages have no impact on the running of mission critical applications? This is clearly an over simplification but there are models that feature NonStop in this way that readily come to mind.

There is also the potential for NonStop itself, virtualized as we now have the option to deploy NonStop, treating the world of hybrid clouds as no different to either converged NonStop processors or as virtual machines. Consider one cloud as being CPU0 and another cloud as CPU1, etc. and you get the idea. This too is clearly an over simplification that perhaps throws a spotlight on the capabilities of the cloud services providers interconnect with each other, but the idea is still simple in principle. A single image NonStop system spanning multiple clouds, with the ability to perform its industry-leading take-over whenever a cloud misbehaves?

Once we get past the idea that yes, like real CPUs and even Virtual Machines, clouds are just as unreliable then the future of NonStop will warm to the opportunity this represents. The mere fact that one publication is already producing an annual Top 10 Outages article should be evidence enough that enterprises need to more seriously consider what the cloud experience really entails?

For Margo and me, this is just the beginning of a theme that we will revisit in 2022, so stay tuned. But again, the items that broke for us in 2021 have all been addressed and having said that, can you all say the same about your own hybrid IT and its supporting infrastructure? Even as we wish you the very best for the coming year perhaps it is time to ponder that ultimate question about NonStop: When did availability ever not be the issue of the day?

Comments

Popular posts from this blog

If it’s June then it’s time for HPE Discover 2021.

  For the NonStop community there has always been an annual event that proved hard to resist; with changing times these events are virtual – but can we anticipate change down the road? Just recently Margo and I chose to return home via US Highway 129. It may not ring any bells, but for those who prefer to call it the Tail of the Dragon – 318 curves in 11 miles – it represents the epitome of mountain excitement. For Margo and me, having now driven the tail in both directions, driving hard through all these turns never gets old. Business took us to Florida for an extended week of meetings that were mostly conversations. Not everything went to plan and we didn’t get to see some folks, but just to have an opportunity to hit the road and meet in person certainly made the 4,500 miles excursion worthwhile. The mere fact that we made touring in a roadster work for us and we were comfortable in doing so, well, that was a real trick with a car better suited to day trips. This is all just a p

Three more wishes coming soon – the path ahead for NonStop.

So, another three years have passed by and I find myself writing a preview of what I will likely focus on in eighteen months’ time – my next three wishes for NonStop! It wouldn’t be fair on my family if I said 2019 had been a routine year for Pyalla Technologies. It started with the return flight from Sydney, Australia, and continued with three separate trips to Europe plus a lengthy road trip to Las Vegas for HPE Discover 2019 combined with stops in southern California and participation in N2TUG back in Texas. The miles have added up but all the while even as the adventurous life continued to unfold, there was so much news coming out of HPE that scarcely a day passed without a discussion or two over what it all means. Margo and I have our roots firmly anchored in NonStop, dating back to Tandem Computers where Margo had risen through the development organization all the way to the COO role under the stewardship of Bill Heil when Bill headed the NonStop Software BU. As for me

ACI Strategy - it's all about choice!

I have just returned from spending a few days in Omaha attending the annual ACE Focus meeting. These two day meetings provide more in-depth technical coverage than is usually found at the regular ACI user events, and ACI customers have been coming for more than a decade to hear the messages directly from company executives. The picture I have included here is of the venue of the Wednesday night social event – a reception held at a local sports bar called the ICEHOUSE. And I found this extremely ironic as my own involvement with ACI came through my association with the ICE product. For most of the ‘90s, ACI had been the global distributor for ICE and then, as we began the new millennium, ACI purchased Insession, creating a separate business unit that it named Insession Technologies. For nearly six years, as part of ACI it enjoyed a successful partnership with the NonStop community and had provided a number of solutions in communications, web services, and security. But the decision in l