Glitches, the norm?

Still in Sydney – but the headlines of the past few weeks have been bothering me somewhat. Have you seen all of them – it looks like computer glitches are hitting us hard! I have included a picture of Sydney in case you aren't sure what it looks like!

In my August 27th blog posting – Is 30 mins too long? – I remarked that “I have little patience for any retailer of financial institution that skips on their infrastructure”. But now I am seeing whole sectors of the community being affected. I have to start wondering – are we becoming desensitized to all of this?

What caught my attention was the headlines here last week – well, actually, a small article in one of the financial papers – “Glitch shuts out Westpac online customers”! It turned out that about 30 percent of the bank’s 400,000 internet banking customer could not access the (online banking) service. The paper I was reading went on to add that according to Westpac, “it appears to be related to an internal systems error which we’re still trying to isolate” and then added a comment that the bank wasn’t sure whether this was related to a recent website revamp at the bank.

Now, in isolation, this would have just been something I read and had a brief chuckle about. But unfortunately, I had only moments early read on my blackberry about Barclay’s having a big problem in the UK that forced them to borrow 1.6 Billion Pounds. According to news@finextra.com “Barclay’s blames technical glitch for 1.6 Bn Pound emergency loan”! A problem with the link between its electronic settlement system and the CREST settlement house on Wednesday broke down … for an hour!

Going back to my August 27th blog posting, you may recall that I mentioned, in passing, that Wells Fargo had suffered a serious outage on the West Coast that not only affected ATMs but major portion of the branch banking business, as well. I just went back and googled the Wells Fargo outage and the first link I was directed to was something called SFGate.com and the heading simply stated “Wells Fargo ATM, other glitches last longer than first reported”. The report also put the timing in perspective as well, when it added “Well’s computer glitch came at a poor time for nervous banking customers, considering the recent turmoil in the mortgage and stock markets.”

I began to look at this after I met with a former colleague of mine, Dieter Monch. Dieter was the Australian Managing Director of Nixdorf Computer when I worked for Nixdorf, back in the early ‘80s. Dieter is an investor, and now manages, the company that sells red-light and speeding cameras around Australia. He recently attended a state government presentation that asked potential vendors to look into providing a camera network that wouldn’t fail – borrowing words from NASA, failure was not an option. Dieter simply, and I have to believe, politely – how much are you prepared to pay?

Now, I am not all that sympathetic to the loss of a speeding camera – and the revenue opportunity missed. I don’t think many of us are – and don’t look positively on this form of revenue generation. But looking at it from a different perspective – if these were cameras tracking vital security operations and went down at the time a key illegal or terrorist activity was being executed – then I can see a time in the future when even these types of networks just have to remain operational at all costs.

So, glitches and their implied outages, as well as the implications of lost revenue, are beginning to show up across all industries and markets. So we are taking the issue pretty seriously, and we seem to understand the problem. But with the news coverage I have seen over the past couple of days – I am not sure how seriously we are taking the fall-out from today’s glitches. Surely, the loss of credibility in a marketplace of 400,000 as was the case in Australia, or millions I would have to believe in the US – as well as the real cost in terms of interest on the short-term borrowing of 1.6 Bn Pounds is pretty serious. Again, have we become desensitized to the issue of computer glitches? Has the term become an easy way out – a catch-all phrase to cover up any infrastructure stuff-up we may make?

Do we aggressively promote the value of applications and data bases that survive single (and now, multiple) points of failure? Do we explain how all this works and the value we can provide? Or, do we simply leave it to others – the comms guys? the web server guys? to explain why an element of the infrastructure failed?

Do we still believe that some subset of these applications are so fundamentally important to us that we view them as "mission critical applications", and ar we prioritizing and routing these "mission critical transactions" to a platform that is orders of magnitude more reliable than the other servers we may have deployed?

While we, as users of NonStop, have come a long way in removing many sources of outages – how strong a voice do we have in other areas of infrastructure? And are we still strongly advocating NonStop in support of mission critical applications, or have we elected to just to sit back and watch as less reliable platforms siphon-off these transactions? In other words, have glitches become the norm and have we reached a time where it’s OK to simply explain away a service interruption to the dreaded glitch?

The folly that was Tandem Computers and the path that led me to NonStop ...

With the arrival of 2018 I am celebrating thirty years of association with NonStop and before that, Tandem Computers. And yes, a lot has changed but the fundamentals are still very much intact! The arrival of 2018 has a lot of meaning for me, but perhaps nothing more significant than my journey with Tandem and later NonStop can be traced all the way back to 1988 – yes, some thirty years ago. But I am getting a little ahead of myself and there is much to tell before that eventful year came around. And a lot was happening well before 1988. For nearly ten years I had really enjoyed working with Nixdorf Computers and before that, with The Computer Software Company (TCSC) out of Richmond Virginia. It was back in 1979 that I first heard about Nixdorf’s interests in acquiring TCSC which they eventually did and in so doing, thrust me headlong into a turbulent period where I was barely at home – flying to meetings after meetings in Europe and the US. All those years ago there was ...

Anonymous said…

Great post! Actually, I DO think the financial sector is starting to take it more seriously than ever, but I think it shows just how complex the problem space is.

It goes beyond hardening even the subsystems that you mentioned (processors, database, web/application servers, etc.). That's necessary but not sufficient (as RT Writer would say). The organization has to 'plan for failure' in establishing operational processes/procedures as well. That means taking the time to draft the procedures, but it also means doing 'just enough' testing of those to validate them.

Of course, the costs of that are hard to quantify. The value proposition of the enterprise class systems (NonStop one of the leading contenders there) is that at least you can remove the hardware/OS/database from your list of worries!

September 13, 2007 at 6:21 AM

Real Time View

Search This Blog

Glitches, the norm?

Labels

Comments

Popular posts from this blog

The folly that was Tandem Computers and the path that led me to NonStop ...

HPE NonStop: Is the wait over? Will the legend live on?

An era ends!