Skip to main content

Software glitches!

In my posting of last week I depicted a fuse box that had been reworked to bypass those pesky fuses that kept popping. In that last post I observed how frustrating it can be when services fail and we can no longer do what we had planned on doing. However, while I could sympathize with whoever it was that eliminated the need for fuses, I also suggested that this was not one of the finest examples of thinking outside the box. Unfortunately, there are other times where we just need a temporary fix, and the photo above is of just one example – the collapsing awning held in place by two appropriately positioned crutches!

This picture was taken by the same contractor who came across the fuse box and was kept as a reminder of poor decision making. The symbolism of propping up something with crutches didn’t escape me either. There has been many times where I would have gladly accepted a pair of crutches to help me rectify a deteriorating circumstance irrespective of the foolishness that it may have represented. Also symbolic is the real estate agent’s lock box attached to the right side railing – I have to wonder how enthusiastic any prospective buyer would be to enter the dwelling once they had noticed this quick fix!

This weekend saw me back on the race track in the Corvette, continuing with my driver education and laying down laps. The venue was the Auto Club Speedway in Fontana, California. Only a few weeks before NASCAR had held its second event of the year on this track, and it is, without doubt, the premier circuit we visit each year. It’s high banking, long straights, and a demanding infield road course thrown in for good measure, make the track quite challenging. It is a circuit that puts very high demands on any car and it’s not for the weak-of-heart to roll out of the pits and stand on the gas pedal. Speed rapidly builds on this track and with concrete barriers everywhere it’s not all that forgiving.

Three laps into my first session the automatic gearbox elected to stop shifting. It’s happened a couple of times before and I can usually free it by returning the selector to the full automatic position and forego using the steering wheel mounted paddle shifters. Not this time – the car remained firmly stuck in third gear. Exactly the same thing happened during the second session and I was very frustrated. Unlike previous occasions, however, the dreaded “check engine” light didn’t come on and there were no codes generated for later analysis.

“Gremlins!” I was told by Dave, my local GM service manager. “We checked the data base and there are no reported symptoms, and without any error codes stored away in the car’s computer, there’s nothing to look at.” Repeating the first observation, Dave then added, “I would suggest you have an intermittent gremlin, and that there’s likely a bug in the transmission software!” The fix for this, and it’s been done twice before, is to flush the memory and reload the base program, so next time the car is back for a service we will have to go through the process one more time. Unfortunately, the temporary fix is to just drive the car in full automatic mode – and for a track-ready Corvette, that’s pretty close to having it hobble around on a pair of wooden crutches.

Software glitches?

When I came to Tandem Cupertino as a Program Manager it was my first time on a major computer vendor’s campus, and the number of development teams working on Tandem hardware and software was a real eye-opener. Coming from much smaller companies with less than a hundred employees, seeing literally thousands of developers engaged in implementing everything from operating systems to compilers to data base and transaction processing infrastructure software was a little overwhelming at first. Cross-functional core teams kept a semblance of order through all the chaos, and attending beer busts on Friday was always as instrumental in ensuring visibility for your program as it was about the beer and popcorn!

The fault tolerant Tandem was not a product anyone wanted to see crash. A “downed” Tandem was cause for immediate executive concern, and Tandem’s Critical Account team had been established to monitor any customer situation where the Tandem systems were experiencing difficulties. Providing a work around or a temporary fix wasn’t unheard of and many a time I witnessed the frenetic activity surrounding the quick generation of such a fix. Unlike other systems, Tandem was designed to fail-over whenever it suspected any individual processor was experiencing difficulties and for most situations, this worked very well. In talking with field engineers I often heard how customers had been running with a processor offline, its workload picked up by other processors, without the customer aware of any problems.

The Quality Assurance (QA) teams within Tandem development were among the most diligent technical staff on campus and they delighted in doing nothing more than beating the very stuffing out of any newly developed product. They took a lot of pride in ensuring that products that made it out of QA rarely generated failures in the field. But with so many products, Tandem still needed a way to ensure conflicts and incompatibilities didn’t arise among combinations of these products. Particularly when layered they formed complete stacks – did a Tandem stack of SNAX, with Pathway, TMF and NS SQL all worked as specified and could EMS events generated out of each layer in the stack provide the necessary insight as to what was happening above and below the layer. Did it all hang together and worked in harmony?

To ensure quality across the whole system Tandem invested in building the Gremlin Test Center. Under the management of John Merrick, I recall, this included a number of Tandem systems configured with different releases of the OS and stacks – some SNAX, other’s TCP/IP, some WAN-centric, some LAN-centric, and each with different combinations of Enscribe, SQL, etc. The test center represented a sizeable investment for Tandem and it had been provided with a number of working solutions to further ensure the OS, infrastructure, middleware, and all the supporting management tools worked well together. There was little that ever frightened any Program Manager more than being advised that his program was scheduled next for tests on Gremlin!

In today’s heavily consumer-focused technology marketplace this level of testing has proved too expensive to maintain, and younger generations have grown up completely at ease with situations like the dreaded blue screen of death, a la Microsoft. Modern development tools have certainly cut down on the number of deterministic bugs that make it through the development cycle. Test tools have become very sophisticated and as the adoption of industry-standard components increases, so too does the access to multiple test tools.

Forward-thinking software houses these days don’t leave the responsibility of catching every bug with the QA group – just as with Gremlin in the past, I am seeing these companies pass tested, and proclaimed “QA OKed” solutions to support organizations for a period of intense “thrashing!” Off-the-wall usage scenarios can often uncover some of the hardest-to-find non-deterministic bugs! Maybe not quite as regimented as was the case with testing on Gremlin, but effective all the same. Anything at all that can be done to ensure bugs do not make it into releases and onto customer’s production systems is aggressively pursued and customers are quick to recognize those companies that take these extra steps.

Even the best software houses will never find every bug in a complex software offering. And customers will never receive the “bug-free” release that they may believe is only one or two releases away! In the late ‘90s a customer did go so far as to suggest that they would prefer to wait for the bug-free release, and I was called upon to go into detail that this was unlikely to ever happen. But today, a decade and a half later, even as software houses have become more proficient in weeding out troublesome bugs before they ever make it into a release, many customers harbor a dimly-lit hope that bugs are now a thing of the past.

It’s a testament of the effort exerted by the development teams at Tandem that the quality of NonStop products was as high as it was. With thousands of systems deployed, there were only ever a handful that experienced difficulties at any given time – mostly, a combination of new solutions as well as untried interfaces as well as variations in regional networking protocols. It was reassuring to know that when such failure occurred Tandem’s Critical Accounts team would get together to find a way to provide a quick fix or workaround that would at least prop up the customer’s system “with crutches” until a permanent solution was ready.

I’m not all that sure that I will be able to shake loose the gremlin in my car’s transmission, or be free from worrying about having just one gear. Thank goodness that driving an American car, all I have to remember is that there is essentially an , , sequence for restarting my transmission’s processor and that can make me mobile in no time at all! Well, sort of … and, with no disrespect of other car manufacturers, at least the gremlins left my gas peddle and brakes alone!

Comments

Popular posts from this blog

If it’s June then it’s time for HPE Discover 2021.

  For the NonStop community there has always been an annual event that proved hard to resist; with changing times these events are virtual – but can we anticipate change down the road? Just recently Margo and I chose to return home via US Highway 129. It may not ring any bells, but for those who prefer to call it the Tail of the Dragon – 318 curves in 11 miles – it represents the epitome of mountain excitement. For Margo and me, having now driven the tail in both directions, driving hard through all these turns never gets old. Business took us to Florida for an extended week of meetings that were mostly conversations. Not everything went to plan and we didn’t get to see some folks, but just to have an opportunity to hit the road and meet in person certainly made the 4,500 miles excursion worthwhile. The mere fact that we made touring in a roadster work for us and we were comfortable in doing so, well, that was a real trick with a car better suited to day trips. This is all just a p

The folly that was Tandem Computers and the path that led me to NonStop ...

With the arrival of 2018 I am celebrating thirty years of association with NonStop and before that, Tandem Computers. And yes, a lot has changed but the fundamentals are still very much intact! The arrival of 2018 has a lot of meaning for me, but perhaps nothing more significant than my journey with Tandem and later NonStop can be traced all the way back to 1988 – yes, some thirty years ago. But I am getting a little ahead of myself and there is much to tell before that eventful year came around. And a lot was happening well before 1988. For nearly ten years I had really enjoyed working with Nixdorf Computers and before that, with The Computer Software Company (TCSC) out of Richmond Virginia. It was back in 1979 that I first heard about Nixdorf’s interests in acquiring TCSC which they eventually did and in so doing, thrust me headlong into a turbulent period where I was barely at home – flying to meetings after meetings in Europe and the US. All those years ago there was

An era ends!

I have just spent a couple of days back on the old Tandem Computers Cupertino campus. Staying at a nearby hotel, this offered me an opportunity to take an early morning walk around the streets once so densely populated with Tandem Computers buildings – and it was kind of sad to see so many of them empty. It was also a little amusing to see many of them now adorned with Apple tombstone markers and with the Apple logo splashed liberally around. The photo at the top of this posting is of Tandem Way – the exit off Tantau Avenue that leads to what was once Jimmy’s headquarters building. I looked for the Tandem flag flying from the flagpole – but that one has been absent for many years now. When I arrived at Tandem in late ’88 I have just missed the “Billion Dollar Party” but everyone continued to talk about it. There was hardly an employee on the campus not wearing the black sweatshirt given to everyone at the party. And it wasn’t too long before the obelisk, with every employee’s signature