Wednesday, November 19, 2008

Losing my connection!

It’s been hard to escape the news of the fires as they rage, wind-driven, down through the canyons surrounding greater Los Angeles. While is the fires are now subsiding, the past week or so has seen so much destruction and tragedy that it’s hard to imagine the city soon forgetting this year’s fires. Again, it has been the Santa Ana winds that have contributed to the fires’ spread and there has been little let-up in the ferocity over the past month. In October, it was the Porter Ranch fire that had brought similar tragedy right to our doorstep and the picture I have included here is from my condo looking east towards California Highway 118 as it climbs out of Simi Valley.

Television networks gave us continuous coverage of the efforts being made by the fire brigade, police, and service providers like telephone and power companies. With mandatory evacuations, the empty streets could be seen, grid like, alive with activity as crews rushed to hot spots with a freedom of movement rarely seen before on the streets of Los Angeles. As reporters brought us live video feeds from ground level, you could often see fire chiefs alongside of fire engines studying topology maps and looking for the best access routes that would give the fire fighters an opportunity to combat the worst outbreaks.

It’s never easy to reflect on developing tragedies of this scope, or to compare the actions of the emergency services teams with anything we experience in our business lives. After watching house after house burning to the ground, the scale of the damage quickly overwhelming the senses, I was reminded of the fire-fights that can break out within the operation center. And the anxieties emergency personnel fighting the flames faced did remind me of scenes I have witnessed inside data centers as potential business catastrophes were avoided by well-disciplined and smart network managers.

In a blog posting back on November 8, ’07 “The artists among us ...” I asked the question “is the operator who instinctively knows what actions to take, at precisely the right time, and pursuing a sequence of commands many of us struggle to comprehend, any less an artist than the conductor of our best symphonies?” I then went on to observe “as I have looked at some of today’s data center schematics describing in minute detail the complexities of the interfaces between servers, storage, and communications paths – I can’t imagine how much time would be involved if ever we had to pour over them to figure out what we had to do next to fix a problem.”

Of all the technicians monitoring today’s IT operations, it is the network manager who faces the biggest, and often the most visible challenges. It only has to happen to the CEO once, and everyone in the company soon knows that the company’s most senior executive couldn’t get access to information from his laptop or PDA at a crucial time – and with the investments made in IT technology - why couldn’t he keep connected!

Integrated within IT’s operation centers, the network center can be just a few screens managing a small number of phone lines, or highly complex control rooms dedicated to supporting a global enterprise. The picture below is of AT&T’s network operations center on the US east coast which is probably among the larger facilities managing a network, although I do recall seeing pictures of the EDS facility in Plano that was of similar size. Extremely costly to set up and operate, and just as likely to be in Bangalore as anywhere else these days, they are nevertheless extremely necessary and an integral part in supporting today’s mission critical applications.

In a recent conversation with Peter Shell, a former OzTUG President who has a long association with networks and network management, I asked him what he saw as the role of network managers today. “The network world is changing - look at where Cisco is going! With TCP being the ubiquitous protocol to deliver all kinds of services such as voice (VoIP), video-on-demand, messaging, etc the communications ‘pipes’ keep getting bigger and the network becomes more complex. Quality of Service (QoS) is important, as is redundancy, backup paths, as is negotiating with multiple carriers.”

“Can you afford to lose your network connection now?” Peter asked, and then explained how “the role of network managers has changed to where they are looking after a number of networking specialists - routers, switches, firewalls, security appliances, etc. Where it used to be a series of protocols that needed to be supported - SNA, TCP/IP, X.25, Netware, XNS, Appletalk, etc. these days you really only need to worry about TCP/IP. But above IP are the many routing protocols, IGRP, OSPF, RIP etc. that are important to the data center’s operations. More recently, there are the application-specific protocols such as SIP, VOIP, etc. that need to be managed. And just to add to the confusion, there is QOS and SSL/IPSEC. In former times, you could partition the network by protocol relatively easily but today, with it all being IP, it becomes more complex.”

The critical nature of the work performed by network managers often requires them to step in at times of disasters – hurricanes, earthquakes, tornados, fires, etc. And the measures some IT groups take to make sure they can continue providing the service demanded of them, including redundant connections to phone networks, power grids, and even water supplies further adds to the cost and the complexity. As I tour these facilities I like to check out the back-up power generation capabilities, and the picture here is of two 750 kilowatt Caterpillar generators that are typical of what many have built into these facilities.

Watching the firefighters battle the outbreaks around Los Angeles these past weeks, it looked like little has changed with the years. There are now more planes and helicopters dropping fire-retardant materials to douse the flames at some of the worst hot spots, but it still comes down to the skills and experience of the fire fighters face to face with the flames. Each fire had to be battled separately, and extinguished, in order to get control over the “front” fanned by the Santa Ana winds. And so it is too that the actions of experienced network managers, using all the tools available to them, keeps the applications we depend upon from crashing down on top of us.

When it comes to NonStop, I have held the belief for many years that the platform is among the most ideal on which to run key network management monitoring and control applications. For years, the cost to do so has been prohibitive when compared to off-the-shelf “wintel server” platforms, but with the arrival of blades and the BladeSystem, could we see this all changing?

So much of what NonStop provides could even help keep the staffing levels manageable – a key component of any network center. Just as we purchase power from more than one grid and still install dual power generation capacity, and just as we lease redundant communications pipes and still replicate to a second DR site – surely the inherent architecture of NonStop has a role to play in the oversight of these operations centers?

The platform continues to become more open, and the ability to port applications has been greatly simplified – I wonder how long it will be before an innovative vendor realizes that significant product differentiation could be achieved by providing support for NonStop? When I look back at AT&Ts network operations center, I could easily see a NonStop configuration supporting it all. And so many automated routines could be reliably launched from the NonStop platform and simplify any recovery steps.

In Martin Fink’s blog posting of November 4th ’08 “The Unix Paradox – Innovation” he makes an observation as applicable to NonStop as to Unix, that “labor costs are one of the biggest expenses in the data center. As a result, vendors truly need to focus and innovate around automating as much as possible and that which isn’t automated needs to be made simple.” And in pursuing this, Martin suggests “automate the system (redeploy operations staff to higher payback projects that are no longer required to maintain the environment); reduce the amount of planned downtime; and make it simpler/easier to deploy and manage!” Automate, zero downtime, and simple – surely, a NonStop Martin!

It’s very easy to make comparisons between fire fighters battling the real fires and the actions of our network managers in times of crises – but the tragedy we watched unfold as families lost their homes is heart-wrenching with implications way beyond what IT faces on a routine basis. And it’s something that doesn’t warrant any trivialization at times like this.

But it did remind me of the many IT fires I was involved with early on in my career and it does reinforce the value-add we have today with NonStop. Will we ever see NonStop in the heart of tomorrow’s IT operations centers – I wouldn’t rule it out, nor would I too quickly dismiss vendors giving it a second look with the new blades packages.

Vendors will continue to innovate and for some of them – battling the current mix of offerings from IBM, HP, and CA – this could be a real opportunity to break from the pack! Furthermore, network management can be considered as nothing more than infrastructure “application-ized” and infrastructure offerings on NonStop are bountiful. And perhaps, better equip tomorrow’s network managers with a platform that will make sure their CEO never again complains about losing his connection! Well connected CEO… hmm… that’s scary indeed!

No comments: