Jim Nottingham, director of switching in Northern Maryland for Chesapeake & Potomac Telephone Co., says trying to find the culprit to Wednesday's massive phone outage is a bit like "caring for a sick child. You want to stay up until it's well, hold its hand, take its temperature, do whatever you have to do to make it better."
By most estimates, that could take a while.
Five days after the worst outage in C&P's history, technicians still don't know what caused the local phone networks in Maryland, Washington, Virginia and West Virginia to overload and shut down.
By week's end, a malfunctioning circuit board in a Baltimore computer was being blamed for triggering the massive outage. But technicians still can't explain why the chain of events that followed left up to 5 million customers across the region without phone service.
Bell Atlantic Corp., the corporate parent of C&P, assembled a task force of more than 200 technicians from across the country to try to figure out an answer to that problem. As of yesterday, there was talk of bolstering those ranks further with technical experts from New Zealand Telcom, the local phone company New Zealand Bell Atlantic purchased last year.
Mr. Nottingham, a member of that task force, said he plans to RTC meet with about a dozen technical experts today from across the region to go over reams of computer printouts that detail the events that immediately preceded and followed the breakdown.
According to Mr. Nottingham, getting all the computer printouts from those few hours took almost three hours to print out Friday.
"I won't know how much paper we have to look at until I get there, but it's a lot," he said.
At today's meeting, scheduled to be held at a C&P site in Hunt Valley, Mr. Nottingham and his colleagues plan to sequester themselves in a conference room and, armed with red pens and plenty of coffee on hand, begin the tedious task of trying to figure out what led to Wednesday's catastrophic breakdown.
That means sifting through thousands of lines of computer messages See C&P, 8C, Col. 4C&P, from 1Cfor any clues that will lead them to the culprit of the breakdown. By isolating possible clues for discussion, the group hopes to try to piece together, second by second, Wednesday's events.
Computer messages -- which are the internal communicationthat computers send each other to confirm that everything is functioning properly -- resemble gibberish to the untrained eye -- a series of numbers, punctuation marks and acronyms that sometimes send even veterans like Mr. Nottingham back to the manual for translations.
Not surprisingly, the task of wading through codes can often be slow, tedious and stressful. The difficulty of isolating electronic aberrations on paper is only compounded when the problem, like that Wednesday, is spread among four calling regions, four master computers and hundreds of electronic switches.
"It's been a long and hard process," said Mr. Nottingham yesterday as he headed, yet again, back into his office. "But you have to analyze each, individual thing if you want to get the total picture."
The only problem is, C&P may never get the total picture.
According to Philip Freedenberg, president of Federal Engineering Inc., the sophisticated nature of C&P's network -- comprising master computers that tell electronic switches how to route and sort calls -- makes it extremely difficult to diagnose a problem like the one that occurred Wednesday.
The reason: Technicians basically have to reassemble the set of circumstances that led up to the breakdown to look for clues about what happened, much like an investigator has to try to reassemble a crime scene after it has taken place.
For a network of C&P's size, the circumstances and combinations of circumstances that possibly caused the outage could number in the millions, leading technicians into a protracted guessing game with computer codes, glitches and intangibles -- such as power surges or an unusual set of calling patterns -- that may or may not be easily uncovered.
"I'm not surprised C&P can't pinpoint it," Mr. Freedenberg said. "They may never pinpoint it, and that's the scary part, because there's no guarantee that there's not another bug in it somewhere else that will send it off into never-never land a week or month from now."
According to Kenneth Pitt, a Bell Atlantic spokesman, steps have already been taken to make sure that Wednesday's performance isn't duplicated any time soon. But he acknowledges there's no guaranteeing that outages won't occur the future, an inherent risk of using state-of-the-art technologies.
"We're going into new areas of technology, and we're apparently learning as we go," he said.