International Engineering Consortium
Web ProForums
Service-Level Management

2. Background
Managing communications service delivery is challenging. Typical environments consist of equipment from several vendors, supporting multiple protocols, with multiple services and multiple users. The network and equipment performance is dynamic, the amount of data is large, but the amount of useful information is limited. In this complex environment, it is difficult to detect potential problems before they surface. Once they do surface, it takes time to discover the true root cause, determine the best solution, measure the potential business impact, and take corrective action. The more time it takes, the greater the negative impact on service availability and overall service quality.

While managing service delivery is challenging, the cost of not doing it well is enormous. In today’s world of converging voice and Internet services, which are offered over a combination of wired, wireless, and cable media, the service providers that cannot keep up will be left behind by those that can differentiate their services and get them to market quickly.

The task of building network operations centers (NOCs) and operations support systems (OSSs) to manage effectively the highly complex, distributed, mission-critical networks used to deliver today’s data and voice services is one of the most important issues facing service and equipment providers. Organizations must integrate legacy systems with emerging technologies, manage a host of diverse and often incongruous technologies, and deploy flexible, platform-independent applications—all with minimal staff. And they must do so faster than their competitors to ensure the successful launch of new products and services.

Whether applying Internet or corporate Intranet, data transmission, fax, wireless, landline telephone or cable service, users expect virtually 100-percent availability. To deliver it, communications service providers must be able to anticipate potential network problems before they occur. When interruptions do happen, they must be able to isolate the cause immediately and take appropriate corrective action. Failure to meet these objectives will cost companies existing and potential customer revenue.

This increasingly demanding environment has created OSS requirements that far exceed the capabilities of first-generation, off-the-shelf software applications. These applications are not designed to integrate the broad spectrum of new technologies with existing legacy systems, manage the multitude of continuously changing variables in real time, or scale to the ever-increasing traffic volumes generated today.

A typical service provider NOC receives hundreds of events per second from tens of thousands of managed resources. Typical OSS software includes fault-management event browser tools, with exception messages presented to the operators in a standard form and grouped by operator responsibilities (management domains and functional areas). Those messages may be filtered, and simple automations, such as event stream deduplication or managed resource up-down correlation, may be performed.

However, when complex problems occur (as is the norm), operators and knowledge experts are faced with daunting problem-solving tasks such as the following:

  • correlating multiple events from one or more managed resources across one or more management domains
  • running diagnostic tests
  • calling on the knowledge experts
  • isolating the problem and its root cause(s)
  • eliminating symptomatic errors
  • determining the impact on users and services
  • notifying appropriate management and customer support centers
  • applying corrective actions to resolve each problem
  • taking the failing component(s) out of production for repair
  • submitting trouble tickets

The situation is typically compounded by adding new managed resources to the production environment at a rate of several hundred per week. Coordinating the change/fault management responsibilities and actions is a challenge. An automated method of incorporating new managed resources and removing retired managed resources from the change/fault/problem/asset management-system environment is essential.

Also, managed resources are periodically removed from production for routine maintenance, upgrades, and problem fixes. The exception reporting, problem resolution, and notification procedures vary according to the state of the managed resource.

Isolating and resolving these problems is often a manual, labor-intensive and time-consuming process—even if it is in the order of minutes. Impacting service availability even for seconds can be significant in terms of customer satisfaction and lost revenue.

Registered Users
Enjoy exclusive access to free On-Line Education and receive the biweekly IEC newsletter.

IEC Newsletter
Get the latest industry information including critical insights from key industry leaders, technology briefings, and an Analyst Corner.
Current
Subscribe

Newsroom

IEC Corporate Member

Advertising Kit