New Line Card
Line cards may be inserted into a chassis to replace cards that have failed. Also, as the load on an RNC increases it may be necessary to add new line cards. In addition, hardware evolution means that cards with superior functionality or lower cost will periodically become available, and carriers will want to upgrade their systems. For all these reasons, new line cards are a fact of life for the RNC.
In an autonomous RNC system, when the line card is initialized it will detect the chassis controller automatically, and configure itself by downloading a suitable database replica from the controller. The controller will automatically reconfigure the pre-existing line cards to spread the load over the new processing power and bandwidth self-optimizing the chassis. The controller application does this by writing the desired configuration into its local data store. The autonomic data management system takes care of forwarding this new master data to the line card replicas. Because the controller is also a replica to the EMS database, the changes are automatically forwarded to the EMS as well. The EMS can always over-ride the controller's decision because it is the master for all configuration data.
Controller Card Failure and Repair
In a hierarchically managed autonomic system, one of the controller cards is the primary and the other acts as its jot standby. The role of the standby is to maintain its copy of the chassis state and watch for a failure of the primary card.
If the primary card fails, the system heals itself by immediately forcing the standby to take over as a lone primary. Any application using the controller card always connects to both primary and standby. When an application discovers it has lost communication with the primary, it heals itself by switching its attention to its connection with the standby If it had a transaction in flight against the primary when the failure occurred, it reissues it. Otherwise it just continues work, knowing that the self-protecting system has ensured that the standby contains every transaction that was committed at the primary.
When the controller card is repaired, it starts itself as the new standby. If some data survived the failure, the new standby can ask for a refresh. Otherwise it asks for a new copy of the primary database. Once it is back in sync, it announces itself as the new standby, applications reconnect to it, and the system has healed itself back in a fault-tolerant state.
If a standby fails, applications take no action other than dropping their useless connection. When the standby is repaired, applications heal themselves by re-establishing their connection to it, restoring the fault-tolerant state of the system.
This functionality can be used to perform a live upgrade of the controller software, by taking down the standby, upgrading it, and then restoring it. Once it is back in sync, it can be switched to a primary role, allowing the other card to be upgraded in the same way.


