International Engineering Consortium
Web ProForums
Highly Available Embedded Computer Platforms Become Reality

4. High Availability and Restart Models
Complexity of a highly available system is also dependent on the restart model. The strategies employed in the fault management process vary if the system is using a hot restart, a warm restart, or a cold restart model. The restart model is affected by the amount of information available to the system at the time of an event. The more information available, the faster the restart is.

A hot restart system has the fastest recovery time but is the most complex to implement. In a hot restart model, the application saves state information about the current activity of the system. That information is given to the standby component so it is ready to take over quickly. The application must be designed to restart using this state information.

A hot restart system also requires that a standby component is designated prior to a fault management event (see Figure 2). In clustered systems (2N), this is straightforward, as there is a one-to-one correspondence between components and their standbys. In N+1 systems, hot restart requires the standby device to save the state of multiple components. The standby must have the extra capacity to do this. Otherwise, a warm restart model must be used.


Figure 2. Hot Standby

A warm restart is similar to the hot restart model. In a warm restart model, the applications save state information about the current activity of the system, and the standby component is not designated until the fault management cycle is in progress. Then the standby component is configured with the necessary application and state information. This adds time to the restart process but can reduce costs associated with the standby components (see Figure 3). Warm restart is also easier to implement in systems where the standby devices are not identical to the active devices.


Figure 3. Warm Restart in N+1

A cold restart is the least complex to implement but requires the most time. A cold restart implies the starting place for the standby element is its initialization point. A cold restart is used when no information is available about the state of the failing component. The last known state is therefore initialization.

A cold restart system can be implemented with little or no changes to the applications of a system. The high-availability-specific software components can be relegated to operating system software and services. The price for this simplicity is that restart times are much longer, and current activity in the system may be lost.

The times associated with the different restart models vary depending on the implementation of the system and application software. In relative terms, if the hot restart model is 1X, the warm restart model can be 2X to 3X, and the cold restart model is approximately 10X to 100X. Restart times will be the lowest in systems where the system software and applications are designed to support high availability.

Registered Users
Enjoy exclusive access to free On-Line Education and receive the biweekly IEC newsletter.

IEC Newsletter
Get the latest industry information including critical insights from key industry leaders, technology briefings, and an Analyst Corner.
Current
Subscribe

Newsroom

IEC Corporate Member

Advertising Kit