This topic applies only on the z/OS operating system.

Peer restart and recovery

The goal of every system is to have as little downtime as possible. Sometimes, however, system failures are inevitable. For example, a system failure might occur because the power unexpectedly goes out in your main system. When a system failure occurs, a restart action you can take is to restart on a peer system in the sysplex. This type of restart uses the peer restart and recovery function. Starting a server on a system to which it was not configured implicitly places it into peer restart and recovery mode.

Important: WebSphere for z/OS uses the z/OS Resource Recovery Services (RRS) system function to provide the same transactional recovery functionality as is provided by the high availability peer recovery support on other platforms. Therefore, high availability peer recovery support is not available on a z/OS platform.

When you experience a main system failure that results in InDoubt transactions with unknown outcomes, you need to obtain those intended transactional outcomes (ideally correctly) before the data can be utilized again. Peer restart and recovery provides an automated means of accomplishing this by restarting the controller on a peer system so that the "locks" that block the data can be dropped and the outcomes determined. This is in contrast to how a system usually handles a failure by automatically rolling back.

If a failure occurs, automatic restart management:

Peer restart and recovery restarts the controller on another system and goes through the transaction restart and recovery process so that we can assign outcomes to transactions that were in progress at the time of failure. During this transaction restart and recovery process, data might be temporarily inaccessible until the recovery process is complete. The restart and recovery process does not result in lost data.

Resource managers (such as DB2) that were being accessed at the time of failure may hold locks that are scoped to a transaction UR (unit of recovery). Once an outcome has been assigned to a UR, the resource managers will, generally, drop those locks.




Subtopics
When might PRR fail to recover servers
Related tasks
Setting up peer restart and recovery
Related information
Repository service custom properties
Concept topic    

Terms of Use | Feedback

Last updated: Sep 20, 2010 11:08:29 PM CDT
http://www14.software.ibm.com/webapp/wsbroker/redirect?version=vela&product=was-nd-mp&topic=cprrovr
File name: cprr_ovr.html