ptx/CLUSTERS V2.2.2 Release Notes


Introduction

These release notes support ptx®/CLUSTERS V2.2.2. Read this document before you install or run this release of ptx/CLUSTERS.


Product Compatibility

This version of ptx/CLUSTERS can be used with the following products:

For product compatibility information on products such as ptx/LAN and ptx/TCP/IP, consult the DYNIX/ptx and Layered Products Software Installation Release Notes.


Supported Cluster Configurations

This release of ptx/CLUSTERS supports the following configurations:

This release of ptx/CLUSTERS does not support ``mixed'' cluster configurations of Symmetry 5000 and NUMA systems. This release also does not support the use of Fibre Channel Arbitrated Loop on NUMA 2000 systems.


Install ptx/CLUSTERS

For instructions on how to update ptx/CLUSTERS, DYNIX/ptx, and other IBM NUMA-Q products running on Symmetry systems, see the DYNIX/ptx and Layered Products Software Installation Release Notes.

IBM NUMA-Q authorizes only IBM NUMA-Q personnel to perform initial cluster installations and to upgrade NUMA-Q 2000 clusters. IBM NUMA-Q Customer Support or Professional Services personnel who install new clusters or update NUMA-Q 2000 clusters should follow the procedures in the ptx/CLUSTERS V2.x Installer's Guide and in the DYNIX/ptx and Layered Products Software Release Notes for installation and configuration.


ptx/SVM Limitations in 3- and 4-Node Clusters

ptx/SVM V2.2.x cannot be used to manage shared storage on clusters containing more than 2 nodes. ptx/SVM can be used for mirroring root and primary swap on local disks on the nodes of 3- and 4-node clusters. See the ptx/SVM V2.2.2 Release Notes for more information about the limitations of ptx/SVM in 3- and 4-node clusters.


Changing Cluster Node ID

Normally, when changing a node ID in a cluster, you need to reboot only the node whose ID you are changing. However, because of a defect in the software (problem report 235185), after changing the ID of one node or more nodes, you need to reboot all nodes.

To change the node ID, follow these steps:

  1. Issue the clustadm -P nodeid=value command, where value is the new node ID (an integer between 0 and 7, inclusive). Issue this command on each node whose ID you wish to change.

  2. Shut down all cluster nodes. The recommended procedure is to first bring all the nodes to run-level 1, and then bring them to the firmware level.

  3. Start the cluster nodes back up.

Failure to follow this procedure can cause the same node to appear multiple times in clustadm output and may cause the Lock Manager to hang.


Using Disks Containing VTOCs with ptx/SVM

In ptx/CLUSTERS V1.x, if you built a VTOC on a shared device from one of the nodes, the disk's slices were then available on all of the cluster nodes. In ptx/CLUSTERS V2.x, the remaining node(s) will not be aware of the existence of the VTOC slices if you build a VTOC on a shared device from only one of the nodes.

If you wish to place a disk containing a VTOC under ptx/SVM control and use the disk in a cluster, then you must assure that each member node's /etc/devtab file contains a VTOC entry for that disk. Then issue the devbuild command to create the virtual devices included in that VTOC on all nodes in the cluster.

If you build the VTOC for a disk on one node (where the disk will be recognized as a ``sliced'' disk), but not on the other node(s) (where the disk will be recognized as a ``simple'' disk), then the ptx/SVM shared disk groups will not match across the cluster and you will not be able to use them.


Forced Node Shutdown of a Cluster Node

There are several situations in which it is necessary for the Integrity Manager to reboot a cluster member node. In these situations, the node has become unable to safely resume access to shared storage. The Integrity Manager invokes the kernel panic mechanism to prevent any further user-level activity that might require access to shared storage and to bring the node most rapidly back into cluster membership. The panic messages used, and their causes, are the following:


Guidelines for Removing a Node from a Cluster

To remove a node from a cluster, follow these steps:

  1. Shut down the node you wish to disconnect from the cluster and power it off.

  2. Disconnect all shared storage from the node to be removed from the cluster.

  3. Disconnect the node from the CCI networks.

  4. Boot the node you wish to remove from the cluster. Go to single-user mode, either with the bootflags or by entering s at the Waiting for cluster membership, enter 's' to go to single-user mode prompt.

  5. Through ptx/ADMIN, deinstall the ptx/CLUSTERS software. For information on how to deinstall software, see the DYNIX/ptx and Layered Products Software Installation Release Notes.


    ATTENTION

    To avoid destroying or corrupting data, do not remove the ptx/CLUSTERS software before detaching the node from all shared storage.


  6. On the remaining nodes, reset the number of expected votes to equal the number of remaining nodes plus the quorum disk, if one is configured.


Product Documentation

The following documentation is available on the online documentation CD or at http://webdocs.numaq.ibm.com/:


Problem Reports

This section lists the following problem report summaries:

The numbers in parentheses identify the problems in the problem-tracking system.


Fixed Problems in ptx/CLUSTERS V2.2.2

This release of ptx/CLUSTERS contains fixes for the following software defects:


Open Problems in ptx/CLUSTERS V2.2.2

This section lists open problems in this release of ptx/CLUSTERS.


devctl Complains of No Quorum When Cluster Has Quorum (255394)

Under certain conditions, devctl returns the following error message even though the cluster has quorum: "Transactions require a quorum of database replicas." The conditions when this message appear are:

Workaround. This error arises because the rc2 script S01deactivate was not run to complete the naming database synchronization across cluster nodes. To work around this problem:


Node Index Change Causes IMD to List Same IP Address Twice (235185)

When a node index is changed on one cluster node and the node is restarted, the unchanged node has two entries for the node whose index has changed. One entry is for the original node index, and the other entry is for the new node index.

Workaround. See the section in these release notes entitled "Changing Cluster Node ID" for information on how to change a node's index.


Installation of ptx/CLUSTERS Removes ptx/CTC Menu Entries (237168)

ptx/CTC menus in ptx/ADMIN are removed if an updated version of ptx/CLUSTERS is installed and ptx/CTC is not reinstalled.

Workaround. Always install ptx/CLUSTERS and ptx/CTC together. If you have already installed ptx/CLUSTERS, install ptx/CTC from the CD-ROM so that the menus will reappear.


ptx/CLUSTERS Does Not Recognize CCI Name Changes Done Through devctl (241128)

When you use devctl to change the name of a CCI device, ptx/CLUSTERS does not know the name has changed.

Workaround. Use clustadm to deconfigure and reconfigure the CCI device.


clustadm -C and clustadm -D Hang When Node Has Lost Quorum (241693)

The clustadm -C (configure quorum disk) and clustadm -D (deconfigure quorum disk) will hang when the node has lost quorum. The commands cannot be suspended or interrupted.

Workaround. Boot another node to restore quorum or reboot the node and only make quorum disk configuration changes while the node has quorum.


devdestroy Fails for Device That Hosted the Quorum Disk (243003)

When a quorum disk is configured, the VTOC, if it is not already in place, is built for the device on remote nodes from the kernel. However, this does not update the list of built devices at the user level.

Workaround. Execute the devbuild command on the node where the devdestroy is failing. Doing so will update the list of built devices. Then do the devdestroy.


A Node May Be Down or Powered Off and Other Nodes May Report It Is Still a Cluster Member (246261)

When a cluster member node is shut down, the other member nodes will continue to report that it is a member until VSYNC has completed its membership view change protocol. This protocol includes a delay known as "I/O drain time," during which the view change waits for completion of any I/O requests initiated before the node was shut down to either complete or fail. Thus it is possible, if this I/O drain time is long enough, for a node to be completely shut down and even powered off while clustadm on other nodes continues to report that the node is still a cluster member. clustadm will report that all links to the shut down node are DOWN during this delay in the membership view change protocol. This is the indication that the protocol is underway and should complete shortly.

Workaround. This is a transient problem which will correct itself after the I/O drain time delay has passed.


ptx/SDI Must Be Installed on All Nodes or No Nodes (250365)

If one cluster node is running ptx/SDI and has a ptx/SDI device, then another node that attempts to join the cluster must also have ptx/SDI installed. Likewise, if an existing cluster node does not have ptx/SDI installed, then another node that attempts to join the cluster must also not have ptx/SDI installed. Otherwise, when the nodes attempt to synchronize their naming databases, the following error will occur:

devctl: Internal error 3 during NDB merge operation: Invalid argument
devctl: unable to synchronize NDB: Invalid argument

Workaround. Ensure that ptx/SDI is either installed on all cluster nodes or on none.


devctl Commands May Not Be Propagated to All Cluster Nodes (251549)

When the devctl command is used to deconfigure or configure devices, the results may not be propagated to all nodes in the cluster if the devctl command is executed just before a cluster membership transition begins.

Workaround. Try not to configure or deconfigure devices when the cluster is transitioning. Use the /sbin/ndbcompall command to verify that the devctl command was propagated to all nodes. If it was not, shut down and reboot the nodes to resynchronize the cluster nodes' naming databases.