These release notes support ptx®/CLUSTERS V2.3.1. Read this document before you install or run this release of ptx/CLUSTERS.
This version of ptx/CLUSTERS can be used with the following products:
For product compatibility information on products such as ptx/LAN and ptx/TCP/IP, consult the DYNIX/ptx and Layered Products Software Installation Release Notes.
This release of ptx/CLUSTERS supports 2-, 3-, and 4-node NUMA system clusters with Direct-Connect interconnects.
This release of ptx/CLUSTERS does not support the use of Fibre Channel Arbitrated Loop on NUMA systems.
This release of ptx/CLUSTERS includes the following:
This release of ptx/CLUSTERS supports clusterwide execution of the devbuild and devdestroy commands for building and destroying devices in DYNIX/ptx. In previous releases of ptx/CLUSTERS, any devbuild or devdestroy operation had to be executed from every cluster node. Now, the devbuild or devdestroy operations on shared devices need only be executed from any one node of the cluster to affect other active nodes of the cluster.
In DYNIX/ptx V4.6.x, the root and primary swap partitions can be located on shareable disks attached to the Fibre Channel. The system must be booted on V4.6.x before installing the root filesystem on a shareable disk or moving the root filesystem to a shareable disk. To support the ability to place the root and primary swap partitions on a Fibre Channel device, disk labels and a Master ID are now required.
Disk labels are now used to specify whether disks are owned by a node or a cluster. The operating system uses these labels to control access to shareable disks (that is, disks that can be potentially accessed from multiple nodes in a cluster). In DYNIX/ptx V4.6.x, disks without disk labels will not be allowed to be opened.
The disks containing the root and primary swap partitions must be node-owned. These disks are labeled automatically when you install DYNIX/ptx V4.6.x.
A new command, diskown, is available for assigning labels to disks. This command can also be used to modify or delete a label, or to display disk label information.
You can also specify disk label information in the /etc/devlabel file. This file is read when the system is booted to multiuser mode and the appropriate labels are assigned to the disks specified in the file. We recommend that you create this file during the V4.6 installation so that all disks will be labeled appropriately.
In clustered systems, information in the /etc/devlabel file is used to write labels when you run the /etc/rolling_upgrade_complete script on each cluster node. For more information about assigning disk labels, installing DYNIX/ptx V4.6.x and ptx/CLUSTERS V2.3.x, and performing a rolling upgrade, see the DYNIX/ptx V4.6.1 and Layered Products Software Installation Release Notes and the Clusters OS and Console Rolling Upgrade to DYNIX/ptx V4.6.1 Release Notes.
A Master ID is now required. On a single-node system, this ID can be the same as the Node ID. Each node on a clustered system must have the same Master ID, which can be the Node ID of one of the nodes in the cluster. The Master ID should be set from the VCS console after downloading the console software. (If you do not set the Master ID, the system will stop at the stand-alone kernel when it is booted.) Instructions for setting the Master ID through the VCS console are available in the DYNIX/ptx V4.6.1 and Layered Products Software Installation Release Notes.
ptx/SVM can now be used to manage shared storage on clusters containing more than 2 nodes. If you previously disabled ptx/SVM sharing on the third or fourth nodes in a cluster, you can enable sharing on them if you wish to use ptx/SVM to manage shared storage. See the section entitled "Re-enable ptx/SVM Sharing on Cluster Nodes" later in these release notes for more information.
IBM authorizes only IBM personnel to perform initial cluster installations and to upgrade NUMA system clusters. IBM Customer Support or Professional Services personnel who install new clusters or update NUMA system clusters should follow the procedures in the ptx/CLUSTERS V2.x Installer's Guide and in the DYNIX/ptx V4.6.1 and Layered Products Software Installation Release Notes for installation and configuration.
In previous releases of ptx/CLUSTERS and ptx/SVM, ptx/SVM could not be used to manage shared storage in 3- and 4-node clusters. If you have a 3- or 4-node cluster and wish to enable ptx/SVM sharing on those nodes now that ptx/SVM sharing is available you will need to shut all cluster nodes down at the same time, install DYNIX/ptx V4.6, ptx/CLUSTERS V2.3, ptx/SVM V2.3 and other layered software on each node, and then reboot each node. If you upgrade and reboot only one node at a time and had previously disabled ptx/SVM sharing in pre-DYNIX/ptx V4.6 nodes, then the DYNIX/ptx V4.6 and pre-DYNIX/ptx V4.6 nodes will not match and the cluster will not form.
Normally, when changing a node ID in a cluster, you need to reboot only the node whose ID you are changing. However, because of a defect in the software (problem report 235185), after changing the ID of one node or more nodes, you need to reboot all nodes.
To change the node ID, follow these steps:
Issue the clustadm -P nodeid=value command, where value is the new node ID (an integer between 0 and 7, inclusive). Issue this command on each node whose ID you wish to change.
Shut down all cluster nodes. The recommended procedure is to first bring all the nodes to run-level 1, and then bring them to the firmware level.
Start the cluster nodes back up.
Failure to follow this procedure can cause the same node to appear multiple times in clustadm output and may cause the Lock Manager to hang.
DYNIX/ptx V4.6.x and later no longer use the /etc/devtab file to store information that devbuild uses to build virtual devices on disks. Instead, the information is stored in the device naming database; the CMPT virtual device is obsolete. In ptx/CLUSTERS V2.3.x and later and ptx/SVM V2.3.x and later, you need only build the VTOC driver for a disk on one node; the naming database will propagate the information to the remaining nodes and the ptx/SVM shared disk groups will match across the cluster.
There are several situations in which it is necessary for the Integrity Manager to reboot a cluster member node. In these situations, the node has become unable to safely resume access to shared storage. The Integrity Manager invokes the kernel panic mechanism to prevent any further user-level activity that might require access to shared storage and to bring the node most rapidly back into cluster membership. The panic messages used, and their causes, are the following:
Taking this node out of the cluster, as some critical transition script has failed
One example of a transition-script failure that warrants a system shutdown is when the lmrecovery script fails. If lmrecovery fails, it could mean that the Lock Manager is disabled on all nodes of the cluster until the problem is fixed. When lmrecovery terminates abnormally on a node, that node is shut down and will normally reboot in order to restore the normal operation of the cluster.
Lost the qdisk to a partition node %d
This message indicates that a cluster with a quorum disk had CCI communication problems. The node that shut down lost connectivity with the other node(s) and when it read the quorum disk, found that it had been removed from the set of active member nodes.
Normally, a node that loses CCI communications enters a NO QUORUM state. While in this state, the node continues to monitor cluster and quorum disk states. If other nodes form a cluster without the disconnected node, quorum disk data will reflect the new cluster membership. The disconnected node's internal state (for example, Lock Manager locks) is now invalid and the node will reboot itself. You must then address the communication problem(s) and reboot the node in order for it to again become an active member of the cluster.
Forcing a system panic - This node out of sync with the rest of the cluster
This panic message means the same as the previous panic message, except that the quorum disk is not involved. The node that shut down discovered through CCI communication that the other node(s) had formed a new cluster without it. Because its state is now invalid, the node shut down.
To remove a node from a cluster, follow these steps:
Shut down the node you wish to disconnect from the cluster and power it off.
Disconnect all shared storage from the node to be removed from the cluster.
Disconnect the node from the CCI networks.
Boot the node you wish to remove from the cluster. Go to single-user mode, either with the bootflags or by entering s at the Waiting for cluster membership, enter 's' to go to single-user mode prompt.
Through ptx/ADMIN, deinstall the ptx/CLUSTERS software. For information on how to deinstall software, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
ATTENTION To avoid destroying or corrupting data, do not remove the ptx/CLUSTERS software before detaching the node from all shared storage.
On the remaining nodes, reset the number of expected votes to equal the number of remaining nodes plus the quorum disk, if one is configured.
The following documentation is available on the online documentation CD and at http://webdocs.numaq.ibm.com/:
ptx/CLUSTERS Administration Guide
ptx/CLUSTERS Installer's Guide (IBM Customer Support group only)
This section lists the following problem report summaries:
The numbers in parentheses identify the problems in the problem-tracking system.
The following problems have been fixed in ptx/CLUSTERS V2.3.1.
(243550) Memory was wasted when an event cell, allocated within alloc_retrans_hdr, was not freed when a call to CLUST_DUPMSG failed.
(247061) A panic occurred in res_freqenable() because the sleep on the semaphore was not enabled on the arrival of the interrupt.
(253730) devctl hung in a three-node cluster.
(253807) The header comments for gms_join and gms_join_async were incorrect.
(254027) The -R option of edc was not used for attaching/detaching menus, and now it is.
(254030) ptx/CLUSTERS hung during boot when all four nodes were rebooted at once.
(254109) The naming database generation number was out of sync on the quorum disk.
(254125) The system panicked when cl_data_size did not match the actual size of the message.
(254141) It was possible for the quorum disk to recognize a different set of shared generation numbers than any of the other cluster nodes, causing the devctl -A command to hang.
(254192) The vsync node block information was not updated when evotes were changed.
(254644) An unpartitioned disk used as a quorum disk was unavailable after reboot.
The following problems were fixed in ptx/CLUSTERS V2.3.0.
(249704) Conflict messages appeared when ptx/CLUSTERS was deinstalled through ptx/INSTALL.
(251549) devctl commands were not propagated to all cluster nodes.
(251488) When a system panicked with the message "Lost the qdisk to a partitioned node 0," the review of system logs on the surviving or panicked node gave no indication why one node declared the other dead and took ownership of the quorum disk.
(251093) A misleading message appeared when the user attempted to deconfigure a non-existent quorum disk.
(252107) A scratch installation of ptx/CLUSTERS using the CD-ROM failed to ask for the cluster parameter.
(251738) The Clusters Replica Manager and the Global Messaging System output too many messages that were not useful.
(251707) Retransmitted messages were appearing hundreds of times a day in backup windows during smash and silver operations.
This section lists open problems in this release of ptx/CLUSTERS. The numbers in parentheses identify the problems in the IBM problem-tracking system.
ptx/CTC menus in ptx/ADMIN are removed if an updated version of ptx/CLUSTERS is installed and ptx/CTC is not reinstalled.
Workaround. Always install ptx/CLUSTERS and ptx/CTC together. If you have already installed ptx/CLUSTERS, install ptx/CTC from the CD-ROM so that the menus will reappear.
The clustadm -C (configure quorum disk) and clustadm -D (deconfigure quorum disk) will hang when the node has lost quorum. The commands cannot be suspended or interrupted.
Workaround. Boot another node to restore quorum or reboot the node and only make quorum disk configuration changes while the node has quorum.
When a quorum disk is configured, the VTOC, if it is not already in place, is built for the device on remote nodes from the kernel. However, this does not update the list of built devices at the user level.
Workaround. Execute the devbuild command on the node where the devdestroy is failing. Doing so will update the list of built devices. Then do the devdestroy.
When a cluster member node is shut down, the other member nodes will continue to report that it is a member until VSYNC has completed its membership view change protocol. This protocol includes a delay known as "I/O drain time," during which the view change waits for completion of any I/O requests initiated before the node was shut down to either complete or fail. Thus it is possible, if this I/O drain time is long enough, for a node to be completely shut down and even powered off while clustadm on other nodes continues to report that the node is still a cluster member. clustadm will report that all links to the shut down node are DOWN during this delay in the membership view change protocol. This is the indication that the protocol is underway and should complete shortly.
Workaround. This is a transient problem which will correct itself after the I/O drain time delay has passed.
If one cluster node is running ptx/SDI and has a ptx/SDI device, then another node that attempts to join the cluster must also have ptx/SDI installed. Likewise, if an existing cluster node does not have ptx/SDI installed, then another node that attempts to join the cluster must also not have ptx/SDI installed. Otherwise, when the nodes attempt to synchronize their naming databases, the following error will occur:
devctl: Internal error 3 during NDB merge operation: Invalid argument devctl: unable to synchronize NDB: Invalid argument
Workaround. Ensure that ptx/SDI is either installed on all cluster nodes or on none.
When a cluster with a quorum disk is booted with the -I -n <ndb.new> option, the clustadm command displays the qdisk path in place of the qdisk name.
Workaround. When a cluster node is booted with the -I -n<ndb.new> option, it is possible for the clustadm command to display the quorum disk as /dev/rdsk/clust_qdisk. To recover from this state, take all nodes to multiuser mode and ensure that device naming has completed on all the nodes in the cluster. Then try deconfiguring the quorum disk (using clustadm -D) and reconfiguring a quorum disk (using clustadm -C <qdisk>).