These release notes support ptx®/CLUSTERS V2.2.1. Read this document before you install or run this release of ptx/CLUSTERS.
This version of ptx/CLUSTERS can be used with the following products:
IBM® NUMA-Q® 2000 systems
IBM Symmetry® 5000 systems
DYNIX/ptx® V4.5.1 or later
ptx/SVM V2.2.1 or later
For product compatibility information on products such as ptx/LAN and ptx/TCP/IP, consult the DYNIX/ptx and Layered Products Software Installation Release Notes.
If you wish to use Oracle® Parallel Server, we recommend Oracle V7.3.4 or higher.
This release of ptx/CLUSTERS supports the following configurations:
2-, 3-, and 4-node NUMA-Q clusters (with Fibre-Channel Interconnection).
2-, 3-, and 4-node "mixed" NUMA-Q 1000/NUMA-Q 2000 clusters (with Fibre-Channel Interconnection).
2-node Symmetry 5000 clusters (with standard QCIC SCSI-direct connection).
This release of ptx/CLUSTERS does not support ``mixed'' cluster configurations of Symmetry 5000 and NUMA-Q systems. This release also does not support the use of Fibre Channel Arbitrated Loop on NUMA-Q 2000 systems.
This release of ptx/CLUSTERS contains fixes for the following software defects:
(250175) The clusters time synchronization check generated unnecessary log error messages.
(249623) Users in lock manager domains were required to exist in the /etc/passwd file.
(249513) Exemption from I/O shutdown was not honored for the quorum disk.
(248893) Output from the clustadm command needed to be more useful. Changes were made so that the clustadm -m output is equivalent to clustadm -m all, and the default (no option) output is equivalent to clustadm -cvm all.
(248891) The lmdomain command needed a usable default output. The lmdomain command now has default output that matches the lmdomain -l output.
(248894) Changes were made to the res command provided by the ptx/CLUSTERS crash library. "sqntkern" is now the default domain instead of "oracle". Also, when a domain name is specified with the -d option, the default domain is set to the given domain name. The subsequent res commands issued without the -d option will dump the information related to the Lock Manager resources from this domain.
(248342) The update mechanism for clustcfg file was updated to be more robust.
(246976) clustadm incorrectly reported that a node was not a cluster member when an incorrect argument (for example, "alll") was given to it.
(245042) The qdisk daemon logged -1 instead of errno following a read failure.
(244827) A confusing message was entered into the ktlog when clustadm -m node was used.
(231434) Some clustadm error messages were not useful. For example, if deconfiguration of a quorum disk failed, the error message failed to say why.
(231250) The console message that appeared after CCI failure was not descriptive.
The following sections describe changes to the ptx/CLUSTERS software that were introduced with the ptx/CLUSTERS V2.2.0 release.
The ptx/CLUSTERS V2.2.0 release contained fixes for the following software defects:
(249608) A MUTS message processing problem caused a cluster to hang.
(248981) It was possible for the imd to fail without being detected by the kernel components. This caused the cluster to hang, because the other cluster nodes sent messages to the imd and waited for it to respond. Since the imd was no longer present, the response was never sent.
(248969) The imd was unresponsive because the member_trans script looped forever due to an incorrect exit status.
(248800) In a 4-node cluster, the imd failed unexpectedly and the cluster then hung in lmrecovery.
(246352) ptx/INSTALL did not prompt the installer for the value of the node index.
(246084) CFS filesystem recovery caused a node to hang during a cluster transition.
(245391) The Lock Manager recovery process hung while a node booted because lmrecovery was stopped on another node.
(245059) The clust_time usage message misspelled the name of the command.
(245061) The clust_time command did not process its arguments correctly.
(245005) An MMU fault occurred in the callback_doconvert routine when a lock request failed.
(244983) When the system was heavily loaded, the ktlog filled with "Rcvd" messages, filling the root filesystem.
(244923) When MUTS displayed IP addresses in EES messages, it did so in hexadecimal.
(244896) The qdisk monitor generated repeated messages indefinitely when the quorum disk became inaccessible.
(243702) The rcv_GMS_msg could lose a message due to a memory allocation failure.
(242936) A node in a cluster ran a number of devctl -c scsibus* commands, the node panicked with a "COMMIT before PREPARE" message.
(242571) clustadm did not emit a warning when an invalid password was specified.
(242070) The clustadm -P command did not emit an error message when an invalid parameter was passed.
(240819) gms_alloc() did not return an error when it was called with an allocation request size that was too large.
(239722) ptx/CLUSTERS did not lose ownership of the quorum disk quickly enough when there was an I/O problem.
(238951) A node returned to the firmware level when it was booted, because initdefault was set to 1 in inittab.
(237374) A node with a corrupt clustcfg file was allowed to join the cluster.
(234577) Changes in shared device configurations were not clusterwide.
(231227) Unwanted error messages were sent to the console.
With V2.2.x, ptx/CLUSTERS ensures the names of shared devices are synchronized throughout the cluster; that is, a shared device is guaranteed to have the same name on all cluster nodes. If devices are added or removed or the name of a device is changed from one cluster node, the changes will be propagated to all the other cluster nodes. Under normal circumstances, if changes are made to device names while some of the cluster nodes are down, the changes will be propagated to these nodes when they rejoin the cluster.
The naming database is a replicated database; a complete copy of the database is stored on each cluster node. You can make changes to the naming database using the devctl command. In releases prior to ptx/CLUSTERS V2.2.x, the naming database on each node is independent. devctl commands affect only the naming database on the system on which they are run. To keep the naming databases synchronized, the administrator must manually make identical changes to all the naming databases in a cluster.
In ptx/CLUSTERS V2.2.x, the devctl command coordinates with all the cluster members to ensure any changes are made to all the naming databases together. In addition, when a new node joins an existing cluster, its naming database is updated to be the same as the naming databases of the existing members.
To support this new functionality, the format of the naming database in ptx/CLUSTERS V2.2.x differs from earlier releases. The new naming database has version numbers and timestamps which allow a cluster to determine which naming database is the most recent version. Also, the quorum disk, if configured, stores naming database version information. A ptx/CLUSTERS V2.2.x system can read both pre-ptx/CLUSTERS V2.2.x and ptx/CLUSTERS V2.2.x naming database formats. When a ptx/CLUSTERS V2.2.x system boots on a node with an earlier version of the naming database, it will rewrite the database to be in ptx/CLUSTERS V2.2.x format.
ptx/CLUSTERS uses a mechanism similar to that of cluster quorum to ensure that naming database copies are identical on all cluster nodes and to ensure that naming database changes are not lost during cluster node failures. Cluster quorum is based on the idea of an expected number of cluster members. The number of members expected starts at 2 (in ptx/CLUSTERS V2.2.x) and is increased whenever additional cluster members arrive. A quorum of members is a strict majority. If a quorum of members participate in a calculation, it is guaranteed there is no other independent set of cluster nodes which also comprise a quorum of members.
For naming databases, the logic is similar. The cluster maintains the number of naming database replicas it expects to participate in the naming database update protocols. If a set of nodes constitute a quorum of naming database replicas, they can use and update the naming database knowing that there are no other groups of cluster nodes doing the same thing and that changes in the naming database will be preserved across failures of the nodes.
A node synchronizes its own naming database with the naming databases of the rest of the cluster members when the rc2.d/S02deactivate script issues a devctl -A command. The command waits until a quorum of naming database replicas is present in the cluster. They then decide which naming database is the newest and all members synchronize their naming databases to that version. Subsequently, any changes made to the naming database from one node are simultaneously made on all the nodes. When a new node joins the cluster which already has a quorum of members, it synchronizes its naming database to the version determined by that quorum.
One important result of the naming database changes is that it is no longer possible to change the naming database by replacing the /etc/system/ndb file. Neither a copy of a naming database from a different machine or a copy of a naming database from the same machine will work correctly if it is simply copied into /etc/system/ndb. The only way the contents of a naming database file may be installed in the active naming database is by using the devctl -l command.
This release of ptx/CLUSTERS includes the following two utilities, which are installed in /sbin:
Each ndb file that is going to be compared needs to be copied to the local system, or you can use ndbcompall instead. ndbcompall locates all the nodes in the cluster, copies over the ndb files, and gives them to ndbcomp for comparison.
For more information about these utilities, see the ndbcomp and ndbcompall man pages.
For instructions on how to update ptx/CLUSTERS, DYNIX/ptx, and other IBM NUMA-Q products running on Symmetry systems, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
IBM NUMA-Q authorizes only IBM NUMA-Q personnel to perform initial cluster installations and to upgrade NUMA-Q 2000 clusters. IBM NUMA-Q Customer Support or Professional Services personnel who install new clusters or update NUMA-Q 2000 clusters should follow the procedures in the ptx/CLUSTERS V2.x Installer's Guide and in the DYNIX/ptx and Layered Products Software Release Notes for installation and configuration.
ptx/SVM V2.2.x cannot be used to manage shared storage on clusters containing more than 2 nodes. ptx/SVM can be used for mirroring root and primary swap on local disks on the nodes of 3- and 4-node clusters. See the ptx/SVM V2.2.1 Release Notes for more information about the limitations of ptx/SVM in 3- and 4-node clusters.
Normally, when changing a node ID in a cluster, you need to reboot only the node whose ID you are changing. However, because of a defect in the software (problem report 235185), after changing the ID of one node or more nodes, you need to reboot all nodes.
To change the node ID, follow these steps:
Issue the clustadm -P nodeid=value command, where value is the new node ID (an integer between 0 and 7, inclusive). Issue this command on each node whose ID you wish to change.
Shut down all cluster nodes. The recommended procedure is to first bring all the nodes to run-level 1, and then bring them to the firmware level.
Start the cluster nodes back up.
Failure to follow this procedure can cause the same node to appear multiple times in clustadm output and may cause the Lock Manager to hang.
In ptx/CLUSTERS V1.x, if you built a VTOC on a shared device from one of the nodes, the disk's slices were then available on all of the cluster nodes. In ptx/CLUSTERS V2.x, the remaining node(s) will not be aware of the existence of the VTOC slices if you build a VTOC on a shared device from only one of the nodes.
If you wish to place a disk containing a VTOC under ptx/SVM control and use the disk in a cluster, then you must assure that each member node's /etc/devtab file contains a VTOC entry for that disk. Then issue the devbuild command to create the virtual devices included in that VTOC on all nodes in the cluster.
If you build the VTOC for a disk on one node (where the disk will be recognized as a ``sliced'' disk), but not on the other node(s) (where the disk will be recognized as a ``simple'' disk), then the ptx/SVM shared disk groups will not match across the cluster and you will not be able to use them.
There are several situations in which it is necessary for the Integrity Manager to reboot a cluster member node. In these situations, the node has become unable to safely resume access to shared storage. The Integrity Manager invokes the kernel panic mechanism to prevent any further user-level activity that might require access to shared storage and to bring the node most rapidly back into cluster membership. The panic messages used, and their causes, are the following:
Taking this node out of the cluster, as some critical transition script has failed
One example of a transition-script failure that warrants a system shutdown is when the lmrecovery script fails. If lmrecovery fails, it could mean that the Lock Manager is disabled on all nodes of the cluster until the problem is fixed. When lmrecovery terminates abnormally on a node, that node is shut down and will normally reboot in order to restore the normal operation of the cluster.
Lost the qdisk to a partition node %d
This message indicates that a cluster with a quorum disk had CCI communication problems. The node that shut down lost connectivity with the other node(s) and when it read the quorum disk, found that it had been removed from the set of active member nodes.
Normally, a node that loses CCI communications enters a NO QUORUM state. While in this state, the node continues to monitor cluster and quorum disk states. If other nodes form a cluster without the disconnected node, quorum disk data will reflect the new cluster membership. The disconnectted node's internal state (for example, Lock Manager locks) is now invalid and the node will reboot itself. You must then address the communication problem(s) and reboot the node in order for it to again become an active member of the cluster.
Forcing a system panic - This node out of sync with the rest of the cluster
This panic message means the same as the previous panic message, except that the quorum disk is not involved. The node that shut down discovered through CCI communication that the other node(s) had formed a new cluster without it. Because its state is now invalid, the node shut down.
To remove a node from a cluster, follow these steps:
Shut down the node you wish to disconnect from the cluster and power it off.
Disconnect all shared storage from the node to be removed from the cluster.
Disconnect the node from the CCI networks.
Boot the node you wish to remove from the cluster. Go to single-user mode, either with the bootflags or by entering s at the Waiting for cluster membership, enter 's' to go to single-user mode prompt.
Through ptx/ADMIN, deinstall the ptx/CLUSTERS software. For information on how to deinstall software, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
ATTENTION To avoid destroying or corrupting data, do not remove the ptx/CLUSTERS software before detaching the node from all shared storage.
On the remaining nodes, reset the number of expected votes to equal the number of remaining nodes plus the quorum disk, if one is configured.
The ptx/CLUSTERS V2.2.1 documentation includes the following:
ptx/CLUSTERS Administration Guide
ptx/CLUSTERS Installer's Guide (NUMA-Q Customer Support group only)
This section lists open problems in this release of ptx/CLUSTERS. The numbers in parentheses identify the problems in the IBM NUMA-Q problem-tracking system.
When a node index is changed on one cluster node and the node is restarted, the unchanged node has two entries for the node whose index has changed. One entry is for the original node index, and the other entry is for the new node index.
Workaround. See the section in these release notes entitled "Changing Cluster Node ID" for information on how to change a node's index.
ptx/CTC menus in ptx/ADMIN are removed if an updated version of ptx/CLUSTERS is installed and ptx/CTC is not reinstalled.
Workaround. Always install ptx/CLUSTERS and ptx/CTC together. If you have already installed ptx/CLUSTERS, install ptx/CTC from the CD-ROM so that the menus will reappear.
When you use devctl to change the name of a CCI device, ptx/CLUSTERS does not know the name has changed.
Workaround. Use clustadm to deconfigure and reconfigure the CCI device.
The clustadm -C (configure quorum disk) and clustadm -D (deconfigure quorum disk) will hang when the node has lost quorum. The commands cannot be suspended or interrupted.
Workaround. Boot another node to restore quorum or reboot the node and only make quorum disk configuration changes while the node has quorum.
When a quorum disk is configured, the VTOC, if it is not already in place, is built for the device on remote nodes from the kernel. However, this does not update the list of built devices at the user level.
Workaround. Execute the devbuild command on the node where the devdestroy is failing. Doing so will update the list of built devices. Then do the devdestroy.
When a cluster member node is shut down, the other member nodes will continue to report that it is a member until VSYNC has completed its membership view change protocol. This protocol includes a delay known as "I/O drain time," during which the view change waits for completion of any I/O requests initiated before the node was shut down to either complete or fail. Thus it is possible, if this I/O drain time is long enough, for a node to be completely shut down and even powered off while clustadm on other nodes continues to report that the node is still a cluster member. clustadm will report that all links to the shut down node are DOWN during this delay in the membership view change protocol. This is the indication that the protocol is underway and should complete shortly.
Workaround. This is a transient problem which will correct itself after the I/O drain time delay has passed.
Conflict messages for several files appear when ptx/CLUSTERS is deinstalled through ptx/INSTALL. The conflict message for each file is:
/sbin/ptxinstall/remove: File to be deleted does not exist
Workaround. These messages can be safely ignored; the deinstallation will not fail because of them.
If one cluster node is running ptx/SDI and has a ptx/SDI device, then another node that attempts to join the cluster must also have ptx/SDI installed. Likewise, if an existing cluster node does not have ptx/SDI installed, then another node that attempts to join the cluster must also not have ptx/SDI installed. Otherwise, when the nodes attempt to synchronize their naming databases, the following error will occur:
devctl: Internal error 3 during NDB merge operation: Invalid argument devctl: unable to synchronize NDB: Invalid argument
Workaround. Ensure that ptx/SDI is either installed on all cluster nodes or on none.
When the devctl command is used to deconfigure or configure devices, the results may not be propagated to all nodes in the cluster if the devctl command is executed just before a cluster membership transition begins.
Workaround. Try not to configure or deconfigure devices when the cluster is transitioning. Use the /sbin/ndbcompall command to verify that the devctl command was propagated to all nodes. If it was not, shut down and reboot the nodes to resynchronize the cluster nodes' naming databases.