These release notes support ptx®/CLUSTERS V2.1.4. Read this document before you install or run this release of ptx/CLUSTERS.
This version of ptx/CLUSTERS can be used with the following products:
IBM® NUMA systems
Symmetry® 5000 systems
DYNIX/ptx® V4.4.10
ptx/SVM V2.1.4
ATTENTION ptx/SVM V2.1.4 cannot be used to manage shared storage on clusters containing more than 2 nodes. ptx/SVM can be used for mirroring root and primary swap on local disks on the nodes of 3- and 4-node clusters. See the ptx/SVM V2.1.4 Release Notes for more information about the limitations of ptx/SVM in 3- and 4-node clusters.
For product compatibility information on other products, such as ptx/LAN and ptx/TCP/IP, consult the DYNIX/ptx Release Notes.
This release of ptx/CLUSTERS supports the following configurations:
Up to 4-node IBM NUMA-Q® 2000 clusters (with Fibre-Channel Interconnection) or 2-node IBM NUMA 1000 clusters (with Fibre-Channel Interconnection).
2-node IBM NUMA-Q 2000/1000 "mixed" clusters (with Fibre-Channel Interconnection).
2-node Symmetry 5000 clusters (with standard QCIC SCSI-direct connection).
This release of ptx/CLUSTERS does not support ``mixed'' cluster configurations of Symmetry 5000 with IBM NUMA systems. This release also does not support the use of Fibre Channel Arbitrated Loop on IBM NUMA systems.
ptx/CLUSTERS V2.x is a major revision of ptx/CLUSTERS V1.x and includes the following improvements:
Higher capacity. ptx/CLUSTERS V2.x supports the SCI-based system architecture, including the use of Fibre Channel as the cluster's storage interconnect. This significantly increases the number of shared storage devices (disk and tape) supported by ptx/CLUSTERS.
Symmetry 5000 systems, which use standard QCIC SCSI direct interconnections, are also supported.
Higher availability. Fibre Channel allows multiple paths to disks, providing redundant components that mean higher availability.
Flexibility of configuration. ptx/CLUSTERS V2.x. is simpler to administer than previous versions of ptx/CLUSTERS. There are no special or separate names required for shared devices, no CMAs, and no maintenance mode.
Sharing model simplified. In ptx/CLUSTERS V2.x, shared storage device support has been both simplified and enhanced. This redesign includes integration with the new DYNIX/ptx autoconfiguration and device-naming protocol. It also includes support for on-line insertion and removal of any shared devices.
Cluster formation simplified. With ptx/CLUSTERS V2.x, a cluster is formed based on the ability of all nodes to communicate with one another, not on a predeclared shared storage subsystem. Because all storage (disk devices, tapes) may be physically connected in a Fibre Channel topology regardless of its intended usage, the sharing model in ptx/CLUSTERS V2.x is imposed by software and not based upon physical topology as it was in ptx/CLUSTERS V1.x. A node needs no shared storage to be a cluster member.
This section compares ptx/CLUSTERS V2.x with ptx/CLUSTERS V1.1 and V1.3 and details what is new, what has stayed the same, and what has been eliminated.
The following major features are new with ptx/CLUSTERS V2.x or have changed since previous releases of ptx/CLUSTERS. The ptx/CLUSTERS Administration Guide provides details about all of the features of ptx/CLUSTERS.
The CSCS component of ptx/CLUSTERS is a new feature that provides the basic mechanisms for reliable and coordinated communication among member nodes of a cluster. The CSCS replaces the Network and Cluster Management Area Monitor daemon (ncmd) of ptx/CLUSTERS V1.x with components (modules) inside the kernel. Features of the CSCS are the following:
Establishes communication links among potential cluster members.
Exchanges various node parameter values, such as node index, node votes, expected votes, and quorum disk information. Determines whether these values make the node an acceptable cluster member.
Determines whether a quorum exists for cluster formation.
Coordinates transition of members into and out of the cluster.
Notifies the Integrity Manager daemon whenever a transition has been successfully completed so that the daemon can run transition notification scripts with the changed set of members.
The mechanisms implemented by the CSCS completely replace those of the V1.x releases of ptx/CLUSTERS for connecting active members of the cluster and protecting against cluster partitioning. The CSCS replaces the disk-based CMA mechanism of ptx/CLUSTERS V1.x for cluster configuration and membership specification with reliable group broadcast communication services.
The Integrity Manager daemon still controls and coordinates the activities of the member nodes in the cluster. Although its role has been greatly reduced because of services provided by the CSCS, the Integrity Manager daemon provides the following functions:
Monitors and reports cluster status.
Handles configuration change requests.
Performs other control functions that affect the cluster as a whole.
In ptx/CLUSTERS V2.x, the Integrity Manager daemon relies on the communication it receives from the CSCS. Because of the CSCS layer, the architecture of the Integrity Manager has changed significantly. These changes include the following:
Consolidation of the ncmd and the limd. In ptx/CLUSTERS V1.x, two user-level daemons, ncmd and limd, implemented the Integrity Manager's functions. The CSCS handles most of the functions provided by the ncmd, and the remaining functions of the ncmd are split between portions of DYNIX/ptx and the Integrity Manager daemons.
Removal of the CMA. With the introduction of the CSCS, the CMA is no longer needed for communication and maintenance of cluster membership and state. The CSCS uses a communication media that consists of virtual circuits built over the dedicated LAN interfaces. The CSCS discovers nodes by probing on the dedicated LAN. The CSCS also manages cluster membership dynamically.
Elimination of maintenance mode. In ptx/CLUSTERS V1.x, the nodes that constituted the cluster and the LLI list were defined and stored in the CMA. Maintenance mode was needed to declare and manipulate these lists. In ptx/CLUSTERS V2.x, each node dynamically recognizes the presence of other nodes, making maintenance mode unnecessary.
The immbroker(1M) command has been replaced by a much simpler utility, clustadm(1M), to monitor and control a cluster. The following table compares the immbroker command to the clustadm command. Note that many of the options to immbroker are unnecessary in ptx/CLUSTERS V2.x.
immbroker Option |
clustadm Equivalent Option or Other Explanation |
-A listname |
None. ptx/CLUSTERS V2.x does not maintain shared, master, or LAN lists. |
-C listname |
None. Same as explanation for -A. |
-D listname |
None. Same as explanation for -A. |
-G listname |
None. Same as explanation for -A. |
-N nodename |
Use -m nodename for short output. Use -vm nodename for verbose output, which includes the node name, node index, number of quorum votes contributed by the node, and when it joined the cluster. Use -l or -vl for short or verbose output about the local node. |
-S |
None. To remove a node from a cluster, the node must be completely shut down. |
-Z |
None. In ptx/CLUSTERS V2.x, there is no clusters driver to enable devices to be sharable or shared. Any storage device which is accessible by more than one member node is sharable. |
-b |
None. See explanation for -Z. |
-c |
Use -m all for short output. Use -vm all for verbose output, which includes the node names, node indexes, number of quorum votes contributed by each node, and when each node joined the cluster. |
-d |
None. ptx/CLUSTERS V2.x supports the DYNIX/ptx V4.4.x devctl(1M) command for deconfiguring (spinning down) shareable and nonshareable devices. |
-e |
None. ptx/CLUSTERS V2.x does not maintain shared, master, or LAN lists. |
-f |
None. See explanation for -e. |
-i |
None. |
-l |
-l for short output. -vl for verbose output, which includes the node name, node index, number of quorum votes contributes by the node, and when it joined the cluster. |
-m |
None. |
-n |
-m all for short output. -vm all for verbose output, which includes the node names, node indexes, number of quorum votes contributed by each node, and when each node joined the cluster. |
-o objectname |
None. |
-p plexname |
None. |
-q |
-c (for short output) and -vc (for verbose output) each include ``quorum state'' information. There are no Integrity Manager states, such as Halted, Maintenance, and Normal. |
-r diskname |
None. In ptx/CLUSTERS V2.x, there is no clusters driver to enable devices to be sharable or shared; any device on the system may be sharable as a direct result of its physical proximity. |
-s |
None. In ptx/CLUSTERS V2.x, a node joins the cluster while it is booting, usually before it reaches single-user mode. |
-t |
None. There is no CMA concept in ptx/CLUSTERS V2.x. |
-u diskname |
None. ptx/CLUSTERS V2.x supports the DYNIX/ptx V4.4 devctl(1M) command for configuring (spinning up) sharable and nonshareable devices. |
-v |
Same as for immbroker. |
-w |
None. |
The clustadm command also lets you set kernel configuration parameters that provide bootstrap information. (During the installation process, you will be required to set these parameters.) The options are listed as follows:
You can optionally configure a quorum disk. The quorum disk concept is discussed in the next section.
ptx/CLUSTERS V2.x implements a quorum consensus algorithm that the CSCS uses to control cluster availability. This algorithm requires that a majority of the potential cluster nodes be fully connected to enable cluster operation. If the number of nodes available is one-half or fewer than the expected cluster membership, then none of the nodes will function as cluster members with access to shared storage. The expected cluster membership is specified by the user during initial cluster setup, and increments as new nodes join the cluster.
To avoid cessation of activity in a two-node cluster when a single node fails, a quorum disk can be designated as a virtual cluster member. Its ``vote,'' along with the remaining node's, enables the cluster to remain running (with a single node). The quorum disk, which is simply a single partition on a non-mirrored disk, is designated by the user after the cluster has formed.
The Low Latency Interconnect (LLI) has been renamed the Cluster Communications Interconnect (CCI). The CCI serves the same basic communication purpose as the LLI. The new name more accurately reflects the use and function of the interconnect.
The way errors are handled and logged has changed in ptx/CLUSTERS V2.x. All clusters warnings and errors are logged to ktlog. In addition, changes to the cluster membership are logged to /var/clusters/trans_log and changes to a node's application availability are logged to /var/clusters/avail_log.
ptx/CLUSTERS V2.x supports the use of Fibre Channel as the cluster storage interconnect on IBM NUMA systems, significantly increasing the number of shared storage devices. ptx/CLUSTERS V2.x also supports SCSI-based Symmetry systems.
In ptx/CLUSTERS V2.x, a local node can have the following states (or modes):
Unlike previous versions of ptx/CLUSTERS, V2.x does not have Normal, Halted, Maintenance, and InTransition modes.
Normal mode is replaced by ACTIVE mode.
Halted mode is replaced by NO_QUORUM mode, which the system automatically assigns to a node.
Maintenance mode has no equivalent in V2.x. A node does not need to be in a special state, with the remaining nodes halted, in order to perform configuration changes.
The InTransition state is no longer visible outside the kernel components of the ptx/CLUSTERS V2.x software. As in previous versions of ptx/CLUSTERS, InTransition mode occurs whenever cluster membership changes.
In ptx/CLUSTERS V2.x, cluster software startup is invoked much earlier in the system boot sequence of any node configured as a member of a cluster. A node on which ptx/CLUSTERS V2.x has been installed is expected to become a cluster member as soon as it boots. Cluster formation is attempted as early as possible in the boot sequence so that the nodes can provide the necessary votes for the cluster to reach and maintain quorum and participate in shared-device coordination protocols and other distributed decisions.
A node will wait until it becomes a cluster member before running the /etc/rc2.d scripts to go to multiuser mode.
The ptx/CLUSTERS V1.x shared-device naming and access mechanism (SVTOC), which was layered over the DYNIX/ptx device names and services, has been eliminated. ptx/CLUSTERS V2.x now has mechanisms for device identification and naming that support device access through Fibre Channel Interconnects. Each shared device is assigned a unique identifier, which forms the basis for access locking on each device. These identifiers are guaranteed to be recognized by all nodes of a cluster. The identifiers are unique and remain consistent if the device is connected to a cluster or to a single node.
Since ptx/CLUSTERS V2.x eliminates the separate shared device namespace (shqd--) and its associated shared object lists, the process of configuring and setting up a cluster is greatly simplified.
The following features were part of ptx/CLUSTERS V1.x and are also part of ptx/CLUSTERS V2.x. Though they may have been modified for use with the new software, their external interfaces are unchanged.
The Lock Manager is a kernel-level component that supports coordination of application access to shared resources. Lock Manager errors in ptx/CLUSTERS V2.x are reported in the new system-standard format.
You can assign a unique domain name for each application that you plan to run on a cluster to avoid potential resource-name conflicts.
In ptx/CLUSTERS V2.x, it is no longer necessary to make all cluster configuration changes on one node while the other cluster members are halted. For the following reasons, ptx/CLUSTERS V2.x reliably maintains synchronization among all cluster members without interruption of service:
The amount and types of state information have been simplified.
The communication mechanisms, implemented through the CSCS, guarantee synchronous communication among cluster members.
The Active Monitor has been replaced in ptx/CLUSTERS V2.x by a more dynamic mechanism for selecting the node responsible for coordinating each cluster transition. This process is transparent to the administrator and the user.
In ptx/CLUSTERS V2.x, all usage of the CMA for communication and maintenance of cluster membership and state has been replaced by the CSCS.
Shared device names (sh---) are no longer required for sharable devices. Users who wish to migrate from ptx/CLUSTERS V1.x to V2.x, however, can retain the existing sh--- device names if they wish.
For instructions on how to update ptx/CLUSTERS, DYNIX/ptx, and other products running on Symmetry systems, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
Only IBM NUMA personnel are authorized to perform initial cluster installations and to upgrade IBM NUMA 2000 clusters. Customer Support or Professional Services personnel who install new clusters or update IBM NUMA 2000 clusters should follow the procedures in the ptx/CLUSTERS V2.x Installer's Guide and in the DYNIX/ptx and Layered Products Software Installation Release Notes for installation and configuration.
Normally, when changing node IDs in a cluster, you need to reboot only the node whose ID you are changing. However, because of a defect in the software (problem report 235185), after changing the ID of one node or more nodes, you need to reboot all nodes.
To change the node ID, follow these steps:
Issue the clustadm -P nodeid=value command, where value is the new node ID (an integer between 0 and 7, inclusive). Issue this command on each node whose ID you wish to change.
Shut down all cluster nodes. The recommended procedure is to first bring all the nodes to run-level 1, and then bring them to the firmware level.
Start the cluster nodes back up.
Failure to follow this procedure can cause the same node to appear multiple times in clustadm output and may cause the Lock Manager to hang.
If you wish to place a disk containing a VTOC under ptx/SVM control and use the disk in a cluster, then you must assure that each member node's /etc/devtab file contains a VTOC entry for that disk and then issue the devbuild command to create the virtual devices included in that VTOC on all nodes in the cluster.
If you build the VTOC for a disk on one node (where the disk will be recognized as a ``sliced'' disk), but not on the other node(s) (where the disk will be recognized as a ``simple'' disk), then the ptx/SVM shared disk groups will not match across the cluster and you will not be able to use them.
In ptx/CLUSTERS V1.x, if you built a VTOC on a shared device from one of the nodes, the disk's slices were then available on all of the cluster nodes. In ptx/CLUSTERS V2.x, the remaining node(s) will not be aware of the existence of the VTOC slices if you build a VTOC on a shared device from only one of the nodes.
There are several situations in which it is necessary for the Integrity Manager to reboot a cluster member node. In these situations, the node has become unable to safely resume access to shared storage. The Integrity Manager invokes the kernel panic mechanism to prevent any further user-level activity that might require access to shared storage and to bring the node most rapidly back into cluster membership. The panic messages used, and their causes, are the following:
Taking this node out of the cluster, as some critical transition script has failed
One example of a transition-script failure that warrants a system shutdown is when the lmrecovery script fails. If lmrecovery fails, it could mean that the Lock Manager is disabled on all nodes of the cluster until the problem is fixed. When lmrecovery terminates abnormally on a node, that node is shut down and will normally reboot in order to restore the normal operation of the cluster.
Lost the qdisk to a partition node %d
This message indicates that a cluster with a quorum disk had CCI communication problems. The node that shut down lost connectivity with the other node(s) and when it read the quorum disk, found that it had been removed from the set of active member nodes.
Normally, a node that loses CCI communications enters a NO QUORUM state. However, in this situation, if the node enters a NO QUORUM state and the other node(s) are rebooted, then there is the potential for the node to regain QUORUM with its old state. This would cause data corruption, so the node is shut down instead. You must then address the communication problem(s) and reboot the node in order for it to again become an active member of the cluster.
Forcing a system panic - This node out of sync with the rest of the cluster
This panic message means the same as the previous panic message, except that the quorum disk is not involved. The node that shut down discovered through CCI communication that the other node(s) had formed a new cluster without it. Because its state is now invalid, the node shut down.
To remove a node from a cluster, follow these steps:
Shut down the node you wish to disconnect from the cluster and power it off.
Disconnect all shared storage from the node to be removed from the cluster.
Disconnect the node from the CCI networks.
Boot the node you wish to remove from the cluster. Go to single-user mode, either with the bootflags or by entering s at the Waiting for cluster membership, enter 's' to go to single-user mode prompt.
Through ptx/ADMIN, deinstall the ptx/CLUSTERS software. For information on how to deinstall software, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
ATTENTION To avoid destroying or corrupting data, do not remove the ptx/CLUSTERS software before detaching the node from all shared storage.
On the remaining nodes, reset the number of expected votes to equal the number of remaining nodes plus the quorum disk, if one is configured.
The following ptx/CLUSTERS V2.1.4 documentation is available on the online HTML documentation CD-ROM:
This section lists the following problem report summaries:
The numbers in parentheses identify the problems in the problem-tracking system.
This release of ptx/CLUSTERS contains fixes for the following software defects.
(243482) ptx/CLUSTERS used a maximum number of CPUs that was too low (32), which could lead to memory corruption.
(243502) The gms_node_left() routine did not handle memory allocation failures, resulting in system panics.
(243522) A memory leak in the muts_open() function occurred in low memory situations.
(243523) A memory leak in the start_heartbeat() function occurred in low memory situations.
(243526) When the le_t structure was cleared, the le_dl pointer was overwritten and led to a memory leak.
(244983) The ktlog filled with lock manager messages when the system was under a heavy load.
(247355) The information available through the clusters crash library on CCI links was not formatted similarly to the output of clustadm -vi. It now is.
(248077) Resilvering in a cluster was too slow.
(248800) A cluster node hung during a transition and the imd was not running.
(249546) The system panicked on an assert_debug in the SLM deadlock code.
(249704) When ptx/CLUSTERS was deinstalled through ptx/INSTALL, a conflict message appeared regarding the process order.
(250448) The pullupmsg() routine was given a bad message to process by dgstr_in(), resulting in a system panic.
(250743) The ASSERT function was incorrectly written and caused an unnecessary panic.
(252107) During a scratch installation of ptx/CLUSTERS, after booting off of the CD-ROM to a different disk, the cluster installation did not ask for cluster parameters.
(252744) A CFS hang occurred when a lock asked for a convert from "nl" to "pr."
(253120) A panic occurred in srlk_open().
(253300) The macros QMON_VALID_EVENT and QMON_VALID_STATE were not defined properly.
(254030) A cluster hung during the boot process when all four nodes were rebooted at once.
(254192) vysnc node block information was not updated when the number of evotes was changed.
(254655) The system panicked with the message: PANIC: ../io/slm/slm_cremote.c:ABORT with ZERO sequence:line.
(254658) An assertion failed with the following message: (slk->slk_flags & (SLK_TIMEOUT|SLK_TIMEOUTSUSP)) == SLK _TIMEOUT.
(254906) A filesystem hang occurred during a cluster reboot.
(254943) vxconfigd never returned from lm_convert call.
(255084) An assertion failure occurred in crlk_restate() during a cluster up/down transition.
This section lists open problems in this release of ptx/CLUSTERS.
When a node index is changed on one cluster node and the node is restarted, the unchanged node has two entries for the node whose index has changed. One entry is for the original node index, and the other entry is for the new node index.
Workaround. See the section in these release notes entitled "Changing Cluster Node ID" for information on how to change a node's index.
ptx/CTC menus in ptx/ADMIN are removed if an updated version of ptx/CLUSTERS is installed and ptx/CTC is not reinstalled.
Workaround. Always install ptx/CLUSTERS and ptx/CTC together. If you have already installed ptx/CLUSTERS, install ptx/CTC from the CD-ROM so that the menus will reappear.
In a single-node cluster with a quorum disk configured, attempting to boot to single-user mode by changing the initdefault entry in /etc/inittab caused the node to loop with the following message:
Cannot satisfy request to go to run-level 1 because this node
does not have cluster quorum. Going to firmware instead.
To get to single-user mode, reboot from firmware.
Workaround. Do not set is:1:initdefault: in /etc/inittab. Instead, use the bootflag option to specify single-user mode.
During the upgrade procedure from ptx/CLUSTERS V1.x to ptx/CLUSTERS V2.x, the following error may be returned from the parser:
Creating ptx/CLUSTERS devices ...
Adding cscs to /installmnt/etc/services ...
Bad format for immbroker -G shared output in line 3
Line = disk
Workaround. None. This message can be safely ignored.
When you use devctl to change the name of a CCI device, ptx/CLUSTERS does not know the name has changed.
Workaround. Use clustadm to deconfigure and reconfigure the CCI device.
The clustadm -C (configure quorum disk) and clustadm -D (deconfigure quorum disk) commands may cause a node to hang when the node has lost quorum. The commands cannot be suspended or interrupted.
Workaround. Reboot the node and only make quorum disk configuration changes while the node has quorum.
A system panic may result on large systems if a memory allocation failure within ptx/CLUSTERS occurs.
Workaround. Reboot the system.
When a quorum disk is configured, the VTOC, if it is not already in place, is built for the device on remote nodes from the kernel. However, this does not update the list of built devices at the user level.
Workaround. Execute the devbuild command on the node where the devdestroy is failing. Doing so will update the list of built devices. Then do the devdestroy.