These release notes support ptx®/CLUSTERS V2.1.3. Read this document before you install or run this release of ptx/CLUSTERS.
This version of ptx/CLUSTERS can be used with the following products:
NUMA® systems
Symmetry® 5000 systems
DYNIX/ptx® V4.4.9 or later
ptx/SVM V2.1.3
ATTENTION ptx/SVM V2.1.3 cannot be used to manage shared storage on clusters containing more than 2 nodes. ptx/SVM can be used for mirroring root and primary swap on local disks on the nodes of 3- and 4-node clusters. See the ptx/SVM V2.1.3 Release Notes for more information about the limitations of ptx/SVM in 3- and 4-node clusters.
For product compatibility information on other products, such as ptx/LAN and ptx/TCP/IP, consult the DYNIX/ptx Release Notes.
If you wish to use Oracle® Parallel Server, we recommend Oracle V7.3.2.3 ELF or higher, and Oracle V8.0.4 or higher.
This release of ptx/CLUSTERS supports the following configurations:
Up to 4-node NUMA 2000 clusters (with Fibre-Channel Interconnection) or 2-node NUMA 1000 clusters (with Fibre-Channel Interconnection).
2-node NUMA 2000/NUMA 1000 "mixed" clusters (with Fibre-Channel Interconnection).
2-node Symmetry 5000 clusters (with standard QCIC SCSI-direct connection).
This release of ptx/CLUSTERS does not support ``mixed'' cluster configurations of Symmetry 5000 with NUMA systems. This release also does not support the use of Fibre Channel Arbitrated Loop on NUMA systems.
This release of ptx/CLUSTERS includes the following two utilities, which are installed in /sbin:
Each ndb file that is going to be compared needs to be copied to the local system, or you can use ndbcompall instead. ndbcompall locates all the nodes in the cluster, copies over the ndb files, and gives them to ndbcomp for comparison.
For more information about these utilities, see the ndbcomp and ndbcompall man pages.
This release of ptx/CLUSTERS contains fixes for the following software defects:
(237374) A node with a corrupt clustcfg file was allowed to join the cluster.
(239722) When there was a problem doing I/O to the quorum disk, the cluster retained ownership of the quorum disk long past when ownership should have been relinquished.
(240594) A method did not exist for detecting conflicting entries for any shared devices in the naming databases on different nodes of a cluster. The ndbcomp script now exists for this purpose.
(242070) The clustadm -P command did not issue a warning when an invalid parameter was specified.
(242571) clustadm did not issue a warning when an invalid password was specified.
(245061) clust_time did not process its arguments correctly, and it was not possible to use the -v or -d options at all.
(245391)There is a race between lmrecovery and killall (run by init) when the cluster is in recovery and a node is shutting down. The cluster transition can hang in lmrecovery.
(245624) getservbyname() failed, and the error report logged by ptx/CLUSTERS CLOG_SEND did not include either errno or the DNS completion status value, h_errno.
(246352) ptx/INSTALL did not prompt for the node index.
(248715) On a two-node cluster, the Lock Manager stalled while waiting for directory bidding and statements to complete on remote nodes.
(248891) The default output for the lmdomain command when it is issued with no options is now equivalent to that of lmdomain -l.
(248893) The default output for the clustadm command when it is issued with no options is now clustadm -cvm all.
(248914) Extremely long delays were seen during node joins and reboots.
(249623) lmdomain returned a confusing error message when it terminated after finding a problem with one of the domains listed in the lmconfig file.
(250147) The lmdomain -D command failed to list all the existing domains.
ptx/CLUSTERS V2.x is a major revision of ptx/CLUSTERS V1.x and includes the following improvements:
Higher capacity. ptx/CLUSTERS V2.x supports the SCI-based system architecture, including the use of Fibre Channel as the cluster's storage interconnect. This significantly increases the number of shared storage devices (disk and tape) supported by ptx/CLUSTERS.
Symmetry 5000 systems, which use standard QCIC SCSI direct interconnections, are also supported.
Higher availability. Fibre Channel allows multiple paths to disks, providing redundant components that mean higher availability.
Flexibility of configuration. ptx/CLUSTERS V2.x. is simpler to administer than previous versions of ptx/CLUSTERS. There are no special or separate names required for shared devices, no CMAs, and no maintenance mode.
Sharing model simplified. In ptx/CLUSTERS V2.x, shared storage device support has been both simplified and enhanced. This redesign includes integration with the new DYNIX/ptx autoconfiguration and device-naming protocol. It also includes support for on-line insertion and removal of any shared devices.
Cluster formation simplified. With ptx/CLUSTERS V2.x, a cluster is formed based on the ability of all nodes to communicate with one another, not on a predeclared shared storage subsystem. Because all storage (disk devices, tapes) may be physically connected in a Fibre Channel topology regardless of its intended usage, the sharing model in ptx/CLUSTERS V2.x is imposed by software and not based upon physical topology as it was in ptx/CLUSTERS V1.x. A node needs no shared storage to be a cluster member.
This section compares ptx/CLUSTERS V2.x with ptx/CLUSTERS V1.1 and V1.3 and details what is new, what has stayed the same, and what has been eliminated.
The following major features are new with ptx/CLUSTERS V2.x or have changed since previous releases of ptx/CLUSTERS. The ptx/CLUSTERS Administration Guide provides details about all of the features of ptx/CLUSTERS.
The CSCS component of ptx/CLUSTERS is a new feature that provides the basic mechanisms for reliable and coordinated communication among member nodes of a cluster. The CSCS replaces the Network and Cluster Management Area Monitor daemon (ncmd) of ptx/CLUSTERS V1.x with components (modules) inside the kernel. Features of the CSCS are the following:
Establishes communication links among potential cluster members.
Exchanges various node parameter values, such as node index, node votes, expected votes, and quorum disk information. Determines whether these values make the node an acceptable cluster member.
Determines whether a quorum exists for cluster formation.
Coordinates transition of members into and out of the cluster.
Notifies the Integrity Manager daemon whenever a transition has been successfully completed so that the daemon can run transition notification scripts with the changed set of members.
The mechanisms implemented by the CSCS completely replace those of the V1.x releases of ptx/CLUSTERS for connecting active members of the cluster and protecting against cluster partitioning. The CSCS replaces the disk-based CMA mechanism of ptx/CLUSTERS V1.x for cluster configuration and membership specification with reliable group broadcast communication services.
The Integrity Manager daemon still controls and coordinates the activities of the member nodes in the cluster. Although its role has been greatly reduced because of services provided by the CSCS, the Integrity Manager daemon provides the following functions:
Monitors and reports cluster status.
Handles configuration change requests.
Performs other control functions that affect the cluster as a whole.
In ptx/CLUSTERS V2.x, the Integrity Manager daemon relies on the communication it receives from the CSCS. Because of the CSCS layer, the architecture of the Integrity Manager has changed significantly. These changes include the following:
Consolidation of the ncmd and the limd. In ptx/CLUSTERS V1.x, two user-level daemons, ncmd and limd, implemented the Integrity Manager's functions. The CSCS handles most of the functions provided by the ncmd, and the remaining functions of the ncmd are split between portions of DYNIX/ptx and the Integrity Manager daemons.
Removal of the CMA. With the introduction of the CSCS, the CMA is no longer needed for communication and maintenance of cluster membership and state. The CSCS uses a communication media that consists of virtual circuits built over the dedicated LAN interfaces. The CSCS discovers nodes by probing on the dedicated LAN. The CSCS also manages cluster membership dynamically.
Elimination of maintenance mode. In ptx/CLUSTERS V1.x, the nodes that constituted the cluster and the LLI list were defined and stored in the CMA. Maintenance mode was needed to declare and manipulate these lists. In ptx/CLUSTERS V2.x, each node dynamically recognizes the presence of other nodes, making maintenance mode unnecessary.
The immbroker(1M) command has been replaced by a much simpler utility, clustadm(1M), to monitor and control a cluster. The following table compares the immbroker command to the clustadm command. Note that many of the options to immbroker are unnecessary in ptx/CLUSTERS V2.x.
immbroker Option |
clustadm Equivalent Option or Other Explanation |
-A listname |
None. ptx/CLUSTERS V2.x does not maintain shared, master, or LAN lists. |
-C listname |
None. Same as explanation for -A. |
-D listname |
None. Same as explanation for -A. |
-G listname |
None. Same as explanation for -A. |
-N nodename |
Use -m nodename for short output. Use -vm nodename for verbose output, which includes the node name, node index, number of quorum votes contributed by the node, and when it joined the cluster. Use -l or -vl for short or verbose output about the local node. |
-S |
None. To remove a node from a cluster, the node must be completely shut down. |
-Z |
None. In ptx/CLUSTERS V2.x, there is no clusters driver to enable devices to be sharable or shared. Any storage device which is accessible by more than one member node is sharable. |
-b |
None. See explanation for -Z. |
-c |
Use -m all for short output. Use -vm all for verbose output, which includes the node names, node indexes, number of quorum votes contributed by each node, and when each node joined the cluster. |
-d |
None. ptx/CLUSTERS V2.x supports the DYNIX/ptx V4.4.x devctl(1M) command for deconfiguring (spinning down) shareable and nonshareable devices. |
-e |
None. ptx/CLUSTERS V2.x does not maintain shared, master, or LAN lists. |
-f |
None. See explanation for -e. |
-i |
None. |
-l |
-l for short output. -vl for verbose output, which includes the node name, node index, number of quorum votes contributes by the node, and when it joined the cluster. |
-m |
None. |
-n |
-m all for short output. -vm all for verbose output, which includes the node names, node indexes, number of quorum votes contributed by each node, and when each node joined the cluster. |
-o objectname |
None. |
-p plexname |
None. |
-q |
-c (for short output) and -vc (for verbose output) each include ``quorum state'' information. There are no Integrity Manager states, such as Halted, Maintenance, and Normal. |
-r diskname |
None. In ptx/CLUSTERS V2.x, there is no clusters driver to enable devices to be sharable or shared; any device on the system may be sharable as a direct result of its physical proximity. |
-s |
None. In ptx/CLUSTERS V2.x, a node joins the cluster while it is booting, usually before it reaches single-user mode. |
-t |
None. There is no CMA concept in ptx/CLUSTERS V2.x. |
-u diskname |
None. ptx/CLUSTERS V2.x supports the DYNIX/ptx V4.4 devctl(1M) command for configuring (spinning up) sharable and nonshareable devices. |
-v |
Same as for immbroker. |
-w |
None. |
The clustadm command also lets you set kernel configuration parameters that provide bootstrap information. (During the installation process, you will be required to set these parameters.) The options are listed as follows:
You can optionally configure a quorum disk. The quorum disk concept is discussed in the next section.
ptx/CLUSTERS V2.x implements a quorum consensus algorithm that the CSCS uses to control cluster availability. This algorithm requires that a majority of the potential cluster nodes be fully connected to enable cluster operation. If the number of nodes available is one-half or fewer than the expected cluster membership, then none of the nodes will function as cluster members with access to shared storage. The expected cluster membership is specified by the user during initial cluster setup, and increments as new nodes join the cluster.
To avoid cessation of activity in a two-node cluster when a single node fails, a quorum disk can be designated as a virtual cluster member. Its ``vote,'' along with the remaining node's, enables the cluster to remain running (with a single node). The quorum disk, which is simply a single partition on a non-mirrored disk, is designated by the user after the cluster has formed.
The Low Latency Interconnect (LLI) has been renamed the Cluster Communications Interconnect (CCI). The CCI serves the same basic communication purpose as the LLI. The new name more accurately reflects the use and function of the interconnect.
The way errors are handled and logged has changed in ptx/CLUSTERS V2.x. All clusters warnings and errors are logged to ktlog. In addition, changes to the cluster membership are logged to /var/clusters/trans_log and changes to a node's application availability are logged to /var/clusters/avail_log.
ptx/CLUSTERS V2.x supports the use of Fibre Channel as the cluster storage interconnect on NUMA systems, significantly increasing the number of shared storage devices. ptx/CLUSTERS V2.x also supports SCSI-based Symmetry systems.
In ptx/CLUSTERS V2.x, a local node can have the following states (or modes):
Unlike previous versions of ptx/CLUSTERS, V2.x does not have Normal, Halted, Maintenance, and InTransition modes.
Normal mode is replaced by ACTIVE mode.
Halted mode is replaced by NO_QUORUM mode, which the system automatically assigns to a node.
Maintenance mode has no equivalent in V2.x. A node does not need to be in a special state, with the remaining nodes halted, in order to perform configuration changes.
The InTransition state is no longer visible outside the kernel components of the ptx/CLUSTERS V2.x software. As in previous versions of ptx/CLUSTERS, InTransition mode occurs whenever cluster membership changes.
In ptx/CLUSTERS V2.x, cluster software startup is invoked much earlier in the system boot sequence of any node configured as a member of a cluster. A node on which ptx/CLUSTERS V2.x has been installed is expected to become a cluster member as soon as it boots. Cluster formation is attempted as early as possible in the boot sequence so that the nodes can provide the necessary votes for the cluster to reach and maintain quorum and participate in shared-device coordination protocols and other distributed decisions.
A node will wait until it becomes a cluster member before running the /etc/rc2.d scripts to go to multiuser mode.
The ptx/CLUSTERS V1.x shared-device naming and access mechanism (SVTOC), which was layered over the DYNIX/ptx device names and services, has been eliminated. ptx/CLUSTERS V2.x now has mechanisms for device identification and naming that support device access through Fibre Channel Interconnects. Each shared device is assigned a unique identifier, which forms the basis for access locking on each device. These identifiers are guaranteed to be recognized by all nodes of a cluster. The identifiers are unique and remain consistent if the device is connected to a cluster or to a single node.
Since ptx/CLUSTERS V2.x eliminates the separate shared device namespace (shqd--) and its associated shared object lists, the process of configuring and setting up a cluster is greatly simplified.
The following features were part of ptx/CLUSTERS V1.x and are also part of ptx/CLUSTERS V2.x. Though they may have been modified for use with the new software, their external interfaces are unchanged.
The Lock Manager is a kernel-level component that supports coordination of application access to shared resources. Lock Manager errors in ptx/CLUSTERS V2.x are reported in the new system-standard format.
You can assign a unique domain name for each application that you plan to run on a cluster to avoid potential resource-name conflicts.
In ptx/CLUSTERS V2.x, it is no longer necessary to make all cluster configuration changes on one node while the other cluster members are halted. For the following reasons, ptx/CLUSTERS V2.x reliably maintains synchronization among all cluster members without interruption of service:
The amount and types of state information have been simplified.
The communication mechanisms, implemented through the CSCS, guarantee synchronous communication among cluster members.
The Active Monitor has been replaced in ptx/CLUSTERS V2.x by a more dynamic mechanism for selecting the node responsible for coordinating each cluster transition. This process is transparent to the administrator and the user.
In ptx/CLUSTERS V2.x, all usage of the CMA for communication and maintenance of cluster membership and state has been replaced by the CSCS.
Shared device names (sh---) are no longer required for sharable devices. Users who wish to migrate from ptx/CLUSTERS V1.x to V2.x, however, can retain the existing sh--- device names if they wish.
For instructions on how to update ptx/CLUSTERS, DYNIX/ptx, and other products running on Symmetry systems, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
Only IBM NUMA personnel are authorized to perform initial cluster installations and to upgrade NUMA 2000 clusters. Customer Support or Professional Services personnel who install new clusters or update NUMA 2000 clusters should follow the procedures in the ptx/CLUSTERS V2.x Installer's Guide and in the DYNIX/ptx and Layered Products Software Release Notes for installation and configuration.
Normally, when changing node IDs in a cluster, you need to reboot only the node whose ID you are changing. However, because of a defect in the software (problem report 235185), after changing the ID of one node or more nodes, you need to reboot all nodes.
To change the node ID, follow these steps:
Issue the clustadm -P nodeid=value command, where value is the new node ID (an integer between 0 and 7, inclusive). Issue this command on each node whose ID you wish to change.
Shut down all cluster nodes. The recommended procedure is to first bring all the nodes to run-level 1, and then bring them to the firmware level.
Start the cluster nodes back up.
Failure to follow this procedure can cause the same node to appear multiple times in clustadm output and may cause the Lock Manager to hang.
If you wish to place a disk containing a VTOC under ptx/SVM control and use the disk in a cluster, then you must assure that each member node's /etc/devtab file contains a VTOC entry for that disk and then issue the devbuild command to create the virtual devices included in that VTOC on all nodes in the cluster.
If you build the VTOC for a disk on one node (where the disk will be recognized as a ``sliced'' disk), but not on the other node(s) (where the disk will be recognized as a ``simple'' disk), then the ptx/SVM shared disk groups will not match across the cluster and you will not be able to use them.
In ptx/CLUSTERS V1.x, if you built a VTOC on a shared device from one of the nodes, the disk's slices were then available on all of the cluster nodes. In ptx/CLUSTERS V2.x, the remaining node(s) will not be aware of the existence of the VTOC slices if you build a VTOC on a shared device from only one of the nodes.
There are several situations in which it is necessary for the Integrity Manager to reboot a cluster member node. In these situations, the node has become unable to safely resume access to shared storage. The Integrity Manager invokes the kernel panic mechanism to prevent any further user-level activity that might require access to shared storage and to bring the node most rapidly back into cluster membership. The panic messages used, and their causes, are the following:
Taking this node out of the cluster, as some critical transition script has failed
One example of a transition-script failure that warrants a system shutdown is when the lmrecovery script fails. If lmrecovery fails, it could mean that the Lock Manager is disabled on all nodes of the cluster until the problem is fixed. When lmrecovery terminates abnormally on a node, that node is shut down and will normally reboot in order to restore the normal operation of the cluster.
Lost the qdisk to a partition node %d
This message indicates that a cluster with a quorum disk had CCI communication problems. The node that shut down lost connectivity with the other node(s) and when it read the quorum disk, found that it had been removed from the set of active member nodes.
Normally, a node that loses CCI communications enters a NO QUORUM state. However, in this situation, if the node enters a NO QUORUM state and the other node(s) are rebooted, then there is the potential for the node to regain QUORUM with its old state. This would cause data corruption, so the node is shut down instead. You must then address the communication problem(s) and reboot the node in order for it to again become an active member of the cluster.
Forcing a system panic - This node out of sync with the rest of the cluster
This panic message means the same as the previous panic message, except that the quorum disk is not involved. The node that shut down discovered through CCI communication that the other node(s) had formed a new cluster without it. Because its state is now invalid, the node shut down.
To remove a node from a cluster, follow these steps:
Shut down the node you wish to disconnect from the cluster and power it off.
Disconnect all shared storage from the node to be removed from the cluster.
Disconnect the node from the CCI networks.
Boot the node you wish to remove from the cluster. Go to single-user mode, either with the bootflags or by entering s at the Waiting for cluster membership, enter 's' to go to single-user mode prompt.
Through ptx/ADMIN, deinstall the ptx/CLUSTERS software. For information on how to deinstall software, see the DYNIX/ptx and Layered Products Software Installation Release Notes.
ATTENTION To avoid destroying or corrupting data, do not remove the ptx/CLUSTERS software before detaching the node from all shared storage.
On the remaining nodes, reset the number of expected votes to equal the number of remaining nodes plus the quorum disk, if one is configured.
The following ptx/CLUSTERS V2.1.3 documentation is available on the online HTML documentation CD-ROM:
This section lists open problems in this release of ptx/CLUSTERS. The numbers in parentheses identify the problems in the problem-tracking system.
When a node index is changed on one cluster node and the node is restarted, the unchanged node has two entries for the node whose index has changed. One entry is for the original node index, and the other entry is for the new node index.
Workaround. See the section in these release notes entitled "Changing Cluster Node ID" for information on how to change a node's index.
ptx/CTC menus in ptx/ADMIN are removed if an updated version of ptx/CLUSTERS is installed and ptx/CTC is not reinstalled.
Workaround. Always install ptx/CLUSTERS and ptx/CTC together. If you have already installed ptx/CLUSTERS, install ptx/CTC from the CD-ROM so that the menus will reappear.
In a single-node cluster with a quorum disk configured, attempting to boot to single-user mode by changing the initdefault entry in /etc/inittab caused the node to loop with the following message:
Cannot satisfy request to go to run-level 1 because this node
does not have cluster quorum. Going to firmware instead.
To get to single-user mode, reboot from firmware.
Workaround. Do not set is:1:initdefault: in /etc/inittab. Instead, use the bootflag option to specify single-user mode.
During the upgrade procedure from ptx/CLUSTERS V1.x to ptx/CLUSTERS V2.x, the following error may be returned from the parser:
Creating ptx/CLUSTERS devices ...
Adding cscs to /installmnt/etc/services ...
Bad format for immbroker -G shared output in line 3
Line = disk
Workaround. None. This message can be safely ignored.
When you use devctl to change the name of a CCI device, ptx/CLUSTERS does not know the name has changed.
Workaround. Use clustadm to deconfigure and reconfigure the CCI device.
The clustadm -C (configure quorum disk) and clustadm -D (deconfigure quorum disk) commands may cause a node to hang when the node has lost quorum. The commands cannot be suspended or interrupted.
Workaround. Reboot the node and only make quorum disk configuration changes while the node has quorum.
A system panic may result on large systems if a memory allocation failure within ptx/CLUSTERS occurs.
Workaround. Reboot the system.
When a quorum disk is configured, the VTOC, if it is not already in place, is built for the device on remote nodes from the kernel. However, this does not update the list of built devices at the user level.
Workaround. Execute the devbuild command on the node where the devdestroy is failing. Doing so will update the list of built devices. Then do the devdestroy.
This problem only occurs if you are running a manufacturing kernel. When the system is under a heavy workload, the ktlog may fill with tracing messages from the Lock Manager. These messages look like the following:
386db400 00:00:00 tolog/nocons/printf q0/e3/p14086 ptx/CLUSTERS V2.1.3 MUTS
muts_health.c #224 MUTS_receive event=2033 debug: Rcvd from 0 on
0x3e15a8c0 GSN = 1896913
Workaround. Archive the ktlog regularly. Restrict use of the manufacturing kernel.