Chapter 11
Rolling Upgrade for Symmetry 5000 Clusters


Getting Ready


Software Versions

Rolling upgrades are supported from DYNIX/ptx V4.4.4, V4.4.6, V4.4.7, 4.4.8, and V4.5.1 only. Software must be at the version levels indicated in Table 11-1. Use the ptx/ADMIN menu system to view the list of software packages currently installed: System Administration -> Software Management -> List Software Packages. Alternatively, view the /etc/versionlog file for the most recent installation dates and versions.

Table 11-1. Software Versions Before a Rolling Upgrade

Product

V4.4.4 Stack

V4.4.6 Stack

V4.4.7/4.4.8 Stack

V4.5.1 Stack

DYNIX/ptx

V4.4.4

V4.4.6

V4.4.7

V4.5.1

ptx/BaseComms

V1.1.1

V1.1.1

V1.1.2

V1.2.0

ptx/CLUSTERS

V2.1.1

V2.1.2

V2.1.3

V2.2.1

ptx/SVM

V2.1.1

V2.1.2

V2.1.3

V2.2.1

ptx/TCP/IP

V4.5.1

V4.5.2

V4.5.3

V4.6.1

ptx/LAN

V4.5.4

V4.6.1

V4.6.2

V4.7.1

ptx/SPDRIVERS

V2.2.0

V2.3.0

V2.3.0

V3.1.0

ptx/CTC

V1.1.2

V1.1.2

V1.1.2

V1.1.3

ptx/CFS

V1.0.2

V1.0.3

V1.0.4

V1.1.1

ptx/RAID ( CLARiiON)

V2.0.4

V2.0.4

V2.0.5

V2.1.0



Pre-Installation Tasks

Complete the pre-installation tasks described in Part 1 of the "Upgrade Checklist for a Single Node" in Chapter 1. Be sure to back up the ptx/SVM database as described in Chapter 2.


Naming Databases

Verify that the naming databases in both systems are synchronized by comparing dumpconf output from each of the nodes. If you need to change a device name, use the devctl -n oldname newname command. See the devctl(1M) man page for more information.


ATTENTION

If you change the name of a device, be sure that all references to that device (such as entries in the vfstab file, ORACLE data files, and ptx/SVM devices) now point to the new name of the device.



ATTENTION

Do not start the upgrade process until both databases are in sync.



Clusters Check

  1. Ensure that a quorum disk exists for the cluster. A quorum disk is necessary to maintain cluster activity when one of the nodes in a cluster is down. Check for a quorum disk with the following command:

    # clustadm -vc  
    Cluster ID = 5
    Cluster formation time = Sun Sep 1 11:24:14 1996
    Cluster generation number = 2
    Last transition time = Mon Sep 2 09:00:35 1996
    Expected votes = 3
    Current aggregate votes = 3
    Minimum votes for quorum = 2
    Quorum state = Quorum
    Quorum disk:
    Name= = sd6s14
    Votes contributed = 1
    Clusterwide State = UP
    Local State = OWNED

    If a quorum disk is configured, when the first node is taken down during the upgrade, the cluster failover process defined with ptx/CTC will go into effect.

    If there is a quorum disk, skip Step 2 and go to the later section "Upgrade Node 1."

  2. If there is not a quorum disk, you will need to create and configure the disk according to the following guidelines:

    To configure a quorum disk, follow these steps:

    1. If you do not already have a free type-1 partition that is at least 1 MB in size for use as the quorum disk, you will need to create a custom VTOC on one of the nodes to designate the appropriate partition. For information about creating a custom VTOC, see the chapter entitled "Disk Drive Management" in the DYNIX/ptx System Administration Guide.

    2. From one of the active nodes, configure the path of the quorum disk, using either the following command line or ptx/ADMIN. In the command, qdisk_name is the disk partition to be used for the quorum disk (for example, sd7s3). qdisk_name cannot be a fully-qualified pathname.

      # clustadm -C qdisk_name

      Once the quorum disk is configured on one cluster node, the other nodes will become aware of the quorum disk and the cluster will start using it automatically. You do not need to reboot any of the nodes for the quorum disk configuration to take effect.

    3. Check the /etc/devtab file on both nodes to verify the qdisk entry.


Upgrade Node 1


ATTENTION

During a rolling upgrade, a hung condition can occur when a devctl -A command is issued, either from the command line or by a startup script. Refer to the later section "Recover From a devctl -A Hang" for more information about this condition and a workaround.


  1. Upgrade the operating system and necessary layered products on Node 1. Refer to Chapter 5, "Upgrade Symmetry 5000 Systems Running ptx/SVM V2.x."


    ATTENTION

    Compile a new kernel, but do not reboot.


  2. Verify that /installmnt/etc/vfstab was updated with the location of the root disk.

  3. Resolve any remaining file conflicts.

  4. Verify that the naming database on the alternate disk matches the naming database on the original root disk.

  5. Verify that the /var/tcp/ifnets and /var/tcp/ifaddrs files include the appropriate entries.

  6. Return the device where the installation was performed to the rootdg. See "Prepare to Reboot the System" in Chapter 5.

  7. Check the n0 boot string with /etc/bootflags. If it includes the -i option, remove the option.

  8. Shut down the operating system on Node 1 with the shutdown command.

  9. Set the boot path to boot the newly installed operating system to single-user mode. Be sure to specify the disk used for the installation.

    ---> bh osPath='2 slic(2)scsi(1)disk(0)'
  10. If you installed the QCIC or CSM software, download it as described in the QCIC or CSM release notes.

  11. Boot the operating system to single-user mode.

  12. If you have CTC objects defined in the database to fail over, use the following command on the other node to verify that the failover is complete.

    # /usr/ctc/bin/ctcadm -a getstatus
    Object <node_1_name> STOPPED
    Object <node_2_name> STARTED
  13. Perform a ROOT installation of any remaining layered products. See Chapter 8.

  14. Use the ptx/ADMIN menu system to compile a new kernel to include the new products. See Chapter 12, "Build a Custom Kernel." Reboot the operating system to single-user mode.

  15. Perform the post-installation tasks described in Chapter 14, "After the Installation."

  16. Shut down the operating system.

  17. Edit the boot path to allow Node 1 to boot to multiuser mode.

    ---> bh osPath='0 slic(2)scsi(1)disk(0)'
  18. Boot Node 1 to multiuser mode.

  19. On the Node 1 console, verify application failback (if CTC objects are defined in the database to fail over) and cluster-rejoin.

    # /usr/ctc/bin/ctcadm -a getstatus
    Object <node_1_name> STARTED
    Object <node_2_name> STARTED

    # clustadm -vc

    Cluster ID = 5
    Cluster formation time = Sun Sep 1 11:24:14 1996
    Cluster generation number = 2
    Last transition time = Mon Sep 2 09:00:35 1996
    Expected votes = 3
    Current aggregate votes = 3
    Minimum votes for quorum = 2
    Quorum state = Quorum
    Quorum disk:
    Name= = sd6s14
    Votes contributed = 1
    Clusterwide State = UP
    Local State = OWNED

    Also, check the location of the CTC database on both nodes. If the databases are in different locations, failover will not occur.

  20. Go to the Node 2 console.


Upgrade Node 2

  1. Upgrade the operating system and necessary layered products on Node 2. Refer to Chapter 5, "Upgrade Symmetry 5000 Systems Running ptx/SVM V2.x."


    ATTENTION

    Compile a new kernel, but do not reboot.


  2. Verify that the /installmnt/etc/vfstab file was updated with the location of the root disk.

  3. Verify that the naming database on the alternate disk matches the naming database on the original root disk.

  4. Verify that the /var/tcp/ifnets and /var/tcp/ifaddrs files include the appropriate entries.

  5. Resolve any remaining file conflicts.

  6. Return the device where the installation was performed to the rootdg. See "Prepare to Reboot the System" in Chapter 5.

  7. Check the n0 boot string with /etc/bootflags. If it includes the -i option, remove the option.

  8. Shut down the operating system on Node 2 with the shutdown command.

  9. Set the boot path to boot the newly installed operating system to single-user mode. Be sure to specify the disk used for the installation.

    ---> bh osPath='2 slic(2)scsi(1)disk(0)'
  10. If you installed new versions of the QCIC or CSM software, download it as described in the QCIC or CSM release notes.

  11. Boot the operating system to single-user mode.


    ATTENTION

    Upon booting to single-user mode, the naming databases of both nodes will be at the V4.5.2 level and will synchronize with each other automatically. The names of all shareable devices will be propagated automatically from the new V4.5.2 Node 1 naming database to the new Node 2 database.


  12. If you have CTC objects defined in the database to fail over, use the following command on the other node to verify that the failover is complete.

    # /usr/ctc/bin/ctcadm -a getstatus
    Object <node_1_name> STARTED
    Object <node_2_name> STOPPED
  13. Perform a ROOT installation of any remaining layered products. See Chapter 8.

  14. Compile a new kernel to include the new products. See Chapter 12, "Build a Custom Kernel." Boot the operating system to single-user mode.

  15. Perform the post-installation tasks described in Chapter 14, "After the Installation."

  16. Shut down the operating system.

  17. Edit the boot path to allow Node 2 to boot to multiuser mode.

    ---> bh osPath='0 slic(2)scsi(1)disk(0)'
  18. Boot Node 2 to multiuser mode.

  19. On the Node 2 console, verify application fail-back (if CTC objects are defined in the database to fail over) and cluster-rejoin.

    # /usr/ctc/bin/ctcadm -a getstatus

    Object <node_1_name> STARTED
    Object <node_2_name> STARTED

    # clustadm -vc

    Cluster ID = 5
    Cluster formation time = Sun Sep 1 11:24:14 1996
    Cluster generation number = 2
    Last transition time = Mon Sep 2 09:00:35 1996
    Expected votes = 3
    Current aggregate votes = 3
    Minimum votes for quorum = 2
    Quorum state = Quorum
    Quorum disk:
    Name= = sd6s14
    Votes contributed = 1
    Clusterwide State = UP
    Local State = OWNED

    Also, check the location of the CTC database on both nodes. If the databases are in different locations, failover will not occur.


Reestablish the Root and Swap Mirrors

Do not perform this procedure until you are sure that the cluster is up and your applications are running correctly on the new operating system.


ATTENTION

After performing this procedure, you can no longer return the cluster to the V4.4.x environment.


To reestablish the root and swap mirrors, complete these steps:

  1. On Node 1, reestablish the mirrors as described under "Mirror the Original Root Partition to the Upgraded Root Volume" in Chapter 5. You do not need to reboot the system.

  2. On Node 2, reestablish the mirrors as described under "Mirror the Original Root Partition to the Upgraded Root Volume" in Chapter 5. You do not need to reboot the system.

On the next reboot, the original root disk will be restored as the root disk.


Recover From a devctl -A Hang

During a rolling upgrade, a hung condition can occur when a devctl -A command is issued, either from a command line or by a startup script. This scenario has been observed during the upgrade of the first node when an initial installation of the operating system failed and a reboot was made or a reinstall was attempted on the first node. It also can happen if the second node running the old operating system must be rebooted for some reason during the upgrade process.

To recover from the hung condition so that the devctl -A command may complete, perform the following steps.

From the node running the older version of the operating system (NODE 2):

  1. Determine the name of the quorum disk:

    clustadm -cv
  2. Remove it:

    clustadm -D
  3. Clear all data from it:

    dd -if=/dev/zero -bs=8k -of=/dev/rdsk/qdisk_name
  4. Restore it:

    clustadm -C qdisk_name
  5. The hung devctl -A process on NODE 1 should complete without re-issuing the command.

  6. Return to the upgrade procedure.