System situations

The system-level predefined situations are described below in alphabetical order. You can access the description of a specific situation by selecting its name in the Contents tab under situations in the OMEGAMON XE on z/OS section of the help.

Renamed situations

The names of several pre-existing situations have been changed in version 4.1.

Several of the OMEGAMON XE for UNIX System Services predefined situations were renamed when the product was merged with OMEGAMON XE on z/OS. The old and new names are shown in the following table. If you are migrating from OMEGAMON XE for UNIX System Services, existing situations are migrated when support for OMEGAMON XE for z/OS is installed into the hub monitoring server. You may continue to use those situations, in which case you should not start the OMEGAMON XE on z/OS versions. Alternatively, you can delete the original situations and use the OMEGAMON XE on z/OS versions.

OMEGAMON XE for UNIX System Services OMEGAMON XE on z/OS
Check_Missing_Mount_Point Check_Missing_UNIX_Mount_Point
Excess_Kernel_CPU_Time Excess_UNIX_Kernel_CPU_Time
Excess_Process_UNIX_Run_Time Excess_Process_UNIX_Run_Time
Excess_UNIX_System_Time Excess_UNIX_System_Time
Excess_UNIX_User_Time Excess_UNIX_User_Time
ENQ_Contention_Critical UNIX_ENQ_Contention_Critical
ENQ_Contention_Warning UNIX_ENQ_Contention_Warning
File_System_Free_Space_Critical UNIX_File_System_FreeSpace_Crit
File_System_FreeSpace_Warning UNIX_File_System_Free_Space_Warn
Logged_On_User_Idle UNIX_Logged_On_User_Idle
Missing_inetd_Process Missing_UNIX_inetd_Process
Quiecsed_File_System Quiecsed_UNIX_File_System
Shortage_of_Processes_Critical Shortage_of_UNIX_Processes_Crit
Shortage_of_Processes_Warning Shortage_of_UNIX_Processes_Warn
Unwanted_inetd_Process Unwanted_UNIX_inetd_Process

Four situations that monitor real storage have been renamed for version 4.1 to reflect that fact that z/OS no longer distinguishes between central and expanded storage. If you are running with a mixed environment during a staged upgrade, the older situations will continue to work with V3.1 agents, but you must use the new versions with V4.1 agents. You cannot distribute the new situations to V3.1 agents. When you have completed your upgrade, you should replace and delete the older versions. (They will run, but they will never become true.)

Version 3.1 and before Version 4.1
OS390_CentralAvailFrames_Crit OS390_Available_Frames_Crit
OS390_CentralAvailFrames_Warn OS390_Available_Frames_Warn
OS390_CentralOnlineFrames_Crit OS390_Frames_Online_Crit
OS390_CentralOnlineFrames_Warn OS390_Frames_Online_Warn

OS390_Unref_Interval_Cnt_Crit and OS390_Unref_Interval_Cnt_Warn situations are still shipped, but they have changed slightly to allow for the new table data format.

Note:
Situations which monitor expanded storage are no longer shipped with V4.1. For more information, see IBM Tivoli OMEGAMON XE on z/OS: User's Guide.

Because of architectural changes, the names of the situations listed in the following table have been shortened. You should replace the old situations with the renamed versions and delete the old ones. If you continue to use the older situations, when the situations evaluate to true, you will see correct data in the Initial Situation values column of the Situation Event Console, but not in the Current Situation column.

Old name New named
OS390_System_PageFault_Rate_Crit OS390_System_PageFaultRate_Crit
OS390_System_PageFault_Rate_Warn OS390_System_PageFaultRate_Warn
OS390_Channel_LPAR_Busy_Pct_Crit OS390_Channel_LPAR_BusyPct_Crit
OS390_Channel_LPAR_Busy_Pct_Warn OS390_Channel_LPAR_BusyPct_Warn
OS390_Cache_FastWrite_HitPt_Crit OS390_Cache_FastWriteHitPt_Crit
OS390_Cache_FastWrite_HitPt_Warn OS390_Cache_FastWriteHitPt_Warn
OS390_Common_PageDS_PctFull_Crit OS390_Common_PageDSPctFull_Crit
OS390_Common_PageDS_PctFull_Warn OS390_Common_PageDSPctFull_Warn
OS390_Tape_Permanent_Errors_Crit OS390_Tape_Permanent_Error_Crit
OS390_Tape_Permanent_Errors_Warn OS390_Tape_Permanent_Error_Warn

Check_Missing_UNIX_Mount_Point

Check_Missing_UNIX_Mount_Point raises a Critical alert when a specified mount point is missing. The formula is:

IF *MISSING USS_Mounted_File_Systems.Mount_Point *EQ ( / )

If this alert is raised, a system programmer should be notified.

Crypto_CKDS_80PCT_Full

Crypto_CKDS_80PCT_Full monitors the Cryptographic Key Dataset (CKDS) and issues a Critical alert when it reaches 80% or more of its maximum capacity.

The formula is:

VALUE ICSF.Status EQ Active AND VALUE ICSF.CKDS_80Full EQ Yes

The CKDS is a VSAM linear dataset used to store keys encryption and authorization keys. If the dataset is at 80% or more of its maximum capacity, a new dataset should be created using a new master key and all keys contained in the dataset should be re-enciphered into the new dataset. The name of the current CKDS is shown in the CKDSname attribute. Refer to the ICSF Administration Guide for further details.

Crypto_CKDS_Access_Disabled

Crypto_CKDS_Access_Disabled monitors the Cyptographic Key Dataset (CKDS) and raises a Warning alert if access has been disabled.

The formula is:

VALUE ICSF.CKDSAccess EQ Disabled

The CKDS is a VSAM linear dataset used to store keys encryption and authorization keys. Access is normally disabled when a new master key or CKDS is being initialized. This interruption is temporary and access is enabled after key management operations are completed.

Crypto_Internal_Error

Crypto_Internal_Error monitors for internal errors and issues a Critical alert if one is detected.

The formula is:

VALUE ICSF.MonStatus NE Enabled

Contact IBM Support with the event attributes to report the error and for assistance in correcting the problem. MonStatus = Overrun indicates that an internal queue overflow has been detected. SCEDisabled > 0 indicates one or more service call exits have ABENDed and is no longer collecting performance data.

Crypto_Invalid_Master_Key

Crypto_Invalid_Master_Key monitors for the existence of a valid master key and raises a Critical alert if none is detected.

The formula is:

VALUE ICSF.Status EQ Active AND VALUE ICSF.CCMKeyOK EQ No

A valid master key must be loaded into at least one of the cryptographic coprocessors. Use the ICSF ISPF dialog, TKE, or the system element to load the master key into each cryptographic coprocessor. A different master key may be loaded into coprocessors shared by PRSM Logical Partitions. Each LPAR is associated with a separate Domain Index to isolate cryptographic keys. For PCIcoprocessors, the master key must be the same value as the symmetric-keys master key (SYM-MK).

Crypto_Invalid_PKA_Master_Keys

Crypto_Invalid_PKA_Master_Keys monitors for the existence of a valid Key Management Master Key (KMMK) and a valid Signature Master Key (SMK) and raises a Critical alert if either is invalid or missing.

The formula is:

VALUE ICSF.PKAMKeys EQ Invalid

If this situation is raised, ensure that the KMMK and SMK are loaded into each coprocessor. For PCI coprocessors, the SMK key must be the same value used for the asymmetric-keys master key (ASYM-MK). Use the KMMK and SM attributes to validate the values of the verification hash patterns for these keys.

Crypto_No_Coprocessors

Crypto_No_Coprocessors monitors for cryptographic coprocessors and raises a Critical alert if none is online.

The formula is:

VALUE ICSF.Status EQ Active AND VALUE ICSF.1_CC EQ No

At least one cryptographic coprocessor must be online for cryptographic services to become available. Verify that at least one coprocessor has been configured for the z/OS system. Use the System Element console to configure the coprocessors for use by systems.

Crypto_No_PCI_Coprocessors

Crypto_No_PCI_Coprocessors monitors for PCI cryptographic coprocessors and raises a Warning alert if none is detected.

The formula is:

VALUE ICSF.1_PCI EQ No

Several Public Key Algorithm (PKA) service calls will not function without a PCI coprocessor available. Since PCI coprocessors are optimized for operations, PKA services will run slower on CMOS coprocessors.

Crypto_PCI_Unavailable

Crypto_PCI_Unavailable monitors for PCI coprocessors and raises a Critical alert if one is detected but is not online or active.

The formula is:

VALUE ICSF.1_PCI EQ Yes AND VALUE ICSF.PCIStatus NE Active

PCI coprocessors are optimized for Public Key Algorithm (PKA) operations and will run slower on CMOS coprocessors. Also, several PKA services will not run without a PCI coprocessor available.

Crypto_PKA_Services_Disabled

Crypto_PKA_Services_Disabled monitors PKA services calls and raises a Warning alert if the service calls are disabled.

The formula is:

VALUE ICSF.Status EQ Active AND VALUE ICSF.PKACall EQ Disabled

Disable the services only to update PKA Key Management Master Key (KMMK) or Signature Master Key (SMK), or to manage the Public Key Dataset (PKDS). Enable PKA services calls only after PKA management operations are completed.

Crypto_PKDS_Read_Disabled

Crypto_PKDS_Read_Disabled monitors the status of the Public Key Dataset (PKDS) and issues a Warning alert if read operations have been disabled.

The formula is:

VALUE ICSF.PKDSRead EQ Disabled

The PKDS is a VSAM dataset used to store Public Key Algorithm (PKA) keys used for encryption and authentication. Read operations may be temporarily disabled for management operations on the PKDS. Read access to the PKDS is restored following completion of management operations. The PKDSname attribute displays the name of the current PKDS.

Crypto_PKDS_Write_Disabled

Crypto_PKDS_Write_Disabled monitors the status of the Public Key Dataset (PKDS) and issues a Warning alert if write operations have been disabled.

The formula is:

VALUE ICSF.PKDSWrite EQ Disabled

The PKDS is a VSAM dataset used to store Public Key Algorithm keys used for encryption and authentication. Write access to the dataset may be temporarily disabled to allow key management operations to occur. Write access should be enabled following completion of PKDS key management operations. The PKDSname attribute displays the name of the current PKDS.

Crypto_Service_Unavailable

Crypto_Service_Unavailable monitors the status of cryptographic services and raises a Critical alert if they are unavailable.

The formula is:

VALUE ICSF.CryptoSvcs EQ Inactive

If this situation is raised, verify that the ICSF subsystem is running on this system. If the ICSF subsystem is active, ensure that cryptographic coprocessors are online and available to this system. Also verify that a valid master key has been loaded in each coprocessor configured for this system.

Excess_Process_UNIX_Run_Time

Excess_Process_UNIX_Run_Time detects when a process exceeds 50% UNIX run time. The formula is:

IF *VALUE USS_Processes.UNIX_Run_Time% *GT 50

CPU utilization attributable to UNIX work for the indicate process exceeds the threshold defined as excessive. This condition might not be a matter of immediate concern. It may indicate that the process is in a loop requesting UNIX System Services.

Excess_UNIX_Kernel_CPU_Time

Excess_UNIX_Kernel_CPU_Time detects when the UNIX kernel is using more than 50% CPU. The formula is:

IF *VALUE USS_Kernel.CPU% *GT 50

UNIX System Services kernel CPU utilization exceeds the threshold defined as excessive. This condition might not be a matter of immediate concern. It may indicate that a process or address space is in a loop requesting kernel services. Look for processes or adress spaces with abnormally high CPU utilization.

Excess_UNIX_System_Time

Excess_UNIX_System_Time detects when a dubbed address space exceeds 50% UNIX run time. The formula is:

IF *VALUE USS_Address_Spaces.UNIX_System_Time% *GT 50

CPU utilization attributable to execution of z/OS UNIX System Services kernel code exceeds the threshold currently defined as excessive. This condition might not be a matter of immediate concern. It may indicate that the address space is in a loop requesting UNIX System Services.

Excess_UNIX_User_Time

Excess_UNIX_User_Time detects when a dubbed address space exceeds 50% UNIX user time. The formula is:

IF *VALUE USS_Address_Spaces.UNIX_User_Time% *GT 50

Address space CPU utilization attributable to UNIX work exceeds the threshold currently defined as excessive. This condition might not be a matter of immediate concern. It may indicate that the address space is in a loop.

INET_Max_Sockets_Critical

INET_Max_Sockets__Critical detects when the percentage of internet sockets in use has reached 95%. The formula is:

*VALUE USS_Kernel.ISock_Curr_Pct *GE 95.0

This situation indicates when internet socket usage is near maximum. If the maximum is reached, UNIX System Services will not be accessible by Internet connections. Help from a system programmer is needed immediately. Consider increasing the value of NETWORK DOMAINNAME(AF_INET)- MA XSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value or to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

INET_Max_Sockets_Warning

INET_Max_Sockets_Warning detects when the percentage of internet sockets in use is between 80 and 95%. The formula is:

*IF *VALUE USS_Kernel.ISock_Curr_Pct *GE 80.0 *AND
*VALUE USS_Kernel.ISock_Curr_Pct *LT 95.0

This situation indicates when internet socket usage is approaching the maximum. If the maximum is reached, UNIX System Services will not be accessible by Internet connections. Consider increasing the value of NETWORK DOMAINNAME(AF_INET)- MA XSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value or to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

INET6_Max_Sockets_Critical

INET6_Max_Sockets_Critical detects when the percentage of Internet v6 sockets in use has reached 95%. The formula is:

*IF *VALUE USS_Kernel.I6Sock_Curr_Pct *GE 95.0

This situation indicates when Internet V6 socket usage is near the maximum. If the maximum is reached, UNIX System Services will not be accessible by Internet connections. Help from a system programmer is needed immediately. Consider increasing the value of NETWORK DOMAINNAME(AF_INET6)- MA XSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value or to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

INET6_Max_Sockets_Warning

INET6_Max_Sockets_Warning detects when the percentage of Internet V6 sockets in use is between 80 and 95%. The formula is:

*IF *VALUE USS_Kernel.I6Sock_Curr_Pct *GE 80.0 *AND
*VALUE USS_Kernel.I6Sock_Curr_Pct *LT 95.0

This situation indicates when Internet V6 socket usage is approaching the maximum. If the maximum is reached, UNIX System Services will not be accessible by Internet connections. Consider increasing the value of NETWORK DOMAINNAME(AF_INET6)- MA XSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value or to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

KM5_Avail_CSA_Warning

KM5_Avail_CSA_Warning raises a Warning alert when CSA or ECSA Available Free Storage bytes goes below a desired threshold, as it may not be possible to start a workload that is required to meet operational demands. The formula is:

KM5_Common_Storage_SubKey.Record_Type *EQ Summary 

*AND KM5_Common_Storage_SubKey.Area *EQ CSA
*AND KM5_Common_Storage_SubKey.Free_Storage_Total LT 1024
*OR KM5_Common_Storage_SubKey.Record_Type *EQ Summary
*AND KM5_Common_Storage_SubKey.Area *EQ ECSA
*AND KM5_Common_Storage_SubKey.Free_Storage_Total *LT 4096

Examine the running workload (address spaces) and determine whether a workload of a lower priority that is utilizing ECSA or CSA can be stopped to free up the storage for higher priority workload.

To examine common storage usage by address space, navigate to the Address Space Overview workspace and from the Address Space Counts view, link to the Address Space Common Storage - Active Users workspace. In this last workspace, the table view can be sorted by the “CSA/ECSA In Use" columns.

KM5_CPU_Loop_Warn

This situation detects when the value of all CPU, zIIP, zIIP on CP, zAAP, and zAAP on CP using and waiting counts, divided by total sample count, is greater than 95.0%.

The formula is:

*IF *VALUE Address_Space_Bottlenecks.CPU_Loop_Index *GT 95.0

This situation indicates that address space CPU usage is high. A high value indicates either that the address space is in a CPU loop or that it is in a very CPU-intensive phase of processing. Examine each address space. If the application is known to be a heavy CPU user, then continue monitoring it. It may not actually be in a loop. If the address space is using unexpectedly large amounts of CPU, it is a candidate for cancellation. Note that very CPU intensive jobs may read high without being in a loop, so the CPU Loop Index value is a guide, not a guarantee.

This situation is refreshed once every 5 minutes, but a minimum of 10 minutes is used to calculate the loop index value. Any address space that reaches the 90% loop index value after 10 minutes and is considered either medium or low importance extend its calculation over longer time periods to avoid false positives. The extended refresh period and workload importance considerations makes false positives unlikely.

KM5_HDSP_Pct_Busy_Warning

KM5_HDSP_Pct_Busy_Warning raises a Warning alert when overall System MVS Percent utilization on a system running with HiperDispatch Management enabled exceeds the specified threshold. In a system running with HiperDispatch Management, the Logical Processor Parked Time is accounted for in the calculation of MVS Percent utilization. In such a system, a high (for example, greater than 95%) overall System MVS Percent utilization may be an indication of latent demand.

The formula is:

*IF *VALUE HiperDispatch_Management.System_MVS_Pct *GT 95

Monitor and examine the LPAR and CPC CPU utilization and configuration (LPAR Weights, Logical Processor configuration) to determine if there is a need to shift CPC workload or resource to relieve the LPAR that is exhibiting high overall System MVS Percent utilization.

KM5_Job_Avail_LSQA_Warning

KM5_Job_Avail_LSQA_Warning raises a Warning alert when available LSQA goes below a desired threshold. This may be an indication of an impending operational issue. The formula is:

KM5_Address_Space_Storage_SubKey.ASNAME *EQ "D81*" 

*AND KM5_Address_Space_Storage_SubKey.Record_Type *EQ Total
*AND KM5_Address_Space_Storage_SubKey.Percent_LSQA_Allocated *GT 84
Note:
This formula should be edited to provide an appropriate address space name.

Examine the address space or spaces for the workload of interest and determine whether one or more of the address spaces may need to be recycled to avoid an operational issue.

To examine address space storage usage, navigate to the Address Space Overview workspace and, from the Address Space CPU Utilization Summary view, select a row for an address space of interest. From the link-to pull-down menu, select Address Space Storage Subpools and LSQA.

KM5_Job_Subp_Key_Use_Warning

KM5_Job_Subp_Key_Use_Warning raises a Warning alert when utilization of a specified storage subpool and key by any of a similarly named group of address spaces exceeds a target threshold. This may be an indication of impending operational issues. The formula is:

KM5_Address_Space_Storage_SubKey.ASNAME *EQ "D81*" 

*AND KM5_Address_Space_Storage_SubKey.SUBPOOL *EQ 241
*AND KM5_Address_Space_Storage_SubKey.Allocation *GT 4096
*AND KM5_Address_Space_Storage_SubKey.Storage_Key *EQ 0
Note:
This formula should be edited to provide an appropriate address space name.

Examine the address space or spaces for the workload of interest and determine whether one or more of the address spaces may need to be recycled or stopped to avoid an operational issue.

To examine address space storage usage by subpool and key, navigate to the Address Space Overview workspace and, from the Address Space CPU Utilization Summary view, select a row for an address space of interest. From the link-to pull-down menu, select Address Space Storage Subpools and LSQA.

KM5_LPAR_Cap_Warn

KM5_LPAR_Cap_Warn detects when an LPAR is being soft-capped. This situation is raised if the average physical standard CP resource consumed by the LPAR has exceeded its Defined MSU Capacity (WLM Soft Cap) or its entitled limit based on guaranteed physical standard CP resources over the last 4 hours. This situation is also true if the LPAR is a member of an LPAR group and it is being soft-capped by the LPAR group 4-Hour MSU Limit being exceeded. The formula is:

System_CPU_Utilization.Percent_LPAR_MSU_Capacity *GE 100.0 

*OR (System_CPU_Utilization.Average_Unused_Group_MSUs *LE 0
*AND System_CPU_Utilization.LPAR_Group_Capacity_Limit *GT 0)

If a WLM soft cap is defined for the LPAR, consider re-evaluating this limit based on the service goals of the workload or workloads running on the LPAR.

Check that LPAR weights are established such that the LPAR is getting a sufficient share of the CPC's physical CPU resources. This applies if the CPC physical CPU resources are not fully utilized at the time of the high utilization and the entitled physical standard CP resource available to the LPAR is insufficient to meet workload demands.

In the case of capping due to the LPAR group 4-Hour Rolling Average MSU Limit being exceeded, the consumption of CPU resources across all the LPAR Group members requires evaluation.

KM5_LPAR_MSU_Warn

KM5_LPAR_MSU_Warn detects when the LPAR 4-Hour Rolling average is high. This situation is raised if the average physical standard CP resource consumed by the LPAR has reached, or is near to, its Defined MSU Capacity (WLM Soft Cap) or its entitled limit based on guaranteed physical standard CP resources over the last 4 hours. The formula is:

System_CPU_Utilization.Percent_LPAR_MSU_Capacity *GT 95.0 

*OR (System_CPU_Utilization.Average_Unused_Group_MSUs *LT 5
*AND LPAR_Group_Capacity_Limit GT 0)

If a WLM soft cap is defined for the LPAR, consider re-evaluating this limit based on the service goals of the workload or workloads running on the LPAR.

Check that LPAR weights are established such that the LPAR is getting a sufficient share of the CPC's physical CPU resources. This applies if the CPC physical CPU resources are not fully utilized at the time of the high utilization and the entitled physical standard CP resource available to the LPAR is insufficient to meet workload demands.

KM5_Storage_Shortage_Critical

KM5_Storage_Shortage_Critical raises an alert when a critical-level storage shortage signal (an ENF Event) has been raised. One of the following Storage Shortage Types is displayed with a critical-level storage shortage alert:

The formula is:

KM5_Storage_Shortage_Status.Storage_Shortage_Level *EQ Critical

A possible cause of the storage shortage condition might be one or more address spaces which are allocating and fixing too much storage or using too much auxiliary storage. Identify address spaces representing the most significant cause of the storage shortage. From the Real Storage workspace, navigate to the Storage Shortage Alerts workspace, then to the Storage Shortage Details workspace. This workspace displays the top twenty contributing address spaces. Consider stopping or canceling one or more of these address spaces to relieve the storage shortage. Note that SRM, with guidance from PARMLIB OPT settings, will take steps to relieve the storage shortage condition. Action may be required only if the critical storage condition persists (that is, if SRM actions are ineffective).

KM5_Storage_Shortage_Warning

KM5_Storage_Shortage_Warning raises an alert when a warning-level storage shortage signal (that is, an ENV Event) has been raised. One of the following Storage Shortage Types is displayed with a warning-level storage shortage alert:

The formula is:

KM5_Storage_Shortage_Status.Storage_Shortage_Level *EQ Warning

OR KM5_Storage_Shortage_Status.Storage_Shortage_Level *EQ Appl_Warning

A possible cause of the storage shortage condition might be one or more address spaces which are allocating and fixing too much storage or using too much auxiliary storage. Identify address spaces representing the most significant cause of the storage shortage. From the Real Storage workspace, navigate to the Storage Shortage Alerts workspace then to the Storage Shortage Details workspace. This workspace displays the top twenty contributing address spaces. Consider stopping or canceling one or more of these address spaces to relieve the storage shortage. Note that SRM, with guidance from PARMLIB OPT settings, will take steps to relieve the storage shortage condition. Action may only be required if problem reaches a critical threshold condition.

Missing_UNIX_inetd_Process

Missing_UNIX_inetd_Process detects a missing inetd process. The formula is:

*IF *MISSING USS_Processes.Command_Name *EQ ( inetd )

The inetd daemon that provides UNIX networking services is not active. Start the inetd daemon.

OS390_Allocated_CSA_Crit

OS390_Allocated_CSA_Crit monitors to determine whether the percentage of the Common Storage Area allocated is equal to or greater than 95% and issues a Critical alert if the condition is true.

The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Allocation_Percent GE 95

A system crash can occur because of exhausted CSA. Use this situation to identify the address spaces using high amounts of CSA and stop or cancel nonessential address spaces with high usage.

OS390_Allocated_CSA_Warn

OS390_ Allocated_CSA_Warn monitors to determine whether the percentage of the Common Storage Area (CSA) allocated is between 90% and 94.9% inclusive and issues a Warning if the condition is true.

The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Allocation_Percent GE 90 AND
VALUE Common_Storage.Allocation_Percent LT 95

A system crash can occur due to exhausted CSA. Identify the address spaces using high amounts of CSA and stop or cancel nonessential address spaces with high usage.

OS390_AvgCPU_Pct_Crit

OS390_AvgCPU_Pct_Crit monitors to determine the average percentage of time that all processors available in this z/OS system were busy dispatching work and issues a Critical alert if the average percent value is equal to or greater than 100.

The formula is:

IF VALUE System_CPU_Utilization.Average_CPU_Percent GE 100

This condition might not be a matter of immediate concern. If it arises suddenly on a uniprocessor, it may indicate that a unit of work is in a loop. If it is a chronic condition, it may be that the system is kept busy with low priority work. However, if service classes are missing their goals, a capacity increase may be needed.

OS390_AvgCPU_Pct_Warn

OS390_AvgCPU_Pct_Warn monitors to determine the average percent of time that all processors available in this system were busy dispatching work, and issues a Warning if the average percent value is between 95 and 99% inclusive.

The formula is:

IF VALUE System_CPU_Utilization.Average_CPU_Percent GE 95 AND
VALUE System_CPU_Utilization.Average_CPU_Percent LT 100

This condition might not be a matter of immediate concern. If it arises suddenly on a uniprocessor, it may indicate that a unit of work is in a loop. If it is a chronic condition, it may be that the system is kept busy with low priority work. However, if service classes are missing their goals, a capacity increase may be needed.

OS390_Cache_FastWriteHitPt_Crit

OS390_Cache_FastWrite_HitPt_Crit monitors the percentage of successful I/O requests to write data to the cache and issues a Critical alert if the percentage is between 0 and 50%. If there is no service class that is missing its goal, this situation's thresholds may need to be adjusted.

The formula is:

IF VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent GT 0 AND
VALUE DASD_MVS_Devices.Fast_Write_Hit_Percent LE 50

If service class periods are missing goals due to delay from the indicated device, datasets may need to be moved so that the Fast Write Cache capacity is better matched to the workload.

OS390_Cache_FastWriteHitPt_Warn

OS390_Cache_FastWrite_HitPt_Warn monitors the percentage of successful I/O requests to write data to the cache and issues a Warning alert if the percentage is between 50% and 70% inclusive.

The formula is:

IF VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent LE 70 AND
VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent GT 50

If there is no service class that is missing its goal, this situation's thresholds may need to be adjusted. If service class periods are missing goals due to delay from the indicated device, datasets may need to be moved so that the Fast Write Cache capacity is better matched to the workload.

OS390_Cache_Read_HitPct_Crit

OS390_Cache_Read_HitPct_Crit monitors the percent of successful I/O requests to read data from the cache and issues a Critical alert if the percentage is greater than 0 and less than or equal to 50%.

The formula is:

IF VALUE DASD_MVS_DEVICES.Cache_Read_Hit_Percent GT 0 AND
VALUE DASD_MVS_DEVICES.Cache_Read_Hit_Percent LE 50

If this situation raises, determine whether dataset placement should be adjusted (I/O tuning). If goals are being missed, tuning may be required. If no goals are being missed, the threshold may need to be adjusted.

OS390_Cache_Read_HitPct_Warn

OS390_Cache_Read_HitPct_Warn monitors the percent of successful I/O requests to read data from the cache and issues a Warning if the percentage is between 51% and 70% inclusive.

The formula is:

IF VALUE DASD_MVS_Devices.Cache_Read_Hit_Percent LE 70 AND
VALUE DASD_MVS_Devices.Cache_Read_Hit_Percent GT 50

If this situation raises, determine whether dataset placement should be adjusted (I/O tuning). If goals are being missed, tuning may be required. If no goals are being missed, the threshold may need to be adjusted.

OS390_Cache_Write_HitPct_Crit

OS390_Cache_Write_HitPct_Crit monitors the percent of successful I/O requests to write temporary data to the cache and issues a Critical alert if the percentage is greater than 0 and less than or equal to 50%.

The formula is:

IF VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent GT 0 AND
DASD_MVS_Devices.Cache_Write_Hit_Percent LE 50

If this situation is raised, determine whether any address spaces that have this device number allocated are in service classes that are missing their goals. If goals are being missed, dataset placement (I/O tuning) may be required.

OS390_Cache_Write_HitPct_Warn

OS390_Cache_Write_HitPct_Warn monitors the percent of successful I/O requests to write temporary data to the cache and issues a Warning alert if the percentage is between 51% and 70% inclusive.

The formula is:

IF VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent LE 70 AND
VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent GT 50

If this situation is raised, determine whether any address spaces that have this device number allocated are in service classes that are missing their goals. If goals are being missed, dataset placement (I/O tuning) may be required.

OS390_Available_Frames_Crit

OS390_Available_Frames_Crit monitors to determine when available frames of real storage are less than the specified threshold and issues a Critical alert when the condition is true. This problem should correct itself in a short time by means of page stealing. However, if the problem occurs more often than once a day, there may be a performance problem in the paging subsystem or an address space is using an excessive number of pages.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Available_Frames LE 0

The available frame queue is a list of frames that are available to the system. If the number of frames is too low, the system resources manager (SRM) automatically replenishes the queue by stealing frames that have not been recently referenced. Controlling the available frame queue is the method SRM uses to manage central storage use. A low available frame queue is a problem only if it causes contention for central storage.

OS390_Available_Frames_Warn

OS390_Available_Frames_Warn monitors to determine when available frames of real storage are less than the specified threshold and issues a Warning alert when the condition is true. This problem should correct itself in a short time by means of page stealing. However, if the problem occurs more often than once a day, there may be a performance problem in the paging subsystem or an address space is using an excessive number of pages.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Available_Frames LT 1 AND
VALUE Real_Storage.Available_Frames GT 0

The available frame queue is a list of frames that are available to the system. If the number of frames is too low, the system resources manager (SRM) automatically replenishes the queue by stealing frames that have not been recently referenced. Controlling the available frame queue is the method SRM uses to manage central storage use. A low available frame queue is a problem only if it causes a high page fault rate.

OS390_Frames_Online_Crit

OS390_CentralOnlineFrames_Crit monitors the central storage online frame count and issues a Critical alert when the condition is true. This situation indicates that central (real) storage available to this system is less than the threshold. If this alert results from a deliberate reconfiguration action, you should reset this situation's threshold using the Situation editor. Otherwise, check for a possible hardware problem.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Online_Frames LE 0

Set the critical threshold for central storage that should be online. The value must be less than the warning threshold. To be alerted of any loss of storage, you might want to set the threshold to the amount of central storage that should always be online.

OS390_Frames_Online_Warn

OS390_CentralOnlineFrames_Warn monitors the central storage online frame count and issues a Warning alert when the condition is true. This situation indicates that central (real) storage available to this system is less than the threshold. If this alert results from a deliberate reconfiguration action, you should reset this situation's threshold using the Situation editor. Otherwise, check for a possible hardware problem.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Online_Frames LT 1 AND
VALUE Real_Storage.Online_Frames GT 0

Set the warning threshold (in frames) for online central storage. The value must be greater than the critical threshold. You might want to leave this situation off, and use the critical situation to alert you to any storage loss.

OS390_ChannelComplexBusy_Crit

OS390_ChannelComplexBusy_Crit monitors channel path activity and has determined that one or more channel paths is busier on all systems than the current Critical threshold. A Critical alert is issued. Check to determine the particular channels that are unusually active and if the threshold provided in this situation is low, adjust it using the Situation editor. Note that acceptable busy levels for tape channels, ESCON channels, and FICON channels are typically much higher than for parallel DASD channels.

The formula is:

IF VALUE Channel_Paths.Complex_Percent GE 100

OS390_ChannelComplexBusy_Warn

OS390_ChannelComplexBusy_Warn monitors channel path activity and has determined that one or more channel paths is busier on all systems than the current Warning threshold. A Warning is issued. Check to determine the particular channels that are unusually active and if the threshold provided in this situation is low, adjust it using the Situation editor. Note that acceptable busy levels for tape channels, ESCON channels, and FICON channels are typically much higher than for parallel DASD channels.

The formula is:

IF VALUE Channel_Paths.Complex_Percent GE 100

OS390_Channel_LPAR_BusyPct_Crit

OS390_Channel_LPAR_Busy_Pct_Crit monitors the activity of the channel paths and issues a Critical alert if one or more channel paths is busier than the threshold. Identify the particular channel or channels that are unusually active. Note that the acceptable busy levels for tape channels, ESCON channels, and FICON channels are much higher than typical levels for parallel DASD channels. If the threshold for this situation is too low, adjust it using the Situation editor.

The formula is:

IF VALUE Channel.Paths.LPAR_Percent GE 100

OS390_Channel_LPAR_BusyPct_Warn

OS390_Channel_LPAR_Busy_Pct_Warn monitors the activity of the channel paths and issues a Warning alert if one or more channel paths is busier than the threshold. Identify the particular channel or channels that are unusually active. Note tht the acceptable busy levels for tape channels, ESCON channels, and FICON channels are much higher than typical levels for parallel DASD channels. If the threshold for this situation is too low adjust, it using the Situation editor.

The formula is:

IF VALUE Channel.Paths.LPAR_Percent GE 100

OS390_Channel_Path_Offline_Crit

OS390_Channel_Path_Offline_Crit monitors to determine whether a channel path is offline and issues a critical alert when this condition is true. This may be a normal condition if the indicated channel path is dynamically managed.

The formula is:

IF VALUE Channel_Paths.Online EQ N

If this situation is raised, check the configuration matrix for the current image and determine wther this channel path should be online. If so, attempt to VARY it online.

OS390_Channel_Path_Offline_Warn

OS390_Channel_Path_Offline_Warn monitors to determine whether a channel is offline and issues a warning when this condition is true. This may be a normal condition if the indicated channel path is dynamically managed. Check the configuration matrix for the current image and determine whether this channel path should be online. If so, attempt to VARY it online.

The formula is:

IF VALUE Channel_Paths.Online EQ N

If this situation is raised, check the configuration matrix for the current image and determine wther this channel path should be online. If so, attempt to VARY it online.

OS390_Common_PageDSPctFull_Crit

OS390_Common_PageDS_PctFull_Crit monitors to determine whether the percentage of slots in use on the common page dataset is greater than or equal to 80% and issues a Critical alert if the condition is true. If the common page data set becomes full, a system crash is imminent.

The formula is:

IF VALUE Page_Dataset_Activity.Dataset_Type EQ Common AND
VALUE Page_Dataset_Activity.Percent_Full GE 80

If this situation is raised, determine which address spaces are using the largest number of common slots and terminate those that can be shut down at this time. If this situation occurs more frequently than once a month, a larger common page data set should be created and activated at the next IPL.

OS390_Common_PageDSPctFull_Warn

OS390_Common_PageDS_PctFull_Warn monitors to determine whether the percentage of slots in use on the common page dataset is greater than or equal to 60% and less than 80% and issues a Warning if the condition is true. If the common page data set becomes full, a system crash is imminent.

The formula is:

IF VALUE Page_Dataset_Activity.Dataset_Type EQ Common AND
VALUE Page_Dataset_Activity.Percent Full GE 60 AND
VALUE Page_Dataset_Activity.Percent Full LT 80

If this situation is raised, determine which address spaces are using the largest number of common slots and terminate those that can be shut down at this time. If this situation occurs more frequently than once a month, a larger common page data set should be created and activated at the next IPL.

OS390_CSA_Growth_Crit

OS390_CSA_Growth_Crit monitors to determine whether the growth in use of the Common Storage Area is greater than or equal to 50 and issues a Critical alert if the condition is true. The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Growth GE 50

If this situation is raised, identify the address spaces using high amounts of CSA and showing rapid growth in its use. Stop or cancel nonessential address spaces to avert a crash.

OS390_CSA_Growth_Warn

OS390_CSA_Growth_Warn monitors to determine whether the growth in use of the Common Storage Area is between 35 and 49 inclusive and issues a Warning alert if the condition is true. The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Growth GE 35 AND
VALUE Common_Storage.Growth LT 50

If this situation is raised, identify the address spaces using high amounts of CSA and showing rapid growth in its use. Stop or cancel nonessential address spaces to avert a crash.

OS390_DASD_Busy_Percent_Crit

OS390_DASD_Busy_Percent_Crit monitors DASD device utilization and issues a Critical alert when the percentage of time a device is busy is greater than or equal to 100. The formula is:

IF VALUE DASD_MVS_Devices.Percent_Busy GE 100

This condition may represent a current or pending performance problem if any service class period is missing its goal because of I/O delay for this device. This threshold is set to 100% by default, which can be set to a lower value only when troubleshooting a chronic DASD performance problem.

OS390_DASD_Busy_Percent_Warn

OS390_DASD_Busy_Percent_Warn monitors DASD device utilization and issues a Warning alert when the percentage of time a device is busy is greater than or equal to 100. The formula is:

IF VALUE DASD_MVS_Devices.Percent_Busy GE 100

This condition may represent a current or pending performance problem if any service class period is missing its goal because of I/O delay for this device. This threshold is set to 100% by default and should be set to a lower value only when pursuing a chronic DASD performance problem.

OS390_DASD_Dropped_Ready_Crit

OS390_DASD_Dropped_Ready_Crit monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5. The formula is:

IF VALUE DASD_MVS.Dropped_Ready GE 5

Should this rare situation occur, a hardware service person should be notified.

OS390_DASD_Dropped_Ready_Warn

OS390_DASD_Dropped_Ready_Warn monitors the count of devices in this condition and issues a Warning alert if the number is greater than 0 but less than 5. The formula is:

IF VALUE DASD_MVS.Dropped_Ready *GT 0 AND
VALUE DASD_MVS.Dropped_ Ready LT 5")

Should this rare situation occur, a hardware service person should be notified.

OS390_DASD_NoDynamicReconn_Critical

OS390_DASD_NoDynamicReconn_Critical monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5.

The formula is:

IF VALUE DASD_MVS.No_Dynamic Path_Reconnect GE 5

This problem should be referred to appropriate personnel to determine whether the devices should be offloaded.

OS390_DASD_NoDynamicReconn_Warn

OS390_DASD_NoDynamicReconn_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 and less than 5.

The formula is:

IF VALUE DASD_MVS.No_Dynamic_Path_Reconnect GT 0 AND
VALUE DASD_MVS.No_Dynamic_Path_Reconnect LT 5

This problem should be referred to appropriate personnel to determine whether the devices should be offloaded.

OS390_DASD_Not_Responding_Crit

OS390_DASD_Not_Responding_Crit monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5.

The formula is:

IF VALUE DASD_MVS.Not_Responding GE 5

Should this rare situation occur, a hardware service person should be notified.

OS390_DASD_Not_Responding_Warn

OS390_DASD_Not_Responding_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 but less than 5. Should this rare situation occur, a hardware service person should be notified.

The formula is:

IF VALUE DASD_MVS.Not_Responding GT 0 AND
VALUE DASD_MVS.Not_Responding LT 5

OS390_DASD_Response_Time_Crit

OS390_DASD_Response_Time_Crit monitors the response time for a DASD device and issues a Critical alert when the threshold value is exceeded. This situation is distributed as disabled by default and should be activated only when attempting to solve a problem where excessive DASD response time is likely to be the cause.

The formula is:

IF VALUE DASD_MVS_Devices.Response GE 1000000000

OS390_DASD_Response_Time_Warn

OS390_DASD_Response_Time_Warn monitors the response time for a DASD device and issues a Warning when the threshold value is exceeded. This situation is distributed as disabled by default and should be activated only when attempting to solve a problem where excessive DASD response time is likely to be the cause.

The formula is:

If VALUE DASD_MVS_Devices.Response GE 1000000000

OS390_ECSA_Allocation_Pct_Crit

OS390_ECSA_Allocation_Pct_Crit monitors to determine whether the percentage of the Extended Common Storage Area allocated is greater than or equal to 95% and issues a Critical alert if the condition is true. Check the current size of ECSA (the second CSA subparameter in IEASYSxx). The value may need to be adjusted before the next IPL. Attempt to determine who is using excessive ECSA or causing it to grow rapidly.

The formula is:

IF VALUE Common_Storage.Area EQ ECSA AND
VALUE Common_Storage.Allocation_Percent GE 95

OS390_ECSA_Allocation_Pct_Warn

OS390_ECSA_Allocation_Pct_Warn monitors to determine whether the percentage of the Extended Common Storage Area allocated is between 90% and 94.9% inclusive and issues a Warning if the condition is true. Check the current size of ECSA (the second CSA subparameter in IEASYSxx). The value may need to be adjusted before the next IPL. Attempt to determine who is using excessive ECSA or causing it to grow rapidly.

The formula is:

IF VALUE Common_Storage.Area EQ ECSA AND
VALUE Common_Storage.Allocation_Percent GE 90 AND
VALUE Common_Storage.Allocation_Percent LT 95

OS390_ExpandedToCentralStor_Crit

OS390_ExpandedToCentralStor_Crit monitors the page movement rate from expanded storage to central storage and issues a Critical alert when the threshold value is exceeded. This situation is disabled by default and should be activated only when attempting to solve a problem where excessive page movement is likely to be a cause.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Pages_Read_From_Expanded GE 1000000000

OS390_GlobalEnqueueReserve_Crit

OS390_GlobalEnqueueReserve_Crit monitors to determine whether the maximum wait time or the current wait time of any enqueue is greater than 60 seconds and issues a Critical alert if the condition is true. Check to determine who is holding the ENQ. If it is a batch job that can be cancelled and requeued, you can break the deadlock by so doing. If it is a started task or online user, report the problem to the appropriate personnel.

The formula is:

IF VALUE Enqueues.Maximum_Wait_Time GT 60 OR
VALUE Enqueues.Wait_Time GT 60

OS390_GlobalEnqueueReserve_Warn

OS390_GlobalEnqueueReserve_Warn monitors to determine whether the maximum wait time or the current wait time of any enqueue is between 31 and 60 seconds inclusive and issues a Warning if the condition is true. Check to determine who is holding the ENQ. If it is a batch job that can be cancelled and requeued, you can break the deadlock by so doing. If it is a started task or online user, report the problem to the appropriate personnel.

The formula is:

IF (VALUE Enqueues.Maximum_Wait_Time GT 30 AND
VALUE Enqueues.Maximum_Wait_Time LE 60) OR
(VALUE Enqueues.Wait_Time GT 30 and VALUE Enqueues.Wait_Time LE 60)

OS390_GRS_Broken_Crit

OS390_GRS_Broken_Crit monitors to determine whether the Global Resource Serialization (GRS) complex is broken and issues a Critical alert if it is. If the GRS complex is broken, it may be necessary to attempt to restart GRS from the console. You can display the status of the channel-to-channel adaptors on each system by entering the command D GRS.

The formula is:

IF VALUE Operator_Alerts.GRS_Status EQ Broken

OS390_GRS_Broken_Warn

OS390_GRS_Broken_Warn monitors to determine whether the Global Resource Serialization (GRS) complex is broken and issues a Warning if it is. If the GRS complex is broken, it may be necessary to attempt to restart GRS from the console. You can display the status of the channel-to-channel adaptors on each system by entering the command D GRS.

The formula is:

IF VALUE Operator_Alerts.GRS_Status EQ Broken

OS390_GTF_Active_Crit

OS390_GTF_Active_Crit monitors to determine whether the Generalized Trace Facility is active and issues a Critical alert if the condition is true. While the Generalized Trace Facility is a useful diagnostic tool, it can cause performance degradation. Ensure that GTF is active for the minimum time required to obtain the needed data.

The formula is:

IF VALUE Operator_Alerts.GTF_Active EQ True

OS390_GTF_Active_Warn

OS390_GTF_Active_Warn monitors to determine whether the Generalized Trace Facility is active and issues a Warning if the condition is true. While the Generalized Trace Facility is a useful diagnostic tool, it can cause performance degradation. Ensure that GTF is active for the minimum time required to obtain the needed data.

The formula is:

IF VALUE Operator_Alerts.GTF_Active EQ True

OS390_HSM_RecallWait_Crit

OS390_HSM_RecallWait_Crit monitors to determine whether the wait time in seconds of the longest single HSM recall that is waiting is greater than or equal to 1200 seconds and issues a Critical alert if the condition is true. Make sure that there is no outstanding tape mount for an HSM tape. In some cases, a wait can occur when a Migration Level 1 volume is tied up by a RESERVE or other conflicting activity such as a volume backup.

The formula is:

IF VALUE Operator_Alerts.HSM_Recall_Wait_Time GE 1200

OS390_HSM_RecallWait_Warn

OS390_HSM_RecallWait_Warn monitors to determine whether the wait time in seconds of the longest single HSM recall that is waiting is between 600 and 1199 seconds inclusive and issues a Warning if the condition is true. Make sure that there is no outstanding tape mount for an HSM tape. In some cases, a wait can occur when a Migration Level 1 volume is tied up by a RESERVE or other conflicting activity such as a volume backup.

The formula is:

IF VALUE Operator_Alerts.HSM_Recall_Wait_Time GE 600 and
VALUE Operator_Alerts.HSM_Recall_Wait_Time LT 1200

OS390_Indexed_VTOC_Lost_Crit

OS390_Indexed_VTOC_Lost_Crit monitors the count of devices in this condition and issues a Critical alert if the count is greater than or equal to 5. Refer this problem to an appropriate storage management specialist.

The formula is:

IF VALUE DASD_MVS.Indexed_VTOC_Lost GE 5

OS390_Indexed_VTOC_Lost_Warn

OS390_Indexed_VTOC_Lost_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 but less than 5. Refer this problem to an appropriate storage management specialist.

The formula is:

IF VALUE DASD_MVS.Indexed_VTOC_Lost GT 0 AND
VALUE DASD_MVS.Indexed_VTOC_Lost LT 5

OS390_Local_PageDS_Errors_Crit

OS390_Local_PageDS_Errors_Crit monitors to determine whether the number of errors in a local page dataset is greater than or equal to 5 and issues a Critical alert if the condition is true. Identify the failing dataset or datasets. Check for a spare page dataset slot, and if there is no spare, increase the PAGETOT parameter in IEASYSxx. There should be at least one spare slot per two page datasets. Remove the failing dataset from the PAGE parameter in IEASYSxx. If there is a spare slot, PAGEADD a dataset and use PAGEDEL REPLACE to move the pages to a good dataset.

The formula is:

IF VALUE Page_Dataset_Activity.Errors GE 5

OS390_Local_PageDS_Errors_Warn

OS390_Local_PageDS_Errors_Warn monitors to determine whether the number of errors in a local page dataset is greater than or equal to 1 and less than 5 and issues a Warning if the condition is true. Identify the failing dataset or datasets. Check for a spare page dataset slot, and if there is no spare, increase the PAGETOT parameter in IEASYSxx. There should be at least one spare slot per two page datasets. Remove the failing dataset from the PAGE parameter in IEASYSxx. If there is a spare slot, PAGEADD a dataset and use PAGEDEL REPLACE to move the pages to a good dataset.

The formula is:

IF VALUE Page_Dataset_Activity.Errors GE 1 and
VALUE Page_Dataset_Activity.Errors LT 5

OS390_Local_PageDS_PctFull_Crit

OS390_Local_PageDS_PctFull_Crit monitors to determine whether the percentage of slots in use on a local page dataset is greater than or equal to 35% and issues a Critical alert if the condition is true. When usage approaches 30%, paging efficiency begins to decline, and blocked paging disappears at about 35% occupancy. If this situation occurs, prepare to PAGEADD another dataset if the critical threshold is passed. If the current PAGTOTL setting in IEASYSxx does not allow another dataset to be added, it should be increased before the next IPL.

The formula is:

IF VALUE Page_Dataset_Activity.Dataset_Type EQ Local and
VALUE Page_Dataset_Activity.Percent_Full GE 35

OS390_Local_PageDS_PctFull_Warn

OS390_Local_PageDS_PctFull_Warn monitors to determine whether a local page dataset is greater than or equal to 25% full and less than 35% full and issues a Warning if the condition is true. When usage approaches 30%, paging efficiency begins to decline, and blocked paging disappears at about 35% occupancy. If this situation occurs, prepare to PAGEADD another dataset if the critical threshold is passed. If the current PAGTOTL setting in IEASYSxx does not allow another dataset to be added, it should be increased before the next IPL.

The formula is:

IF VALUE Page_Dataset_Activity.Dataset_Type EQ Local and
VALUE Page_Dataset_Activity.Percent Full GE 25 and
VALUE Page_Dataset_Activity.Percent Full LT 35

OS390_LPAR_OverheadPercent_Crit

OS390_LPAR_OverheadPercent_Crit monitors to determine whether the percentage of time the system spends managing a logical partition is greater than or equal to 20% and issues a Critical alert if the condition is true. Possible causes include saturation of the CPU capacity leading to excessive overhead switching CPUs between LPARs. This can be compounded when there are too many logical processors assigned to an LPAR.

The formula is:

IF VALUE System_CPU_Utilization.Partition_Overhead% GE 20

OS390_LPAR_OverheadPercent_Warn

OS390_LPAR_OverheadPercent_Warn monitors to determine whether the percentage of time the system spends managing a logical partition is greater than or equal to 10% and less than 20% and issues a Warning if the condition is true. Possible causes include saturation of the CPU capacity leading to excessive overhead switching CPUs between LPARs. This can be compounded when there are too many logical processors assigned to an LPAR.

The formula is:

IF VALUE System_CPU_Utilization.Partition_Overhead% GE 10 and
VALUE System_CPU_Utilization.Partition_Overhead% LT 20

OS390_LPAR_STATUS_Crit

OS390_LPAR_STATUS_Crit monitors to determine whether LPAR CPU Management Overhead or Velocity Index have exceeded thresholds and if so, issues a Critical alert. These conditions might not be of immediate concern. In the case of LPAR CPU Management Overhead, if the number of configured LPARs is substantial, it may trigger this situation. If the conditions persist, you may consider reducing the number of configured LPARs. In the case of the Velocity Index, you may want to adjust LPAR weights if the LPARs' workloads are not meeting expected service levels.

The formula is:

IF (VALUE LPAR_Clusters.LPAR_Name NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.LPAR_Effective_Weight_Index LT 0.9) OR
(VALUE LPAR_Clusters.LPAR_NAME NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.Host_LPAR_Flag EQ Y AND
VALUE LPAR_Clusters.CPC_CPU_Overhead GT 15.0)

OS390_LPAR_STATUS_Warn

OS390_LPAR_STATUS_Warn monitors to determine whether LPAR CPU Management Overhead or Velocity Index have exceeded thresholds and if so, issues a Warning. These conditions might not be of immediate concern. In the case of LPAR CPU Management Overhead, if the number of configured LPARs is substantial, it may trigger this situation. If the conditions persist, you may consider reducing the number of configured LPARs. In the case of the Velocity Index, you may want to adjust LPAR weights if the LPARs' workloads are not meeting expected service levels.

The formula is:

IF (VALUE LPAR_Clusters.LPAR_Name NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.LPAR_Effective_Weight_Index LT 1.0) OR
(VALUE LPAR_Clusters.LPAR_NAME NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.Host_LPAR_Flag EQ Y AND
VALUE LPAR_Clusters.CPC_CPU_Overhead GT 10.0)

OS390_MAX_ASIDs_in_Use_Crit

OS390_MAX_ASIDs_in_Use_Crit monitors to determine whether the percentage that represents the maximum number of address space vector table slots that are in use or unavailable is greater than or equal to 90% and issues a Critical alert if the condition is true. Check the values of the MAXUSER, RSVNONR, and RSVSTART parameters as well as for any problems that could lead to address space IDs becoming unusable.

The formula is:

IF VALUE Operator_Alerts.ASVT_Slot_Utilization GE 90

OS390_MAX_ASIDs_in_Use_Warn

OS390_MAX_ASIDs_in_Use_Warn monitors to determine whether the percentage that represents the maximum number of address space vector table slots that are in use or unavailable is between 80 and 89% inclusive, and issues a Warning if the condition is true. Check the values of the MAXUSER, RSVNONR, and RSVSTART parameters as well as for any problems that could lead to address space IDs becoming unusable.

The formula is:

IF VALUE Operator_Alerts.ASVT_Slot_Utilization GE 80 and
VALUE Operator_Alerts.ASVT_Slot_Utilization LT 90

OS390_Network_ResponseTime_Crit

OS390_Network_ResponseTime_Crit monitors the Network Response Time and when it equals or exceeds 10, issues a Critical alert. Appropriate personnel should be notified if the condition persists.

The formula is:

IF VALUE User_Response_Time.Network_Response GE 10

OS390_Network_ResponseTime_Warn

OS390_Network_ResponseTime_Warn monitors the Network Response Time and when it equals or exceeds 5 but is less than 10, issues a Warning. Appropriate personnel should be notified if the condition persists.

The formula is:

IF VALUE User_Response_Time.Network_Response GE 5 AND
VALUE User_Response_Time.Network_Response LT 10

OS390_OLTEP_Active_Crit

OS390_OLTEP_Active_Crit monitors to determine whether OLTEP is active and issues a Critical alert if the situation is true. Determine who is using OLTEP and minimize the time of its use.

The formula is:

IF VALUE Operator_Alerts.OLTEP_Active EQ True

OS390_OLTEP_Active_Warn

OS390_OLTEP_Active_Warn monitors to determine whether OLTEP is active and issues a Warning alert if it is. Determine who is using OLTEP and minimize the time of its use.

The formula is:

IF VALUE Operator_Alerts.OLTEP_Active EQ True

OS390_Outstanding_WTORs_Crit

OS390_Outstanding_WTORs_Crit monitors to determine whether the number of outstanding Write to Operator with Reply requests is greater than or equal to 12 and issues a Critical alert if the condition is true. Check the operator console for outstanding replies and address these. If all of the outstanding replies are correct and routine, you may want to adjust this situation's threshold.

The formula is:

IF VALUE Operator_Alerts.Outstanding_Operator Replies GE 12

OS390_Outstanding_WTORs_Warn

OS390_Outstanding_WTORs_Warn monitors to determine whether the number of outstanding Write to Operator with Reply requests is between 10 or 11 inclusive and issues a Warning if the condition is true. Check the operator console for outstanding replies and address these. If all of the outstanding replies are correct and routine, you may want to adjust this situation's threshold.

The formula is:

IF VALUE Operator_Alerts.Outstanding_Operator_Replies GE 10 AND
VALUE Operator_Alerts.Outstanding_Operator_Replies LT 12

OS390_PageDSNotOperational_Crit

OS390_PageDSNotOperational_Crit monitors the number of page datasets in this condition and issues a Critical alert if the number is greater than or equal to 5. Verify that paging devices are operational. If a device is not operational, attempt to VARY it online. If a page data set was drained by a prior PAGEDEL DRAIN command, it may now be removed by a PAGEDEL DELETE command. If this alert occurs without warning, an IPL may be imminent. Prepare to shut down and request appropriate assistance.

The formula is:

IF VALUE System_Paging_Activity.Datasets_Not_Operational GE 5

OS390_PageDSNotOperational_Warn

OS390_PageDSNotOperational_Warn monitors the number of page datasets in this condition and issues a Warning if the number is from 1 to 4 inclusive. Verify that paging devices are operational. If a device is not operational, attempt to VARY it online. If a page data set was drained by a prior PAGEDEL DRAIN command, it may now be removed by a PAGEDEL DELETE command. If this alert occurs without warning, an IPL may be imminent. Prepare to shut down and request appropriate assistance.

The formula is:

IF VALUE System_Paging_Activity.Datasets_Not_Operational GT 0 AND
VALUE System_Paging_Activity.Datasets_Not_Operational LT 5

OS390_Page_Rate_Crit

This situation monitors the current paging rate and raises a Critical alert when the rate exceeds the threshold. The formula is:

IF VALUE Page_Dataset_Activity.Page_Rate *LT 0

Excessive paging may increase application wait and response time. Because system page rate is dependent on processor type, real storage configuration, and workload you may need to adjust your paging system based on your installation defined service requirements. This situation is disabled by default and should be activated only when necessary to characterize a chronic excessive paging problem.

OS390_Page_Rate_Warn

This situation monitors the current paging rate and raises a Warning alert when the rate exceeds the threshold. The formula is:

IF VALUE Page_Dataset_Activity.Page_Rate *LT 0

Because system page rate is dependent on processor type, real storage configuration, and workload you may need to adjust your paging system based on your installation defined service requirements. This situation is disabled by default and should be activated only to diagnose a chronic high paging rate.

OS390_Physical_CPUs_Online_Crit

OS390_Physical_CPUs_Online_Crit monitors the number of online CPUs and issues a Critical alert when the number is less than the current threshold. This situation is disabled (set to 0) by default and should be activated only to diagnose chronic configuration problems.

The formula is:

IF VALUE System_CPU_Utilization.Physical_CPU_Count LT 0

OS390_Physical_CPUs_Online_Warn

OS390_Physical_CPUs_Online_Warn monitors the number of online CPUs and issues a warning when the number is less than the current threshold. This situation is disabled (set to 0) by default and should be activated only to diagnose chronic configuration problems.

The formula is:

IF VALUE System_CPU_Utilization.Physical_CPU_Count LT 0

OS390_RMF_Not_Active_Crit

OS390_RMF_Not_Active_Crit monitors to determine whether the RMF monitor is inactive and issues a Critical alert if the condition is true. RMF data is essential to performance management and problem analysis. If you cannot restart the RMF, notify appropriate personnel.

The formula is:

IF VALUE Operator_Alerts.RMF_Not_Active EQ True

OS390_RMF_Not_Active_Warn

OS390_RMF_Not_Active_Warn monitors to determine whether the RMF monitor is inactive and issues a Warning if the condition is true. RMF data is essential to performance management and problem analysis. If you cannot restart the RMF, notify appropriate personnel.

The formula is:

IF VALUE Operator_Alerts.RMF_Not_Active EQ True

OS390_SMF_Not_Recording_Crit

OS390_SMF_Not_Recording_Crit monitors to determine whether the SMF is recording information and issues a Critical alert if the condition is true. SMF data has numerous uses including resource accounting and capacity management. Check the SMF datasets and restart the collection process as soon as possible. If you cannot restart the SMF datasets, notify appropriate personnel.

The formula is:

IF VALUE Operator_Alerts.SMF_Not_Recording EQ True

OS390_SMF_Not_Recording_Warn

OS390_SMF_Not_Recording_Warn monitors to determine whether the SMF is recording information and issues a Warning if the condition is true. SMF data has numerous uses including resource accounting and capacity management. Check the SMF datasets and restart the collection process as soon as possible. If you cannot restart the SMF datasets, notify appropriate personnel.

The formula is:

IF VALUE Operator_Alerts.SMF_Not_Recording EQ True

OS390_SYSLOG_Not_Recording_Crit

OS390_SYSLOG_Not_Recording_Crit monitors to determine whether the System Log is recording information and issues a Critical alert if the condition is true. Determine why logging has stopped. A possibility is that JES spool space is exhausted.

The formula is:

IF VALUE Operator_Alerts.SYSLOG_Not_Recording EQ True

OS390_SYSLOG_Not_Recording_Warn

OS390_SYSLOG_Not_Recording_Warn monitors to determine whether the System Log is recording information and issues a Warning if the condition is true. Determine why logging has stopped. A possibility is that JES spool space is exhausted.

The formula is:

IF VALUE Operator_Alerts.SYSLOG_Not_Recording EQ True

OS390_System_Page_Rate_Crit

This situation monitors the system page rate and raises a Critical alert when the threshold is reached. The formula is:

IF VALUE System_Paging_Activity.System_Page_Rate *LT 0

Excessive paging may increase application wait and response time. Because system page rate is dependent on processor type, real storage configuration, and workload you may need to adjust your paging system based on your installation defined service requirements. This situation is disabled by default and should be activated only to diagnose a chronic high paging rate.

OS390_System_Page_Rate_Warn

This situation monitors the system page rate and raises a Warning alert when the threshold is reached. The formula is:

IF VALUE System_Paging_Activity.System_Page_Rate *LT 0

Excessive paging may increase application wait and response time. Because system page rate is dependent on processor type, real storage configuration, and workload you may need to adjust your paging system based on your installation defined service requirements. This situation is disabled by default and should be activated only to diagnose a chronic high paging rate.

OS390_System_PageFault_Rate_Crit

OS390_System_PageFault_Rate_Crit monitors the system page fault rate and issues a Critical alert when the threshold is exceeded. This situation is shipped disabled by default.

The formula is:

IF VALUE System_Paging_Activity.Page_Fault_Rage GE 1000000000

OS390_System_PageFault_Rate_Warn

OS390_System_PageFault_Rate_Warn monitors the system page fault rate and issues a Warning when the threshold is exceeded. This situation is shipped disabled by default.

The formula is:

IF VALUE System_Paging_Activity.Page_Fault_Rage GE 1000000000

OS390_Tape_Dropped_Ready_Crit

OS390_Tape_Dropped_Ready_Crit monitors the number of tape drives in this condition and issues a
Critical alert if the threshold is exceeded. Check the devices and attempt to make them ready. If this is not possible, report the condition to the appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Dropped_Ready GE 5

OS390_Tape_Dropped_Ready_Warn

OS390_Tape_Dropped_Ready_Warn monitors the number of tape drives in this condition and issues a Warning if the threshold is exceeded. Check the devices and attempt to make them ready. If this is not possible, report the condition to the appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Dropped_Ready GT 0 AND
VALUE Tape_Drives.Dropped_Ready LT 5

OS390_Tape_Mount_Pend_Time_Crit

This situation issues a Critical alert when any tape unit has been waiting for a tape mount for more than 1200 seconds (20 minutes) or more. The situation has a monitoring interval of 5 minutes. The situation message displays the volume or volumes being requested to allow tape operators to know which tape volumes require mounts.

The formula is:
IF VALUE Tape_Drives.Tape_Mount_Pending_Time *GE 1200

A tape mount pending time that exceeds the threshold might require contacting operations to ensure that the MOUNT request has been recognized by personnel responsible for mounting tapes on the requested tape unit.

OS390_Tape_Mount_Pend_Time_Warn

This situation issues a Warning alert when any tape unit has been waiting for a tape mount for more than 600 seconds (10 minutes) and less than 1200 seconds (20 minutes). The situation has a monitoring interval of 2 minutes. The warning situation message displays the volume or volumes being requested to allow tape operators to know which tape volumes require mounts.

The formula is:
IF VALUE Tape_Drives.Tape_Mount_Pending_Time *GT 600 *AND
*VALUE Tape_Drives.Tape_Mount_Pending_Time *LT 1200

A tape mount pending time that exceeds the threshold might require contacting operations to ensure that the MOUNT request has been recognized by personnel responsible for mounting tapes on the requested tape unit.

OS390_Tape_Not_Responding_Crit

OS390_Tape_Not_Responding_Crit monitors the number of tape drives in this condition and issues a Critical alert if the threshold is exceeded. If the condition is persistent and the devices cannot be activated by VARYing them online, report the problem to the appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Not_Responding GE 5

OS390_Tape_Not_Responding_Warn

OS390_Tape_Not_Responding_Warn monitors the number of tape drives in this condition and issues a Warning if the threshold is exceeded. If the condition is persistent and the devices cannot be activated by VARYing them online, report the problem to the appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Not_Responding GT 0 AND
VALUE Tape_Drives.Not_Responding LT 5

OS390_Tape_Permanent_Error_Crit

OS390_Tape_Permanent_Errors_Crit monitors the count of permanent errors on a tape drive and issues a Critical alert if the number is greater than or equal to 30.

The formula is:

IF VALUE Tape_Drives.Permanent_Errors GE 30

OS390_Tape_Permanent_Error_Warn

OS390_Tape_Permanent_Errors_Warn monitors the count of permanent errors on a tape drive and issues a Warning if the number is between 5 and 29 inclusive.

The formula is:

IF VALUE Tape_Drives.Permanent_Errors GE 5 AND
VALUE Tape_Drives.Permanent_Errors LT 30

OS390_Tape_Temp_Errors_Crit

OS390_Tape_Temp_Errors_Crit monitors the count of temporary errors on a tape drive and issues a Critical alert if the number is greater than or equal to 30. The problem could be caused either by the media or by the device. Monitor to determine whether there is additional degradation and if so, report the problem to appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Temporary_Errors GE 30

OS390_Tape_Temp_Errors_Warn

OS390_Tape_Temp_Errors_Warn monitors the count of temporary errors on a tape drive and issues a
Warning if the number is between 5 and 29 inclusive. The problem could be caused either by the media or by the device. Monitor to determine whether there is additional degradation and if so, report the problem to appropriate personnel.

The formula is:

IF VALUE Tape_Drives.Temporary_Errors GE 5 and
VALUE Tape_Drives.Temporary_Errors LT 30

OS390_Undispatched_Tasks_Crit

OS390_Undispatched_Tasks_Crit monitors to determine whether the number of tasks or address spaces that have not been dispatched by the SRM due to constraints is greater than or equal to 20 and issues a Critical alert if the condition is true. If the condition persists for more than an hour, a capacity upgrade may be required. Determine whether any important service classes are missing their goals.

The formula is:

IF VALUE System_CPU_Utilization.Undispatched_Tasks GE 20

OS390_Undispatched_Tasks_Warn

OS390_Undispatched_Tasks_Warn monitors to determine whether the number of tasks or address spaces that have not been dispatched by the SRM due to constraints is greater than or equal to 5 and less than 20 and issues a Warning if the condition is true. If the condition persists for more than an hour, a capacity upgrade may be required. Determine whether any important service classes are missing their goals.

The formula is:

IF VALUE System_CPU_Utilization.Undispatched_Tasks GE 05 AND
VALUE System_CPU_Utilization.Undispatched_Tasks LT 20

OS390_Unowned_Common_Stor_Crit

OS390_Unowned_Common_Stor_Crit monitors the amount of unowned storage in the Common Services Area and issues a Critical alert if the threshold is exceeded. Ensure that the CSA Analyzer collector is running.

The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Unowned GE 1000000000

OS390_Unowned_Common_Stor_Warn

OS390_Unowned_Common_Stor_Warn monitors the amount of unowned storage in the Common Services Area and issues a Warning alert if the threshold is exceeded. Ensure that the CSA Analyzer collector is running.

The formula is:

IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Unowned GE 1000000000

OS390_Unref_Interval_Cnt_Crit

OS390_Unref_Interval_Cnt_Crit monitors to determine whether the storage type is Summary and the amount of time, in seconds, that the oldest frame of pageable storage has gone without being referenced is less than or equal to 10 seconds, and issues a Critical alert if the condition is true.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Unreferenced_Interval_Count LE 10

If this situation is raised, determine whether any important service classes are failing to meet their goals and if Private Page-in Wait is a significant reason. If so, central storage may be overcommitted, possibly the result of a capacity problem.

Because lower values indicate resource contention, the unreferenced interval count (UIC) is an alert where the critical threshold must be less than the warning threshold.

OS390_Unref_Interval_Cnt_Warn

OS390_Unref_Interval_Cnt_Warn monitors to determine whether the storage type is Summary and the amount of time, in seconds, that the oldest frame of pageable storage has gone without being referenced is greater than 10 and less than 20 seconds, and issues a Warning if the condition is true. Determine whether any important service classes are failing to meet their goals and, if Private Page-in Wait is a significant reason. If so, central storage may be overcommitted, possibly the result of a capacity problem.

The formula is:

IF VALUE Real_Storage.Storage_Type EQ Summary AND
VALUE Real_Storage.Unreferenced_Interval_Count LT 20 AND
VALUE Real_Storage.Unreferenced_Interval_Count GT 10

Because lower values indicate resource contention, the unreferenced interval count (UIC) is an alert where the warning threshold must be greater than the critical.

OS390_User_Host_Resp_Time_Crit

OS390_User_Host_Resp_Time_Crit monitors to determine whether the host (internal) response time for the indicated TSO user is exceeding the Critical threshold and if so, issues a Critical alert. If the user's service class is meeting its goal, there may be a specific problem in this user's address space. If the service class is missing its goal, the goal may be too demanding and may need to be adjusted.

The formula is:

IF VALUE User_Response_Time.Host_Response GE 100000000

OS390_User_Host_Resp_Time_Warn

OS390_User_Host_Resp_Time_Warn monitors to determine whether the host (internal) response time for the indicated TSO user is exceeding the Warning threshold and if so, issues a Warning alert. If the user's service class is meeting its goal, there may be a specific problem in this user's address space. If the service class is missing its goal, the goal may be too demanding and may need to be adjusted.

The formula is:

IF VALUE User_Response_Time.Host_Response GE 100000000

OS390_User_Total_Resp_Time_Crit

OS390_User_Total_Resp_Time_Crit monitors the total response time (host plus network) for a TSO user and issues a Critical alert if the threshold is exceeded. If the service class for this user is meeting its goal, the problem may be a network response time problem. The formula is:

IF VALUE User_Response_Time.Total_Response GE 100000000.0

OS390_User_Total_Resp_Time_Warn

OS390_User_Total_Resp_Time_Warn monitors the total response time (host plus network) for a TSO user and issues a Warning alert if the threshold is exceeded. If the service class for this user is meeting its goal, the problem may be a network response time problem.

The formula is:

IF VALUE User_Response_Time.Total_Response GE 100000000.0

OS390_WTO_Buffers_Left_Crit

OS390_WTO_Buffers_Left_Crit monitors to determine whether the remaining WTO buffer pool is becoming dangerously small and issues a Critical alert if the condition is true. Determine whether a console device is down and if so, switch the message stream to another device.

The formula is:

IF VALUE Operator_Alerts.WTO_Buffers_Remaining LE 20

OS390_WTO_Buffers_Left_Warn

OS390_WTO_Buffers_Left_Warn monitors to determine whether the remaining WTO buffer pool is becoming short of resources and issues a Warning if the condition is true. Determine whether a console device is down and if so, switch the message stream to another device.

The formula is:

IF VALUE Operator_Alerts.WTO_Buffers_Remaining GT 20 AND
VALUE Operator_Alerts.WTO_Buffers_Remaining LE 100

Quiesced_UNIX_File_System

Quiesced_UNIX_File_System detects a quiesced file system. The formula is:

IF *VALUE USS_Mounted_File_Systems.Status *EQ Quiesced

The file system indicated is in a Quiesced state. This condition might not be a matter of immediate concern. For example, this could be due to an HSM backup recovery in progress. If this condition persists, a system programmer should be notified.

Shortage_of_UNIX_Processes_Crit

Shortage_of_UNIX_Processes_Crit checks to determine if the current number of processes is very close to 90. The formula is:

*VALUE USS_Kernel.Used_Processes *GE 90

If the maximum is reached, no more processes can be started. If this condition is due to expected growth, increase the maximum value. Otherwise, a system programmer should be notified.

Shortage_of_UNIX_Processes_Warn

Shortage_of_UNIX_Processes_Warn checks to determine if the current number of processes is between 80 and 90. The formula is:

*VALUE USS_Kernel.Used_Processes *GE 80 *AND *VALUE USS_Kernel.Used_Processes *LT 90

If the maximum number of processes is reached, no more processes can be started. If this condition is due to expected growth, increase the maximum value. Otherwise, a system programmer should be notified.

UNIX_ENQ_Contention_Critical

UNIX_ENQ_Contention_Critical detects when an HFS enqueue contention has lasted 30 seconds or more. The formula is:

*VALUE USS_HFS_ENQ_Contention.Time *GE 30

If this condition occurs, check the details to determine who is holding the enqueue. If it is a batch job that can be canceled and requeued, the deadlock can be broken by doing that. If it is a started task, a UNIX process, or an online user, a system programmer should be notified.

UNIX_ENQ_Contention_Warning

ENQ_Contention_Warning detects when an HFS enqueue contention has lasted 10 seconds or more. The formula is:

*VALUE USS_HFS_ENQ_Contention.Time *GE 10 *AND
*VALUE USS_HFS_ENQ_Contention.Time *LT 30

If this condition occurs, check the details to determine who is holding the enqueue. If it is a batch job that can be canceled and requeued, the deadlock can be broken by doing that. If it is a started task, a UNIX process, or an online user, a system programmer should be notified.

UNIX_File_System_FreeSpace_Crit

UNIX_File_System_Free_Space_Critical detects when any file system has less than 10% free space. The formula is:

IF *VALUE USS_Mounted_File_Systems.Percent_Used *GE 90

If this condition occurs, the file system space should be extended. This can be accomplished with UNIX System Services commands. If this condition becomes chronic, a system programmer should be notified.

UNIX_File_System_FreeSpace_Warn

UNIX_File_System_Free_Space_Warning detects when any file system has less than 20% free space. The formula is:

IF *VALUE USS_Mounted_File_Systems.Percent_Used *GE 80 *AND
*VALUE USS_Mounted_File_Systems.Percent_Used *LT 90

If this condition occurs, consider extending the file system space. This can be accomplished with UNIX System Services commands. If this condition becomes chronic, a system programmer should be notified.

UNIX_Logged_On_User_Idle

Logged_On_User_Idle detects a logged-on user with excessive idle time. The formula is:

IF *VALUE USS_Logged_on_Users.Idle_Time_Mins *GT 480

This condition might not be a matter of immediate concern. Consult installation procedures for appropriate action.

UNIX_Max_Sockets_Critical

UNIX_Max_Sockets_Critical detects when the percentage of UNIX sockets in use has reached 95%. The formula is:

IF *VALUE USS_Kernel.USock_Curr_Pct *GE 95.0

This situation indicates that usage of UNIX sockets is near the maximum. If the maximum is reached, UNIX System Services functionality will be adversely affected. Help from a system programmer is needed immediately. Consider increasing the value of NETWORK DOMAINNAME(AF_UNIX)- MAXSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value, or, to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

UNIX_Max_Sockets_Warning

UNIX_Max_Sockets_Warning indicates that the percentage of UNIX sockets in use is between 80 and 95%. The formula is:

*IF *VALUE USS_Kernel.USock_Curr_Pct *GE 80.0 *AND
*VALUE USS_Kernel.US ock_Curr_Pct *LT 95.0


This situation indicates that usage of UNIX sockets is approaching the maximum. If the maximum is reached, UNIX System Services functionality will be adversely affected. Consider increasing the value of NETWORK DOMAINNAME(AF_UNIX)- MAXSOCKETS(). Use the SETOMVS RESET command to dynamically change the MAXSOCKETS value, or, to make a permanent change, edit the BPXPRMxx member in SYS1.PARMLIB.

Unwanted_inetd_UNIX_Process

Unwanted_inetd_UNIX_Process detects a missing or unwanted inetd process. The formula is:

IF *VALUE USS_Processes.Command_Name *EQ inetd

The inetd daemon provides UNIX networking services. If the active process is unwanted, consult the installation procedures for appropriate action. If the inetd daemon should be, stop or do not start this situation.