Health monitoring

A TCP/IP load balancer that is allocating a connection to the Gateway daemon detects whether or not a CICS® server is available. The Gateway daemon reports the health of its CICS server connections to the TCP/IP load balancer.

The best results are obtained if one Gateway daemon connects to one CICS server. If a Gateway daemon connects to more than one CICS server, a failure in one CICS server may prevent work from being sent to the others.

Sysplex Distributor and TCP/IP port sharing can both use health monitoring when allocating new connections. If you do not activate health reporting in the Gateway daemon, statistics are still collected by the Gateway daemon, but are not reported to the TCP/IP load balancer.

Health reporting is effective exclusively in TCP/IP load balancing topologies with CICS Transaction Gateway running in remote mode. Over intervals specified by the health interval setting, the Gateway daemon monitors certain error codes to determine the health of communications with CICS. The TCP/IP load balancer then prioritizes the creation of new incoming client application connections to Gateway daemons in the load balancing group. Gateway daemons reporting a higher health value receive a greater proportion of the incoming connections than those reporting a lower health value.

Health monitoring in a Sysplex with Sysplex Distributor

The diagram shows Gateway daemons reporting on the availability of CICS servers to IBM® Workload Manager.

The image shows a remote WebSphere Application Server connecting through Sysplex Distributor to Gateway daemon CTG1, which connects over EXCI to CICS1. CTG1 and CICS1 are both in z/OS LPAR1, and are cloned as Gateway daemon CTG2 and CICS2 in z/OS LPAR2. Both Gateway daemons report to a TCP/IP workload manager. Availability of CICS1 is reported as CTG1 WLM weighting. Availability of CICS2 is reported as CTG2 WLM weighting.

How health is calculated

The Gateway daemon health interval defines the amount of time, in seconds, that the Gateway daemon monitors particular error codes to determine the health of communications with CICS. The default health interval is 60 seconds. If no connectivity problems occur, the Gateway daemon health remains at 100.

Intermittent problems can cause the health of communications with CICS to drop which, in turn, causes the load balancer to reduce the amount of work sent to the CICS server affected. If the problem disappears, health recovers.

If the health of communications with CICS drops to zero, the Gateway daemon issues a warning message, and the load balancer stops sending connection requests to the Gateway daemon until the health value has been reset by a Gateway daemon administrator.

These CICS return codes indicate that a request failed because of problems with the health of communications with CICS:
  • ECI_ERR_NO_CICS
  • ECI_ERR_RESOURCE_SHORTAGE
  • ECI_ERR_SYSTEM_ERROR
  • ESI_ERR_NO_CICS
  • ESI_ERR_RESOURCE_SHORTAGE
  • ESI_ERR_SYSTEM_ERROR

The health of communications with CICS represents the percentage of requests during the health interval that succeeded. If all requests succeed, health of communications with CICS is 100. If 30% of requests fail, health is 70. If there are fewer than 20 requests in the interval, each failing request reduces health by 5, however the health of communications with CICS can never drop below zero.

The table shows how health can fluctuate:

Table 1. Health fluctuation
Event Requests processed Requests failed System health
Health interval 1 1000 200 80%
Health interval 2 0 0 80%
Health interval 3 500 50 90%
Health interval 4 15 1 95%
Health interval 5 200 0 100%

Concept Concept

Feedback


Timestamp icon Last updated: Tuesday, 19 November 2013


https://ut-ilnx-r4.hursley.ibm.com/tgzos_latest/help/topic/com.ibm.cics.tg.zos.doc//ctgzos/c0200020.html