Define health policy condition properties

Use this page to define health policy condition properties while creating a new health policy. To view this administrative console page, click Operational Policies > Health Policies > New.

Extended information about the health policy properties follows:

Age-based condition:

Maximum age This field sets the age value so that the policy restarts the associated members when their age reaches that value. Acceptable values are positive whole numbers in days or hours between 1 hour and 365 days. Decimal numbers are not supported. To use fractions of days, convert to hours. For example, for 1.5 days, use 36 hours.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations on the Runtime tasks page.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach. Restart server: Restarts the server. For the age-based condition policy, the action must be Restart server.

Excessive response time condition:

Response time This field is available for the excessive response time condition health policy. The excessive response time policy restarts members when the average number of requests completed exceeds this period of time. Acceptable values for this field are between and including 1 millisecond and 60 minutes.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach. Restart server: Restarts the server. For the excessive response time condition policy, the action must be Restart server.

Excessive request timeout condition:

Total memory used The excessive memory policy restarts members when the memory usage exceeds a percentage of your heap size over a period of time. The total memory used percentage is used with the time over memory threshold value to determine when to restart members. Acceptable values for this field are whole numbers from 1 to 99.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach.
  • Take thread dumps: Takes thread dumps on IBM Java Development Kit (JDK).
  • Restart server: Restarts the server.

Memory condition: excessive memory:

JVM heap size Threshold value for the percentage of the maximum heap size used for the Java Virtual Machine process. Acceptable values for this field are whole numbers from 1 to 99.
Offending time period Time period over which the JVM heap threshold must breach. The time that the total memory must be over the threshold value prior to corrective action. Acceptable values for this field are between, and including, 1 second and 60 minutes.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach. Restart server: Restarts the server. For the memory condition: excessive memory condition policy, the action must be Restart server.

Memory condition: memory leak:

Detection level for condition You can choose from the following detection levels. For each level there is a trade-off between the speed and accuracy of detecting suspected memory leaks.
  • Faster detection, higher probability of false alarms: A faster detection policy detects a potential memory leak quickly, however it has a greater chance of falsely identifying a memory leak than a slower detection policy because it analyzes before the Java heap has expanded to its maximum configured size.
  • Standard detection, standard probability of false alarms: A standard detection policy is more accurate than a faster one, but not as quick to identify a potential memory leak. The standard and faster settings require the same amount of historical data, but the standard setting analyzes after the Java heap has expanded to its maximum configured size.
  • Slower detection, lower probability of false alarms: A slower detection policy is the most accurate, however it does not detect a potential memory leak as quickly as the faster detection policy does. The slower setting requires the most historical data.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach.
  • Take JVM heap dumps on IBM Java Development Kit (JDK) only: Takes heap dumps on IBM JDK.
  • Restart server: Restarts the server.

Storm drain condition

Detection level for condition You can choose from the following detection levels. For each level there is a trade-off between the speed and accuracy of detecting suspected memory leaks.
  • Standard detection, normal probability of false alarms: A standard detection policy is less accurate than a slower one, but quicker to identify a potential memory leak. This policy uses fewer samples (N=10) for both response times and deployment workload manager weights and tries to detect a change point in each of the metrics based on the sample set. It reaches a conclusion faster because it waits for 20 samples, 10 for the left mean and 10 for the right mean, for calculating a difference of means and looking for a local maximum. The samples are collected at intervals of 15 seconds. Storm drain can be detected within five minutes of its occurrence. Because the number of samples is smaller, if the samples have a lot of transient peaks or dips, there is a higher probability false alarms.
  • Slower detection, lower probability of false alarms: A slower detection policy is the most accurate, however it does not detect a potential memory leak as quickly as the standard detection policy does. This policy uses more samples (N=15) for both response times and deployment workload manager weights. It reaches a conclusion slower because it has to wait for 30 samples (15 for the left mean and 15 for the right mean) for calculating a difference of means. The detection time is seven minutes and 30 seconds. Because the number of samples is higher, the presence of a few samples with transient peaks or dips does not overtly affect the means and the probability of false alarms is lower.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach. Restart server: Restarts the server. For the storm drain condition policy, the action must be Restart server.

Workload condition:

Total requests In this field you can assign a numerical request value to your workload policy. The workload condition policy restarts members when this number of requests is serviced. An acceptable request value must be a whole number between 1000 and 9223372036854775807.
Reaction mode
  • Supervise: Indicates the health policies are active and recommendations for appropriate actions are being sent to the administrator, who can accept or decline the recommendations.
  • Automatic: Indicates the health policies are active, and the system is both logging data and taking action.
Select actions to take on health condition breach Restart server: Restarts the server. For the workload condition policy, the action must be Restart server.

When you complete the fields, click Next.