Health policy settings
Use this page to modify existing health policies. Health policies
are used to maintain a healthy environment using prevention and detection
methodologies.
To view this administrative console page, click Operational Policies
> Health Policies > health_policy_name.
If you are a user with either a monitor or an operator role, you can only
view health policy information. If you are a user with either a configurator
or an administrator role, you have all configuration privileges for health
policies.
This page has two tabs: Configuration and Local Topology.
On the Configuration tab, you can view and configure the settings for
the health policy. On the Local Topology tab, you can view the health
policy memberships in a visual representation.
- Name
Specifies the name of a health policy. The health policy name is
required and must be unique among all the health policies in the cell.
The name cannot begin with a period (.) or
a space. A space does not generate an error, but leading and trailing spaces
are automatically deleted. Use meaningful and consistent health policy names.
For example, age-based health policies can be indicated by naming the policies AGE_20DAYS, AGE_15DAYS, and so on.
- Description
Specifies an additional description of the health policy. The description
is optional. You can edit the description when you are creating or editing
a health policy. Consider using the optional description when you are using
many health policies or when multiple administrators manage the same set of
health policies.
- Health condition
The health condition defines the specific policy that is implemented.
Some policies are prevention-based and some are detection-based.
Prevention-based policies are used to avoid conditions that might lead to
problems, while the detection-based policies are used to identify existing
conditions and to achieve resolution. These policies can be used to perform
health-based assessments on clusters, dynamic clusters, and application server
instances running on nodes. In the case of dynamic clusters, regardless of
the health policy that you are using, the minimum number of dynamic cluster
instances remains running.
- The age-based condition policy restarts the associated members
when their age reaches a certain user-defined value. This restart cleans out
all cached and memory acquired data. If you select age-based condition policy,
you must define the age criteria. The age-based condition
is supported for all server types.
- The excessive request timeout condition policy tracks the memory
that is used for request timeouts. When the percentage of timeouts exceeds
the breach of condition, the members are restarted. If you select the excessive
request timeout condition, you must set the memory used percentage threshold. The excessive request timeout condition is supported for
all server types.
Restriction: The excessive request timeout
condition does not apply to Java Message Service (JMS) and Internet Inter-ORB
Protocol (IIOP) traffic.
- The excessive response time condition policy tracks the requests
and the amount of time they take to complete. Use this policy to clean up
servers that have an average number of requests that take longer than a specified
time. If an average number of requests takes longer than a certain amount
of time, the members are restarted. When you select the excessive response
time policy, you must define the response time threshold. The
excessive response time condition is supported for all server types.
- The memory condition: excessive memory usage policy tracks the
memory usage for a member. When the memory usage exceeds a percentage of the
heap size for a specified time, actions are taken to correct this situation.
If you define the health policy against a standalone server, static cluster,
or dynamic cluster in manual mode, then the member stops and restarts. If
you define the health policy against a dynamic cluster that is in automatic
or supervised mode, then the member that is flagged by the condition stops.
The placement controller dynamically decides which, if any, servers to start
based on its evaluation of the environment. These actions occur automatically
if you are in automatic mode. If you are in supervised mode, you can approve
the runtime tasks are generated to correct the situation. If you select the
excessive memory usage policy, you must define the memory used and the time-over-memory
threshold. The excessive memory usage condition is
supported only on application servers on nodes that run WebSphere Application
Server or WebSphere Application Server Community Edition.
You cannot define the excessive memory usage condition for other middleware
server types.
- The memory condition: memory leak policy tracks consistent downward
trends in free memory that is available to a server in the Java heap. The
detection level setting determines when these trends are detected. If you
select the memory condition: memory leak policy, you must define a detection
level. The slower detection level setting requires the most historical data.
The normal and faster detection level settings require the same amount of
historical data, but the faster setting allows analysis before the Java heap
has expanded to its maximum configured size. This provides earlier detection
capability, but is also more prone to false positives. This condition supports
heap dumps in addition to server restarts as reactions. The
memory leak condition is not supported for other middleware server types.
- The storm drain condition policy tracks stuck requests. The server
that is associated with this policy restarts when the specified detection
level is reached. Storm drain detection relies on change point detection on
a given time series data. The metrics that are used for detecting storm drain
are the response times and deployment workload manager weights that are observed
for the server. The storm drain condition applies only to dynamic clusters
and cells. If you select the storm drain condition policy, you must select
the detection level.
To detect change points, the health controller calculates
a left mean and a right mean for a given point. For a point, the left mean
consists of the mean value of N samples that arrive prior to this sample,
and the right mean is the mean value of N samples, including the current
point, that arrive later. The difference of the left and the right mean values
is stored and compared with other differences in a set of values to N to
determine if this difference is a local maxima. If this difference is the
maximum difference, then the point to which this difference corresponds, is
declared as a change point. The two metrics that are used for detecting
storm drain are the response times and dynamic workload manager weights that
are observed for the server.
The storm drain condition
is supported for all server types.Restriction: The storm
drain condition does not apply to JMS and IIOP traffic.
- The workload condition policy restarts the members when a certain
user-defined number of requests are serviced. This policy cleans out the memory
and caches. If you select the workload policy, you must define the total request
criteria. The workload condition is supported for all
server types.
- Health condition properties
Specifies properties that are specific to the health condition.
Table 1. Age-based condition properties
Setting |
Description |
Maximum age |
This field is available for the age-based policy.
The age-based condition policy restarts the associated members when their
age reaches the maximum age. Acceptable values are positive whole numbers
in days or hours between 1 hour and 365 days. To enter a
value like 1.2 days, use 36 hours, because decimal numbers
are not supported.
|
Table 2. Excessive request timeout condition properties
Setting |
Description |
Timed out requests |
The excessive memory usage condition policy restarts
members when the memory usage exceeds a percentage of your heap size over
time. The total memory used percentage is used with the time over memory threshold
value to determine when to restart members. Acceptable values for this field
are whole numbers between 1 and 99.
|
Table 3. Excessive response time condition properties
Setting |
Description |
Response time |
This field is available for the excessive response
time condition policy. The excessive response time condition policy restarts
members when the average number of responses completed exceeds a given period.
Acceptable values for this field are between 1 millisecond and 60 minutes.
|
Table 4. Memory condition: excessive memory usage properties
Setting |
Description |
JVM heap size |
The excessive memory usage condition policy restarts
members when the memory usage exceeds a percentage of your heap size over
time. The total memory used percentage is used with the time over memory threshold
value to determine when to restart members. Acceptable values for this field
are whole numbers between 1 and 99.
|
Offending time period |
This field is available for the excessive memory
usage condition policy. The excessive memory usage condition policy restarts
members when the memory usage exceeds a percentage of your heap size over
time. Acceptable values for this field are between 1 second and 60 minutes.
|
Table 5. Memory condition: memory leak condition properties
Setting |
Description |
Detection level |
You can choose from the following detection levels.
For each level a trade-off exists between the speed and accuracy of detecting
suspected memory leaks.
- Faster detection, higher probability of false alarms: A faster
detection level detects a potential memory leak quickly, however this detection
level has a greater chance of falsely identifying a memory leak than a slower
detection policy because the analysis is done before the Java heap expands
to its maximum configured size.
- Standard detection, standard probability of false alarms: A standard
detection level is more accurate than a faster one, but not as quick to identify
a potential memory leak. The standard and faster settings require the same
amount of historical data, but the standard setting analyzes after the Java
heap has expanded to its maximum configured size.
- Slower detection, lower probability of false alarms: A slower detection
level is the most accurate, however this detection level does not detect a
potential memory leak as quickly as the faster detection level does. This
slower setting requires the most historical data.
|
Table 6. Storm drain condition properties
Setting |
Description |
Detection level |
- Standard detection, normal probability of false alarms: A standard
detection policy is less accurate than a slower one, but quicker to identify
a potential storm drain.
This level uses fewer samples (N=10) for both
response times and dynamic workload manager weights and detects a change point
in each of the metrics based on the sample set. As a result, this policy reaches
a conclusion faster because it waits for 20 samples, 10 for the left mean
and 10 for the right mean, for calculating a difference of means and looking
for a local maxima. The samples are collected at intervals of 15 seconds.
Therefore, the storm drain can be detected within 5 minutes of its occurrence.
However, because the samples are fewer, if the samples have multiple transient
peaks or dips, then there is a higher probability for false alarms.
- Slower detection, lower probability of false alarms: A slower detection
policy is the most accurate, however it does not detect a potential storm
drain as quickly as the standard detection policy does.
This level uses
more samples (N=15) for both response times and dynamic workload manager weights.
As a result, this policy reaches a conclusion slower because the policy has
to wait for 30 samples (15 for the left mean and 15 for the right mean) for
calculating a difference of means. The detection time is seven minutes and
30 seconds. However, because there are more samples, the presence of samples
with transient peaks or dips does not overly affect the mean values. Therefore
the probability of false alarms is lower.
|
Table 7. Workload condition properties
Setting |
Description |
Total requests |
The workload condition policy restarts members when
a certain user-defined number of requests are serviced. A request value must
be a whole number between 1000 and 9223372036854775807.
|
Table 8. Custom condition properties
Setting |
Description |
Run reaction plan when |
Specifies a subexpression that represents the metrics
that you are evaluating in your custom condition. |
- Health management monitor reaction
Specifies how WebSphere Extended Deployment behaves when a defined
health condition needs improving.
- Reaction mode
Specifies the reaction mode that defines the behavior of the health
policy. The reaction mode can be Supervise or Automatic.
- When the reaction mode is set to Supervise, health policies are
active and recommendations on actions are sent to the administrator with a
runtime task. The administrator can follow the recommendations. If the administrator
approves a recommendation, actions are taken to improve the health condition
automatically.
- When the reaction mode is set to Automatic, health policies are
actively logging data, and WebSphere Extended Deployment automatically takes
actions to improve the health conditions, without approval from the administrator.
- Take the following actions when the health condition breaches
You can define a specific set of actions to occur when the health
condition breaches. These actions can be the existing default actions, or
you can define custom actions to run an executable file.
A list of actions displays in the order that they are run when the health
condition breaches. To add an action, click Add Action.... You can
choose an existing default health policy action, a custom action that you
have created, or you can create a new custom action.
To remove a step, select the step and click Remove Action. To change
the order of your steps, select one step to move and click Move up or Move
down.
- Memberships
Specifies the members for the health policy, which activates the
health policy that is defined for the members. Membership is not a one-to-one
relationship; members can be associated with multiple policies.
Edit the Membership field by selecting the appropriate member type
from the list. The resulting potential members display in the Available
for Membership field. Select the appropriate members from the Available
for Membership list. To select multiple members, press the control key
until all of your selections are highlighted, and click Add to add
your selection to the membership for the health policy.
hc_detail_main