Health policy settings
Use this page to modify existing health policies. Health
policies are used to maintain a healthy environment using prevention
and detection methodologies.
To view this administrative console page, click .
If you are a user with either a monitor or an operator role, you
can only view health policy information. If you are a user with either
a configurator or an administrator role, you have all configuration
privileges for health policies.
This page has two tabs: Configuration and Local
Topology. On the Configuration tab,
you can view and configure the settings for the health policy. On
the Local Topology tab, you can
view the health policy memberships in a visual representation.
- Name
Specifies the name of a health policy. The health policy
name is required and must be unique among all the health policies
in the cell.
The name cannot begin with a period (.)
or a space. A space does not generate an error, but leading and trailing
spaces are automatically deleted. Use meaningful and consistent health
policy names. For example, age-based health policies can be indicated
by naming the policies AGE_20DAYS, AGE_15DAYS,
and so on.
- Description
Specifies an additional description of the health policy.
The description is optional. You can edit the description when you
are creating or editing a health policy. Consider using the optional
description when you are using many health policies or when multiple
administrators manage the same set of health policies.
- Health condition
The health condition defines the specific policy that is
implemented.
Some policies are prevention-based and some are detection-based.
Prevention-based policies are used to avoid conditions that might
lead to problems, while the detection-based policies are used to identify
existing conditions and to achieve resolution. These policies can
be used to perform health-based assessments on clusters, dynamic clusters,
and application server instances running on nodes. In the case of
dynamic clusters, regardless of the health policy that you are using,
the minimum number of dynamic cluster instances remains running.
- The age-based condition policy restarts the associated
members when their age reaches a certain user-defined value. This
restart cleans out all cached and memory acquired data. If you select
age-based condition policy, you must define the age criteria. The age-based condition is supported for all server
types.
- The excessive request timeout condition policy tracks the
memory that is used for request timeouts. When the percentage of timeouts
exceeds the breach of condition, the members are restarted. If you
select the excessive request timeout condition, you must set the memory
used percentage threshold. The excessive request
timeout condition is supported for all server types.
Restriction: The excessive request timeout condition does not
apply to Java Message Service (JMS) and Internet Inter-ORB Protocol
(IIOP) traffic.
- The excessive response time condition policy tracks the
requests and the amount of time they take to complete. Use this policy
to clean up servers that have an average number of requests that take
longer than a specified time. If an average number of requests takes
longer than a certain amount of time, the members are restarted. When
you select the excessive response time policy, you must define the
response time threshold. The excessive response
time condition is supported for all server types.
- The memory condition: excessive memory usage policy tracks
the memory usage for a member. When the memory usage exceeds a percentage
of the heap size for a specified time, actions are taken to correct
this situation. If you define the health policy against a standalone
server, static cluster, or dynamic cluster in manual mode, then the
member stops and restarts. If you define the health policy against
a dynamic cluster that is in automatic or supervised mode, then the
member that is flagged by the condition stops. The placement controller
dynamically decides which, if any, servers to start based on its evaluation
of the environment. These actions occur automatically if you are in
automatic mode. If you are in supervised mode, you can approve the
runtime tasks are generated to correct the situation. If you select
the excessive memory usage policy, you must define the memory used
and the time-over-memory threshold. The excessive
memory usage condition is supported only on application servers on
nodes that run WebSphere Application Server or
WebSphere Application Server Community Edition. You cannot define
the excessive memory usage condition for other middleware server types.
- The memory condition: memory leak policy tracks consistent
downward trends in free memory that is available to a server in the
Java heap. The detection level setting determines when these trends
are detected. If you select the memory condition: memory leak policy,
you must define a detection level. The slower detection level setting
requires the most historical data. The normal and faster detection
level settings require the same amount of historical data, but the
faster setting allows analysis before the Java heap has expanded to
its maximum configured size. This provides earlier detection capability,
but is also more prone to false positives. This condition supports
heap dumps in addition to server restarts as reactions. The memory leak condition is not supported for
other middleware server types.
- The storm drain condition policy tracks stuck requests.
The server that is associated with this policy restarts when the specified
detection level is reached. Storm drain detection relies on change
point detection on a given time series data. The metrics that are
used for detecting storm drain are the response times and deployment
workload manager weights that are observed for the server. The storm
drain condition applies only to dynamic clusters and cells. If you
select the storm drain condition policy, you must select the detection
level.
To detect change points, the health controller calculates
a left mean and a right mean for a given point. For a point, the left
mean consists of the mean value of N samples that arrive prior
to this sample, and the right mean is the mean value of N samples,
including the current point, that arrive later. The difference of
the left and the right mean values is stored and compared with other
differences in a set of values to N to determine if this difference
is a local maxima. If this difference is the maximum difference, then
the point to which this difference corresponds, is declared as a change
point. The two metrics that are used for detecting storm drain
are the response times and dynamic workload manager weights that are
observed for the server.
The storm drain
condition is supported for all server types. Restriction: The
storm drain condition does not apply to JMS and IIOP traffic.
- The workload condition policy restarts the members when
a certain user-defined number of requests are serviced. This policy
cleans out the memory and caches. If you select the workload policy,
you must define the total request criteria. The
workload condition is supported for all server types.
- Health condition properties
Specifies properties that are specific to the health condition.
Table 1. Age-based condition properties
Setting |
Description |
Maximum age |
This field is available for the age-based
policy. The age-based condition policy restarts the associated members
when their age reaches the maximum age. Acceptable values are positive
whole numbers in days or hours between 1 hour and 365 days.
To enter a value like 1.2 days, use 36 hours, because
decimal numbers are not supported.
|
Table 2. Excessive request timeout condition properties
Setting |
Description |
Timed out requests |
The excessive memory usage condition policy
restarts members when the memory usage exceeds a percentage of your
heap size over time. The total memory used percentage is used with
the time over memory threshold value to determine when to restart
members. Acceptable values for this field are whole numbers between 1 and 99.
|
Table 3. Excessive response time condition properties
Setting |
Description |
Response time |
This field is available for the excessive
response time condition policy. The excessive response time condition
policy restarts members when the average number of responses completed
exceeds a given period. Acceptable values for this field are between 1 millisecond
and 60 minutes.
|
Table 4. Memory condition: excessive memory usage properties
Setting |
Description |
JVM heap size |
The excessive memory usage condition policy
restarts members when the memory usage exceeds a percentage of your
heap size over time. The total memory used percentage is used with
the time over memory threshold value to determine when to restart
members. Acceptable values for this field are whole numbers between 1 and 99.
|
Offending time period |
This field is available for the excessive
memory usage condition policy. The excessive memory usage condition
policy restarts members when the memory usage exceeds a percentage
of your heap size over time. Acceptable values for this field are
between 1 second and 60 minutes.
|
Table 5. Memory condition: memory leak condition properties
Setting |
Description |
Detection level |
You can choose from the following detection
levels. For each level a trade-off exists between the speed and accuracy
of detecting suspected memory leaks.
- Faster detection, higher probability of false alarms: A
faster detection level detects a potential memory leak quickly, however
this detection level has a greater chance of falsely identifying a
memory leak than a slower detection policy because the analysis is
done before the Java heap expands to its maximum configured size.
- Standard detection, standard probability of false alarms: A
standard detection level is more accurate than a faster one, but not
as quick to identify a potential memory leak. The standard and faster
settings require the same amount of historical data, but the standard
setting analyzes after the Java heap has expanded to its maximum configured
size.
- Slower detection, lower probability of false alarms: A
slower detection level is the most accurate, however this detection
level does not detect a potential memory leak as quickly as the faster
detection level does. This slower setting requires the most historical
data.
|
Table 6. Storm drain condition properties
Setting |
Description |
Detection level |
- Standard detection, normal probability of false alarms:
A standard detection policy is less accurate than a slower one, but
quicker to identify a potential storm drain.
This level uses fewer
samples (N=10) for both response times and dynamic workload manager
weights and detects a change point in each of the metrics based on
the sample set. As a result, this policy reaches a conclusion faster
because it waits for 20 samples, 10 for the left mean and 10 for the
right mean, for calculating a difference of means and looking for
a local maxima. The samples are collected at intervals of 15 seconds.
Therefore, the storm drain can be detected within 5 minutes of its
occurrence. However, because the samples are fewer, if the samples
have multiple transient peaks or dips, then there is a higher probability
for false alarms.
- Slower detection, lower probability of false alarms: A
slower detection policy is the most accurate, however it does not
detect a potential storm drain as quickly as the standard detection
policy does.
This level uses more samples (N=15) for both response
times and dynamic workload manager weights. As a result, this policy
reaches a conclusion slower because the policy has to wait for 30
samples (15 for the left mean and 15 for the right mean) for calculating
a difference of means. The detection time is seven minutes and 30
seconds. However, because there are more samples, the presence of
samples with transient peaks or dips does not overly affect the mean
values. Therefore the probability of false alarms is lower.
|
Table 7. Workload condition properties
Setting |
Description |
Total requests |
The workload condition policy restarts members
when a certain user-defined number of requests are serviced. A request
value must be a whole number greater than 1000.
|
Table 8. Custom condition properties
Setting |
Description |
Run reaction plan when |
Specifies a subexpression that represents the
metrics that you are evaluating in your custom condition. |
- Health management monitor reaction
Specifies how WebSphere Extended Deployment behaves when
a defined health condition needs improving.
- Reaction mode
Specifies the reaction mode that defines the behavior of
the health policy. The reaction mode can be Supervised or Automatic.
- When the reaction mode is set to Supervised,
health policies are active and recommendations on actions are sent
to the administrator with a runtime task. The administrator can follow
the recommendations. If the administrator approves a recommendation,
actions are taken to improve the health condition automatically.
- When the reaction mode is set to Automatic,
health policies are actively logging data, and WebSphere Extended
Deployment automatically takes actions to improve the health conditions,
without approval from the administrator.
- Take the following actions when the health condition breaches
You can define a specific set of actions to occur when
the health condition breaches. These actions can be the existing default
actions, or you can define custom actions to run an executable file.
A list of actions displays in the order that they are run when
the health condition breaches. To add an action, click Add
Action.... You can choose an existing default health policy
action, a custom action that you have created, or you can create a
new custom action.
To remove a step, select the step and click Remove Action.
To change the order of your steps, select one step to move and click Move
Up or Move Down.
- Memberships
Specifies the members for the health policy, which activates
the health policy that is defined for the members. Membership is not
a one-to-one relationship; members can be associated with multiple
policies.
Edit the Membership field by selecting the
appropriate member type from the list. The resulting potential members
display in the Available for Membership field.
Select the appropriate members from the Available for Membership list.
To select multiple members, press the control key until all of your
selections are highlighted, and click Add to
add your selection to the membership for the health policy.
hc_detail_main