This topic illustrates that solving a performance problem is an
iterative process and shows how to troubleshoot performance problems.
About this task
Solving a performance problem is frequently an iterative process
of:
This process is often iterative because when one bottleneck is removed
the performance is now constrained by some other part of the system. For
example, replacing slow hard disks with faster ones might shift the bottleneck
to the CPU of a system.
Measuring system performance
and collecting performance data
- Begin by choosing a benchmark, a standard set of operations to
run. This benchmark exercises those application functions experiencing performance
problems. Complex systems frequently need a warm-up period to cache objects,
optimize code paths, and so on. System performance during the warm-up period
is usually much slower than after the warm-up period. The benchmark must
be able to generate work that warms up the system prior to recording the measurements
that are used for performance analysis. Depending on the system complexity,
a warm-up period can range from a few thousand transactions to longer than
30 minutes.
- If the performance problem under investigation only occurs when a large
number of clients use the system, then the benchmark must also simulate multiple
users. Another key requirement is that the benchmark must be able to produce
repeatable results. If the results vary more than a few percent from one
run to another, consider the possibility that the initial state of the system
might not be the same for each run, or the measurements are made during the
warm-up period, or that the system is running additional workloads.
- Several tools facilitate benchmark development. The tools range from
tools that simply invoke a URL to script-based products that can interact
with dynamic data generated by the application. IBM Rational has tools that
can generate complex interactions with the system under test and simulate
thousands of users. Producing a useful benchmark requires effort and needs
to be part of the development process. Do not wait until an application goes
into production to determine how to measure performance.
- The benchmark records throughput and response time results in a form to
allow graphing and other analysis techniques. The performance data that is
provided by WebSphere Application Server Performance
Monitoring Infrastructure (PMI) helps to monitor and tune the application
server performance. Request
metrics is another sources of performance data that is provided by
WebSphere Application Server. Request metrics allows a request to be timed
at WebSphere Application Server component boundaries, enabling a determination
of the time that is spent in each major component.
Locating a bottleneck
Consult the following
scenarios and suggested solutions:
- Scenario: Poor performance occurs with only a single user.
Suggested
solution: Utilize request
metrics to determine how much each component is contributing to the
overall response time. Focus on the component accounting for the most time.
Use Tivoli Performance
Viewer to check for resource consumption, including frequency of garbage
collections. You might need code profiling tools to isolate the problem to
a specific method.
- Scenario: Poor performance only occurs with multiple users.
Suggested
solution: Check to determine if any systems have high CPU, network or
disk utilization and address those. For clustered configurations, check for
uneven loading across cluster members.
- Scenario: None of the systems seems to have a CPU, memory, network,
or disk constraint but performance problems occur with multiple users.
Suggested
solutions:
- Check that work is reaching the system under test. Ensure that some external
device does not limit the amount of work reaching the system. Tivoli Performance
Viewer helps determine the number of requests in the system.
- A thread dump might reveal a bottleneck at a synchronized method or a
large number of threads waiting for a resource.
- Make sure that enough threads are available to process the work both in
IBM HTTP Server, database, and the application servers. Conversely, too many
threads can increase resource contention and reduce throughput.
- Monitor garbage collections with Tivoli Performance Viewer or the verbosegc
option of your Java virtual machine. Excessive garbage collection can limit
throughput.
Eliminating a bottleneck
Consider the
following methods to eliminate a bottleneck:
- Reduce the demand
- Increase resources
Reducing the demand for resources can be accomplished in several
ways. Caching can greatly reduce the use of system resources by returning
a previously cached response, thereby avoiding the work needed to construct
the original response. Caching is supported at several points in the following
systems:
- IBM HTTP Server
- Command
- Enterprise bean
- Operating system
Application code profiling can lead to a reduction in the CPU
demand by pointing out hot spots you can optimize. IBM Rational and other
companies have tools to perform code profiling. An analysis of the application
might reveal areas where some work might be reduced for some types of transactions.
Change
tuning parameters to increase some resources, for example, the number of file
handles, while other resources might need a hardware change, for example,
more or faster CPUs, or additional application servers. Key tuning parameters
are described for each major WebSphere Application Server component to facilitate
solving performance problems. Also, the performance
advisors can provide advice on tuning a production system under a real
or simulated load.
Some critical sections of the application and server
code require synchronization to prevent multiple threads from running this
code simultaneously and leading to incorrect results. Synchronization preserves
correctness, but it can also reduce throughput when several threads must wait
for one thread to exit the critical section. When several threads are waiting
to enter a critical section, a thread dump shows these threads waiting in
the same procedure. Synchronization can often be reduced by: changing the
code to only use synchronization when necessary; reducing the path length
of the synchronized code; or reducing the frequency of invoking the synchronized
code.