Each different experiment has different goals. These goals dictate the size of the workload, its ramp-up period, and
what measurements are taken. For example, if the experiment is to discover the full system capacity with the current
system configuration and tuning, you may try ramping up one virtual user per second in a balanced manner to say two
thirds or three quarters of the system's expected capacity. Once you attain this load level, study the run statistics
for a few minutes at that level to make sure all the transactions are still operating correctly. When this has been
verified (usually by verification point statistics and steady hit rates), manually add another 25% to the workload and
repeat the verification. Add another 25% and perhaps yet another, until you have clearly exceeded some threshold of
acceptability. This may be when response times becoming 2-5 times too long, transaction rates start dropping or staying
stable over a wide load range, or the number of failing verification points become unsatisfactory. Using this kind of
experiment, the verification point, response times, and transaction rates can be examined to determine the maximum
capacity of the system in terms of the number of virtual users.
Another type of experiment is used to delve into why a certain transaction or class of transactions are taking the
system too long to process. After knowing the approximate system capacity, you set up a test run for only a few minutes
of steady state transactions. This test should be run with about a 50% load, but with the application server
instrumentation turned on at the highest level. In addition, all of the servers (Web, application, database, and any
other) key operating system resources should be monitored. Once you have captured the data for the test, you terminate
the test after no more that 10-15 minutes of steady state data. Doing this should ensure that the data is not too
voluminous to analyze. First you filter the data down to only the steady state response interval. Usually by looking at
the user load and the hit rate charts you can discover when a steady state was reached. You can double check this
against the page response time versus time graphs to make sure the response times of interest have become stable. By
decomposing the transaction times into pages and then page elements that are taking the most time to process, you can
see which operations are those most critical to tune. Taking a particular page element, you can then break down by
method call the request that you want to analyze from an application tuning perspective. Depending on the frequency of
calls as well as the time spent in each call, you may want to forward this data to the development organization for
algorithm tuning.
A third type of experiment is used to verify the stability of the system intended for continuous operation without down
time. Many enterprise systems have to be available 24 hours a day / 7 days a week with little if any regular system
down time. To accept a new software version for one of these systems, a 3-7 day test of continuous operation at full
system load is typical to ensure that the system is ready to be put into production. For this test you set up the
statistical results (at the page level only) and operating system monitoring data to be sampled once every 5 to 15
minutes. Test log data is completely turned off for this test. Running this test may require that you write special
logged messages into a data file for every few minutes to verify that transactions are properly executing because
normal logging is turned off. As long as the number of counters being sampled in the statistics model is only a few
hundred, this mode should permit a several day test run. If your test runs with very few errors, you can also run the
test logging set to errors only mode and determine the sampled number of users that you need error data from to get an
accurate picture of a failure mode when it occurs.
The final type of experiment to discuss is that of taking final capacity measurements. In this test run, you will have
resource data being taken from each of your test agent systems to verify that you are in a safe operating region. In
Performance Tester terms, this means that the CPU utilization on the test agents average no more than 70%, and there
are no peak periods where the utilization hits peaks of over 90%. Because the memory allocation is constrained by the
Java heap specified on the test agent, memory statistics should not be an issue. If however you have multiple playback
engines sharing a physical memory that can not easily contain 100% of the Java processes used in its working set, then
you may have to monitor paging and swapping on that test agent. In general, this is not a recommended test agent
configuration because one playback engine can usually consume all available CPU resources on a system. Network and file
I/O data rates should be monitored to make sure that the test agent is not constrained by limited I/O bandwidth and is
unable to accept the server responses at full server output speed.
If there is a concern about the negative impact of Java garbage collection (GC) on the test agent, you can turn on the
verbose GC logging (using the -verbosegc -verbosegclog:C:\temp\gclog.log) and view the length of time spent doing
garbage collection. There are tools available through the IBM Support Assistant to analyze these logs. In general, this
should not be a problem, unless you are running very close to the edge in heap size and doing frequent, unproductive
garbage collection cycles.
Once you have validated that the test agents are operating in a healthy operating region, you should verify that the
transactions are operating properly and the steady state operating region has been identified and the report data
filtered down to only this data. Based on this region, you can export the performance report and any other reports that
have interesting data so that it can be included in a formal final report of your findings. Candidates include the
transaction report (if those have been defined in the tests), the verification point report, and the HTTP percentile
report (assuming that test log at the primary action level has been kept). The performance report by itself basically
contains all the data necessary to show what the expected page performance of the system is. Statistically speaking
response times are best reported as a 85th, 90th, or 95th percentile while throughput of transactions are best reported
as averages completion times or system rates per second or per hour depending on their execution time.
|