Guideline: Performance Test Experiments
This guideline describes different types of experiments and provides some examples for each of them.
Relationships
Main Description

Each different experiment has different goals. These goals dictate the size of the workload, its ramp-up period, and what measurements are taken. For example, if the experiment is to discover the full system capacity with the current system configuration and tuning, you may try ramping up one virtual user per second in a balanced manner to say two thirds or three quarters of the system's expected capacity. Once you attain this load level, study the run statistics for a few minutes at that level to make sure all the transactions are still operating correctly. When this has been verified (usually by verification point statistics and steady hit rates), manually add another 25% to the workload and repeat the verification. Add another 25% and perhaps yet another, until you have clearly exceeded some threshold of acceptability. This may be when response times becoming 2-5 times too long, transaction rates start dropping or staying stable over a wide load range, or the number of failing verification points become unsatisfactory. Using this kind of experiment, the verification point, response times, and transaction rates can be examined to determine the maximum capacity of the system in terms of the number of virtual users.

Another type of experiment is used to delve into why a certain transaction or class of transactions are taking the system too long to process. After knowing the approximate system capacity, you set up a test run for only a few minutes of steady state transactions. This test should be run with about a 50% load, but with the application server instrumentation turned on at the highest level. In addition, all of the servers (Web, application, database, and any other) key operating system resources should be monitored. Once you have captured the data for the test, you terminate the test after no more that 10-15 minutes of steady state data. Doing this should ensure that the data is not too voluminous to analyze. First you filter the data down to only the steady state response interval. Usually by looking at the user load and the hit rate charts you can discover when a steady state was reached. You can double check this against the page response time versus time graphs to make sure the response times of interest have become stable. By decomposing the transaction times into pages and then page elements that are taking the most time to process, you can see which operations are those most critical to tune. Taking a particular page element, you can then break down by method call the request that you want to analyze from an application tuning perspective. Depending on the frequency of calls as well as the time spent in each call, you may want to forward this data to the development organization for algorithm tuning.

A third type of experiment is used to verify the stability of the system intended for continuous operation without down time. Many enterprise systems have to be available 24 hours a day / 7 days a week with little if any regular system down time. To accept a new software version for one of these systems, a 3-7 day test of continuous operation at full system load is typical to ensure that the system is ready to be put into production. For this test you set up the statistical results (at the page level only) and operating system monitoring data to be sampled once every 5 to 15 minutes. Test log data is completely turned off for this test. Running this test may require that you write special logged messages into a data file for every few minutes to verify that transactions are properly executing because normal logging is turned off. As long as the number of counters being sampled in the statistics model is only a few hundred, this mode should permit a several day test run. If your test runs with very few errors, you can also run the test logging set to errors only mode and determine the sampled number of users that you need error data from to get an accurate picture of a failure mode when it occurs.

The final type of experiment to discuss is that of taking final capacity measurements. In this test run, you will have resource data being taken from each of your test agent systems to verify that you are in a safe operating region. In Performance Tester terms, this means that the CPU utilization on the test agents average no more than 70%, and there are no peak periods where the utilization hits peaks of over 90%. Because the memory allocation is constrained by the Java heap specified on the test agent, memory statistics should not be an issue. If however you have multiple playback engines sharing a physical memory that can not easily contain 100% of the Java processes used in its working set, then you may have to monitor paging and swapping on that test agent. In general, this is not a recommended test agent configuration because one playback engine can usually consume all available CPU resources on a system. Network and file I/O data rates should be monitored to make sure that the test agent is not constrained by limited I/O bandwidth and is unable to accept the server responses at full server output speed.

If there is a concern about the negative impact of Java garbage collection (GC) on the test agent, you can turn on the verbose GC logging (using the -verbosegc -verbosegclog:C:\temp\gclog.log) and view the length of time spent doing garbage collection. There are tools available through the IBM Support Assistant to analyze these logs. In general, this should not be a problem, unless you are running very close to the edge in heap size and doing frequent, unproductive garbage collection cycles.

Once you have validated that the test agents are operating in a healthy operating region, you should verify that the transactions are operating properly and the steady state operating region has been identified and the report data filtered down to only this data. Based on this region, you can export the performance report and any other reports that have interesting data so that it can be included in a formal final report of your findings. Candidates include the transaction report (if those have been defined in the tests), the verification point report, and the HTTP percentile report (assuming that test log at the primary action level has been kept). The performance report by itself basically contains all the data necessary to show what the expected page performance of the system is. Statistically speaking response times are best reported as a 85th, 90th, or 95th percentile while throughput of transactions are best reported as averages completion times or system rates per second or per hour depending on their execution time.