Introduction
This general guideline discusses some key principles for performance testing. It covers the following major tasks:
There are many reasons to do a performance test. Some of these include high availability (long running stability under
load), system capacity (how many users or transactions can I do?), or service level agreement (do the key transactions
return in under X seconds?). Find out from your customer exactly what questions need to be answered so you know how to
design your tests and lab experiments. By formulating and documenting testing goals, you will have a good basis to
begin building your workload model.
The key to a successful and orderly (well-run) performance test is developing a detailed workload model. You will need
access to either a statistical breakdown of an existing production workload, or the ability to synthesize the workload
by gathering the equivalent data with analysts from the customer's business operations department.
With the help of the analysts, you will need to answer questions like the following:
-
What are the most frequently executed 20% of the transactions, which probably generate 80% of the total system
load?
-
What are the expected transaction rates for each of these transaction types?
-
Which data inputs are the independent variables that spread the data accesses across any back-end databases
involved?
-
Are there classes of users on the system who do very different sets of transactions that each need to be modeled
separately?
-
How many of each user class are simultaneously active during the busiest hour of the day?
-
Are there any variable data design or uniqueness constraints on the data inputs?
-
Are there any background batch jobs or reports that cause a significant portion of the system load (even though
their frequency is not in the top 20% of the operations)?
Based on the answers to these and other similar questions, you need to construct a user scenario for each of the
transactions or other operations contained in the workload model. If types of users have been identified with relative
or absolute numbers of users in each, then these need to be specified in the workload model. If user types are not
used, then cumulative transaction rates need to be translated into numbers of simultaneous user scenarios, which are
then used to accomplish the total workload transaction rates.
You need to be specific about average think times between user interactions and average transaction rates, and then
identify data inputs for variation. You also need to identify key transactions for measurement and tuning activities,
as well as expected response time values for each.
The modeled Busy Hour should be explicitly called out and associated with the workload characteristics.
Constraints on the size and composition of the test environment data state at the beginning of each test run need to be
explicitly called out in the workload specification.
Most of the time, the customer will want much too much of the real workload modeled in the performance test. Work with
the customer to agree on as many simplifications of the workload model as can be done while maintaining the overall
usefulness of the prediction capability of the model. As long as the workload model accurately predicts the Busy Hour
performance of the system and adequately stresses all of the subsystems with approximate work done under production
conditions, the model is useful for predicting the production system's behavior.
This approximate behavior of the workload model is the essence of doing valuable performance testing. For example, if
there are three variations on how to enter a new customer record into a customer database, pick the most frequent or
the one that is closest (by some estimation) to the average user scenario. Then combine the transaction rates for all
three variations and use the average user scenario in the workload model.
Ideally the notion of simplifying the workload yields a much quicker implementation of the workload model, and makes it
both more flexible and adaptable to use in the future as the customer's workload changes over time. Adjustments can be
made to transaction rates, database sizes, data input variations, and total volume of work without having to implement
new or different user scenarios. This concept makes a workload model reusable and userful to predict the answers to
many "what-if" scenarios posed by the customer, both now and in in the future.
Because of the abstraction contained in the workload model, you need to sit down with the customer and explain the
details of the workload model, including both what it does model and what it doesn't. All of this needs to be done with
the customer before implementation of the model begins. Doing this -- gaining acceptance of the workload model
content ahead of time, in an atmosphere of collaboration -- results in later meetings with the customer being centered
on the test results and not whether the model was correct.
These later meetings tend to be filled with emotion, because the customer feels pressure since major business decisions
often ride on the outcome of the performance test measurements. If the customer is not familiar with the workload model
employed in the test, the focus shifts to understanding the model and figuring out if it is a true predictor of
production system performance. This is counterproductive after the workload model is implemented and being used, as the
focus needs to be on system tuning and potentially re-hosting or re-implementing some of the system before placing it
into production. These are costly and typically time-critical decisions that should be the focus of the customer
meetings once measurement data is available.
Often there are a small number of the transaction types or critical business scenarios that contain measurement points
that signal whether or not the system is behaving within acceptable parameters. Usually different operations result in
different enough response time values that they should not be statistically grouped together since, when taken as a
single sample set, they really do not predict the behavior of any operation.
You should specify in the workload model which of these measurements are the important ones to monitor, and also have a
time threshold of acceptability values for each of these measurement points. Much of the system tuning efforts should
be focused on achieving these values through system parameter adjustments, and in some cases coding adjustments within
the application. Since these key measurements are the one most heavily scrutinized, they should be chosen carefully in
conjunction with the customer and their business analysts as part of the workload model development.
Usually some peripheral measurements come up with unacceptable values and result in changes to the system. However, the
key measurements point to whether or not the system is ready for production deployment, and are central to that
decision.
The performance testing laboratory environment has a large number of components, the states of which must be tightly
controlled so that changes made between test runs are well understood. This promotes predictability and reproducability
of performance test results.
Normally you should keep two change logs, one for the test driver complex and one for the system under test complex.
The change log for the performance test driver complex is the record of changes to the hardware and software networking
fabric specifically not part of the system to be deployed. The change log for the system under test is the record of
all changes to the hardware, software, and networking that need to be reflected in the production system once it is
deployed. System tuning changes are very important to keep in the log and associate with specific test results. This
ensures that once an experiment has been tried and results gathered, the experiment will not have to be repeated just
to reconstruct the results because there was some uncertainty about the system state when certain results were
recorded.
If results impacting changes are being made to the driver complex, such as a change to the user think times or some
transaction rates, no meaningful tuning comparisons can be made to previous test runs. Since the workload changed, the
results must be re-baselined so that new tuning experiments can be compared before and after the change was attempted.
Since these large performance labs can consist of dozens of systems and networking components, an accurate account of
all changes is crucial to make progress towards a tuned and fully characterized system.
Often there are many dynamic shifts in tactics between consecutive performance test runs, based on the measured results
from the previous run. Having a general test strategy and daily objectives during the process are important so that you
can make progress towards achieving the basic objectives of your test. The alternative to a constantly changing test
plan is to be pragmatic about the necessary adjustments during the test interval, and respond by keeping a separate
test log of experiments performed. Each test run should be logged, including excerpts from both change logs, as well as
what load level of the workload specification was being run during the experiment. The results associated with that run
are then logged with the environment data.
The test plan becomes a high-level test strategy with mid point objectives, such as:
-
Have the application servers tuned by Tuesday
-
Perform database tuning on Wednesday and Thursday
-
Reserve Friday for any other subsystem tuning
The final system measurements could then be made the following week between Monday and Wednesday.
Once all of the experiments have been run and the detailed results documented, the performance test analyst goes back
and summarizes for the customer the steps taken along the way -- during the entire testing interval -- and the
incremental results achieved. You should then give the customer a bottom line conclusion about the current state of the
system, as well as what steps should now be taken to move the system from its current state to where it fully meets all
of the objectives for the performance test. You should also document any recommended procedures or test environment
changes to make the next performance test run more smoothly. This final report should be very high quality, and have a
complete appendix or reference to the detailed test results. It should also reference the workload specification where
the testing goals were documented in detail.
Presenting this final report to the customer is the culmination of the performance test, and should provide a complete
communication of the summarized findings of the test.
|