Optimizing SQL Server Performance

			 11-17-92

To most effectively optimize SQL Server performance, it is 
necessary to identify the areas that will yield the largest 
performance increases over the widest variety of situations, and 
focus analysis on these areas.  Otherwise, significant time and 
effort may be expended on topics that may not yield sizable 
improvements.

The following does not address the performance issues 
stemming from multi-user concurrency.  This is a separate 
complex topic that is treated by the document "Maximizing 
Database Consistency and Concurrency", which ships with SQL 
Server 4.2.

Experience has shown that the greatest benefit in SQL Server 
performance can be gained from the general area of logical 
database design, index design, and query design. Conversely, 
the biggest performance problems are often caused by 
deficiencies in these areas.  If performance is a concern, it is 
wise to concentrate on these areas first, since very large 
performance improvements can often be achieved with a 
relatively small time investment. While other performance issues 
such as memory, cache buffers, hardware, etc., are certainly 
candidates for study, experience has shown that the performance 
gain from these areas is often of an incremental nature.

Below are suggestions that have historically yielded significant 
performance gains:

Normalize Logical Database Design

Experience has shown that reasonable normalization of the 
logical database design will yield best performance.  A greater 
number of narrower tables is characteristic of a normalized 
database.  A lesser number of wider tables is characteristic of a 
denormalized database.  A highly normalized database is 
normally associated with complex relational joins, which can 
hurt performance.  However, the SQL Server optimizer is very 
efficient at selecting rapid, efficient joins, as long as effective 
indexes are available.  Benefits from normalization:

*  accelerates sorting and index creation, since tables are narrower
*  allows more clustered indexes, since there are more tables
*  indexes tend to be narrower and more compact
*  allows better use of segments to control physical placement of 
   tables
*  fewer indexes per table, helping UPDATE performance
*  fewer NULLs and less redundant data, increasing compactness 
   of the database
	
With SQL Server, normalization will often help performance 
rather than hurt it.  As normalization increases, so will the 
number and complexity of joins required to retrieve data.  We 
suggest carrying on the normalization process unless this causes 
many queries to have over 4-way joins.

If the logical database design is already fixed and total redesign 
is not feasible, it may be possible to selectively normalize a 
large table if study shows a bottleneck on this table.  If access to 
the database is conducted through stored procedures, this 
schema change could take place without impacting applications.  
If not, it may be possible to hide the change by creating a view 
that presents the illusion of a single table.


Use Efficient Index Design

Unlike many non-relational systems, relational indexes are not 
considered part of the logical database design.  Indexes can be 
dropped, added, and changed without impacting the database 
schema or application design in any way other than 
performance.  Efficient index design is paramount in achieving 
good SQL Server performance.  For these reasons, no hesitation 
should be shown in experimenting with different indexes. 

Experience has shown the optimizer will reliably choose the 
most effective index in most cases.  The overall strategy 
should be to provide a good selection of indexes to the 
optimizer, and trust it to make the right decision.  This reduces 
analysis time and gives good performance over a wide variety of 
situations.

Following are index recommendations.  Examine the WHERE 
clause of your SQL queries, since that is the primary focus of 
the optimizer.  Each column listed in the WHERE clause is a 
possible candidate for an index.  If you have too many queries to 
examine, pick a representative set, or just the slow ones.  If your 
development tool transparently generates SQL this will be more 
difficult.  Many of these tools allow the logging of the generated 
SQL syntax to a file or screen for debugging purposes.  You 
may want to inquire of the product's vendor to see if such a 
feature is available.

Use narrow indexes.  Narrow indexes are often more effective 
than multi-column, compound indexes.  Narrow indexes have 
more rows per page, and fewer index levels, boosting 
performance.  The SQL Server optimizer only maintains 
statistics on the most significant column of a compound index.  
Therefore if the first column of a compound index has poor 
selectivity, the optimizer may not use the index.

The optimizer can rapidly and effectively analyze hundreds, or 
even thousands of index and join possibilities.  Having a greater 
number of narrow indexes provides the optimizer with more 
possibilities to choose from, which usually helps performance. 
Having a lesser number of wide, multi-column indexes provides 
the optimizer with fewer possibilities to choose from, which 
may hurt performance.

You shouldn't have more indexes than are necessary to achieve 
adequate read performance because of the overhead involved in 
updating those indexes.  However even most update-oriented 
operations require far more reading than writing.  Therefore 
don't hesitate to try a new index if you think it will help.  You 
can always drop it later.

Use clustered indexes.  Appropriate use of clustered indexes 
can tremendously increase performance.  Even UPDATE and 
DELETE operations are often accelerated by clustered indexes, 
since these operations require much reading.  You can only have 
a single clustered index per table, so use this index wisely.  
Queries that return numerous rows, or queries involving a range 
of values are good candidates for acceleration by a clustered 
index.  Examples:  

SELECT * FROM PHONEBOOK 
WHERE LASTNAME='SMITH', or

SELECT * FROM MEMBERTABLE 
WHERE MEMBER_NO > 5000 AND MEMBER_NO < 6000


By contrast, the LASTNAME or MEMBER_NO columns 
mentioned above would likely not be good candidates for a non-
clustered index.   Try to use non-clustered indexes on columns 
where few rows are returned.  

Examine column uniqueness.  This will help you decide what 
column is a candidate for a clustered index, non-clustered 
index, or no index.  Example query to examine column 
uniqueness:

SELECT COUNT (DISTINCT COLNAME) FROM 
TABLENAME.

This returns the number of unique values in the column.  
Compare this to the total number of rows in the table.  On a 
10,000-row table, 5000 unique values would make the column a 
good candidate for a non-clustered index.  On the same table, 20 
unique values would better suit a clustered index.  3 unique 
values should not be indexed at all.  These are only examples, 
not hard-and-fast rules.  Remember to place the indexes on the 
individual columns listed in the WHERE clauses of the queries. 

The number of rows returned in a query is also an important 
factor in index selection.  The optimizer considers a non-
clustered index to cost at least one page I/O per returned row.  
At this rate it doesn't take long before it becomes more efficient 
to merely scan the entire table.  This is another reason to restrict 
the size of the result set, or to locate the large result with with a 
clustered index.

 
Use Efficient Query Design  

Some types of queries are inherently resource-intensive.  This is 
related to fundamental database and index issues common to 
most RDBMSs, not to SQL Server in particular.  They are not 
"inefficient", as the optimizer will implement the queries in the 
most efficient fashion possible.  However they are resource-
intensive, and the set-oriented nature of SQL may make them 
appear inefficient. No degree of optimizer intelligence can 
eliminate the inherent resource cost of these constructs.  They 
are intrinsically costly when compared to a more simple query. 
And although SQL Server will use the most optimum access 
plan, this is limited by what is fundamentally possible. 
Examples:

*  large result sets
*  IN and OR queries
*  Highly non-unique WHERE clause
*  != (not equal)
*  certain column functions like SUM
*  expressions or data conversions in WHERE clause
*  local variables in WHERE clause
*  complex views with GROUP BY or ORDER BY

Various factors may necessitate the use of some of these query 
constructs.  The impact of these will be lessened if the optimizer 
can restrict the result set before applying the resource-intensive 
portion of the query.  Examples:

Inefficient:  SELECT SUM(SALARY) FROM TABLE

More efficient:  SELECT SUM(SALARY) FROM TABLE 
		 WHERE ZIP='98052'  

Inefficient:  SELECT * FROM TABLE 
		WHERE LNAME=@VAR

More efficient: SELECT * FROM TABLE 
		WHERE LNAME=@VAR AND ZIP='98052'

In the first example, the SUM operation cannot itself be 
accelerated with an index.  Each row must be read and summed.  
Assuming that there is an index on the ZIP column, the 
optimizer will likely use this to initially restrict the result set 
before applying the SUM.  This potentially can be much faster.

In the second example, the local variable is not resolved until 
run time.  However the optimizer cannot defer until run time the 
choice of access plan, but must choose at compile time. Yet at 
compile time, when the access plan is built, the value of @VAR 
is not known, and consequently cannot be used as input to 
index selection.  The illustrated technique for improvement 
involves restricting the result set with an AND clause.  An 
optional technnique would be to use a stored procedure, and 
pass as a parameter to the stored procedure the value for 
@VAR.   

Large result sets are costly on most RDBMSs.  Try not to return 
a large result set to the client for final data selection via 
browsing.  It is much more efficient to restrict the size of the 
result set, allowing the database back end to perform the 
function for which it was intended.  
  
Techniques to Analyze Slow Performance

First isolate the query or queries that are slow.  Often it will 
appear that an entire application is slow, when only a few of the 
SQL queries are slow.  For a development tool that 
transparently generates SQL, use any available diagnostic or 
debug mode of this tool to capture the generated SQL.  For 
tools that use embedded SQL, this is much easier.  After the 
slow query is isolated, do the following steps:

*  Run the suspected slow query in isolation, using a query tool 
   such as ISQL or SAF, and verify it is in fact slow.  

*  Use SET STATISTICS IO ON, and SET SHOWPLAN ON to 
   examine the I/O consumed by the query, and the access plan 
   chosen.  Give attention to the count of logical page I/Os.  The 
   optimizer's goal is to minimize I/O count.  Make a record of the 
   logical I/O count.  This forms a baseline number against which 
   to measure improvement.

*  If the query involves a view or stored procedures, extract the 
   query from the view or stored procedure and run separately.  
   This allows the access plan to change as you experiment with 
   different indexes.

*  Be aware of possible triggers or views on the involved tables 
   that can transparently generate I/O as the trigger runs.

*  Examine the indexes of the tables used by the slow query.  Use 
   the previously listed techniques to determine if these are good 
   indexes, and change them if necessary.  

*  Re-run the query after making the index change and observe any 
   change in I/O count or access plan.

*  After noting improvement, run the main application to see if 
   overall performance is better.


Check program for I/O or CPU bound behavior.  It is often 
useful to determine if a query is I/O or CPU bound.  This 
focuses improvement efforts on the true bottleneck.  For 
example if a query is CPU bound, adding additional memory to 
SQL Server will likely not improve performance, as more 
memory only improves the cache hit ratio, which in this case is 
already high.  Steps for examining I/O vs. CPU bound state of  
SQL Server:

*  Use an OS/2 CPU monitoring program.  Several are available on 
   CompuServe IBMOS2 forum.

*  While running query if CPU graph is consistently high (say, 
   >70%) this indicates CPU bound state.

*  While running query if CPU graph is consistently low (say, 
   <50%) this indicates I/O bound state.

*  Compare the CPU graph with the STATISTICS IO information

Summary

SQL Server is capable of very high performance on large 
databases.  To achieve this performance potential, it is necessary 
to use efficient database design, indexes and queries.  These 
areas are the best candidates for obtaining significant 
performance improvement.  Experimentation with indexes using 
the guidelines in this document is especially suggested.  A 
methodical approach in analyzing performance problems will 
often yield very significant improvement for relatively little time 
investment.