Optimizing SQL Server Performance 11-17-92 To most effectively optimize SQL Server performance, it is necessary to identify the areas that will yield the largest performance increases over the widest variety of situations, and focus analysis on these areas. Otherwise, significant time and effort may be expended on topics that may not yield sizable improvements. The following does not address the performance issues stemming from multi-user concurrency. This is a separate complex topic that is treated by the document "Maximizing Database Consistency and Concurrency", which ships with SQL Server 4.2. Experience has shown that the greatest benefit in SQL Server performance can be gained from the general area of logical database design, index design, and query design. Conversely, the biggest performance problems are often caused by deficiencies in these areas. If performance is a concern, it is wise to concentrate on these areas first, since very large performance improvements can often be achieved with a relatively small time investment. While other performance issues such as memory, cache buffers, hardware, etc., are certainly candidates for study, experience has shown that the performance gain from these areas is often of an incremental nature. Below are suggestions that have historically yielded significant performance gains: Normalize Logical Database Design Experience has shown that reasonable normalization of the logical database design will yield best performance. A greater number of narrower tables is characteristic of a normalized database. A lesser number of wider tables is characteristic of a denormalized database. A highly normalized database is normally associated with complex relational joins, which can hurt performance. However, the SQL Server optimizer is very efficient at selecting rapid, efficient joins, as long as effective indexes are available. Benefits from normalization: * accelerates sorting and index creation, since tables are narrower * allows more clustered indexes, since there are more tables * indexes tend to be narrower and more compact * allows better use of segments to control physical placement of tables * fewer indexes per table, helping UPDATE performance * fewer NULLs and less redundant data, increasing compactness of the database With SQL Server, normalization will often help performance rather than hurt it. As normalization increases, so will the number and complexity of joins required to retrieve data. We suggest carrying on the normalization process unless this causes many queries to have over 4-way joins. If the logical database design is already fixed and total redesign is not feasible, it may be possible to selectively normalize a large table if study shows a bottleneck on this table. If access to the database is conducted through stored procedures, this schema change could take place without impacting applications. If not, it may be possible to hide the change by creating a view that presents the illusion of a single table. Use Efficient Index Design Unlike many non-relational systems, relational indexes are not considered part of the logical database design. Indexes can be dropped, added, and changed without impacting the database schema or application design in any way other than performance. Efficient index design is paramount in achieving good SQL Server performance. For these reasons, no hesitation should be shown in experimenting with different indexes. Experience has shown the optimizer will reliably choose the most effective index in most cases. The overall strategy should be to provide a good selection of indexes to the optimizer, and trust it to make the right decision. This reduces analysis time and gives good performance over a wide variety of situations. Following are index recommendations. Examine the WHERE clause of your SQL queries, since that is the primary focus of the optimizer. Each column listed in the WHERE clause is a possible candidate for an index. If you have too many queries to examine, pick a representative set, or just the slow ones. If your development tool transparently generates SQL this will be more difficult. Many of these tools allow the logging of the generated SQL syntax to a file or screen for debugging purposes. You may want to inquire of the product's vendor to see if such a feature is available. Use narrow indexes. Narrow indexes are often more effective than multi-column, compound indexes. Narrow indexes have more rows per page, and fewer index levels, boosting performance. The SQL Server optimizer only maintains statistics on the most significant column of a compound index. Therefore if the first column of a compound index has poor selectivity, the optimizer may not use the index. The optimizer can rapidly and effectively analyze hundreds, or even thousands of index and join possibilities. Having a greater number of narrow indexes provides the optimizer with more possibilities to choose from, which usually helps performance. Having a lesser number of wide, multi-column indexes provides the optimizer with fewer possibilities to choose from, which may hurt performance. You shouldn't have more indexes than are necessary to achieve adequate read performance because of the overhead involved in updating those indexes. However even most update-oriented operations require far more reading than writing. Therefore don't hesitate to try a new index if you think it will help. You can always drop it later. Use clustered indexes. Appropriate use of clustered indexes can tremendously increase performance. Even UPDATE and DELETE operations are often accelerated by clustered indexes, since these operations require much reading. You can only have a single clustered index per table, so use this index wisely. Queries that return numerous rows, or queries involving a range of values are good candidates for acceleration by a clustered index. Examples: SELECT * FROM PHONEBOOK WHERE LASTNAME='SMITH', or SELECT * FROM MEMBERTABLE WHERE MEMBER_NO > 5000 AND MEMBER_NO < 6000 By contrast, the LASTNAME or MEMBER_NO columns mentioned above would likely not be good candidates for a non- clustered index. Try to use non-clustered indexes on columns where few rows are returned. Examine column uniqueness. This will help you decide what column is a candidate for a clustered index, non-clustered index, or no index. Example query to examine column uniqueness: SELECT COUNT (DISTINCT COLNAME) FROM TABLENAME. This returns the number of unique values in the column. Compare this to the total number of rows in the table. On a 10,000-row table, 5000 unique values would make the column a good candidate for a non-clustered index. On the same table, 20 unique values would better suit a clustered index. 3 unique values should not be indexed at all. These are only examples, not hard-and-fast rules. Remember to place the indexes on the individual columns listed in the WHERE clauses of the queries. The number of rows returned in a query is also an important factor in index selection. The optimizer considers a non- clustered index to cost at least one page I/O per returned row. At this rate it doesn't take long before it becomes more efficient to merely scan the entire table. This is another reason to restrict the size of the result set, or to locate the large result with with a clustered index. Use Efficient Query Design Some types of queries are inherently resource-intensive. This is related to fundamental database and index issues common to most RDBMSs, not to SQL Server in particular. They are not "inefficient", as the optimizer will implement the queries in the most efficient fashion possible. However they are resource- intensive, and the set-oriented nature of SQL may make them appear inefficient. No degree of optimizer intelligence can eliminate the inherent resource cost of these constructs. They are intrinsically costly when compared to a more simple query. And although SQL Server will use the most optimum access plan, this is limited by what is fundamentally possible. Examples: * large result sets * IN and OR queries * Highly non-unique WHERE clause * != (not equal) * certain column functions like SUM * expressions or data conversions in WHERE clause * local variables in WHERE clause * complex views with GROUP BY or ORDER BY Various factors may necessitate the use of some of these query constructs. The impact of these will be lessened if the optimizer can restrict the result set before applying the resource-intensive portion of the query. Examples: Inefficient: SELECT SUM(SALARY) FROM TABLE More efficient: SELECT SUM(SALARY) FROM TABLE WHERE ZIP='98052' Inefficient: SELECT * FROM TABLE WHERE LNAME=@VAR More efficient: SELECT * FROM TABLE WHERE LNAME=@VAR AND ZIP='98052' In the first example, the SUM operation cannot itself be accelerated with an index. Each row must be read and summed. Assuming that there is an index on the ZIP column, the optimizer will likely use this to initially restrict the result set before applying the SUM. This potentially can be much faster. In the second example, the local variable is not resolved until run time. However the optimizer cannot defer until run time the choice of access plan, but must choose at compile time. Yet at compile time, when the access plan is built, the value of @VAR is not known, and consequently cannot be used as input to index selection. The illustrated technique for improvement involves restricting the result set with an AND clause. An optional technnique would be to use a stored procedure, and pass as a parameter to the stored procedure the value for @VAR. Large result sets are costly on most RDBMSs. Try not to return a large result set to the client for final data selection via browsing. It is much more efficient to restrict the size of the result set, allowing the database back end to perform the function for which it was intended. Techniques to Analyze Slow Performance First isolate the query or queries that are slow. Often it will appear that an entire application is slow, when only a few of the SQL queries are slow. For a development tool that transparently generates SQL, use any available diagnostic or debug mode of this tool to capture the generated SQL. For tools that use embedded SQL, this is much easier. After the slow query is isolated, do the following steps: * Run the suspected slow query in isolation, using a query tool such as ISQL or SAF, and verify it is in fact slow. * Use SET STATISTICS IO ON, and SET SHOWPLAN ON to examine the I/O consumed by the query, and the access plan chosen. Give attention to the count of logical page I/Os. The optimizer's goal is to minimize I/O count. Make a record of the logical I/O count. This forms a baseline number against which to measure improvement. * If the query involves a view or stored procedures, extract the query from the view or stored procedure and run separately. This allows the access plan to change as you experiment with different indexes. * Be aware of possible triggers or views on the involved tables that can transparently generate I/O as the trigger runs. * Examine the indexes of the tables used by the slow query. Use the previously listed techniques to determine if these are good indexes, and change them if necessary. * Re-run the query after making the index change and observe any change in I/O count or access plan. * After noting improvement, run the main application to see if overall performance is better. Check program for I/O or CPU bound behavior. It is often useful to determine if a query is I/O or CPU bound. This focuses improvement efforts on the true bottleneck. For example if a query is CPU bound, adding additional memory to SQL Server will likely not improve performance, as more memory only improves the cache hit ratio, which in this case is already high. Steps for examining I/O vs. CPU bound state of SQL Server: * Use an OS/2 CPU monitoring program. Several are available on CompuServe IBMOS2 forum. * While running query if CPU graph is consistently high (say, >70%) this indicates CPU bound state. * While running query if CPU graph is consistently low (say, <50%) this indicates I/O bound state. * Compare the CPU graph with the STATISTICS IO information Summary SQL Server is capable of very high performance on large databases. To achieve this performance potential, it is necessary to use efficient database design, indexes and queries. These areas are the best candidates for obtaining significant performance improvement. Experimentation with indexes using the guidelines in this document is especially suggested. A methodical approach in analyzing performance problems will often yield very significant improvement for relatively little time investment.