Faster SQL calls with index covering

As sites and applications begin to scale on the data side of the equation, you can quickly find your database to be a bottle neck to performance. Index covering is one technique that can help speed up your queries dramatically.

Technique: Index Covering
Applies To: SQL Server, MySQL, Sybase (Possibly also Oracle, DB2)
When to use it: Thin queries (limited columns) that do not operate on the clustered index.

So what is index covering? In SQL Server when you perform a query on a table that returns only columns contained in a non-clustered index, then SQL will return the values form that index, thus bypassing reads to the actual data pages. This is called Index Covering, and can greatly reduce the IO needed to perform the query.

Let’s say you want to make a call to your database that returns all of your customer names when passed in a particular state. The clustered index on the Customer table is CustomerID.

Select FirstName, LastName From Customer Where State = ‘Oregon’;

SQL Will scan through all of the data pages looking for all of records where ‘Oregon’ is the state. If you have large data rows this can be fairly costly in terms of IO. The field State is a horrible candidate for an index as there are only a finite number of distinct values that can be placed in the field. So what’s a princess to do when her turing machine slows to a veritable crawl? Index covering.

Create Index cust_state On Customer(State, LastName, FirstName);

Since all of the fields needed for the query are now contained in the index SQL will go to the index, find all rows where ‘Oregon’ is the state, and then, seeing that all of the other columns it needs are in the index, it will return the results. In this scenario SQL will never touch the actual data pages. Because the size of the index pages are significantly smaller than those of the data pages, SQL uses less IO. Now place your clustered index in a different filegroup and your performance gain can be even greater. But that’s for a different article. Also note that our covered index would also work with the following query.

Select State, FirstName, LastName From Customer Group By State;

“I’m confused. You said earlier that State was a bad candidate for an index?!” Under normal circumstances it is. Good index practice would tell us to order our index columns as so (LastName, FirstName, State), with the most unique (selective) values first. Remember that when working with index covering we are creating an index for a special purpose. When dealing with large scale databases you need to start thinking differently about your data.

Large scale data systems like the ones I’ve been thinking about lately can seem more like data warehouses than OLTP systems. In some cases they are both at the same time. I think the biggest mistake a developer or architect can make is to become rigid in their thinking about how data needs to be defined and structured.

PS: For those of you who got the veiled reference to Diamond Age. Gr@75!

Leave a comment