Faster SQL calls with index covering

As sites and applications begin to scale on the data side of the equation, you can quickly find your database to be a bottle neck to performance. Index covering is one technique that can help speed up your queries dramatically.

Technique: Index Covering
Applies To: SQL Server, MySQL, Sybase (Possibly also Oracle, DB2)
When to use it: Thin queries (limited columns) that do not operate on the clustered index.

So what is index covering? In SQL Server when you perform a query on a table that returns only columns contained in a non-clustered index, then SQL will return the values form that index, thus bypassing reads to the actual data pages. This is called Index Covering, and can greatly reduce the IO needed to perform the query.

Let’s say you want to make a call to your database that returns all of your customer names when passed in a particular state. The clustered index on the Customer table is CustomerID.

Select FirstName, LastName From Customer Where State = ‘Oregon’;

SQL Will scan through all of the data pages looking for all of records where ‘Oregon’ is the state. If you have large data rows this can be fairly costly in terms of IO. The field State is a horrible candidate for an index as there are only a finite number of distinct values that can be placed in the field. So what’s a princess to do when her turing machine slows to a veritable crawl? Index covering.

Create Index cust_state On Customer(State, LastName, FirstName);

Since all of the fields needed for the query are now contained in the index SQL will go to the index, find all rows where ‘Oregon’ is the state, and then, seeing that all of the other columns it needs are in the index, it will return the results. In this scenario SQL will never touch the actual data pages. Because the size of the index pages are significantly smaller than those of the data pages, SQL uses less IO. Now place your clustered index in a different filegroup and your performance gain can be even greater. But that’s for a different article. Also note that our covered index would also work with the following query.

Select State, FirstName, LastName From Customer Group By State;

“I’m confused. You said earlier that State was a bad candidate for an index?!” Under normal circumstances it is. Good index practice would tell us to order our index columns as so (LastName, FirstName, State), with the most unique (selective) values first. Remember that when working with index covering we are creating an index for a special purpose. When dealing with large scale databases you need to start thinking differently about your data.

Large scale data systems like the ones I’ve been thinking about lately can seem more like data warehouses than OLTP systems. In some cases they are both at the same time. I think the biggest mistake a developer or architect can make is to become rigid in their thinking about how data needs to be defined and structured.

PS: For those of you who got the veiled reference to Diamond Age. Gr@75!

Performant Scalable Thoughts

I’ve been filling my brain with thoughts of online community building and involving fans more in how you define your brand. I’ve been reading what I can in the space, whether it’s blogs or books trying to form a clear picture of where we are headed. Not just as an industry, but also as a connected society. For those of you paying attention, that’s right I said “fans” and not customers. If you aren’t thinking about your customers as fans, then you probably also think that this whole internet thing is just a fad.
So what does this have to do with Scalability? Lots! For those of you who don’t already know, it’s my job to translate what our marketing and design teams come up with into deliverable solutions for clients. As a digital agency this mostly consist of ways to communicate with brand fans, and a campaign that takes off can need to scale-up quickly to handle large amounts of users, then scale back down over time. At the same time you don’t want to spend an exorbitant amount of time (and client money) building a monolithic site that can infinitely scale if you are only expecting a few thousand users going to it. Don’t get me wrong. A few thousand well targeted users who are fanatic (there’s that word again) about your brand, can be a higher measure of success than 20 times that number who visit your site once, and never give you another thought. But what if your site takes off virally? Would you have to throw out your existing code and start over, or would you have all the pieces in place from the start that will help you scale?

While not every site needs to scale like facebook or twitter, there are several things that can be done with no impact to existing budgets that will help with scalability down the road and performance immediately. This post is the start of a thread on the subject of scalability and performance. While some posts will be short snippets of practical advice that any site can take advantage of, others will be targeted to specific problems involved in scaling a system.

So here is the first tidbit of information that can be used in any web site. It involves client side browser performance.

Most browser cache is case sensitive as it does a straight string compare of a named resource. Therefore if include a reference to an image file on a web page like so

<img src=”Image.jpg” />
it will download the file and place it in the browser cache. If you later reference the same image on the same or another page but change the case in the source url as such

<img src=”image.JPG” />

the browser will download the file from the server again as it perceives it as a different file/url, and create a second entry in the browser cache.

This isn’t a problem if your web server acts in the same manor as Apache where all urls are case sensitive to begin with. In this case the server will not recognize a resource unless the url is presented in the same case as the file on the server. Because IIS isn’t case sensitive a browser could potentially download a file hundreds of times if the case in the source url is slightly different every time.