SQL Server Indexes: Using the Clustered Index

If you really want to understand SQL Server indexing I suggest following Stairway to SQL Server Indexes. My first blog post on SQL Server indexes is going to focus on clustered indexes I’m going to paraphrase a lot of the information found in the articles linked above to save those of us who don’t want an intimate scientific knowledge on this topic and address what is pertinent to re-factoring indexes for to reduce deadlocks.

Simply put, indexes are smaller, concise reference tables associated with the data tables that tell SQL Server where some of the information requested in a query is located. Without indexes, the query engine performs a full table scan (sequentially looks at every Page in a table) to retrieve the requested data rather than jumping around segmented portions of the data where the requested rows are stored. In some cases where all the information requested resides in an index, SQL Server will simply return the row from the index and not access the main data table at all.

Non-clustered indexes are separate objects with separate storage. A clustered index instructs SQL Server how to sort the data in the main data table itself and creates a logical hidden permanent key. This is why identity columns are popular to create clustered indexes on. These integer or big integer column values are created sequentially by SQL Server when a new row is inserted; there is basically never any resorting when rows are inserted or deleted.

SQL Server saves data tables and non-clustered indexes in 8k byte blocks called Pages within their data files. As CRUD actions are performed on your tables the data within these pages must be re-sorted. SQL Server determines which rows go into which Pages. In the case where sequentially created pages do not contain the data sequentially, external fragmentation is created. The percentage of empty page space is called internal fragmentation. The fixed size nature of pages means some data types are better for indexes then others (creating less fragmentation), additionally the number of field included in an index can adversely affect index performance. I’m going to address fragmentation in a later post and explain why creating clustered indexes on GUID data types is death to a database.

From Stairway to SQL Server Indexes: Level 3, Clustered Indexes:

The clustered index key can be comprised of any columns you chose; it does not have to be based on the primary key.

Keep in mind these additional points about SQL Server clustered indexes:

Because the entries of the clustered index are the rows of the table, there is no bookmark value in a clustered index entry. When SQL Server is already at a row, it does not need a piece of information that tells it where to find that row.
A clustered index always covers the query. Since the index and the table are one and the same, every column of the table is in the index.
Having a clustered index on a table does not impact your options for creating non-clustered indexes on that table.

There can be only one clustered index on a table; the data in a table can’t be sorted two different ways at the same time. Clustering does not require uniqueness. In the case where the clustered index is made up of non-unique fields the sorting results in grouping for these fields.

If the preceding is clear, it should be obvious that creating a clustered index on a particular column and then also creating a non-clustered index with the same column is a waste of resources. The creation of the clustered index enforces sorting of the data and created a permanent key for SQL server to quickly locate the Page where every row is stored.

Additionally, creating indexes in this matter can lead to deadlocks. Consider a large number of rows being inserted into a table in a batch. Even if your clustered index is created on an identity column, SQL Server will perform a sort check on the clustered index. After the Clustered sort is finished the non-clustered indexes that also include this column will be filled and the batch will not be committed until the indexes are ready. The reason is the non-clustered indexes can’t create their bookmarks until page each record is going to reside is determined. When a select against this table happens at the same time, the query engine may decide to access the non-clustered index, but the insert batch has it locked. The insert can’t complete because the select is accessing the non-clustered index is the insert also has to write to.

Now wait, if the clustered column is removed from the non-clustered index, the select query is just going to use the clustered index (if the clustered index column is part of the table links or where clause of the select statement) and then the deadlock will occur due to the clustered index being locked. So how do we get the query engine to use the non-clustered index without using it to create the bookmark in the index? We use a covering index, with is created with the include statement.

From Stairway to SQL Server Indexes: Level 5, Included Columns:

Columns that are in a non-clustered index, but are not part of the index key, are called included columns. These columns are not part of the key, and so do not impact the sequence of entries in the index. Also, as we will see, they cause less overhead than key columns.

There are several advantages in having these columns in the index but not in the index key, such as:

Columns that are not part of the index key do not affect the location of the entry within the index. This, in turn, reduces the overhead of having them in the index.
The effort required to locate an entry(s) in the index is less.
The size of the index will be slightly smaller.
The data distribution statistics for the index will be easier to maintain.

Deciding whether an index column is part of the index key, or just an included column, is not the most important indexing decision you will ever make. That said, columns that frequently appear in the SELECT list but not in the WHERE clause of a query are best placed in the included columns portion of the index.

select objt.object_id, idxc.index_id, clmn.column_id

into #ClustIdxCols

from sys.objects objt

inner join sys.indexes idx on objt.object_id = idx.object_id

inner join sys.index_columns idxc on objt.object_id = idxc.object_id and idx.index_id = idxc.index_id

inner join sys.columns clmn on objt.object_id = clmn.object_id and idxc.column_id = clmn.column_id

where objt.type = ‘U’ and idx.type = 1

order by objt.object_id, idxc.index_id, clmn.column_id

select distinct objt.name, idx.name

from sys.objects objt

inner join sys.indexes idx on objt.object_id = idx.object_id

inner join sys.index_columns idxc on objt.object_id = idxc.object_id

and idx.index_id = idxc.index_id

inner join sys.columns clmn on objt.object_id = clmn.object_id

and idxc.column_id = clmn.column_id

inner join #ClustIdxCols test on objt.object_id = test.object_id

and idxc.column_id = test.column_id

where idx.type = 2 and idxc.is_included_column = 0

order by objt.name, Idx.name

This script identifies non-clustered indexes using clustered index columns (include columns are filtered out). In some cases the tables only contain two indexes, a clustered index on the identity column for the table and an non-clustered index on the same column.

Wisdom from the Stache

Bitcoin Simplified

SQL Server Indexes: Using the Clustered Index

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply