GUID’s – Never for Clustered Indexes

Globally Unique Identifiers have their place in software development. They’re great for identifying a library in the GAC or windows registry. They are, however, huge data types from the database perspective.

Oracle, MYSQL, Sybase, and DB2 do not provide any special data type for fields storing GUID’s, for these vendors a GUID is a 34-38 character string (depending on including dashes and “{}”). SQL Server has provided a Unique Identifier data type which has some benefits in storage and access speeds over a 36 character varchar, or nvarchar field. However, they’re still huge…

Unique Identifier Data Type

http://msdn.microsoft.com/en-us/library/ms190215(v=sql.105).aspx

SQL Server’s Unique Identifier displays as a 36 character string (dashes and no “{}”) and stores a GUID as 16 byte binary value. There’s no argument that it’s nearly impossible (not mathematically impossible) to create a duplicate GUID, but how many data sources are going to outgrow a bigint (-2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)) data type? That’s only 8 bytes, half a Unique Identifier. Hard disk space has gotten cheap, why do we care about data type size anyway?  In the article mentioned above, it’s mentioned that indexes created on Unique Identifier fields are going to perform slower than indexes built on integer fields. That statement hardly scratches the surface of performance implications with Unique Identifier indexes, and it’s all related to the size.

Pages and Extents

http://msdn.microsoft.com/en-us/library/ms190969(SQL.105).aspx

The above article explains how SQL stores data and indexes in 8KB pages. 96 bytes are reserved for the page header, there’s a 36 byte row offset and then 8,060 bytes remain for the data or index storage. If your table consisted of just one column, a page could store 503 GUID’s,  or  1007 BigInt’s, or 2015 int’s. Put another way, the smaller the amount of bytes in a row, the more you can store in one page. SQL Server doesn’t control where the Pages are written on the hard disk, the O/S and hardware decide. The chances of consecutive or sequential pages being stored in distant disk sectors increases with the more pages stored for each table or index in the system. As the number of index pages grows, the more out of sync they become with the data pages leading to index fragmentation.

Index Fragmentation

http://www.brentozar.com/archive/2009/02/index-fragmentation-findings-part-1-the-basics/

http://www.brentozar.com/archive/2009/02/index-fragmentation-findings-part-2-size-matters/

Let’s recap what we have so far,

  1. GUID’s are randomly generated values without any sequential nature or restrictions.
  2. GUID’s are twice as big as the biggest integer data types.
  3. The larger a tables rows are the more pages have to be created to store the data.
  4. The more pages an index has, the more fragmented they get.
  5. The more fragmented the indexes get the more frequently they have to be rebuilt.

Clustered Index Implications

Clustered indexes set the organization and sorting of the actual data in the table. Non-clustered indexes created on a table with a clustered index have to be updated with pointer changes as records are inserted or deleted, or the clustered index value updated because these changes require the data pages to be resorted and new keys generated. SQL Server Identity columns of an integer data type reduce a lot of I/O overhead and SQL server processing because the rows are always inserted in the correct order. GUID values are almost never inserted in the correct order because of their random nature. Thus, with a GUID clustered index every insert or delete or update of that field requires data page reorganization, non-clustered index updates, more pages to be created, and increased fragmentation.

Leave a comment