I was recently discussing this topic with a colleague and less than a day later a similar was question was asked on StackOverflow. Then a few days later I was asked a related question by another colleague.
Let’s start with the answer to the question “Is it OK to use a uniqueidentifier (GUID) as a Primary Key?”: Yes, a uniqueidentifier (GUID) column can be fine as a Primary Key, BUT it is not a particularly good choice for the clustered index. In many cases, you will be better off creating the clustered index on a column (or columns) that are likely be used in range searches, and create a non-clustered index on the GUID column.
OK, that’s the answer out of the way! Now for the reasons.
A common reason for using a GUID column as a primary key (over an int or bigint) is the ability to create and assign these in the middle application tier without making a round trip to the database, while still avoiding key clashes between distributed clients.
A primary key is a column (or set of columns) that uniquely identifies each row in a table (or entity). When a column (or a set of columns) is defined as a primary key, SQL Server checks that ALL of the columns that make up the PRIMARY KEY meet the requirement of a PRIMARY KEY constraint which is that they must not contain any NULLs. If the table contains data, a check is made to ensure that this existing data meets the uniqueness constraint. If there are any duplicate rows, the addition of the constraint will fail, as will the creation of the primary key.To enforce this for new rows, SQL Server builds a UNIQUE index. If you do not specify the index type when adding the constraint, SQL Server creates a UNIQUE CLUSTERED index. Which is almost certainly why people sometimes confuse the clustered index with the primary key.
It’s important, so it’s worth repeating: If you don’t specify the index type when designating a Primary Key, SQL Server creates a UNIQUE CLUSTERED index by default.
What’s the significance of the clustering key?
- It defines the lookup value used by all the non-clustered indexes (and therefore should be unique, as narrow as possible and unchanging).
- It defines the table’s order, so fragmentation (and index maintenance) needs to be taken into account.
- It can be used to satisfy a query, either as a table scan or a range query (where the clustering key supports that range)
Notice point (1): Each index row in the non-clustered index contains the non-clustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value (Nonclustered Index Structures). So a wide clustering key adds overhead to every non-clustered index. This overhead can be significant.
What are good choices for the clustering key?
- An identity column (int or bigint)
- A composite key based on date and identity (in that order)
- An ‘ordered’ (or pseudo sequential) GUID (such as SQL Server’s NEWSQUENTIALID(), or a modified, ordered GUID sometimes called a ‘COMB’)
A uniqueidentifier (GUID) column requires 16 bytes of storage (4 times wider than an int identity column), so the space used per row is higher (and less rows fit on a database page), and bandwidth consumed by data transfer is higher.
Column Type | Size (bytes) | Range |
int | 4 | 231 – 1 |
bigint | 8 | 263 – 1 |
uniqueidentifier (GUID) | 16 | 2128 |
If all you need is a key range larger than that supported by an int (4 bytes, 231 – 1 rows) then consider using a bigint (8 bytes), which supports 263 – 1 rows (which should be sufficient for most applications) and saves 8 bytes per row compared to using a GUID.
Are there any performance considerations? Yes. In addition to the increased storage issues, a clustered index based upon a non-ordered GUID will experience fragmentation due to page splits during inserts. If you absolutely have to have a (non-ordered) GUID as a primary key with a clustered index, make sure you have a regular index rebuild as part of your database maintenance plans, and maybe even consider leaving extra space per page by decreasing the Fill Factor for the clustered index.When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth, and reducing the number of page splits. [Note: The fill-factor setting applies only when the index is created, or rebuilt.]