Clustered Index vs Non-Clustered on Heap
Published: 2nd November 2013
I had a question recently in which someone asked the following:

"I'm trying to determine what the overall performance difference would be in the following situation…
Assume I have a large table, 500M records that have a non-unique RecordID column (eg. RecordID (BIGINT), SubID (BIGINT), Name, Detail)
I'm only ever going to select * from Table where RecordID = ?
If I create a clustedIndex on RecordID the execution plan shows only two steps
Select + Clustered Index Seek
If I leave the table as a HEAP and create a no-clustered index on RecordID the execution plan shows 5 steps
Nested Loops Inner Join <- ((Compute Scaler + Index Seek) + RID Lookup)
Clearly I'm going to get better insert performance when inserting into the HEAP, especially when page spits are required on the clustered index.
What I don't know is what the select performance difference would be under load.
e.g. I can expect the RID lookup to be X% slower than the Clustered Index Seek"


The answer I gave was as follows:

This is a loaded question from all angles...

Basically if you're going to create a non-clustered index then please create it on a clustered index and not on a heap. This is because the indexes themselves are MUCH easier to maintain against a clustered index than a heap. Also you end up in a situation where you could have forwarding records in your data pages themselves which mean that trying to access 1 single record would cause multiple records to be read (the forwarding record then the forwarded record).

Also, what you're seeing is that you are creating the non-clustered index on a single column. This means that when you perform "select *" you are asking for all rows. The non-clustered index only contains the single column and a pointer to the location of the record in the heap. Therefore when you request "recordID 1", for example, SQL will seek against the non-clustered index for recordID 1, then obtain the RID (which is the location of the record in the heap), and then it has to scan the heap looking for that RID and therefore the rest of the columns.

In a clustered index you are simply storing the entire table in an ordered fashion. Therefore it contains ALL columns. This means that when you search for the RecordID all the other columns are right there with it and therefore there's no second lookup.

As you can see, in the case you provide, I can't see anything but indicators that you should use the clustered index as down the road your non-clustered index could become very expensive indeed.
Comments:
NB: Comments will only appear once they have been moderated.