Unique Index with Union All
Published: 1st October 2014
This follows on directly from the previous post in regard to ensuring that you use UNIQUE if possible on an index as I’ve also found that not having UNIQUE can make quite a hefty difference when using UNION ALL.

Everyone knows that UNION ALL is a very quick and efficient way to merge datasets because there is no de-duping taking place. However, this can all go awry if you want the results to be ordered and you’re not helping SQL Server out with your indexes.

Here’s a simple example… we’re going to create 2 heaps and then we’ll ask SQL Server to perform a UNION ALL over some of the columns but with an ORDER BY clause…

use AdventureWorks2012
go

select *
into #salesHeader
from sales.SalesOrderHeader

select *
into #salesHeader1
from sales.SalesOrderHeader
go

select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader
union all
select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader1
order by salesOrderID
go

drop table #salesHeader, #salesHeader1
go


What we end up with is a Parallel plan with a cost of 4 and a memory grant of 10MB. Not massive, but this is a very small example.

The main thing to note though, is that we have a SORT operator which is expensive but not altogether surprising because we have 2 Heaps and therefore unordered data which warrants a sort in order to meet our output criteria.

So let’s try and remove this sort from the execution plan… the most obvious method is to add a Clustered index to each table as we know that this forces order…

use AdventureWorks2012
go

select *
into #salesHeader
from sales.SalesOrderHeader

select *
into #salesHeader1
from sales.SalesOrderHeader
go

create clustered index ix_salesHeaderTemp on #salesHeader(salesOrderID)
create clustered index ix_salesHeaderTemp1 on #salesHeader1(salesOrderID)
go

select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader
union all
select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader1
order by salesOrderID
go

drop table #salesHeader, #salesHeader1
go


It actually made no different at all… we have the same plan (albeit with Clustered Index Scans), the same cost, the same memory grant, and the same costly Sort.

But why?

Well, without a Unique column SQL Server will attempt to enforce order across EVERY column listed in the output, starting with the column listed in the Order By.

Therefore, in this example of “select salesOrderID, orderDate, PurchaseOrderNumber, SubTotal” SQL Server will presume that salesOrderID might not be unique and therefore it needs to also sort by orderDate, which in turn might not be unique and therefore it moves to the next column etc.

So how do we stop this? It’s not hard… if your clustered index is unique then tell SQL Server…

use AdventureWorks2012
go

select *
into #salesHeader
from sales.SalesOrderHeader

select *
into #salesHeader1
from sales.SalesOrderHeader
go

create unique clustered index ix_salesHeaderTemp on #salesHeader(salesOrderID)
create unique clustered index ix_salesHeaderTemp1 on #salesHeader1(salesOrderID)
go

select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader
union all
select SalesOrderID, OrderDate, PurchaseOrderNumber, SubTotal
from #salesHeader1
order by salesOrderID
go

drop table #salesHeader, #salesHeader1
go


Now we have a much nicer plan, the Sort has gone, we now have a Merge Concatonate, the query cost has fallen to 1, and we no longer have a memory grant. All because we used UNIQUE when defining our clustered index.
Comments:
NB: Comments will only appear once they have been moderated.