Query Tuning - "not in"
Published: 5th July 2013
This is something I’ve seen a lot recently and rarely has it been anywhere near the best way to achieve a result. In essence I’m talking about this type of query:

select *
from largerTable
where id not in
(
  
select largerTableID
  
from subsetTable
)


It’s obvious to see what results we’re trying to achieve (everything from the larger table where that data is not referenced in the subset table), and the syntax is perfectly acceptable. Also, the results you receive will be accurate. So where’s the problem?

Well, the problem comes in the manner in which this query is executed. Let’s set up a test example and see what happens…

if object_id('subsetTable') is not null drop table subsetTable
if object_id('largerTable') is not null drop table largerTable
go

create table largerTable
(
  
id int identity(1, 1),
  
letterA char(100) default('a'),
  
letterF char(100) default('f'),
  
letterN char(100) default('n'),
  
constraint pk_largerTable primary key clustered(id)
)
go

create table subsetTable
(
  
id int identity(1, 1),
  
largerTableID int,
  
letterB char(100) default('b'),
  
letterG char(100) default('g'),
  
letterO char(100) default('o'),
  
constraint pk_subsetTable primary key clustered(id)
)
alter table subsetTable add constraint fk_subsetLarger foreign key (largerTableID) references largerTable(id)
go

set nocount on

declare
@counter int = 1

begin transaction
   while
@counter <= 1000000
  
begin
       insert into
largerTable default values

       if
(select @counter%3) in (0, 1)
          
insert into subsetTable(largerTableID) select @counter

      
set @counter += 1
  
end
commit transaction

dbcc
freeproccache
dbcc dropcleanbuffers

select *
from largerTable
where id not in
(
  
select largerTableID
  
from subsetTable
)
option (maxdop 1)


Let’s have a quick look at the execution plan for the select (I’ve run this with maxdop 1 just to ensure parallelism doesn’t cloud the plan)…

You can clearly see that there is a problem with the Nested Loop join. In a Nested Loop join we take one row from the input table and match it with the output table. This means that we ideally want a small input table with very few rows, whereas in this case we have largerTable as the input which means that we are looping through 1 million records and looking for them in the subsetTable.

Therefore our smaller subsetTable is being scanned 1 million times. Not ideal.

Having executed the statement with Profiler running you can see that this has therefore resulted in a huge number of reads, even with such small tables…

So what’s the better approach? Well the hint is actually hidden away inside the execution plan itself…

The clustered index scan of the subsetTable tells us that it’s actually trying to find records where the largerTableID is null. Ie. Where the record does not exist in the subsetTable. So why don’t we actually take that idea and go with it… let’s join the two tables and pull out those records where the largerTableID is null?

select l.*
from largerTable l
left join subsetTable s
on l.id = s.largerTableID
where s.largerTableID is null
option (maxdop 1)


This gives us a much nicer looking execution plan…

And if we look at the associated trace, we can instantly see that although duration and CPU haven’t moved, we have cut the reads down by a third. That’s quite a good saving as we all know that IOs are something we want to keep to a minimum in our servers.

It’s also worth noting that if parallelism kicks in on your server, then the results for this example query are even more interesting. Running the first query with parallelism produces the following trace results:

Whereas running the latter, better query, produces these:

Worth noting as you can see that the savings in this example were 60%+ in disk IO for the latter query, and it out performed the maxdop 1 version, however, the “not in” query with the nested loop was atrocious when parallelism was invoked.

My advice would be to look out for these “not in” type of statements and if you find any then take a close look at them using the actual execution plan AND Profiler. The fact is that it's very rare to find a tuning tip that's correct every time and therefore you may see different results than I have (in one of my test servers the performance was the same, on the majority though, the above was consistent)... just that this is definitely something to bear in mind if you see it present in a badly performing proc.
Comments:
NB: Comments will only appear once they have been moderated.