Nested Loop Operator
Published: 19th August 2013
Join operators within SQL Server seem to be very misunderstood in general as people are forever telling me that they want a query to use this or that types of operator for the joins in their plan and they will even use hints to force these. Sadly it seems to come from a misunderstanding that certain types are better than others. This is not the case as they are all designed for a reason and each have their speciality.

In this case I’m going to cover a quick overview of the Nested Loop Join… what it is, how it works, and what it’s best suited for.

This isn’t going to be a massive Microsoft White Paper standard post (mainly because I’m nowhere near qualified to write one), but a simple overview just to get you understanding the basics as to why the optimiser will sometimes choose this over the alternatives and what it can mean to performance.

For a VERY in depth and, frankly, brilliant blog as to the in depth internals of this operator, see Craig Freedman’s blog here.

For the simple version, read on.

So, what is a Nested Loop join?

Well it’s exactly as the name implies… it’s a nested loop. Ie. It’s a loop inside a loop which is used to join records from 2 tables.

Think of it this way… we have tableA and tableB… what a nested loop does is it loops through tableA picking up a record at a time. For each record in tableA it loops through tableB looking for a match.

Hence, loop inside a loop being a nested loop. Simple, yes? At least I hope that made sense.

Here’s what one looks like in an execution plan:

So… when would we want to see one of these in our execution plans? As a rough guide (as there are numerous factors which can influence when you’d want to see this), they’re ideal for relatively small data inputs which are seeking into a larger table using an index.

Let’s create an example using AdventureWorks:

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID <= 100


Looking at the execution plan we can see the following:

Success… we’re using a Nested Loop join. But why? Well, as per the above, we’re pulling just a few records from the Sales.SalesOrderHeader table and then using the index on Sales.SalesOrderDetail to seek out the matches.

This is MUCH more efficient than any other type of join.

I’ll run this again with the other joins and include some stats:

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID = 29825
go

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
inner merge join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID = 29825
go

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
inner hash join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID = 29825
go


As you can see… for this particular type of query, the loop join is by far the best performer as it is highly selective.

For this reason you will usually see this type of join if you ever encounter the Bookmark Lookup which is simply SQL Server having got a subset of data from a non-covering nonclustered index and using that to seek out some extra columns using the clustered index of a table.

So why wouldn’t you want to use this? Well, there is obviously a tipping point at which it becomes less expensive to simply perform a table scan (sequential IO) on either table versus a large amount of random IO to pull out matching records.

We can easily show this by simply extending the criteria of our query:

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
inner loop join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID >= 29825
go

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
inner merge join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID >= 29825
go

select h.customerID, d.ProductID, d.UnitPrice
from sales.SalesOrderHeader h
inner hash join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.CustomerID >= 29825
go


Doing this means that we’re effectively joining both tables together in their entirety… therefore to perform a nested loop we would be reading all 1707 rows from the Sales.SalesOrderHeader table and scanning the entire Sales.SalesOrderDetail table for each record. That’s a huge number of reads and clearly less practical than simply scanning each table once in full and performing a HASH join. This can be easily shown in Profiler:

A couple of additional notes for this join type… it has NO memory requirement… and it relies on statistics and cardinality in order to make sure that the smallest table is on the top of the join (in the visual execution plan). Therefore if you do see this join and it’s performing badly then the optimiser has chosen it for a reason, therefore check if you’re giving the optimiser all the info it needs in order to make the best decision. This is a much better option than simply forcing the join type you want as, if the data drastically changes, this could suddenly cause huge performance issues without warning… and no-one wants that.

Without question the Nested Loop has its advantages and pitfalls as a join, but that’s the point… when it’s needed, it’s ideal and when it’s not then there are other preferred options. Again showing that there’s no “best join”… they all have their place.
Comments:
NB: Comments will only appear once they have been moderated.