Better Row Estimates with Table Variables
Published: 5th November 2014
In this episode of “Fun with Flags” we’ll be looking at TF 2453.

Having been recently playing with SQL Server 2014 I was intrigued to see the following in the list of fixes included for Cumulative Update 3 (note this is also included in SQL Server 2012 Service Pack 2)…

“FIX: Poor performance when you use table variables in SQL Server 2012 or SQL Server 2014”. As I’m not a complete fan of table variables (they’re good, but for limited uses), I was intrigued.

Reading further it turns out that what Microsoft have done is to provide a trace flag which, when turned on, tells SQL Server to obtain more accurate row estimates for the table variable before creating an execution plan.

This caters for a massive flaw in the table variable so I thought I would have a play.

We’re all well aware that SQL Server only ever provides an estimated number of rows of 1 for a table variable.

This can be over-ridden with OPTION (RECOMPILE) but I’m not a fan of that due to lack of plan re-use and increased resource overhead.

Because of this I have always used table variables sparingly, generally when I know I’m going to have a very small dataset and therefore knowing that an estimate of 1 row won’t cause too much mayhem in my plan.

So what happens with this new trace flag? Let’s have a look:

Here’s the good old table variable performing atrociously with a large dataset as we’ve come to expect…

declare @temp table
(
  
salesOrderID int
)
insert into @temp
select salesOrderID
from sales.SalesOrderHeader

select t.salesOrderID, sum(OrderQty) totalOrderQty
from @temp t
join sales.SalesOrderDetail d
on t.salesOrderID = d.SalesOrderID
group by t.salesOrderID


As you can see this has done what we expect… SQL Server has presumed that we’re only getting 1 row from our variable table and therefore has chosen us a Nested Loop meaning that we’ve encountered nearly 100,000 reads.

So how would we deal with this in the past? We’d force a statement recompile…

declare @temp table
(
  
salesOrderID int
)
insert into @temp
select salesOrderID
from sales.SalesOrderHeader

select t.salesOrderID, sum(OrderQty) totalOrderQty
from @temp t
join sales.SalesOrderDetail d
on t.salesOrderID = d.SalesOrderID
group by t.salesOrderID
option (recompile)


This works a treat… SQL Server now knows how many rows we’re expecting and therefore we get a MUCH better plan.

However I tend to work in VERY highly transactional environments and you simply can’t afford the overhead of a recompile on every table variable statement in a query.

This brings me back to our flag… Microsoft have now introduced TF 2453 which can alleviate all our pain…

dbcc traceon(2453)

declare @temp table
(
  
salesOrderID int
)
insert into @temp
select salesOrderID
from sales.SalesOrderHeader

select t.salesOrderID, sum(OrderQty) totalOrderQty
from @temp t
join sales.SalesOrderDetail d
on t.salesOrderID = d.SalesOrderID
group by t.salesOrderID

dbcc traceoff(2453)


As you can see, this is ideal… there’s no recompile (if you don’t believe me then run the code yourself with Profiler on in the background) and yet SQL Server has gathered enough statistics to be able to create us the same plan but without the recompilation overhead.

Also note that the new trace flag cannot be used with querytraceon and has to be set for the whole connection. That’s not a major problem, but something worth noting as I much prefer statement level rather than connection level, but others may not have that same concern.

Now before you get overly excited it’s not perfect and therefore I would still mostly use temp tables, but this does alleviate some of the headaches at least meaning that I would at least now consider variable tables instead of immediately writing them off.

So why am I still leaning towards temp tables? Well it’s actually because the new mechanism is good for something simple like basic row counts but it still won’t perform any actual statistical analysis on the column values…

dbcc traceon(2453)

declare @temp table
(
  
salesOrderID int primary key clustered,
  
orderDate datetime
)
insert into @temp
select salesOrderID, OrderDate
from sales.SalesOrderHeader

select t.salesOrderID, sum(OrderQty) totalOrderQty
from @temp t
join sales.SalesOrderDetail d
on t.salesOrderID = d.SalesOrderID
where t.salesOrderID < 46000
group by t.salesOrderID

dbcc traceoff(2453)


This produces the following Estimated plan:

Versus this Actual plan:

We see the same pattern when using a WHERE clause against the OrderDate column as well (which I’m not going to include, but feel free to test).

However, these are the Estimated and Actuals for the same query but using a temp table instead:

As you can see, these are almost identical.

So, to emphasise, this trace flag is very useful for pure rowcounts, but otherwise the table variables are still lacking. Still… it’s a huge step in the right direction and I might now start to use table variables a little more than before.
Comments:
NB: Comments will only appear once they have been moderated.