Query Tuning - Scalar Value Functions
Published: 10th July 2013
Now, let me start by saying that if you have a scalar function which includes table access then you REALLY need to read this as they are bad... so, so bad that I never want you to use them. Seriously. These are horrible, horrible, horrible things when used in queries… please just abandon them now… there are better ways to achieve results.

This is a pet peeve of mine and I also know it’s the same with most DBAs who ever encounter this. In fact, not that I want to you go elsewhere, but a quick search on Google for “T-SQL Scalar Value Functions are evil” will return a plethora of results showing that it’s not just me who hates these abominations of the T-SQL language.

Here’s a fun example of one that I’ll use for this demo… it makes my blood run cold just to see it, but here we go:

select dbo.fn_fullName(personID) as fullName, accountNumber
from sales.Customer
where dbo.fn_fullName(personID) is not null


You can probably see that this is using the AdventureWorks database as I needed some good data with which to demo this, so if you want to play along then get yourself a copy (it’s free).

Right, so, why is this bad? Well, the problem with functions in a query like this is that the function is called against every single row in the table… therefore if you have 1000 rows, this function is called 1000 times. And most of these functions aren’t nice… they reference tables, not just perform simple concatenations (although they’re also bad, just less so). Therefore you could be hitting another table 1000 times just to return your records.

To demonstrate this, create the following function in AdventureWorks:

create function dbo.fn_fullName(@personID int)
returns varchar(100)
as
begin
   return
  
(
      
select FirstName + ' ' + LastName as fullName
      
from person.Person
      
where BusinessEntityID = @personID
  
)
end


Now, let’s turn on Profiler and run the following query:

select dbo.fn_fullName(personID) as fullName, accountNumber
from sales.Customer
where dbo.fn_fullName(personID) is not null


The results from Profiler show just how bad this query can be:

As you can see, profiler is firing the function for every single row in the table. Each of these requires just a couple of reads from the Person.Person table in the database, but that soon adds up. Looking at the totals at the end of the batch, we can see that it used 114,841 reads. That’s a huge number.

If you were using a simple, non-table accessing, scalar function, then I would recommend just putting the raw code into the query and not calling a function, however in this case we’re referencing tables and I would rather not go adding a load of joins into my query just to make it work. But we don’t need to as there’s a nifty little tactic that will allow us to still use our function, but with minimal cost.

Yes, we turn it into a table valued function instead:

drop function dbo.fn_fullName
go

create function dbo.fn_fullName(@personID int)
returns table
as
   return
  
(
      
select FirstName + ' ' + LastName as fullName
      
from person.Person
      
where BusinessEntityID = @personID
  
)
go


Now, to include it in our query, we need to adjust the code to be a CROSS APPLY rather than to use the function in-line:

select f.fullName, accountNumber
from sales.Customer s
cross apply dbo.fn_fullName(personID) f
where f.fullName is not null


Now when we execute this, despite it being called for each record in the table, it does not have anywhere near the number of reads, cpu, or duration of the first usage:

This is a massive improvement. 100,000 reads down to just 200!!! Therefore this is something I would ALWAYS recommend over a scalar function. In a large scale environment this can cut minutes off a query execution time and save your disks from spinning themselves into a frenzy.

If this wasn’t enough, let’s also see what other tweaks can be applied to this type of situation?

Well, this is the execution plan for the scalar function:

Not much we can do about this really… the only table access recorded is that of the Sales.Customer table and that’s always going to require a table scan. So let’s have a look at the table valued function:

Wait a minute… there’s an index recommendation? Now, I’m not saying always follow the examples given to you by SQL Server as there are a LOT of caveats surrounding their suggestions, but in this case I think we should give it a go:

create nonclustered index ix_personID
on sales.Customer (personID) include (accountNumber)


Now re-run our query:

The reads may not have changed much, but the duration and cpu have dropped into insignificance. Overall we now have a duration saving of 99.98%, a CPU saving of 99.98%, and an IO saving of 99.998%.

Anyone still need convincing of the benefits?
Comments:
NB: Comments will only appear once they have been moderated.