PItfalls of the MERGE Statement and NULL Values
Published: 21st May 2013
We all know and love the MERGE statement (if you don’t know it, then read this first), but it does have its quirks, especially when using null values and these can be a major problem if they go undetected as they can leave huge holes in your data if you’re not careful.



As you can see… the data in the picture above is clearly incorrect, yet in a very large dataset this type of oversight can go unnoticed and have a detrimental effect to any business.

To demonstrate the problem, run the following code:

if object_id('tempDB..#name') is not null drop table #name
if object_id('tempDB..#newName') is not null drop table #newName
go

create table #name
(
  
id int not null,
  
firstName varchar(10) not null,
  
lastName varchar(15) not null,
  
hairColour varchar(10) null,
  
constraint pk_name primary key clustered(id)
)
create table #newName
(
  
id int not null,
  
firstName varchar(10) not null,
  
lastName varchar(15) not null,
  
hairColour varchar(10) null,
  
constraint pk_newName primary key clustered(id)
)
go

insert into #name values(1, 'Homer', 'Simpson', null), (2, 'Bart', 'Simpson', 'Yellow')
insert into #newName values(1, 'Homer', 'Simpson', 'Brown'), (3, 'Marge', 'Simpson', 'Blue')
;
merge #name n
using
#newName nn
on
(
  
n.id = nn.id
)
when matched and
(
  
n.hairColour != nn.hairColour
)
then
   update
   set
n.hairColour = nn.hairColour
when not matched then
   insert
(id, firstName, lastName, hairColour)
  
values(nn.id, nn.firstName, nn.lastName, nn.hairColour)
;
select *
from #name




This is clearly wrong… Homer’s hair colour wasn’t updated?

At first glance there’s nothing wrong with the code at all. However, on closer inspection we can see that the problem lies here:

when matched and
(
  
n.hairColour != nn.hairColour
)


This is because SQL cannot use null for an equality based statement. Therefore is there an easy way to cater for this? Well, yes. You COULD use “isnull” to surround your nullable columns and therefore eliminate the problem that way:

when matched and
(
  
isnull(n.hairColour, '') != isnull(nn.hairColour, '')
)


BUT this has its disadvantages as well. For example, how do you know that he value you’re using inside “isnull” is not going to trigger the equality? Eg. isnull(a.score, 0) != isnull(b.score, 0) would not update if b.score is null and a.score is 0.

Also, for this to work, you need to know the datatype of the column in question because isnull(hairColour, ‘a’) may be ideal, but isnull(score, ‘a’) will obviously fail with a datatype conversion error. Therefore this technique is fine for small column lists with known datatypes, but what if you have a table with 200 nullable columns you wish to compare? Finding the datatype of each column alongside a valid value for an isnull function suddenly becomes a daunting task.

So what’s the solution? Use “except”. I admit this makes the statement look a little more complicated and it may take a little longer to type, but it solves all problems perfectly as it deals with null values in exactly the way we require:

when matched and
(
   exists (
select n.hairColour except select nn.hairColour)
)


Therefore let’s go back to our full example and see that we get the correct answer as expected:

if object_id('tempDB..#name') is not null drop table #name
if object_id('tempDB..#newName') is not null drop table #newName
go

create table #name
(
  
id int not null,
  
firstName varchar(10) not null,
  
lastName varchar(15) not null,
  
hairColour varchar(10) null,
  
constraint pk_name primary key clustered(id)
)
create table #newName
(
  
id int not null,
  
firstName varchar(10) not null,
  
lastName varchar(15) not null,
  
hairColour varchar(10) null,
  
constraint pk_newName primary key clustered(id)
)
go

insert into #name values(1, 'Homer', 'Simpson', null), (2, 'Bart', 'Simpson', 'Yellow')
insert into #newName values(1, 'Homer', 'Simpson', 'Brown'), (3, 'Marge', 'Simpson', 'Blue')
;
merge #name n
using
#newName nn
on
(
  
n.id = nn.id
)
when matched and
(
   exists (
select n.hairColour except select nn.hairColour)
)
then
   update
   set
n.hairColour = nn.hairColour
when not matched then
   insert
(id, firstName, lastName, hairColour)
  
values(nn.id, nn.firstName, nn.lastName, nn.hairColour)
;
select *
from #name


Comments:
NB: Comments will only appear once they have been moderated.