Execution Plans and XML Documents
Published: 11th October 2014
Over the past few months I’ve been working with a lot of XML and stored procedures and it’s been a different experience, and not something I’d always refer to as “fun”.

But I have learnt a couple of things along the way which I’ve found interesting… mostly involving XML Documents and the Optimizer.

As it transpires, if you’re using sp_xml_preparedocument to handle your XML then you need to be very wary as to the way that this is interpreted by the Optimizer as it can cause some very nasty looking execution plans to appear.

Let’s do a very simple and very quick example and see what happens. This is very simplistic and meaningless, but it highlights the point I require…

-- Define some very basic XML, this is just a very simple example for demo purposes
declare @xml xml =
'<notes>
<note id="1" name="Invoice Correction" noteDesc="Anyone ordering more than 3 items had a discount missed">
</note>
</notes>'

declare @iHandle int
exec
sp_xml_preparedocument @iHandle output, @xml

select h.SalesOrderID, d.OrderQty, d.ProductID, d.LineTotal, d.ModifiedDate, n.name, n.noteDesc
into #details
from sales.SalesOrderHeader h
cross join OpenXml
(
  
@iHandle, '/notes/note', 2
)
with (id int '@id', name varchar(25) '@name', noteDesc varchar(250) '@noteDesc') n
join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.OrderDate < '2008-01-01'

exec sp_xml_removedocument @iHandle


In essence there’s nothing wrong with that query… it’s fine… so let’s have a look at the execution plan…

Okay, there’s a Clustered Index Scan that would need dealing with, but again… it doesn’t look TOO strange a plan.

However, if you look closely you’ll see that the inputs for the Hash Join are the wrong way around? This would imply to me that SQL Server expected more rows coming from the SalesOrderHeader direction. So let’s have a look at the estimates…

And the estimated plan…

Ever get the feeling there’s something horrible wrong?

So what’s happened? Well, SQL Server presumes that there are ALWAYS 10,000 records coming from an XML document. Therefore it caused some truly horrible estimates to be made.

This also played havoc with the Memory Grant of the query…

Therefore we need to bring this under control as our concurrency will go flying out the window if we allowed queries to run like this.

So what we need to do is to use temp tables and the problem instantly goes away…

-- Define some very basic XML, this is just a very simple example for demo purposes
declare @xml xml =
'<notes>
<note id="1" name="Invoice Correction" noteDesc="Anyone ordering more than 3 items had a discount missed">
</note>
</notes>'

declare @iHandle int
exec
sp_xml_preparedocument @iHandle output, @xml

select *
into #note
from OpenXml
(
  
@iHandle, '/notes/note', 2
)
with (id int '@id', name varchar(25) '@name', noteDesc varchar(250) '@noteDesc')

exec sp_xml_removedocument @iHandle

select h.SalesOrderID, d.OrderQty, d.ProductID, d.LineTotal, d.ModifiedDate, n.name, n.noteDesc
into #details
from sales.SalesOrderHeader h
cross join #note n
join sales.SalesOrderDetail d
on h.SalesOrderID = d.SalesOrderID
where h.OrderDate < '2008-01-01'


That’s MUCH better. We now have hardly any memory grant, our estimates are good, we have the correct inputs going into our join and everyone is happy.

Again, we could probably benefit from an index on SalesOrderDetail, but just to demonstrate this problem I didn’t deem it necessary.

Therefore please be a little careful when using XML with sp_xml_preparedocument within your code.
Comments:
NB: Comments will only appear once they have been moderated.