Vertical Partitioning To Reduce Reads
Published: 26th January 2014
Directly following on from my posts about Reads and the 8k Page, Vertical Partitioning your tables can suddenly make a lot more sense and it’s very quick and easy to see how vast improvements can be made to the speed of your queries and their impact on your disks.

The concept itself is incredibly basic… you are literally taking one wide table and breaking it up into “regularly returned columns” and “infrequently returned columns”. Sounds unnecessary, but in respect of my reads and 8k page posts your cogs should already be turning and seeing the implications this change can have.

So… first things first, let’s create a simple table to use in our test:

set nocount on

if
object_id('testProductFull') is not null drop table testProductFull
go

create table testProductFull
(
  
id int identity(1, 1) not null,
  
colourID tinyint,
  
weightID smallint,
  
shapeID tinyint,
  
makeID smallint,
  
modelID smallint,
  
priceRangeID smallint,
  
prodDescription char(970)
)
go


As you can see I’ve designed this so that it takes up roughly 1000 bytes per row. Therefore we know that we can fit 8 records per 8k page.

To ensure this I’m going to create a clustered index on the table and specify a fillfactor of 100 so that we fit the maximum number of records on each page:

create unique clustered index ix_testProductFull on testProductFull(id) with (fillfactor = 100)
go


What I’m now going to do is populate the table with 8000 records of random data using a random number generator limiting the values which can go into each column (for example, colourID can only contain values 0, 1, 2, and 3). This will ensure a decent size data set for some basic queries and also ensure 1000 data pages are used:

insert into testProductFull
select abs(convert(int, convert(varbinary, newID()))) % 3 colourID,
      
abs(convert(int, convert(varbinary, newID()))) % 6 weightID,
      
abs(convert(int, convert(varbinary, newID()))) % 4 shapeID,
      
abs(convert(int, convert(varbinary, newID()))) % 7 makeID,
      
abs(convert(int, convert(varbinary, newID()))) % 3 modelID,
      
abs(convert(int, convert(varbinary, newID()))) % 8 priceRangeID,
      
'a'
go 8000

select object_name(i.object_id), a.data_pages
from sys.indexes i
join sys.partitions p
on i.object_id = p.object_id
and i.index_id = p.index_id
join sys.allocation_units a
on p.partition_id = a.container_id
where i.name = 'ix_testProductFull'


We’re now set up in a way in which I’ve seen regularly in companies who are doing a lot of web based searches. For example, someone selling bikes might allow you to select the colour, make, model etc from a series of drop down boxes and then results appear on screen.

What happens next is that all the data is loaded into the page and when someone then selects a bike a separate window is opened on the webpage showing you a more detailed description of the product.

Hence the above seems a good way to set up your database schema.

However, in terms of disk io and reads, vertical partitioning may well be the way forward.

To explain, we’ll run the following queries and look at the resulting output:

set statistics io on

select
colourID, weightID, shapeID, makeID, modelID, priceRangeID, prodDescription
from testProductFull
where colourID = 2

select colourID, weightID, shapeID, makeID, modelID, priceRangeID, prodDescription
from testProductFull
where makeID = 5


As you can see, just to get some results back we’ve made 1000 reads in each case… that’s every page in the table just so that when someone clicks “details” we have the description already on hand.

This is a perfect example as to where vertical partitioning can help.

What we’re now going to do is split the data into “regularly queried” and “rarely queried”. Basically we’re going to leave all the IDs in one table and split the description out into a separate table.

This means that to achieve our website functionality we’re going to need to make 2 calls to the database… one to return the initial results, then a second to return the description.

However, what you’ll see is that even though we’re making an extra database call, the overhead on the disk subsystem is massively reduced.

So let’s split up our table:

if object_id('testProductFullIDs') is not null drop table testProductFullIDs
if object_id('testProductFullDesc') is not null drop table testProductFullDesc
go

create table testProductFullIDs
(
  
id int identity(1, 1) not null,
  
colourID tinyint,
  
weightID smallint,
  
shapeID tinyint,
  
makeID smallint,
  
modelID smallint,
  
priceRangeID smallint
)
go

create unique clustered index ix_testProductFullIDs on testProductFullIDs(id) with (fillfactor = 100)
go

create table testProductFullDesc
(
  
id int not null,
  
prodDescription char(970)
)
go

create unique clustered index ix_testProductFullDesc on testProductFullDesc(id) with (fillfactor = 100)
go

alter table testProductFullDesc add constraint fk_testProductFullDescIDs foreign key (id) references testProductFullIDs(id)
go

set identity_insert testProductFullIDs on
insert into
testProductFullIDs
(
  
id, colourID, weightID, shapeID, makeID, modelID, priceRangeID
)
select id, colourID, weightID, shapeID, makeID, modelID, priceRangeID
from testProductFull
set identity_insert testProductFullIDs off
go

insert into testProductFullDesc
select id, prodDescription
from testProductFull
go


Okay, now in order to not have to change our website code we can drop our original table and add a view over the top of these tables:

if object_id('testProductFull') is not null drop table testProductFull
go

create view testProductFull
as
   select
p.id, colourID, weightID, shapeID, makeID, modelID, priceRangeID, prodDescription
  
from testProductFullIDs p
  
left join testProductFullDesc d
  
on p.id = d.id
go


This means that any code reading from the old wide table will not have to be changed. Any data updating or inserting will but that is usually easier to change. It could also be dealt with using indexed views, but I’m not going to go into that here.

Anyway, we’ll now run the same queries as above, but this time in 2 parts… first we’ll obtain the main search results… then we’ll make a second database call to obtain the description of the item selected:

set statistics io on

select
colourID, weightID, shapeID, makeID, modelID, priceRangeID
from testProductFull
where colourID = 2

select prodDescription
from testProductFull
where id = 37

select id, colourID, weightID, shapeID, makeID, modelID, priceRangeID
from testProductFull
where makeID = 5

select prodDescription
from testProductFull
where id = 251


As you can see, even though we’re actually making more calls to the database, the number of reads involved has reduced from a total of 2000 using the wide table, to a total of 60.

This is a simple example using just 8000 records in a very thin table, imagine what a difference this could make to your IO subsystem when you scale it up to millions of records across a much wider table. The savings could potentially be huge, therefore making this something you should definitely consider.
Comments:
NB: Comments will only appear once they have been moderated.