How to Use Partitioning to Make kCura Relativity Faster

kCura Relativity is an e-discovery program used by law firms to find evidence quickly. I’ve blogged about performance tuning Relativity, and today I’m going to go a little deeper to explain why DBAs have to be aware of Relativity database contents.

In Relativity, every workspace (case) lives in its own SQL Server database. That one database houses:

  • Document metadata – where the document was found, what type of document it is
  • Extracted text from each document – the content of emails, spreadsheets, files
  • Document tagging and highlighting – things the lawyers discovered about the documents and noted for later review
  • Workspace configuration – permissions data about who’s allowed to see what documents
  • Auditing trails – who’s searched for what terms, what documents they’ve looked at, and what changes they made

For performance tuners like me, that last one is kinda interesting. I totally understand that we have to capture every activity in Relativity and log it to a table, but log-sourced data has different performance and recoverability requirements than other e-discovery data.

The AuditRecord table is append-only. We don’t go back and modify AuditRecord data – we just constantly add to it, tacking on data at the end of the table. This means it has different backup requirements – I certainly don’t want to be doing a full backup on this table every day, repeatedly backing up the exact same data over and over and over when it will never change.

It will never get deleted. We have to keep this data throughout the life of the case because the client may need to go back through time to see who did what during the case. This means it will keep growing and growing, making backups tougher over time.

In big cases, it can dominate the database. I’ve seen instances where the AuditRecord table consumed more than half of the database size – meaning in a 1TB database, 500GB of the contents are AuditRecord. This means backups will take twice as long.

In the event of a disaster, I don’t need it right away. In our 1TB workspace example, I would prefer to restore the 500GB workspace data first, let the lawyers in to do document review, and then take my time restoring a separate 500GB AuditRecordArchive database.

Splitting AuditRecord Into an Archive Database

In order to pull this off, I need two separate databases:

  1. Workspace database – the normal EDDS12345 database for workspace 12345. Inside this database, I have all the normal Relativity tables, but only a small AuditRecord table with the current audits – say the last 7 days. As people review documents, I’d log that data into this AuditRecord table.
  2. AuditRecordArchive database – say, EDDS12345_AuditRecordArchive. In here, I have one AuditRecordArchive table that has all of the AuditRecord data more than 1 week old.

Once I’ve set this up, then I need a job that sweeps the EDDS12345.AuditRecord data into the archive database once per week. Immediately after that job finishes, then I do a full backup and DBCC of EDDS12345_AuditRecordArchive – and then I don’t need to back it up again until the next sweep.

If I want to get really fancy, I don’t do daily DBCCs or index maintenance against that AuditRecordArchive database either. If anything goes wrong with it, like database corruption, I just restore to last week’s full backup and I’m off to the races. This means less downtime for database maintenance.

Great News! kCura Supports This

It’s called “partitioning the AuditRecord table”, and your kCura support contacts can walk you through it. It doesn’t involve SQL Server partitioned tables at all – they just call it partitioning because it’s the same basic concept, only done with application-level code.

However, I don’t recommend doing this by default across all your databases. This technique is going to instantly double the number of databases you have and make your management much more complex. However, I do recommend reviewing your largest workspaces to see if AuditRecord is consuming half or more of the database space. If so, consider partitioning their AuditRecord tables to get faster backups, database maintenance jobs, and restores.

At the risk of sounding like a fanboy, this is one of the reasons I love working with the kCura folks. They really care about database performance, they take suggestions like this, and they implement it in a way that makes a real difference for customers.

This is also why database administrators need to:

  1. Understand the real business purpose of the biggest tables in their databases
  2. Build working, productive relationships with their software vendors
  3. Come up with creative approaches to ease SQL Server pains
  4. Help the vendors implement these approaches in software

Back Up Transaction Logs Every Minute. Yes, Really.

Right now, your transaction log backups are set up to run every 15 minutes, every hour, or maybe every few hours.

It’s time for a reality check. Figure out what internal customer owns the data inside the database, and walk into their office. Ask them how much data they’re comfortable losing.

You’re probably going to have to repeat the question a couple of times because they won’t believe these words are coming out of your mouth. After they eventually recover, they’ll blurt out words like “NONE” and “ZERO” and “ZOMG”. Give them a quick overview of how much it costs to have a no-data-loss system – typically in the millions of dollars – and then ask for another answer.

They’re going to ask for a list of options and their costs. Forget the fine-grained details for now – let’s just assume one of your solutions involves transaction log backups.

The big hand is on the network cable, and the little hand is on the power cable.

The big hand is on the network cable, and the little hand is on the power cable.

Is it more expensive to run your log backups:

  • A – Every hour
  • B – Every 15 minutes
  • C – Every 5 minutes
  • D – Every minute

It’s a trick question – they all cost the same.


Nope – it actually means less overhead. To keep the numbers easy, let’s say you generate 60GB of transaction log data per hour. Would you rather back up 1GB per minute in small chunks, or kick the server in the junk every hour trying to back up 60GB all at once? The users would definitely notice the latter – the server will slow to a crawl while it handles that firehose of log data for a sustained period.


Yes, but if you think you’re really going to manually restore log backups through the GUI, one at a time, you haven’t been through a real mission-critical crash. Long before disaster strikes, you need to be familiar with a script-based method to automatically restore all of the log files in a directory. Test it, get comfortable with it, and then when the outage hits, you can let your script restore everything ASAP, no clicks required.


If you have too many databases and your jobs can’t keep up, it’s time to start breaking up the log backups into multiple jobs. Some of my clients use two jobs, for example – one to back up all odd-numbered databases by the database ID in sys.databases, and another one to back up all the evens. The log backups don’t stress out their servers by any means, and users don’t notice the impact of two simultaneous log backups in two different databases.


I’m just as serious as you were when you randomly picked 15 minutes or 1 hour as the log backup schedule. The point is that it’s not for the geeks to decide – it’s for the customer to decide. Ask your business user about the business needs for their data, and then give them a menu of options to meet that goal.

You think they’re all going to insist that all data is mission-critical, but you’d be surprised. Once they see the costs involved with log shipping, AlwaysOn Availability Groups, SAN replication, and so forth, they’re able to make better informed decisions about which data really is priceless, and which data we could really re-enter at minimal cost.

But until you ask them this question, they assume you’re never going to lose data.

Alaska has a SQL Server User Group!

Alaska is on the PASS map!

Alaska is on the PASS map!

I’m really stoked to share the news: Alaska (my home state) finally has a PASS Chapter of its own! The group just got started last December, and officially welcomed into the PASS organization at the end of January. While they don’t have a Facebook or Twitter account yet, they do have a website and a YouTube channel.

The group meets every month and draws about twenty attendees, according to Chapter Leader Greg Burns. (Greg also runs the local SharePoint user group, which has about eighty members.) The audience is a mix of DBAs and developers, mostly.

Curious. Why would I mention the audience?

Because Greg is running a PASS Chapter for the first time, he could use a lot of help. He’s looking for speakers — remote or in-person — to present at upcoming meetings. If you’re interested in presenting to the group remotely, or just looking for an excuse to visit by far the largest state in the union…[prolonged eye contact with Texas]…just drop Greg a line at AlaskaSQL(at)

But wait, there’s more! If you’re a current or former PASS Chapter leader, you probably have some great tips on how to structure meetings, build membership, advertise your group, line up sponsors, and other things it takes to grow a user group. Rather than flood Greg’s inbox with your collective wisdom, let’s assemble them here in the comments so they’re all in one place. I can think of no better way to welcome Alaska to the SQL Server community than to show them how much we help each other.

Extended Events for Tracking Code Improvements (video)

Ever wanted to prove that your code or index changes had a positive impact? In this 30 minute video, Kendra shows you how to use Extended Events to measure a production workload– and how you can aggregate the results to prove your changes made a difference.

Want to skip straight to the demos? Here’s a quick timeline:

  • Demo 1: How I generate a fake workload – 2:45
  • Demo 2: Taking an initial benchmark with Extended Events – 4:45
  • Demo 3: Aggregating the results of the Extended Events Trace -10:52
  • Demo 4: Taking a second benchmark with Extended Events and aggregating the results – 18:05

Want a sample of the type of Extended Events trace run in the demo? Here’s a sample script. This isn’t any rocket science on my part: I generated the script with the Extended Events wizard and then told it to script it out. (Try it yourself!)

--These scripts were generated from SQL Server 2012
--Management Studio Extended Events Wizard
--Customization is primarily just formatting for readability

CREATE EVENT SESSION [Production Perf Sample-Before] ON SERVER 

ADD EVENT sqlserver.sp_statement_completed(SET collect_object_name=(1)
    WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5)))
	AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4))))
	AND ([package0].[equal_boolean]([sqlserver].[is_system],(0)))
	AND ([sqlserver].[query_hash]<>(0))

ADD EVENT sqlserver.sql_statement_completed(
    WHERE ((([package0].[divides_by_uint64]([sqlserver].[session_id],(5)))
	AND ([package0].[greater_than_uint64]([sqlserver].[database_id],(4))))
	AND ([package0].[equal_boolean]([sqlserver].[is_system],(0)))
	AND ([sqlserver].[query_hash]<>(0))

ADD TARGET package0.event_file
	(SET filename=N'c:\Xevents\Traces\Production Perf Sample-Before')



The Evolution of SQL Server Instrumentation

One of my favorite things about SQL Server is the instrumentation. It’s incredibly good at giving you information about what’s going on in the database engine and query optimizer.

Want to know why your server is slow? Need to know what made it go offline? You have a lot of tools available.

This hasn’t always been the case. One of the toughest things about the old SQL Server 2000 is the lack of instrumentation. The tools that help you improve SQL Server have all been added in the last 10 years.


The cool thing is that the instrumentation gets better and better with each version of SQL Server– new features get new Dynamic Management Views and functions. And sometimes we get a whole new feature to help us get insight into SQL Server.

If you’ve still got SQL Server 2000 instances where performance and availability is important, explain to your management how limited the tools are to diagnose and repair issues. SQL Server’s come an awful long way!

San Diego Class Recap

We had a lot of fun this month teaching SQL Server classes at the San Diego Harbor Hilton. (We wrote about how we picked the cities earlier.) Even when we teach, we learn, and then we share what we learned, so here we are.

San Diego Harbor Hilton

San Diego Harbor Hilton

We like to try new things with our products and services, and this year’s classes brought a few firsts for us. Let’s recap what’s new and how it worked.

We tried new class titles and module groupings. This year, we’re running a 2-day class on How to Be a Senior DBA and a 3-day class on SQL Server Performance Troubleshooting. We wanted to find out if we’d gotten the module mixes right.

To learn, we surveyed the bejeezus out of the attendees at the start. How did they rate their skills at each module before we started, and which modules were they looking forward to the most? Then at lunchtime and end-of-day, we gave them more surveys to find out if they would recommend each module to others.

We learned right from the first day – it was obvious from the looking-forward-to question that the attendees weren’t interested in one particular module, so we gave them the option to switch it out for another module instead. That went really well, and we’ll adapt the Chicago and Philadelphia itineraries to include that change by default.

Be Your Neighbor's Consultant

Be Your Neighbor’s Consultant

We added a lot of attendee interaction. One of the best ways to learn something is to teach it to someone else. In the high availability architecture module, we ran a 5-minute portion where you had to describe your app’s RPO/RTO needs to the person sitting next to you, and they had to recommend the right HA/DR infrastructure.

Great side effect: this got the attendees out of their shell right from the start! Before the very first bio break, each attendee knew at least one other person’s SQL Server challenges and felt comfortable talking about it.

Kendra Discusses the Outage Post-Mortem

Kendra Discusses the Outage Post-Mortem

We added optional after-hours group lab exercises. For example, on the first night of the How to Be a Senior DBA class, we broke attendees up into teams and handed them a production outage to fix. We watched via the projector as each team worked on their servers.

After the outage finished, the team conducted a post-mortem meeting just as they would (maybe?) in real-life, explaining what caused the outage, how they fixed it, and how they would ensure it didn’t happen again. Prizes went to the best solvers and best explainers.

Jeremiah Teaching

Jeremiah Teaching

We used attendee questions to improve the training. While the training takes place, one of us sits in the back of the room and transcribes every attendee question.

Now that the training is over, we’ll revisit the slides and demos to see where we can build those answers right into the training to keep making each delivery better. (We do this at our conference sessions, user group sessions, and webcasts too.)

Granted, we can’t just add material like crazy – otherwise we’d run way long every day – but sometimes it makes sense to take out portions of training material that isn’t working well, add in other portions, or reorder slides to introduce answers in a way that makes more sense for attendees. We have some slide decks we’ve been working on for several years, and gradually improving this way the whole time.

Brent Teaching with Hardware

Brent Teaching with Hardware

We added hands-on hardware for illustration. One of my favorite teaching icons says that if you want to teach students an abstract topic, give them something to put their hands on. I brought motherboards, memory, hard drives, and cables in to teach about how wait stats work, and as various queries ran onscreen, we talked about the physical holdup.

This class was the first time I’d tried this approach in person, and this is a great example of a session I’ll need to tweak over time. The time budgeting failed miserably – I didn’t get to go as deep as I wanted because I tried to cover too many things in too short of time. Now the fun part starts – I get to revisit the presentation, the attendee questions, and the subject matter, and figure out how I can impart maximum learning in 60 minutes.

Kendra Teaching

Kendra Teaching

We added prerequisite training videos and resources. To make sure everybody was on the same page when class started, each attendee got homework when they enrolled. They got a login for our training video system that gave them access to a group of videos we picked just for them.

When the in-person class started, their training login suddenly had access to all of the scripts and resources for the class, too. We use that same delivery system to handle all of our resources, and we love how it works.

We even extended it based on what we learned in the first class. One of the Senior DBA students suggested that we compile a list of learning resources mentioned during the class, build a PDF of the links, and add that PDF to the training site, too. Great idea! Plus, since we’ll be teaching these same classes in Chicago and Philadelphia later this year, we can even start with that same resource list because many of them will be applicable.

San Diego (the Hilton is on the left)

San Diego (the Hilton is on the left)

See, we learn a lot even when we’re teaching. We have such a great time with this, and we can’t wait to keep sharing what we learn. If it sounds like fun, join us in Chicago or Philly later this year.

How to count the number of rows in a table in SQL Server

“How many rows exist in a table?”

It seems like such an innocent request. It isn’t too hard to get this information out of SQL Server. But before you open SSMS and whip out a quick query, understand that there are multiple methods to get this information out of SQL Server – and none of them are perfect!

COUNT(*) or COUNT(1)

The seemingly obvious way to get the count of rows from the table is to use the COUNT function. There are two common ways to do this – COUNT(*) and COUNT(1). Let’s look at COUNT(*) first.

FROM dbo.bigTransactionHistory;

The STATISTICS IO output of this query shows that SQL Server is doing a lot of work! Over 100,000 logical reads, physical reads, and even read-ahead reads need to be done to satisfy this query.

(1 row(s) affected)
Table 'bigTransactionHistory'. Scan count 5, logical reads 132534, 
physical reads 3, read-ahead reads 131834, lob logical reads 0, 
lob physical reads 0, lob read-ahead reads 0.

Looking at the execution plan, we can see an Index Scan returning over 31 million rows. This means that SQL Server is reading every row in the index, then aggregating and counting the value – finally ending up with our result set. The cost of this query? 123.910000.

count 1

The query results: 31,263,601 rows.

count 2

Now, let’s look at the behavior of COUNT(1).

FROM dbo.bigTransactionHistory;

We can see from STATISTICS IO that we have a large number of logical reads – over 100,000.

(1 row(s) affected)
Table 'bigTransactionHistory'. Scan count 5, logical reads 132531, 
physical reads 3, read-ahead reads 131849, lob logical reads 0, 
lob physical reads 0, lob read-ahead reads 0.

The execution plan again shows an index scan returning over 31 million rows for processing. The query cost is the same, 123.910000.

count 3

The results here are the same – 31,263,601 rows.

count 4

The benefit of using COUNT is that it is an accurate indicator of exactly how many rows exist in the table at the time query processing begins. However, as the table is scanned, locks are being held. This means that other queries that need to access this table have to wait in line. This might be acceptable on an occasional basis, but I frequently see applications issuing these types of queries hundreds or thousands of times per minute.

sys.tables + sys.indexes + sys.partitions

We can join several SQL Server catalog views to count the rows in a table or index, also. sys.tables will return objects that are user-defined tables; sys.indexes returns a row for each index of the table; and sys.partitions returns a row for each partition in the table or index. I am going to query for the table ID, name, and count of rows in all partitions.

DECLARE @TableName sysname
SET @TableName = 'bigTransactionHistory'

SELECT TBL.object_id,, SUM(PART.rows) AS rows
FROM sys.tables TBL
INNER JOIN sys.partitions PART ON TBL.object_id = PART.object_id
INNER JOIN sys.indexes IDX ON PART.object_id = IDX.object_id
AND PART.index_id = IDX.index_id
WHERE = @TableName
AND IDX.index_id < 2
GROUP BY TBL.object_id,;

The output of STATISTICS IO here shows far fewer reads – 15 logical reads total.

(1 row(s) affected)
Table 'syssingleobjrefs'. Scan count 1, logical reads 2, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysidxstats'. Scan count 2, logical reads 4, physical reads 
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 1, logical reads 5, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 4, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The execution plan is more complex, but much less work – the query cost here is 0.0341384.

count 5

The results of the query are also the same – 31,263,301.

count 6

The benefits of using this method are that the query is much more efficient, and it doesn’t lock the table you need the count of rows for.

However, you need to be cautious when counting the rows on a table that is frequently inserted into or deleted from. The TechNet documentation for sys.partitions.rows says it indicates the “approximate number of rows for this partition”. How approximate? That information isn’t documented. Understand, though, that if you use this method, you potentially sacrifice up-to-the-moment accuracy for performance.


A third option is to use the dynamic management view sys.dm_db_partition_stats. This returns one row per partition for an index.

DECLARE @TableName sysname
SET @TableName = 'bigTransactionHistory'

SELECT OBJECT_NAME(object_id), SUM(row_count) AS rows
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID(@TableName)
AND index_id < 2

The STATISTICS IO output of this query is even lower – this time, only two logical reads are performed.

(1 row(s) affected)
Table 'sysidxstats'. Scan count 1, logical reads 2, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The execution plan is less complex than our second example involving the three system views. This query also has a lower cost – 0.0146517.

count 7

The query results are the same as the previous examples – 31,263,301 rows.

count 8

Using this DMV has the same benefits as the system views – fewer logical reads and no locking of the target table. The query is also simpler to write, involving only one object.

But again, the TechNet documentation for sys.dm_db_partition_stats.row_count says it is “the approximate number of rows in the partition”, and when this information is updated is not documented. Here, you are also potentially sacrificing accuracy for performance.

Time to do some digging

The questions that you need to work with the business to answer are, “How up-to-date must the row count be? What is the business purpose? How often do you insert into or delete from that table, and how often do you count the rows?” If the accuracy of the row count is crucial, work to reduce the amount of updates done to the table. If performance is more important, and the row count could be approximate, use one of the system views.

Wanna learn more tricks?

Check out our T-SQL Level Up online class – we guarantee it’s the best T-SQL training trailer you’ve ever seen:

Learn more about our T-SQL course now.

Why Your Slow SQL Server Doesn’t Need a SQL Consultant (or Does It?)

Your SQL Server is slow – but should you call us in? Jeremiah and Brent had a throw-down the other day, and we figured we’d capture it here on the blog.

Brent Says You Do, and Here’s Why

5. You’ve been blindly throwing money at the problem without results. You’re on a first name basis with your local SAN salesperson. Your server racks are glistening with blue and green lights. But somehow, when users hit your app, they’re not as happy as your sysadmins – or your SAN salesperson. Before you burn another stack of Benjamins, it’s time to get an outside opinion.


4. You’re doing something for the first time. Sure, you’re pretty good at Googling your way out of trouble, but the company’s starting to make riskier and riskier gambles on data storage. Maybe you should talk to someone who’s faced this problem before.

3. You’re relying on the manual. I love Microsoft as much as the next guy – especially now that they brought out a new ergonomic keyboard – but Books Online doesn’t tell the whole truth. When Microsoft unveils a new feature, they talk about all the positives, but they don’t always disclose the drawbacks. Get a reality check before you bet the farm on PowerFilePivotProRT, and hear what our other clients are doing to accomplish the same goal.

2. You need answers faster. We get together on a Monday, and by end of day Wednesday, you’ve got a prioritized action plan showing you how to make the pain go away by the end of the week. You get the knowledge and confidence to keep going without expensive long-term consultants. You’re really close – you just need our 3-day SQL Critical Care® to unlock the tips and tricks to make it work.

1. Developers can get back to adding features. Your real business isn’t SQL Server administration – it’s adding features to your app to make your end user happier. Bring us in, get the answers, and get back to work.

Jeremiah Says You Don’t, and Here’s Why

5. You’re probably blindly throwing money at the problem without results. Unless a consultant is willing to provide a list of happy customers, there’s no way to verify that they know something. Heck, even if they do provide a list of happy customers, you have no way of knowing that Luanne in IT management isn’t really someone’s Aunt Mavis.


4. Best practices aren’t universal. Every situation is different and the most complicated scenarios require a deep understanding of business goals, features, and SLAs. Consultants can help you understand best practices, but you’re the only person who knows what’s right in your environment. If you’re doing something for the first time and your Google-fu is running out, you can’t expect much help from outside.

3. Peer pressure shouldn’t change your decisions. We jokingly call this “design by Hacker News”. Just because a startup, our clients, or your next door neighbor are doing something, that doesn’t mean it’s the right fit for you. For many application feature decisions, it’s easy to build two or three prototype solutions and decide on the best one.

2. You need performance yesterday. Rather than wait a few weeks for a reputable consultant to show up, have you considered buying memory? If you’re on SQL Server Standard Edition and you have less than 64GB of memory, just buy more RAM. If you’re on EE and you have less RAM than you have data, why not max out that server with 16GB DIMMs; they’re cheap and you can solve most code sins with memory. Heck, we even recommend buying memory as one of the first ways to solve problems quickly.

1. Developers: understand your features. While developers should be adding features, they also need to understand the consequences of those features. Some functionality that’s possible in SQL Server requires an understanding of how to write queries to take advantage of those features – filtered indexes, indexed views, and ColumnStore indexes immediately spring to mind. The best way to understand a feature is to get in the database, make it work, and then make it work fast.

Brent’s Upcoming User Group Presentations

In a world where servers rule their administrators, one man will change your life. It just won't be this man.

In a world where servers rule their administrators, one man will change your life. It just won’t be this man.

Coming soon to a user group near you, it’s…me.

March 5 – South Florida SQL Server User Group (Remotely)

The Best Free SQL Server Downloads – You’re trying to manage SQL Server databases, and every time you Google for something, you get overwhelmed with all kinds of free tools, white papers, blogs, and newsletters. There’s so many that suck, and you’re tired of wasting time on bad ones. Microsoft Certified Master Brent Ozar has been using SQL Server for over a decade, and he’s built a killer list of the very best freebies. He’ll show you where to get the good stuff and how to use it.

March 12 – Philadelphia SQL Server User Group

How to Think Like the SQL Server Engine
You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server. In this 90-minute session with Microsoft Certified Master Brent Ozar, you’ll role play as the database engine while Brent gives you queries. You’ll learn first-hand about clustered indexes, nonclustered indexes, execution plans, sargability, statistics, TempDB spills, and T-SQL anti-patterns. Register here.

March 13 – PASS DC (Chevy Chase) User Group

How to Think Like the SQL Server Engine – You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server. In this 90-minute session with Microsoft Certified Master Brent Ozar, you’ll role play as the database engine while Brent gives you queries. You’ll learn first-hand about clustered indexes, nonclustered indexes, execution plans, sargability, statistics, TempDB spills, and T-SQL anti-patterns. Register now.

March 14 – Richmond SQL Server User Group

How to Think Like the SQL Server Engine – You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server. In this 90-minute session with Microsoft Certified Master Brent Ozar, you’ll role play as the database engine while Brent gives you queries. You’ll learn first-hand about clustered indexes, nonclustered indexes, execution plans, sargability, statistics, TempDB spills, and T-SQL anti-patterns. Register now.

April 11-12 – SQLSaturday #267 Lisbon, Portugal

Brent is flying across the pond to go visit one of the best-regarded SQLSaturdays in Europe. Join Brent for a one-day Friday pre-con on virtualization, storage, and hardware for just 100€. Register now for the pre-con, and then register for the SQLSaturday too.

April 26 – SQLSaturday Chicago

The sessions haven’t been picked yet, but Brent’s fingers are crossed that he’ll be one of the chosen ones. You can register now.

PREEMPTIVE_OS_WRITEFILEGATHER and Instant File Initialization (IFI)

The other day I was running a script I’d written to expand out some tables in the AdventureWorks sample database for some testing. The script was taking a long time, and at first I thought I was just spoiled because I usually run my tests on my laptop, whose storage is backed by SSDs. In this case I was running my test against a SQL Server instance up in AWS.

After a delicious fifteen minute coffee break, I realized something was wrong– it STILL wasn’t done. I grabbed sp_BlitzFirst® to do a quick check and see if my storage was really THAT slow.

I ran sp_BlitzFirst® with a 60 second sample, like this:

exec sp_BlitzFirst @seconds=60, @ExpertMode=1;

Here’s what I saw (click for a larger version):

Lack of Instant File Initialization


In a 60 second sample, I’d been waiting for PREEMPTIVE_OS_WRITEFILEGATHER for 36 seconds. Whoops!

The script I was running was making tables larger. The database was out of room, and was growing the data file in small autogrowth increments. Each of these automatic growths was stopping to “zero out” the files, because I hadn’t enabled Instant File Initialization in SQL Server.

How I enabled Instant File Initialization

My test instance is set up a bit uniquely– it uses UNC paths for the SQL Server databases. This is supported in SQL Server 2012, and I was testing it out in a Windows Failover Cluster. (If you use shares for files, you need to be very careful about redundancy and availability of course, but that’s a topic for another post.)

I’d forgotten to enable Instant File Initialization for the SQL Server Service on the server that hosted the file shares! Wherever Windows will actually grow the files, the SQL Server Service account needs the “Perform Volume Maintenance” tasks right.

Using the Local Security Policy to Grant “Perform Volume Maintenance Tasks”

Here’s how to grant the rights:

First, open the local security. I usually do this by running the command secpol.msc, but you can also find this in the “Administrative Tools” menu.

Opening the Local Security Policy

Then navigate to Local Policy -> User Right Assignment -> Perform Volume Maintenance Tasks.

Perform Volume Maintenance Tasks

Add the SQL Server Service account. The change will take effect the next time you restart SQL Server.

Grant IFI to the SQL Server Service Account

So, did it improve my performance?

Heck yeah, it did! On the first run I got tired of waiting and killed my expansion script after 20 minutes and just dealt with the rollback. After enabling Instant File Initialization on fileserver that holds the shares where my database files live, the expansion script finished in four minutes and 48 seconds. Shazam!