Getting developers production data stinks
DBAs are stuck in this annoying cycle where they need to give developers production, or production-like data, but… Production data can be huge. Hundreds of gigabytes if you’re lucky, several terabytes if you’re not. Then once it gets there, you have other considerations. Either you lock down Dev, or you come up with a process to scramble data. This poses another set of problems because once data is scrambled, you lose some of the nuances of your actual data. Pattern matching scrambled data is useless and I hate it.
One alternative has been to script out a statistics only copy of your database. Which is cool, but kind of clunky. Microsoft has sought to ease that a bit with… yet another DBCC command.
That doesn’t solve any of those problems.
There’s gonna be a lot of HOOP-la!
SQL 2014 SP2 introduces DBCC CLONEDATABASE! Let’s get this straight. it’s not really meant to solve those problems. It’s meant to give you a minimal copy of your data to reproduce and diagnose performance issues with, without adding further load to your production server. But it suffers from the same problem that any statistics only copy of your database will. Let’s go through some of the issues.
Get your clone on
The syntax is really simple. Command. Database name. Clone database name.
DBCC CLONEDATABASE(StackOverflow, StackClone)
This will produce a Read-Only ‘copy’ of your database on the same server as the database you cloned.
Here’s where the problems begin!
Production values are still visible
If you run DBCC SHOW_STATISTICS, or go through the GUI, you can see the steps in the histogram. For my StackClone database, the command looks about like this.
DBCC SHOW_STATISTICS(Users, _WA_Sys_00000005_0519C6AF)
And there’s a bunch of names! Not all the names, but enough to be dangerous. I’m only showing the top 10 because a picture of the whole thing would be tacky and useless, like a condom in your wallet.
TIME and IO wait for no man
And they certainly don’t register anything with only statistics to reference.
SET STATISTICS TIME, IO ON SET NOCOUNT ON SELECT u.Id, u.DisplayName FROM dbo.Users AS u WHERE u.DisplayName = 'Brent Ozar'
On real data, we get back this:
Table 'Users'. Scan count 1, logical reads 52412, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 593 ms, elapsed time = 594 ms.
On the clone, we get back this:
Table 'Users'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
This immediately narrows the range of issues you can troubleshoot down. If you want to do any sort of meaningful performance tuning, you’re already short a couple of the best indicators that you’re on the right track.
Read-Only Means Read-Only
When your database springs to life, like what probably happened to Lazarus, it’s all grey and sticky.
Okay, maybe not Lazarus. Maybe that dude Tooms from X-Files that eats livers and takes wicked long naps. The down side here is that new queries won’t auto-create statistics. So if you run a query, and you don’t already have statistics for a column, you just get a warning.
I mean, yeah, you can make it Read-Write to get around this, but you should probably move it off to another server before doing this. I’m guessing you don’t want a database sitting in prod that’s virtually indistinguishable from your other databases.
Updating statistics and Rebuilding indexes ruins everything
The other downside of making it Read-Write, is that if a maintenance task comes along and checks the data, it wipes out your existing statistics. This is also true of index Rebuilds, which if I haven’t beaten it into you already, updates statistics with a full scan. And hey, forget about adding new indexes. SQL is hip to your empty table game. You’ll have to add them in Prod, and then re-clone your database, or script that single statistic out and add it to your database clone. Sound like fun? Yeah? You’re a real sick puppy.
Questions and answers
This is pretty handy for a few different scenarios. Just be aware of the limitations and risks. If you’re out there reading this, and you’ve got a good process in place for scrubbing production data, leave a comment. I’m sure lots of other people out there could use the advice.
Thanks for reading!
Brent says: DBCC CLONEDATABASE is just one of many cool things in SQL 2014 SP2. There’s new memory grant columns in sys.dm_exec_query_stats, tons more stuff in execution plans, and even faster performance on partitioned tables. The dev teams are pouring a lot of effort into backporting awesome improvements into 2014. I’ve never been so excited about Microsoft’s commitment to SQL Server.