Getting developers production data stinks
DBAs are stuck in this annoying cycle where they need to give developers production, or production-like data, but… Production data can be huge. Hundreds of gigabytes if you’re lucky, several terabytes if you’re not. Then once it gets there, you have other considerations. Either you lock down Dev, or you come up with a process to scramble data. This poses another set of problems because once data is scrambled, you lose some of the nuances of your actual data. Pattern matching scrambled data is useless and I hate it.
One alternative has been to script out a statistics only copy of your database. Which is cool, but kind of clunky. Microsoft has sought to ease that a bit with… yet another DBCC command.
That doesn’t solve any of those problems.
There’s gonna be a lot of HOOP-la!
SQL 2014 SP2 introduces DBCC CLONEDATABASE! Let’s get this straight. it’s not really meant to solve those problems. It’s meant to give you a minimal copy of your data to reproduce and diagnose performance issues with, without adding further load to your production server. But it suffers from the same problem that any statistics only copy of your database will. Let’s go through some of the issues.
Get your clone on
The syntax is really simple. Command. Database name. Clone database name.
1 |
DBCC CLONEDATABASE(StackOverflow, StackClone) |
This will produce a Read-Only ‘copy’ of your database on the same server as the database you cloned.
Here’s where the problems begin!
Production values are still visible
If you run DBCC SHOW_STATISTICS, or go through the GUI, you can see the steps in the histogram. For my StackClone database, the command looks about like this.
1 |
DBCC SHOW_STATISTICS(Users, _WA_Sys_00000005_0519C6AF) |
And there’s a bunch of names! Not all the names, but enough to be dangerous. I’m only showing the top 10 because a picture of the whole thing would be tacky and useless, like a condom in your wallet.
TIME and IO wait for no man
And they certainly don’t register anything with only statistics to reference.
1 2 3 4 5 6 |
SET STATISTICS TIME, IO ON SET NOCOUNT ON SELECT u.Id, u.DisplayName FROM dbo.Users AS u WHERE u.DisplayName = 'Brent Ozar' |
On real data, we get back this:
1 2 3 4 |
Table 'Users'. Scan count 1, logical reads 52412, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 593 ms, elapsed time = 594 ms. |
On the clone, we get back this:
1 2 3 4 |
Table 'Users'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. |
This immediately narrows the range of issues you can troubleshoot down. If you want to do any sort of meaningful performance tuning, you’re already short a couple of the best indicators that you’re on the right track.
Read-Only Means Read-Only
When your database springs to life, like what probably happened to Lazarus, it’s all grey and sticky.
Okay, maybe not Lazarus. Maybe that dude Tooms from X-Files that eats livers and takes wicked long naps. The down side here is that new queries won’t auto-create statistics. So if you run a query, and you don’t already have statistics for a column, you just get a warning.
I mean, yeah, you can make it Read-Write to get around this, but you should probably move it off to another server before doing this. I’m guessing you don’t want a database sitting in prod that’s virtually indistinguishable from your other databases.
Updating statistics and Rebuilding indexes ruins everything
The other downside of making it Read-Write, is that if a maintenance task comes along and checks the data, it wipes out your existing statistics. This is also true of index Rebuilds, which if I haven’t beaten it into you already, updates statistics with a full scan. And hey, forget about adding new indexes. SQL is hip to your empty table game. You’ll have to add them in Prod, and then re-clone your database, or script that single statistic out and add it to your database clone. Sound like fun? Yeah? You’re a real sick puppy.
Questions and answers
This is pretty handy for a few different scenarios. Just be aware of the limitations and risks. If you’re out there reading this, and you’ve got a good process in place for scrubbing production data, leave a comment. I’m sure lots of other people out there could use the advice.
Thanks for reading!
Brent says: DBCC CLONEDATABASE is just one of many cool things in SQL 2014 SP2. There’s new memory grant columns in sys.dm_exec_query_stats, tons more stuff in execution plans, and even faster performance on partitioned tables. The dev teams are pouring a lot of effort into backporting awesome improvements into 2014. I’ve never been so excited about Microsoft’s commitment to SQL Server.
25 Comments. Leave new
Very Useful Information.Nice article!
wait just a darned second…is it april 1st?
my reaction on reading the title of this article. this is pretty cool, though. pretty dang cool.
on the doc page it doesn’t list constraints–surely those come along for the ride? constraints being a child-object of tables/views?
Well, Primary Keys followed. Can’t vouch for anything else since my copy of SO is constraint-free. I’ll follow up though.
confirmed that foreign keys follow as well.
Great feature, any idea if this will make its way to SQL 2016 as well (soon) ?
Would be silly not to.
I’d imagine so, but I don’t know for sure. I’d keep an eye on release notes for 2016 CUs.
Hi everyone,
just for information: tried it on Microsoft SQL Server 2016 (RTM) – 13.0.1601.5 (X64)
Developer Edition (64-bit) , got for
DBCC CLONEDATABASE(Test, TestClone)
Msg 2526, Level 16, State 3, Line 1
Incorrect DBCC statement. Check the documentation for the correct DBCC syntax and options.
Hi Eric,
I am getting below error while executing CLONEDATABASE query on my local SO db. What could be the issue?
Database cloning for ‘StackOverflow’ has started with target as ‘StackClone’.
Msg 2601, Level 14, State 1, Line 1
Cannot insert duplicate key row in object ‘sys.sysowners’ with unique index ‘nc1’. The duplicate key value is (Neon).
You are not alone.
https://connect.microsoft.com/SQLServer/feedback/details/2926086/dbcc-databaseclone-fails-on-sys-sysowners
Fix released: https://support.microsoft.com/en-ca/help/4016238/fix-dbcc-clonedatabase-is-unsuccessful-if-the-source-database-has-an-o
. I’m only showing the top 10 because a picture of the whole thing would be tacky and useless, like a condom in your wallet.”
+1 for that alone
Thought this was gonna be an awesome new tool.
I really wanted to use this to refresh our QA environment. Remove all data refill from production.
Got pretty far, just can’t reindex the indexed views.
Steps:
Save all the indexes
Clone the DB
Backup the cloned DB
Restore the cloned DB over the original. (So that the data and log file names stay the same.)
Resize the data and log files to what they were before.
Rebuild any missing indexes. (This is where things came to a stop!)
Drop the cloned DB
Fill the new QA DB with selected data from production.
Where’s that DacPac script of mine…..?
Hi Erik,
Clone can not be used for database corruption detection.
I have restored a copy of corrupt database (Page Corruption) and create clone of it. When run dbcc checkdb it came out clean.
Strange..!!!!
Nisarg – correct, the clone only scripts things out, and it doesn’t include the data. If you want to check the data, you have to run CHECKDB on it.
Hi Brent,
I am using clones for my developers to run pre-deployment schema change script testing. So no excuses, all code is tested before deployment to development databases. I also have a copy on QA environments for me to test the code push before I ruin QA (not that I do not trust my developer friends;)
I forgot and left a clone on production and Mr. Hallengren’s maintenance plan failed dbcc with error 8992. I am guessing this would be normal for a clone as this error refers to meta data misalignment with objects and there is no data in a clone. Is this something anyone has run into during testing?
Btw Igor, you need to put the database names in quotes e.g.
DBCC CLONEDATABASE (N’AdventureWorks2014′, N’AdventureWorks2014_CLONE’);
“If a snapshot cannot be created, DBCC CLONEDATABASE will fail. ”
I think this will rule out databases with read-only filegroups – from what I know you cannot snapshot a database with read-only filegroups. I have seen this before because CheckDB takes a snapshot for schema stability – but if read-only filegroups exist it cannot take a snapshot and instead utilises table locks. If I get chance I’ll spin up a test machine and try it out but don’t expect that any time soon 🙂
Thanks for the Article.
We have been using database cloning to refresh non-prod server (restore+clone+drop original DB) for some databases (logging DBs/sensitive info).
We have seen one issue/bug. Cloned database somehow changes EXECUTE AS OWNER clause of SPs to “CALLER”. And the weird thing is SP definition shows EXECUTE AS OWNER but SP properties shows EXECUTE AS CALLER.
I know it is very bad to have execute as owner (which now I cant do anything about).
But this behavior is not documented any where. Do you have any insight on this?
It is SQL 2016 one
Redeploying SPs fix this. But not sure about the root cause.
It appears that the “Cannot insert a duplicate key row in object ‘sys.sysowners’ with unique index ‘nc1’. The duplicate key value is” issue is still happening with SQL 2017 Dev Edition. I have applied all CU’s and the issue still remains.
I too am having that same error, SQL 2017 Enterprise Edition CU13.
For me at least, I have SCOM monitoring, and the error was on the NT SERVICE\HealthService. This user has permissions to all databases for alerting purposes. I removed permissions for this user from the database and the clone worked successfully and without error. I then just added the permissions for SCOM back onto the database. If this is production and its a permissions issue with something more important than monitoring, it may be more difficult, otherwise just remove the user temporarily and then add it back after the clone to see if that works for you.
Do you know how to solve the wrong values that left from the source database at sys.partitions ?
For general questions, head to a QA site like DBA.stackexchange.com.