T-SQL & Development

Poll Results: Yep, Developers are Using Unmasked Production Data.

By Brent Ozar · July 5, 2019 · 14 comments

Earlier this week, I asked where your developers are getting their development databases from. The poll results are in:

9% of development happens on the production database server
57% are copying the production data to another server, and using it as-is
31% are copying the production data, but then masking/scrambling private data before developers get access to it
25% are using made-up data for development

The totals add up to over 100% because y’all were allowed to pick multiple responses when development was happening in more than way in your organization.

To put it another way, 2/3 of the time, developers are seeing the production data as-is.

You left comments, too.

And I got a chuckle out of a lot of these, so I’m putting in my favorites verbatim. Based on the comments, you can probably guess what their answer was:

A Dev environment would be nice
Also, it’s on their own workstation.
and it pains me so.
And then we have problems with the test DB being tiny and developers missing needed indexes.
Application Development is a copy of production. Reporting teams develop directly in production environment.
But the data is of poor quality!
data is generally not copied from production database, but is loaded via etl from production sources
Depends on what i am doing, if it’s major then i will work on a copy of prod on my dev server but most of the time it’s just straight prod. I will caveat this by saying i’m a one man band here, doing dev and dba in a small company.
dev databases use to be replicas which caused devs to come up with stupid work arounds to deal with the possible overwrites and sometimes promote code to production that queried both dev and prod – or sometimes write dev code to production to replicate it down. am forcing them to use stale data now that is only updated ad hoc after a request, and the response to requests to get the data updated is usually “no.”
Developers are also Tier 2 & 3 support.
Development databases that import data from 3rd party databases, which in turn may or may not have sensitive data deleted/masked.
Gonna get better, soon. Boy, the users will hate it. Screw ’em. Growth hurts.
hooooooooooooooooooooooooooooboy
How do we remove the data but keep the data?
How does one get the devs to work in test after they’re all working directly in prod? I just started here…
I don’t always develop, but when I do, it’s in production. 😉
I try to keep my working development data sounding at least vaguely realistic, Pete’s Pretzels, Cyndi’s Cinnamon, Bill’s Barley, etc. so I can tell from names what relationships there are between different data items as I see them, and if a client drops by, it won’t be terribly embarrassing, and they won’t be confused by “Company 1” and “Company Xyz” that makes “Product 1” and “Product Test 3” and so forth.This way, too, I can be sure that no live data accidentally gets into my local database, and if I have to set the flags on it to pretend to be “production”, for testing, emails will never inadvertently sent to real clients.The downside is I can never generate more than a tiny fraction of the volume of data we have in production, so troubleshooting performance issues requires working in production.
It depends! if its a brand new app then no live data exists. if its to fix a bug related to specific data then a copy of prod – that may or may not get deleted
It is a copy of production data but we do not store sensitive data in our production database.
It’s a copy of Production from before the days when most of today’s functionality existed, therefore it has a lot of made up data and test orders created by developers and QA personnel.
it’s a pain in the neck having to generate so much synthetic dev data (c;
Its where the best data is 😉
Masked prod data is useful but too large for Dev env. Prod level data is tested at the prod support deploy stage
moving to dev db with make up contents soon
No sensitive data in production database
Not refreshed on a schedule so very out of date.
Oh – how we laughed!?!?!?!
Our client’s production data is free from PII, so that is not a major concern. We do try to use updated copies of production data whenever possible. In our case it doesn’t need to be up-to-the minute accurate data – it can be many months old and still meet our development and testing needs.
Production = a database per customer so there is no copying a single production database
Production copies with masked data are only used when needed to debug complex issues which are data dependent and unable to be reproduced on our standard development data.
Sensitive data remains unchanged, and devs are given full read access. Go figure.
Small development team; no sensitive data
small team — devs == operational staff
so much prod data in violation of internal security policy as well as good practices.
Some sensitive data is removed but not all.
Sometimes a combination, depending on how much the developer is trusted
Sometimes some of the above, but mostly a dev database on a dev server, but with data imported as per prod import process. Sometimes obfuscated, depending on data and client.
Switch between full prod backup for debugging and blank with only reference data for unit/integration testing
“The “made-up” contents may mimic the patterns we see in production (e.g. some records may have only minimum fields populated, the description fields may have really large HTML comments copied from a webpage or an E-mail, etc), but are fully synthesized. Our legacy applications used restored copies of production data, but anything new that we have developed in the last 5-6yrs use synthesized data.
The data needs to match across multiple data sets
The database is updated once a year, developers will add their own test data as well.
the Dev/UAT/SIT databases can’t be rolled back at all due to interaction with other servers!!
The production database, in 2–3 different schemas that mirror the production tables structure. Don’t ask!
There’s almost no sensitive data in the production database
They actually all work on a local instance. We are trying to change that to a centralized development database. There was no data team here before!
Trying to go to a model where Dev is Prod with sensitive data removed.
usually sensitive data is not stored in my databases and does not make it into the SQL production databases.
we are trying to move to masking sensitive data
We do sometimes run production data through the development database. Purging it when testing is done.
We have a single database structure used by 150+ different companies with very different data profiles, so using a copy of live data is a must in some circumstances. If we just had made up test data, we could easily miss potential problems. We only have 4 developers in the company, one of whom is the boss and 2 of us do db development/dba work as well as code development, so there is a fair bit of cross over of job roles.
We have Dev + QA. So they work in Dev which is “A development database with made-up contents, not restored from production” and test in QA “A development database copied from production”. Right at this moment i am working on a FULL refresh of the QA system from production. All databases with production data.
We have development databases on a “production” server – it mostly just houses non-client (internal use) databases. Increasingly, our developers are using Docker with a database with fake contents in a Docker image.
We have multiple “development” environments for the various phases of development (QA, UAT, Staging) and the later environments are copies of production with sensitive data removed.
We really do not contain sensitive data in our SQL Server environment. However if that changes we would mask the sensitive data.
We require financial info to be tied back to it’s respective client, employees and other reporting systems. Masked data would unfortunately be nonsense for our testers.
We want to go to the made-up content, but the developers don’t take the time to define the test case data.
You skipped, “development database on prod server”

These are all good comments about hard problems.

I wish there was an easy, free, quick answer to solve these problems.

There isn’t one: the answers involve process changes and extra labor. I get the feeling this situation is going to continue for a decade or more. I salute those of you who are fighting the good fight to keep data safe from breaches and stolen backups. It ain’t easy.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

14 comments

Teun van den Biggelaar

July 5, 2019 at 5:09 am

Hi Brent,

In an ideal world we would all scramble the data for our developers. If we would be forced to do this, is there a tool(s) you can really recommend ?

Reply
1. Knotty
  
  July 5, 2019 at 5:37 am
  
  Custom scripts would be the best solution.If you want to generate data try using ApexSQL generate or Reggate tools…
  
  Reply
2. Emanuele Meazzo Student since 2017
  
  July 5, 2019 at 6:15 am
  
  Check out dbatool’s Static Data Masking cmdlets: https://dbatools.io/mask/
  
  A dynamic data masking configuration with a VERY tight security control could work too for the short term, while static data masking is implemented.
  
  Reply
3. Brent Ozar
  
  July 5, 2019 at 6:20 am
  
  Teun – I haven’t seen one that’s easy to implement while avoiding these problems:
  
  https://www.brentozar.com/archive/2011/09/how-do-you-mask-data/
  
  That’s why I’d much rather build development data from scratch. I know it’s hard as hell, but…that’s the right long term solution.
  
  Reply
Frédéric Hébert

July 5, 2019 at 11:04 am

#Like

Reply
Dave Murray

July 8, 2019 at 2:47 am

My fellow devs and I would love a proper masked database with personal info removed but our Data Protection Officer claims we don’t need it. We’re in the EU btw and she also thinks we can keep information on employees and former employees forever. We have details of people who worked here 30 years ago in databases that are open to everyone in the organisation! Even when I pointed out to her that this is required under GDPR and companies have spent a lot of money adding masking features to products like SQL Server she still disagrees.

Reply
1. Richard
  
  July 8, 2019 at 5:04 am
  
  Send her an email, copying-in senior management, stating that GDPR rules, if broken, mean MASSIVE, MASSIVE, MASSIVE fines. Include this link: https://gdpr.eu/fines (and draw attention to “The less severe infringements could result in a fine of up to €10 million, or 2% of the firm’s worldwide annual revenue from the preceding financial year, whichever amount is higher.”).
  
  State clearly that you accept zero responsibility for any future data breach unless GDPR rules are followed, and that it will be she & senior management who will be in the smelly stuff if a breach occurs.
  
  Send a copy to your private email address.
  
  And sleep well.
  
  Reply
  1. Richard
    
    July 9, 2019 at 2:09 pm
    
    Oh look.
    https://www.theguardian.com/business/2019/jul/09/marriott-fined-over-gdpr-breach-ico
    
    Reply
    1. Brent Ozar
      
      July 9, 2019 at 2:14 pm
      
      I might have missed something obvious, but how is that connected to dev/test data? Sounded like their production systems were hacked.
      
      Reply
      1. Richard
        
        July 9, 2019 at 11:35 pm
        
        It was to illustrate just how heavily the EU can fine a company for GDPR-related snafus. Dave (2 posts up) has had difficulty getting a Data Protection Officer to take GDPR seriously.
        
        One wonders if that hotel chain had its own Dave, who was repeatedly poo-pooed … right up until the inevitable happened.
      2. Brent Ozar
        
        July 10, 2019 at 4:27 am
        
        Gotcha – for that, a better recommended link is the overall list of enforcements: http://www.enforcementtracker.com/
Alen T

July 10, 2019 at 9:13 am

So say you scramble your DEV/QA databases. What do you do when a query runs perfectly in DEV runs slow in PROD cause the execution plan is different because of the data distribution differences. I’ve already dealt with this because it’s impossible to run every process in DEV as in production, but with masking data you might be causing problems later on.

Reply
Art

December 10, 2019 at 11:08 am

Total fins are nearly 55mil Euros https://finestracker.com/

Reply
What's In Your Development Database? The Answer: Production Data. - Brent Ozar Unlimited®

October 23, 2024 at 1:15 pm

[…] It was the same story about 5 years ago when I asked the same question, and back then, about 2/3 of the time, developers were using production data as-is: […]

Reply

Poll Results: Yep, Developers are Using Unmasked Production Data.

You left comments, too.

These are all good comments about hard problems.

Get my new posts by email

Keep digging

14 comments

Leave a comment Cancel reply