Poll Results: Yep, Developers are Using Unmasked Production Data.

Earlier this week, I asked where your developers are getting their development databases from. The poll results are in:

  • 9% of development happens on the production database server
  • 57% are copying the production data to another server, and using it as-is
  • 31% are copying the production data, but then masking/scrambling private data before developers get access to it
  • 25% are using made-up data for development

The totals add up to over 100% because y’all were allowed to pick multiple responses when development was happening in more than way in your organization.

To put it another way, 2/3 of the time, developers are seeing the production data as-is.

You left comments, too.

And I got a chuckle out of a lot of these, so I’m putting in my favorites verbatim. Based on the comments, you can probably guess what their answer was:

  • A Dev environment would be nice
  • Also, it’s on their own workstation.
  • and it pains me so.
  • And then we have problems with the test DB being tiny and developers missing needed indexes.
  • Application Development is a copy of production. Reporting teams develop directly in production environment.
  • But the data is of poor quality!
  • data is generally not copied from production database, but is loaded via etl from production sources
  • Depends on what i am doing, if it’s major then i will work on a copy of prod on my dev server but most of the time it’s just straight prod. I will caveat this by saying i’m a one man band here, doing dev and dba in a small company.
  • dev databases use to be replicas which caused devs to come up with stupid work arounds to deal with the possible overwrites and sometimes promote code to production that queried both dev and prod – or sometimes write dev code to production to replicate it down. am forcing them to use stale data now that is only updated ad hoc after a request, and the response to requests to get the data updated is usually “no.”
  • Developers are also Tier 2 & 3 support.
  • Development databases that import data from 3rd party databases, which in turn may or may not have sensitive data deleted/masked.
  • Gonna get better, soon. Boy, the users will hate it. Screw ’em. Growth hurts.
  • hooooooooooooooooooooooooooooboy
  • How do we remove the data but keep the data?
  • How does one get the devs to work in test after they’re all working directly in prod? I just started here…
  • I don’t always develop, but when I do, it’s in production. 😉
  • I try to keep my working development data sounding at least vaguely realistic, Pete’s Pretzels, Cyndi’s Cinnamon, Bill’s Barley, etc. so I can tell from names what relationships there are between different data items as I see them, and if a client drops by, it won’t be terribly embarrassing, and they won’t be confused by “Company 1” and “Company Xyz” that makes “Product 1” and “Product Test 3” and so forth.This way, too, I can be sure that no live data accidentally gets into my local database, and if I have to set the flags on it to pretend to be “production”, for testing, emails will never inadvertently sent to real clients.The downside is I can never generate more than a tiny fraction of the volume of data we have in production, so troubleshooting performance issues requires working in production.
  • It depends! if its a brand new app then no live data exists. if its to fix a bug related to specific data then a copy of prod – that may or may not get deleted
  • It is a copy of production data but we do not store sensitive data in our production database.
  • It’s a copy of Production from before the days when most of today’s functionality existed, therefore it has a lot of made up data and test orders created by developers and QA personnel.
  • it’s a pain in the neck having to generate so much synthetic dev data (c;
  • Its where the best data is 😉
  • Masked prod data is useful but too large for Dev env. Prod level data is tested at the prod support deploy stage
  • moving to dev db with make up contents soon
  • No sensitive data in production database
  • Not refreshed on a schedule so very out of date.
  • Oh – how we laughed!?!?!?!
  • Our client’s production data is free from PII, so that is not a major concern. We do try to use updated copies of production data whenever possible. In our case it doesn’t need to be up-to-the minute accurate data – it can be many months old and still meet our development and testing needs.
  • Production = a database per customer so there is no copying a single production database
  • Production copies with masked data are only used when needed to debug complex issues which are data dependent and unable to be reproduced on our standard development data.
  • Sensitive data remains unchanged, and devs are given full read access. Go figure.
  • Small development team; no sensitive data
  • small team — devs == operational staff
  • so much prod data in violation of internal security policy as well as good practices.
  • Some sensitive data is removed but not all.
  • Sometimes a combination, depending on how much the developer is trusted
  • Sometimes some of the above, but mostly a dev database on a dev server, but with data imported as per prod import process. Sometimes obfuscated, depending on data and client.
  • Switch between full prod backup for debugging and blank with only reference data for unit/integration testing
  • “The “made-up” contents may mimic the patterns we see in production (e.g. some records may have only minimum fields populated, the description fields may have really large HTML comments copied from a webpage or an E-mail, etc), but are fully synthesized. Our legacy applications used restored copies of production data, but anything new that we have developed in the last 5-6yrs use synthesized data.
  • The data needs to match across multiple data sets
  • The database is updated once a year, developers will add their own test data as well.
  • the Dev/UAT/SIT databases can’t be rolled back at all due to interaction with other servers!!
  • The production database, in 2–3 different schemas that mirror the production tables structure. Don’t ask!
  • There’s almost no sensitive data in the production database
  • They actually all work on a local instance. We are trying to change that to a centralized development database. There was no data team here before!
  • Trying to go to a model where Dev is Prod with sensitive data removed.
  • usually sensitive data is not stored in my databases and does not make it into the SQL production databases.
  • we are trying to move to masking sensitive data
  • We do sometimes run production data through the development database. Purging it when testing is done.
  • We have a single database structure used by 150+ different companies with very different data profiles, so using a copy of live data is a must in some circumstances. If we just had made up test data, we could easily miss potential problems. We only have 4 developers in the company, one of whom is the boss and 2 of us do db development/dba work as well as code development, so there is a fair bit of cross over of job roles.
  • We have Dev + QA. So they work in Dev which is “A development database with made-up contents, not restored from production” and test in QA “A development database copied from production”. Right at this moment i am working on a FULL refresh of the QA system from production. All databases with production data.
  • We have development databases on a “production” server – it mostly just houses non-client (internal use) databases. Increasingly, our developers are using Docker with a database with fake contents in a Docker image.
  • We have multiple “development” environments for the various phases of development (QA, UAT, Staging) and the later environments are copies of production with sensitive data removed.
  • We really do not contain sensitive data in our SQL Server environment. However if that changes we would mask the sensitive data.
  • We require financial info to be tied back to it’s respective client, employees and other reporting systems. Masked data would unfortunately be nonsense for our testers.
  • We want to go to the made-up content, but the developers don’t take the time to define the test case data.
  • You skipped, “development database on prod server”

These are all good comments about hard problems.

I wish there was an easy, free, quick answer to solve these problems.

There isn’t one: the answers involve process changes and extra labor. I get the feeling this situation is going to continue for a decade or more. I salute those of you who are fighting the good fight to keep data safe from breaches and stolen backups. It ain’t easy.

Previous Post
Updated First Responder Kit and Consultant Toolkit for July 2019
Next Post
SQL Server 2008 and R2 Support Ends Tomorrow. Let’s Talk About That.

12 Comments. Leave new

  • Teun van den Biggelaar
    July 5, 2019 5:09 am

    Hi Brent,

    In an ideal world we would all scramble the data for our developers. If we would be forced to do this, is there a tool(s) you can really recommend ?

    Reply
  • Frédéric Hébert
    July 5, 2019 11:04 am

    #Like

    Reply
  • Dave Murray
    July 8, 2019 2:47 am

    My fellow devs and I would love a proper masked database with personal info removed but our Data Protection Officer claims we don’t need it. We’re in the EU btw and she also thinks we can keep information on employees and former employees forever. We have details of people who worked here 30 years ago in databases that are open to everyone in the organisation! Even when I pointed out to her that this is required under GDPR and companies have spent a lot of money adding masking features to products like SQL Server she still disagrees.

    Reply
    • Send her an email, copying-in senior management, stating that GDPR rules, if broken, mean MASSIVE, MASSIVE, MASSIVE fines. Include this link: https://gdpr.eu/fines (and draw attention to “The less severe infringements could result in a fine of up to €10 million, or 2% of the firm’s worldwide annual revenue from the preceding financial year, whichever amount is higher.”).

      State clearly that you accept zero responsibility for any future data breach unless GDPR rules are followed, and that it will be she & senior management who will be in the smelly stuff if a breach occurs.

      Send a copy to your private email address.

      And sleep well.

      Reply
  • So say you scramble your DEV/QA databases. What do you do when a query runs perfectly in DEV runs slow in PROD cause the execution plan is different because of the data distribution differences. I’ve already dealt with this because it’s impossible to run every process in DEV as in production, but with masking data you might be causing problems later on.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}