Hi. I’m not Brent.
My name’s Alex, and I care about databases, DevOps and data protection. Last year I gave a session on GroupBy all about how to do Database DevOps right. Now, dear reader, I’m going to give you a cheerful, three-step crash course about how to screw it all up.
All set? Let’s go!
Step 1: Apathy – wait until it’s too late
We’ve got a monster database.
Well, it’s actually fourteen databases split across three servers, with a tangled web of circular cross-db dependencies. We have a CLR layer (for which we’ve lost the source code) and our table and column names are obfuscated and unintelligible.
We’ve got four petabytes of data and we know a bunch of it is sensitive PII, but due to the complicated table structure and lack of documentation we aren’t exactly sure where it all lives. However, our outsourced, overseas dev team tell us it’s critical they have all the prod data in their dev database to test the Entity Framework code in their apps work.
We are running SQL Server 2005 and have 37,000 tables, some with over 1,000 columns. (That’s not including the automatically generated tables). Also, some of our stored procedures are a few thousand lines long. Needless to say, none of the database source control tools that we tried have been able to cope – so the schema isn’t in source control. Our dev and production environments have many inconsistencies.
We managed to pull off our 2015 release in under 3 months, but we’ve been struggling with our 2018 release since February and we are still trying to figure out how to make it work for our dozen or so oldest clients. (They use customised versions of the CLR layer for legacy reasons. We lost that source code too.)
We’d like you to take a few days to automate our deployment and create a test suite with 100% code coverage to help us deploy more easily.
(Also, one of my developers has told me about this new GDPR law. Do you think we should budget a day or two to cover our bases there? Thanks.)
J. Doe, Project manager at Corporation Inc.
While this email is fake, it’s made up of a whole bunch of issues I’ve seen with various real customers. The point is that people begin to understand why source control, testing and deployment automation is so awesome when their databases become hard to manage.
Unfortunately, at this point it’s often super hard to retro-fit a sensible process.
For many folks who manage legacy databases, DevOps theory and the tools that can help us are new and complicated things to learn.
In general, teams are far more likely to succeed if they learn the basics with a much simpler database. Once they’ve honed their skills with the easy stuff they are far more likely to succeed with their monster database.
The problem is that often folks don’t see the point with greenfield projects. While the schema is still relatively simple people don’t see the value in writing unit tests for their stored procedures or automating their deployment pipeline. They just want to get started cutting code.
Unfortunately, in time, that greenfield project will sure enough turn into a new monster. If they had taken the time at the start to put together a basic source control, automated build, test and deployment process they might have avoided it. When making architectural decisions they would have been forced to think not just about scalability, durability and performance etc – but also testability and deployability. Having designed the database with those things in mind they might have saved themselves a bunch of headaches a few months or years down the road – but they didn’t.
Instead, in a few years some new start-up will be able to respond to evolving market demands more quickly.
Step 2: Use a shared dev database
Let’s talk about how C# developers work.
Their code lives in a source control system. When they want to make changes to it they do the following:
- Pull down the latest code to their local machine
- Make their changes, run their tests
- Build it locally, check it works
- Push their changes back up to source control
It works pretty well. Imagine telling your C# developers that you wanted them to change their process as follows:
- The code will now live on a file share
- All developers will work on the same set of files at the same time
- When you want to commit, cherry pick the changes you made and only push those
They would either laugh, get angry or quit. (Or all three.) What you are asking is plainly bonkers for various reasons:
- Cherry picking changes to commit makes it impossible to manage changes effectively.
- Developers will be unable to test code in isolation – if it broke, was that because of something I did or something you did?
- Developers will sure enough be overwriting each other’s code – and they won’t notice until deployment day when their feature doesn’t work in production.
- If one developer breaks the code – the others will be blocked until it’s fixed. (If it’s not recoverable, they’ll lose their work.)
- People have their opinions about branching, and I’m not going to get opinionated about it here, but it’s plainly impossible to branch effectively with a shared dev database. It’s a plain contradiction.
- The shared dev version of the code will get bogged down with old test code and half-finished or abandoned features that never get cleaned up, meaning your dev environment behaves differently from production. Sooner or later someone will utter those familiar words… “But it worked in dev?”
There is a reason that C# developers would find this strategy entirely stupid.
So, given that it’s a one-way ticket to a world of pain, why do some organisations insist on using shared development servers for database development?
Here are a few commonly touted reasons. They are all flawed:
- Devs need lots of data and their workstation doesn’t have enough space. (You can solve this problem in other ways.)
- It helps me lock down my data. (Wrong. If developers can access sensitive data you are already doing it wrong. The hacker doesn’t care if that sensitive data is on a workstation or a dev server. If the dev clicks that phishing link and can also see the sensitive data – it’s game over.)
- As a DBA how can I control the SQL estate if all developers have their own SQL instances? (You don’t. Developers are responsible for their own dev databases.)
- My database is complicated and impossible to spin up on a dev workstation. (Congratulations, you’ve successfully achieved Step 1, above.)
If you want to ensure that your development process is doomed to be a single-threaded waterfall or a horrendously complicated attempt to implement feature branches without branching, you should use a shared development database. If you want to ensure your dev database is inconsistent with production, and you want to hear people say “it worked in dev”, you should use a shared database. If you want to build a monster database that is hard to deploy, you should use a shared development database.
There are other ways to screw up your ability to deliver database updates – but few are as effective is using a shared development database.
If you’d like to read more of my thoughts about shared development databases, I wrote about them in more detail here.
Step 3: Re-invent the wheel
One of the reasons I started DLM Consultants is because I’ve seen so many awful DevOps consultants or engineers who fly in and build a DevOps*. I thought I could do better.
The consultants circle around like vultures, seeking out people like J. Doe (who sent me the email above). They swoop in and build a custom DevOps* that’s highly customised to the specific needs of Corporation Inc. Unfortunately, the DevOps* they built is so complicated and poorly documented and no-one really understands it, so the DevOps vultures** hang around on an expensive retainer to manage all the DevOps-ing* for Corporation Inc (as the company slowly loses more and more business to the new start-ups that are eating into its market-share).
Another pest to look out for is the DevOps parasite***. These critters are typically employees at organisations, such as a developer or DBA at Corporation Inc. They might even be a DevOps engineer*. They’ll try to solve the deployment problem by building a very clever and specially customised DevOps*, but no-one else will understand it. By building it in-house they’ve managed to simultaneously become the bottleneck on the company’s ability to release database updates and also make themselves essential to the successful running of the company. They’ve found a way to siphon cash out of Corporation Inc as a salary until either they retire or the company goes bust – whichever comes first.
You can defend yourself from DevOps vultures and parasites by adopting well documented, industry standard tools and techniques where possible. These will be easier for everyone in the organisation to understand and it will be possible to recruit talent that already understands how your source control, testing and deployment tools work because they’ve used the exact same tools with previous employers.
And if you are thinking about building your own DevOps*, just remember that whatever you can come up with in an afternoon, as a side project, is not going to be as good as the tools that are available off the shelf either from Microsoft, third party vendors or open source communities who have sunk years into solving the problems you haven’t thought of yet.
Unless of course, you want to become a DevOps vulture** or a DevOps parasite*** yourself. I wouldn’t recommend it, becoming the bottleneck of your company isn’t a fun job – even if it is hard for them to fire you. But hey, if that’s what you are looking for, go ahead.
*DevOps is not a thing you do or a thing you build. It’s the way a group of people with different skills work together effectively to achieve a goal.
** DevOps is also not a breed of vulture.
*** Or a type of parasite.
Want to know more ways to screw up?
This blog post was inspired by a talk I delivered for SQL in the City Streamed back in 2016. If you would like to watch the 30-minute recording of my 15-step guide you can do that here.