This has been blogged before by several people. One thing I’ve learned from Brent is to not let the fact that the topic has been blogged before stop you from blogging about it. With that out of the way…
When you have sysadmin access, you are bound to make a big mistake in production at some point in your career. Not everyone has. Maybe they’re perfect, or maybe it just hasn’t happened yet. I’ve made two big mistakes in production.
15 years ago. I remember it like it was yesterday. This was back in SQL Server 2000 days. We were using Log Shipping for our Disaster Recovery needs. This company failed over to the DR site 1-2 times per year and would run production there for 2-3 weeks at a time. This had two big benefits.
-Proves to the customers that the DR plan works (as long as resources are availabe in an unplanned situation)
-Allows you to do major maintenance at the primary site
During one of the planned failover maintenance windows, I was tasked with dropping Log Shipping and setting it up again in the reverse (from DR site to primary site). One of the drawbacks with Log Shipping is that after you failover to the secondary server, you can’t reverse it without dropping Log Shipping and setting it up again.
Well 15 years ago, I was a lot less experienced than I am now. My steps would be different these days. But here’s what I used to do:
- Drop Log Shipping
- Check the Log Shipping tables in the msdb database to make sure it dropped cleanly (back then there were often rows that had to be manually deleted)
- Drop the database at the primary site to avoid having to use WITH MOVE in the RESTORE DATABASE command
- Setup Log Shipping
I accidentally dropped the database at the DR site where production was now running. I immediately told my manager and the manager of the WebOps team. We were still inside our maintenance window, so I started restoring the database. I restored to the last transaction log backup that had been run, but there’d be data loss if I stopped there. I got lucky on that system that the incoming data could be recovered from text files. The bad news was that someone else had to do that task as it wasn’t a SQL Server task. I felt bad for not only having dropped the database but that I increased the work of someone else who already had a ton of stuff to complete in that maintenance window.
Now every time that I have had to drop a database in production, it takes me a good minute before I am confident that I am doing it on the right server and on the right database.
I’m embarrassed to say that my second big mistake in production occurred 2 years ago. The system that this big mistake occurred on required many manual data changes due to various reasons, such as the application not having a certain feature or an application bug that hadn’t been fixed yet.
I needed to delete some data in a table. You see where this is going, right?
First I had to query the table to see what needed to be deleted so that I could get my WHERE clause right. I changed the SELECT to a DELETE. The full DELETE query was in the query window including the WHERE clause. I highlighted the query (I had other things in the query window) and hit F5.
The problem was that I accidentally missed highlighting the WHERE clause and hit F5 too fast. To top it off, I had dismissed the SSMSToolsPack warning about deleting data without a WHERE clause. I was so used to dismissing the warning that I clicked it without even thinking about what it said.
My stomach dropped. I noted the time (this is important for point-in-time recovery). I ran to my manager’s office and fessed up to what I had done. He told me to start the side-by-side restore while he worked on figuring out the impact. We had enough disk space that I could restore the database with a different name so that there wouldn’t be downtime. I restored the database and then the transaction logs to the point in time before the DELETE command had been run. I then copied the data over to the actual database. There wasn’t foreign key data that I had to worry about, luckily. It was just the data in this table that needed to be recovered.
So, fess up! What have you done?
What is the biggest mistake you made in production? Feel free to comment with non-SQL Server mistakes, just as long as they were made in production. Mistakes in non-production can be costly, but doing them in production takes it to another level.
Brent says: I’ve done a lot of dumb stuff, and I should probably write a whole compendium blog post. One of my favorites was trying to move a big 4U rack-mount server from a bottom spot to a top spot, by myself, in the middle of the night during a Saturday maintenance window. I’m no Erik Darling, and my pencil-thin arms managed to drop the server from above my head. It missed my foot by maybe an inch, shattering the raised floor tile. Took me a good half hour just to calm down. I have to hand it to IBM, too – that x445 kept right on working when I plugged it back in.