Blog

Scary SQL Surprises: Crouching Tiger, Hidden Replication

The more SQL Servers I see, the more scared I get.  There are a staggering number of ways that your server can sneak around behind you and … well, let’s just say you shouldn’t drop the soap in the datacenter.

As part of my health checks, I’m used to verifying that full backups are done regularly, but that’s not enough.

Suspicious Cat, Angry Birds

If your database is in full recovery mode, then it needs transaction log backups to clear out the transaction log.  Full backups by themselves don’t free up log space – only transaction log backups do it.  (This explains why people are still searching the web for how to do BACKUP LOG WITH TRUNCATE_ONLY in SQL 2008 – they need to clear out log space without waiting around for a ginormous log backup.)

However, I ran into an interesting surprise recently at a client.  The full backups were working great, the transaction log backups were working great, so the DBA team thought everything was fine.

Except that we found a 16GB database with a 160GB log file.

After the disbelief and panic died down, we ran a simple query to see the problem:

SELECT name, log_reuse_wait_desc FROM sys.databases

By querying sys.databases, we can get all kinds of nifty information about database status.  The log_reuse_wait_desc column shows why SQL Server can’t reuse the existing space in the transaction log and must continue growing it out – in our case, to 10x the data size.  Some possibilities include:

  • NOTHING – you’re in good shape
  • LOG_BACKUP – you need to take a log backup (and preferably schedule it to happen regularly in a way that matches your recovery point objective)
  • DATABASE_MIRRORING, REPLICATION, and AVAILABILITY_REPLICA – one of your mirrors or replicas is down, and SQL Server is holding on to log data until the replica comes back up

In our case, we saw REPLICATION – which was quite a surprise since the company wasn’t using any SQL Server replication – or so they thought.  Turned out that somebody had played around with replication once and tore it down, but didn’t rip out all of the parts.  SQL Server was patiently waiting for a dead server to come back online and catch up.

Whoops.

The Moral of the Story

Just because you’ve got the right jobs in place and the jobs are running successfully doesn’t mean they’re doing everything you expect.  Yes, transaction log backups are supposed to let SQL Server reuse log space, but there are a lot of prerequisites for that to work.

The more you monitor, the more you can prevent these kinds of problems.  The answer isn’t to build your own monitoring tool and continue to add checks like this one by one AFTER they’ve jumped you in the shower.  The answer is to reuse wheels that other people have already invented that have extensive checks built in, and learn from other peoples’ mistakes.

Open Question and Answer Webcast Recording

It’s like a can of mixed nuts, except – actually, that’s pretty much exactly what it is.  In this extended video from yesterday’s Tech Triage Tuesday webcast, we talk TempDB, fragmentation, CXPACKET, resumes, and more:

Who’s taking your backups? No, really – who’s taking your backups?

Who in your company is responsible for database backups?

It’s Me

Fantastic! Check your backup job. Did it run successfully the last time it was supposed to? Are the files where they are supposed to be? Have you checked the integrity of the backup files by restoring them to a secondary server? If you are in full recovery mode, are you taking both data and log file backups?

One of these Storm Troopers is the guilty party.

It’s Someone Else

Great! Who? I’d like you to walk over to that person’s desk or call them, and ask them the questions listed above. Do they know the backups ran? Do they know where they are stored? Have they tested them?

You’re going to sound like you’re questioning their ability to do their job. You’re going to sound paranoid. You’re going to sound suspicious. Good. You should. This is a task that should regularly be reviewed and challenged, to make sure the processes in place are optimal for your current environment and business needs.

I Don’t Know

If this is your answer, now is the time to panic. The business’s data is the business. A task as important as backups should not be left to chance. This should be a well-documented process with everyone knowing which department or person the responsibility falls to.

If you do a bit of research and find out that backups aren’t being taken, or the job has been failing, or you’re running out of space on servers, consider this an opportunity – not a problem. You get to put on your superdatabasehero cape. Determine which backups need to be taken, and when. Come up with a plan. Present it to your boss. Execute the backups. Monitor the jobs. Restore the databases. Document your process. Make sure people in the company – especially business stakeholders – know that the data is protected, and by whom.

I need to know more about backups!

You’re in luck! The Brent Ozar Unlimited crew has been advocating for backups for years. Check out Brent’s post on Backup Best Practices. Watch Jeremiah’s Backup to Basics video. Read How to Test Your Backup Strategy by Kendra. Then, check out my Backup & Recovery, Step by Step 115-minute training. You’ll be smarter, and you could be saving your company from a huge disaster.

The Top Five Signs You Need a Break from SQL Server

Even when you love your job, you can get too much of a good thing. How do you know when your love of technology is taking you too far away from reality?

Take our quiz to diagnose your mental state.

Our Quiz: Diagnose if Your Inner Geek is Out of Control

Have you found yourself doing any of the following? Assign yourself 1 point for each positive answer.

5. Your voicemail greeting says, “The DBA was not found or was not accessible. Verify that the name is correct and that the DBA is configured to allow remote connections.”

4. You answer a user complaint with the statement, “You have been chosen as the deadlock victim, please retry your transaction.”

3. Your boss asks you to redo work for a coworker. You say, “Your ROLLBACK request has no corresponding BEGIN TRANSACTION.”

2. Your email signature contains the phrase “The DBA is waiting for an internal operation to complete. If you encounter this problem regularly, report it to Microsoft.”

1. When asked to do anything unpleasant, you answer: “A severe error occurred on the current command. The results, if any, should be discarded.”

How to Get Help

If you scored even a single point on our quiz, it’s time to seek help. As a first step back to reality, try looking at pictures of cute animals.

We’re Taking sp_BlitzUpdate Out to the Farm

Last year at the PASS Summit, one of my favorite moments in my Blitz session was showing off my new equivalent of Windows Update for stored procedures.  I demoed how to use OPENROWSET to connect your SQL Server to the cloud, fetch an updated version of our sp_Blitz® stored procedure, and apply the update automatically.

I knew it was risky and I specifically told the audience repeatedly, “You shouldn’t use this in production.”  It was chock full of nuts security risks.  You’re connecting to a complete stranger’s server and running the code you find.  I could be dropping your databases, and you wouldn’t know until it was too late.  Even worse, my SQL Server could get hacked and a malicious user could replace my tasty sp_Blitz® code with sp_HackTheServer code and trash your stuff.

It was a lot of fun while it lasted, and nobody got hacked – but it’s time to quit while we’re ahead.  This is really more of a security risk than we’re comfortable taking with your SQL Servers; after all, you trust us to take care of you.  We decided to shut down the update service and let everybody get their updates the old-fashioned way – through our Blitz page.

That’s not to say I don’t have tricks up my sleeve for this Summit, though.  One of them is up to you: vote for my Enterprise DBA: Bob Dylan Explains TempDB lightning talk.  Today’s the last day for voting, and I’ve been practicing my impression.  If you wanna see my next costumed session, vote now!

Potential Problems with Partitioning

Getting started with table partitioning is difficult; there are many decisions to make after you’ve decided to partition data. Correctly configuring partitioning is critical to the long term performance and stability of the database from both a querying and data modification standpoint. There are three key considerations that database architects should devote time to before embarking on a partitioning implementation: the design of the partitioning key, a thorough study of querying practices, and an examination of data modification practices.

Partitioning Key

Keys are important

A critical component of database partitioning is the partitioning key; the partitioning key is used to identify the location of data in the partitioned table. The partitioning key is also used during query execution to avoid table scans by eliminating partitions. The difficulty comes in designing a good partitioning key.

SQL Server’s table partitioning only allows for a single column to be used as a partitioning key, but this is not as limiting you might think. Table partitioning is not limited in the type of column that is used; it is possible to use a computed column for the partitioning key. The advantage of using a computed column is that we can combine multiple columns and create a “smart key” to effectively identify a row’s partition. A smart key is a key that naturally occurs in your database; it could be something similar to the VIN of a car or it could be a combination of a date and a warehousing code (e.g. “ASDF:20120810″). While many database architects discourage the use of natural keys (and thus smart keys) in OLTP systems, natural keys play an important role in table partitioning because they can provide an easy way to split data into logically discrete chunks.

The design of the partitioning key is also important since once data has been partitioned to disk, the only way to change the partitioning scheme is to move the data into new tables. A significant amount of scripts and automation are typically built up around data partitioning and a change to the partitioning scheme can result in a significant amount of development work as well as a significant amount of I/O.

Querying Practices

The most important consideration for designing the partitioning key comes from how data is queried. A simple scan of sys.sql_modules and querying a combination of sys.dm_exec_cached_plans and sys.dm_exec_sql_text will reveal commonly used query criteria. If all queries feature a common set of search criteria (sales territory and date), then it becomes easy to select a partitioning key.

Understanding how data is queried is critical for the success of table partitioning. If queries do not explicitly query on something that amounts to the partitioning key, it will be important to re-write queries so that they will use the partitioning key. This will ensure that the smallest amount of data is touched during any single query’s execution. When using a computed column to create a partitioning key it may be necessary to update queries to include a search on the partitioning key through dynamic SQL or by generating the queries in the application tier.

It may not be possible to rewrite all queries to use the partitioning key. This is normal – when aggregating data across the entire business, it may not be desirable to confine queries to a single partition. While large queries are unavoidable, it’s best to confine queries to the smallest number of partitions possible to reduce I/O and potentially increase the accuracy of optimizations. It may even be necessary, in some cases, to force partition elimination using the $partition function.

Data Modification

Finally, special consideration should be given to how data is modified in the database. OLTP applications typically modify a small number of rows in a single transaction. Data warehouses, on the other hand, modify large amounts of data at once. Although these different data modification patterns do not directly change how data is partitioned, data modification patterns should influence the decisions around partitioning. For instance, partitioning is typically used for data warehousing because a partition swap is a metadata change – since a partition swap is a metadata change, an entire partition can reloaded and index fragmentation can be avoided.

The data warehouse scenario is the easiest approach to partitioning – a new partition is loaded and appended to a table while existing partitions are modified outside the current table and the newly partitions are swapped in place. For OLTP systems, partitioning may still be used as a way to spread write load across multiple drives.

Conclusions

Table partitioning is not a project that should be entered into lightly – it is a major structural change for the database and a major software change for database developers.

Before you start designing a table partitioning strategy, or if you’re trying to troubleshoot why your partitioned tables aren’t working as fast as you expect, check out our SQL Server table partitioning resources page.

Webcast Video: Sizing Up the TempDB Database – Best Practices

You may have heard that it’s a best practice to use multiple files in SQL Server’s TempDB database– but do you understand why this is the case? Do you know how to tell when you have enough TempDB data files, and when you should add files to improve performance? In this free 30 minute webcast, Microsoft Certified Master Kendra Little will explain common causes of slow performance in TempDB and steps you can take to speed up your SQL Server. If you have one year of experience with database administration, you’ll leave the webcast armed with a new understanding of TempDB.

Want a rundown on the links in the video? Scroll down to the bottom of the post.

Helpful Links on TempDB

Change Management: The Question You’re Not Asking

I work with clients with a variety of change management practices— everything from, “I just email the team when I’m going to change something” to “I need to submit formal changes each Tuesday by 2pm for weekly review for the next approved change window.” And occasionally, “Just Do It.”

Don’t cry over this: plan for it.
[Photo credit]

There’s no single right way to do change management. There are some really basic standards: yes, you should be checking in your code! But different teams communicate, propose, and approve changes in their own way.

When planning changes, successful teams typically cover these questions in one way or another:

  • What are you doing?
  • Why are you doing it?
  • What steps will you be doing?

Sometimes people even include, “How do you undo the change?” I’m a huge fan of that— if there’s no way to undo the change, that’s important to know! It’s also important to know when it’s dead simple to undo the change.

There’s one question which almost nobody asks, however, and it’s one of the most important questions.

What’s the Worst Thing That Could Happen?

With every change you do in production, ask what realistically is the worst case scenario. We’re not talking about space trash striking the datacenter and causing company-wide failure, we’re asking the question, “What specifically would fail if this change went really bad?

This is a simple question to ask, but you learn a lot from answering it. It’s a much more informative question than asking something like “is this low, medium, or high risk?”

Why this Question Saves Your Business

The “worst case” question gives you insight about how you run your business. It shows you what levels of risk you regularly take. It also shows you how well your team understands the possible impacts of changes.

Asking this question also builds your teams in an interesting way. When I first started tracking the “worst case” scenario for changes, I found that I needed to talk to my colleagues more. I could easily tell that a change might produce incorrect data in a given table or cause a problem in a given area of data processing, but I wasn’t sure exactly what that might mean to the end users. To work out that question I needed to work with others. Sometimes I got information from our developers, sometimes from our level 2 customer support team, and sometimes I worked out the information with business users.

Great Change Management Builds Documentation the Easy Way

Asking “what’s the worst thing that could happen?” was a natural and easy way for my team to document the role of our database servers and points of impact on the business when they went offline. As we collected this information in changes we also documented it in our troubleshooting guides.

My team went from having a vague idea of the level of risk of changes to have a clear, documented list of the roles of all our databases. We worked in an extremely active environment and it was always safe to assume we didn’t have a complete list of impacts: we had to keep communicating. But this added a huge efficiency to planning changes, responding to incidents, and also demonstrating the value of our investment in data management.

Don’t wait for Management to Demand Better Change Management

Sure, improving change management can be a great win for managers. But if you’re reading this and you’re a database administrator, improving change management can be an even bigger win for you.

If your team’s change management can be improved, you have a great opportunity. Start with your own changes: try out a new method of documenting your changes and getting them approved. Use it to iteratively improve your documentation of the system. Show your management what you’ve been doing and the value it provides, and soon you’ll see the process take off. Know what you just did? You just contributed to your business in a huge way.

No Substitute for Time, Experience, and Training

I’m a runner. I wasn’t always a runner. Three years ago, I couldn’t run a mile. I wanted to, but I wasn’t physically capable. So I started training with the Couch to 5K plan. Eventually, I ran a 5K – I did it! – even though I was a mess. I didn’t wear the best clothing. I ran slowly. I struggled to finish. I hurt for days afterward.

left: 5K #1 – bad hair, bad shirt, bad shoes.
right: 26.2 #1 – big smile, big medal

I’ve now run 10Ks, half marathons, and a marathon. I have moisture-wicking clothes. I’ve been doing speed training to cut down my time. I know how to pace myself so I can finish strong, and fast. I know what to stretch and ice afterwards to prevent soreness. How did I learn that those things would help me? I followed a training plan. I ran more. I ran longer distances. I ran faster. I tried new clothing. I talked to other runners. I stepped outside of my comfort zone.

I’m a SQL Server consultant and MCP, also. I wasn’t always a consultant, or an MCP. I started in IT on the help desk. After taking a SQL class, I volunteered to help the DBA write reports. I realized I really liked working with SQL Server, and that is what I wanted to do. I wasn’t good at SQL Server right away. I didn’t know the difference between backup recovery models. I couldn’t explain how clustered and nonclustered indexes worked. I didn’t know what a filegroup was.

Eventually, I was promoted to DBA. Now, I’m a consultant. I give presentations on Reporting Services and filegroups. I’ve put clustered SQL Server instances into production. I’m writing a book chapter on automating administration. How did I learn those things? Over the years, I’ve tried new things. I’ve read books and blogs. I’ve attended webcasts and presentations.  I know far more about SQL Server than I used to – enough to help other people!

What is the connection between being a better runner and being a better SQL Server consultant? Both have taken time. Both have required me to try things – I’ve failed at some and succeeded at others, but learned from all of them – thus gaining experience. Both required training and dedication.

Create A SQL Server Training Plan

To improve your skills with SQL Server, possibly leading to a promotion or a new job or a speaking gig you really want, you need a training plan. Yes, a plan. One that has dates and actions. My current half marathon training plan is from Hal Hidgon’s website.

First, identify what you want to accomplish. I did not set out wanting to run a marathon. “I want to learn everything about SQL Server” is a big statement, and nearly impossible. It is too big of a product for one person to know everything. You will get lost and discouraged trying to do that. I set one goal at a time. I want to run my next half marathon in less than 2 hours and 10 minutes. I want to give an hour-long presentation on automating SQL Server administration. I want to pass the 70-461 Querying Microsoft SQL Server 2012 exam.

Second, identify your preferred training method. I love long runs – Saturday mornings are my time to run for hours and hours. You have many options. There are books and blogs you can read, webcasts you can watch, podcasts you can listen to, and in-person events to attend like user group meetings, SQL Saturdays, and PASS Summit. (Shameless plug: Brent Ozar Unlimited® has a ton of great blogs, a free weekly webcast, and training classes.)

Third, find something outside your comfort zone. I hate speed work. I’ve tried fartleks, intervals, hills, and track workouts, and I dislike all of them equally. However, speed work has helped me get faster. Doing something outside of your comfort zone increases your skillset – and you may find something you enjoy! Answer forum questions. Create a presentation to give for a user group or virtual chapter. Try writing an instructional blog.

Fourth, create your training plan. I’ve already got my half marathon training plan printed and hung on my corkboard. Decide what book you are going to read or what set of webcasts you’re going to watch. Then – and this is the part people usually forget – set aside the time. Block out time on your calendar. Keep a book in plain sight. Every morning, when I wake up, I spend 15-30 minutes reading while I drink my first cup of coffee, before I go for a run or hit the weights. I just finished “Troubleshooting SQL Server – A Guide for the Accidental DBA” and now I’m working on “On Writing Well”. I have time blocked off on my calendar weekly to watch a PASS Summit presentation and a TED Talk. I go to a user group meeting monthly. These are things I have worked into my schedule because learning – training – is important to me.

Fifth, and most importantly, execute. Some days, I don’t even want to lace up my shoes. But I do. Some days you won’t feel like reading a book chapter, or writing even one page in Word. But you need to. If you don’t do it on the tough days, you won’t get where you want to be.

Training is Important

I’ve learned that no matter how much I want to be good at something, it doesn’t happen automatically. I have to invest time into preparing and training, whether that is by making time to run 4 or 5 days a week, or setting aside time to read blogs and try new SQL Server features. There is no substitute for hands-on experience. I have gotten better as a runner by running different roads, different distances, and different paces. I have grown as a SQL Server professional by working with different versions, trying new ways of writing queries, and investigating how things work.

To grow in your career, understand that there is no substitute for time and experience. Identify what you want to learn, determine how you learn best, set aside the time, and train!

SQL Server DBA Interview Q&A Part 2: The Answers

Last week, we asked you the questions, and this week we’ll talk about the answers.  We’ll discuss what we look for in a candidate – not just the written answers, but also the way they answer questions.  You’ll learn why you shouldn’t bluff, and why you should never, ever rate yourself a 10 on a 10-point scale.  We’ll even share our favorite questions that candidates asked US during the interviews in this 30-minute session.

Want more interview questions? Check out some of our past posts:

Developer, Sysadmin, and SQL Interview Questions

Good Tips for Interviewing Database Administrators

  • How to Hire Top Talent – First of all, you’re not the only one looking.  But guess what? You’re probably looking for the wrong person, and you’re almost certainly asking the wrong questions.
  • The Myth of the Perfectly Qualified Candidate – companies look for someone with exactly the right amount of experience – but what kind of employee has it?  I explain.
  • (SQL Server) Size Matters – how to determine if a DBA’s experience is right for your company.
  • Tips for Hiring a DBA – my thoughts after going through the interview process at a few companies.
  • Show Candidates Their Work Areas – and watch their reactions.  It’ll help you gauge if they’re a good fit for you, plus give you insight about your environment.

Interview Tips for Candidates

css.php