Blog

My Easy New Year’s Resolutions (Guest Post)

Hi!  I’m your SQL Server.  I know you don’t usually listen to me, so I’ve decided to take a drastic step and find you where you spend all your time – Brent’s blog.  Seriously, you need to spend less time on the intertubes and more time on the error logs.  You’re lucky this post is about me, not about you, or else I’d also have to divulge the fact that you installed Adobe Flash on my desktop.  Whoops, I just did.

Resolution 1: Keep in Touch with Loved Ones

Bad things have been happening to me all year long, but I’ve been keeping it secret from you.  It’s time you and I were more honest with each other.  To do this, I want you to write your name in my heart – and by that I mean, set yourself up as my operator.  The below script sets you up as an operator and sends you alert emails whenever bad things happen.  Before you run it, make these two changes:

  • Change YourEmailAddress@Hotmail.com to your actual email
  • Change 8005551212@cingularme.com to your actual phone’s email address (look up your provider)
USE [msdb]
GO
EXEC msdb.dbo.sp_add_operator @name = N'The Database Administrator',
    @enabled = 1, @weekday_pager_start_time = 0,
    @weekday_pager_end_time = 235959, @saturday_pager_start_time = 0,
    @saturday_pager_end_time = 235959, @sunday_pager_start_time = 0,
    @sunday_pager_end_time = 235959, @pager_days = 127,
    @email_address = N'YourEmailAddress@Hotmail.com',
    @pager_address = N'8005551212@cingularme.com',
    @category_name = N'[Uncategorized]'
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 016', @message_id = 0,
    @severity = 16, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 016',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 017', @message_id = 0,
    @severity = 17, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 017',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 018', @message_id = 0,
    @severity = 18, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 018',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 019', @message_id = 0,
    @severity = 19, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 019',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 020', @message_id = 0,
    @severity = 20, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 020',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 021', @message_id = 0,
    @severity = 21, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 021',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 022', @message_id = 0,
    @severity = 22, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 022',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 023', @message_id = 0,
    @severity = 23, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 023',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 024', @message_id = 0,
    @severity = 24, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 024',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO
EXEC msdb.dbo.sp_add_alert @name = N'Severity 025', @message_id = 0,
    @severity = 25, @enabled = 1, @delay_between_responses = 60,
    @include_event_description_in = 1,
    @job_id = N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name = N'Severity 025',
    @operator_name = N'The Database Administrator', @notification_method = 7
GO

Resolution 2: Do a Data Detox Diet

Over the years, I’ve eaten a lot of bad stuff.  It’s time to do a little cleansing ritual to purge myself.  People have dumped in heaps of temporary backup tables that never got queried again.  This query looks for heaps (tables without clustered indexes) that haven’t been accessed since the last server restart (or database restore/attach).  Run it in my largest or most important databases:

SELECT SCHEMA_NAME(o.schema_id) AS [schema]
,OBJECT_NAME(i.object_id) AS [table]
,p.rows
,'EXEC sp_rename ''[' + SCHEMA_NAME(o.schema_id) + '].[' + OBJECT_NAME(i.object_id) + ']'', ''[' + SCHEMA_NAME(o.schema_id) + '].[_ToBeDropped_' + OBJECT_NAME(i.object_id) + ']''' AS ScriptToRename
,'DROP TABLE ''[' + SCHEMA_NAME(o.schema_id) + '].[' + OBJECT_NAME(i.object_id) + ']''' AS ScriptToDrop
FROM sys.indexes i
INNER JOIN sys.objects o ON i.object_id = o.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id
WHERE i.type_desc = 'HEAP'
AND COALESCE(ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates, ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup) IS NULL
ORDER BY rows desc

These heaps are slowing down my backup times, my DBCC times, and my 100 meter sprint times.  We can make some quick judgment calls based on table names – if they’re names like Sales_Backup2009BeforeDeployment, it’s probably a table someone made once as a backup – and then forgot about it.  We could either rename them temporarily and then drop ‘em in a few days, or just drop ‘em outright.  Standard warning about deleting data, blah blah blah.  (Hey, my resolution wasn’t to be more cautious.)

Resolution 3: Lose Waits

Users love me, and they show it to me by feeding me huge dinners of data. I love my life, don’t get me wrong, but sometimes I see those new guys with their solid state drives and their 512GB of memory, and I think, wow, they must be able to handle anything. I’ll never be that well-endowed, but there’s still some easy things I can do to get in better shape. Right now, I’m in heart attack territory, and it’s time to fix that.

I’ll start by working out with Brent Ozar and Buck Woody, watching this video where Brent is dressed up like Richard Simmons. It’s an oldie but a goodie, and it’s better than being seen in public with a ShakeWeight.

Now, weren’t those resolutions easy?  What, we’re not done yet?  You haven’t run the scripts?  What the hell, you lazy meatbag?  I can’t pull this off by myself – I need you to do ‘em.  What do you think this is, the cloud?  And wasn’t one of your resolutions this year to make sure the boss doesn’t think the cloud is better than us?  Hop to it before we lose our jobs.

SQL Server High Availability and Disaster Recovery Basics Webcast

If your boss wants you to make your SQL Server more reliable, there’s a lot of confusing options. I’ll walk you through the pros and cons of each method and show you how they can help make queries go faster too. We’ll finish with a list of books and blogs I’d recommend for each option.

This 30-minute session is for production DBAs who haven’t implemented clustering, replication, or log shipping before, and aren’t sure where to start.

HA-and-DR-worksheetChoosing the Right Technology with Our Worksheet

Should you use a cluster, replication, or AlwaysOn Availability Groups? How often should you do backups? Figure it all out with our HA/DR planning PDF worksheet.

Clustering Resources

Clustering had a bad reputation for being difficult to configure and manage, but Microsoft’s made dramatic improvements in recent versions of both Windows and SQL Server.  Whether you’re just getting started with clustering for the first time, or you still think clusters require heartbeat networks, your first resource should be Pro SQL Server 2008 Clustering by Allan Hirt (@SQLHA).

Database Mirroring Resources

For the first couple of years that I used Database Mirroring, I thought it was brain-dead-simple: just Next-Next-Next your way through a wizard, and you’re in business.  I couldn’t imagine how anyone could write an entire book about mirroring.  That, however, was before I found out about Pro SQL Server 2008 Mirroring, which I’d definitely recommend.  The book’s coauthors are Robert Davis (@SQLSoldier) and Ken Simmons (@KenSimmons).

For some quick free learning, check out Robert’s posts tagged with database mirroring.

Replication Resources

Replication’s been out for several SQL Server versions now, and it’s slowly changed with each version.  As a result, there’s a ton of good free info out there, but it’s not all up-to-date.  My favorite place to start is the Books Online topic for Designing and Implementing Replication, and then move on to:

Log Shipping Resources

Log shipping is more of a technique than a technology: it just means restoring your backups continuously on another server.  When database administrators first started doing this, they did it with their own scripts.  Third party backup compression tools like LiteSpeed started offering cool easy-to-manage user interfaces for it too.

These days, the log shipping and backup compression built into SQL Server 2008 R2 Standard Edition put log shipping within everybody’s reach.  The easiest place to get started is with the Books Online section on Log Shipping.  If you use a third party backup compression tool, poke around in their documentation too.

SAN/VM Replication Resources

These solutions are all vendor-specific, and unfortunately a lot of vendors lock their documentation behind support sites that require logins. However, there’s one very good place to start: Mike Laverick’s free ebook Administering VMware Site Recovery Manager 4.  This isn’t the most current information available, but don’t worry about that – just read Chapter 1 for an excellent vendor-neutral introduction to disaster recovery planning using SAN and VMware replication.

If your SAN happens to be covered in his book, skim through that chapter too just to get an idea of the difficulty level for these solutions.  These don’t have the ease-of-use of, say, SQL Server log shipping, but it’s one pane of glass to manage all of your disaster recovery needs regardless of the operating system and application.

For more learnin’, check out Veeam’s free whitepaper section.  Veeam is a vendor that sells their own virtualization backup and recovery software, and I’m not affiliated with ‘em or getting paid to say this, but I’m a big fan of vendors that give away free education to the community.

AlwaysOn Availability Groups Resources

This is a brand new feature of SQL Server 2012, so we don’t have a lot of good community resources out there yet.  There’s a few screenshot walkthroughs, but they don’t really tell you anything you can’t figure out in five minutes of playing around.  For now, the place to go is the Books Online section on Availability Enhancements.

No matter how you do HA or DR, you need backups.

There is no more critical task for a DBA than to back up the business data – except perhaps restoring it to ensure validity. A disaster – in many forms – can strike at any time. SQL Server offers many backup and restore options, to help you meet business requirements.

In Jes Schultz Borland’s 2-hour video class, you’ll learn about:

  • Simple, Full, and Bulk logged recovery models
  • Full, differential, and log backups
  • Restore options
  • Best practices

It always helps when you see something in action, so she’ll demonstrate

  • Full backups
  • Differential backups
  • Log backups
  • Restores

Buy it now.

2011 Yearly Link Roundup

This year we started publishing a weekly email newsletter (subscribe) – nothing deep, just each of us writing our 3-4 favorite links from the week.  We read a lot, and we thought you might like some of the same things we liked.  It’s not technology-specific, just interesting stuff.

To finish up 2011, here’s the most popular links from each of us:

Jeremiah’s Link History

Why how is boring and how why is awesome

Benjamin Pollack reveals the secret of great communication: focus on the why. People can figure out how to do something for themselves, telling the audience why your ideas can solve their problems will enage them and give them a reason to learn more. Read it!

Indexing for ORs

Most people write about indexing strategy for single predicate or multiple predicates with AND conditions, but very few people write about a bigger problem: indexing for OR. A lot of time and effort goes into developing great queries and indexing strategies that make most of your code run fast. Shouldn’t the rest of your code get the same treatment? Gail Shaw tackles the difficult problem of indexing for OR.

Get Your Learn On!

When I have to learn something fast, I don’t try to memorize as much as possible. Instead I form links between the concepts I’m learning and link them to analogs in other areas of expertise. I’ve found that I catch on faster when I’ve been able to do this vs memorizing detail.

Database Architect Cat Fight!

Mike Stonebraker is a polarizing guy in the world of databases. He worked on INGRES, POSTGRES, Vertica, and now VoltDB. In this article, Dr. Stonebraker and one Facebook’s database architects square off about the future of databases.

Leveling Up

Figuring out how to get to the next level in your career can be tricky. Jason Rudolph puts a different spin on the idea by saying “What experiences has a master benefited from that a novice has not?” The trick, of course, is to link that experience to concrete goals, like merit badges.

Database Anti-Pattern: The Queue

You can make a relational database do some crazy things. Some of those crazy things are incredibly cool. Some of them are incredibly bad. Using your database as a queue is not so smart. The article talks about MySQL, but the ideas apply everywhere.

Lock the doors!

Here’s a quick primer for when you need to start looking into security problems in databases. It’s not comprehensive, but it’s a great start into combining application and database security.

Partitioning vs Federation vs Sharding

Sharding isn’t magic, it’s traumatic.

How Long Will an Identity Last?

Ever wonder how long your identity column will last? Wonder no more. You can cut this number in half for SQL Server since there’s no way to use an unsigned integer, but it’s a good calculation to have.

Build a Better Index

Use these examples from history to build a better index.

Brent’s 2011 Highlights

“I will jiggle things randomly until they unbreak”

Linus Torvalds, the creator of Linux, bashes a programmer for a bad bug fix. We love this line, and it motivates us to do a better job of root cause analysis.

How to Troubleshoot Locking

Michael J. Swart brings you a simple, easy-to-use flow chart showing how to figure out if you’ve got locking – and how to react. Don’t just print it out and hang it on your wall – the flow chart is linkified to make your life even easier. This must be locking week, because Kendra Little also covered three ways to diagnose locking issues too.

Find Your Lost T-SQL Code

Ever accidentally closed a tab in SSMS without saving the T-SQL first? Or even worse, closed SSMS entirely? Grant Fritchey shows you how to get it back.

8-CPU, 2TB SQL Servers

You’re probably never going to need to configure one of these big, bad HP DL980 8-CPU servers with up to 2TB of memory, but just to be safe, you should read about how Microsoft says to configure it for SQL Server. The special notes about SSD configuration are especially amusing.

The FBI Might Take Your Servers

If you’re using a colo facility and your next-door-neighbor is doing something illegal, the FBI might just grab all the nearby servers to be safe. It happened to Instapaper. Does your DR plan cover that?

20 Movies Every Hacker Should See

I’m embarrassed to say I’ve only seen 9 of these movies.

Why Is My Query Faster in SSMS?

When you’ve got a query that runs slow in production, but fast in SSMS, this monster article from Erland Sommerskog will teach you everything you need to know. Query plans, caching, dynamic SQL, parameter sniffing – it’s all here.

Why MS Won’t Build SQL Server for Linux

Hal Berenson recently retired from Microsoft, and he takes a moment to look back at why the SQL Server 7.0 team didn’t build a version for *nix. They’re still valid today.

The Jobless Future

It’s amazing to think about the kinds of jobs that have disappeared as I’ve grown up. Whole categories are disappearing, and in the future, I really believe that in order to be successful, you’re going to have to be an entrepreneur, period.

Move Rows with One Statement

Need to move rows from a live table to an archive table with as little work as possible? Check out this cool trick by Merrill Aldrich.

When Does Join Order Matter?

Conor Cunningham, Microsoft’s query guru, explains in this oldie but goodie.

Make SSMS Awesome

Jes Borland shares her favorite Management Studio tweaks to script objects better, save changes easier, and filter large numbers of objects.

Convert Yourself Into a Designer

No matter what you’re doing for a living, Penelope Trunk says you need better visual communication skills if you’re going to take your career to the next level. I wholeheartedly agree.

Signs You Suck at Programming

This programmer’s equivalent of a Cosmo quiz asks tough questions about your code and your habits. Storing passwords in plaintext? Not validating your SQL strings? Uh oh…

SQL Indexing Strategies

Gail Shaw sums up everything you need to know about indexing with lots of links.

The Best Windows Utilities

Every year, Scott Hanselman puts out his Developer and Power Tools List and it’s always amazing. He knows all the good stuff. New tools are in green.

Be All the TempDB You Can Be

Aaron Bertrand brings you a list of user suggestions for improvements to SQL Server’s TempDB. This list a great read – it points out some super-sneaky gotchas.

Be Careful With Those Temp Tables

Temp Tables and stored procedures can be a great combination. There are some gotchas worth remembering— Jack Li shows you how altering temp tables in a stored procedure has important side effects to be aware of.

A Cheat Sheet for All the *{_(%#$] PowerShell Punctuation

Jes Shultz Borland brings you a handy PowerShell cheatsheet.

Shorter Transactions Aren’t Always Better

It’s easy to think that keeping individual statements in their own transaction is best for concurrent workloads. Amit Banerjee gives you a simple and compelling demo to show this isn’t always the case.

What Should I Learn Next?

Technology is fascinating because there’s always something new to learn. What should you learn next? Buck Woody gives you a menu of options.

The Past and Future of Famous Logos

Wow, these are right on. And that original Apple logo from 1976? It’s real.

Syntax Errors Ahead;

Do you terminate each of your TSQL statements with a semi-colon? You’ve got a good reason to start.

Extreme Negotiating

How do you negotiate when you are under stress? Here are some practical tips we can all apply.

Can You Peel a Head of Garlic in Less than 10 Seconds?

Yes, you can. Be a garlic genius.

Does the Optimizer Care about SQL Server’s Max Memory?

It cares a lot! This short blog gives interesting insights into memory settings and optimizer costing. Read on!

Break Out of All Those Ands and Ors

Many situations call for dynamic criteria. Instead of writing a giant query, make your queries faster by using dynamic search conditions.

Like These? Subscribe to the Newsletter

Every Monday morning, we send out our favorite links from last week.  We keep you up to date on tech news, SQL Server, and just interesting stuff.  We don’t spam you, and we don’t give your email address to others.  Come join us – subscribe now.

The 9 Letters That Get DBAs Fired

I’m proud to say that I’ve only lost one job in my life.  I was a grocery store bagger when the store came under pressure to increase their percentage of minority workers.  They laid off all of the Caucasian baggers, but I’m not bitter.  I actually loved bagging groceries, and if I hadn’t been laid off, I’d still be the guy asking you “Paper or plastic?”

I probably didn’t deserve to lose that job, but I’ve deserved to lose others.  I’ve just been lucky.  Through my career, I’ve seen a lot of database administrators go through some dicey situations that cost them their jobs.  Maybe not right away – but once these situations happen, it’s only a matter of time until management finds a legally defensible reason to swing the axe.  These situations often involve just nine letters.

The First Three Letters: RPO

Recovery Point Objective (RPO) measures how much data you’d lose if your server went down and you brought up the secondary server.  It’s measured in time, not megabytes.  The less time you have, the more frequently you have to get the data to a secondary location.

In theory, if you’re doing transaction log backups every 5 minutes to a network share, so you’ve got a 5-minute RPO.

In theory, “in theory” and “in practice” are the same.  In practice, they are different.

If you lit up the oxygen tank, the view would actually improve.

The Gremlin of Disaster

In practice, you’re unlucky, and your transaction log backups will fail before the engine itself does.  I know this because it’s happened to me and my clients.  In one case, the volume for backups filled up and the transaction log backups stopped working.  Since the log backups didn’t run, the transaction logs didn’t clear, and they continued to grow.  The Gremlin of Disaster arrived, sowing corruption in his wake.  By the time the dust cleared, we’d lost over an hour of data – permanently.

If you’re backing up to a local drive and the RAID controller fails, check out this helpful article.  You’ll be needing it, because with a failed RAID controller, you can’t access any of the data on the local drives.  Before you try pulling the hard drives out of the failed server and putting them into another server, check out this article.  After all, you’re gambling – gambling that the RAID controller firmware matches between servers and that it’ll be able to read and rebuild the newly inserted drives.  Guess wrong, and it will format the drives, thereby erasing your backups.

Savvy readers will detect a sense of acerbic humor in the post so far – even more biting than my usual pieces.  Brace yourself, because it’s going to get worse.

The Next Three Letters: RTO

Recovery Time Objective is how much time you’ve got to get your server’s act together.  Some companies measure it from the time you learn about the problem, some measure when the service is first unavailable.

Beep beep, back that thing up. Wait, on second thought...

The Pinto of Peril

The clock’s starting point matters a lot.  When things go wrong, SQL Server Agent might just email the failsafe operator – if you’ve gone to the trouble of setting one up and configuring alerting.  Even then, you probably won’t notice the email because you’ll be drunk at the pub having a nutritious dinner with your lovely family.  An hour later, when you realize that your cell phone is set on vibrate, you’ll panic, run to the car, fire up your laptop, and try connecting over your phone’s 1-bar internet connection.  Should we start the clock when the server started failing, when the email was sent to you, when you saw the email, or when you finally connected to the VPN?  If the business wants to start the clock when the server stops, you have to account for the meatbag reaction time, and with most of us, there’s a pretty long delay.  (If you’re the kind of DBA who sets up rules to move automated alerts into a separate email folder, you’re off to a bad start.)

If our idea of disaster protection is restoring the databases onto another server, we might assume that the RTO just needs to cover the length of time to do a full restore.  Not so fast, Speedy McPinto – here’s a few things you’ll need to do to get the server online:

  • Provision enough space to do the restore – do you know how much space you need on each drive?
  • Make sure it’s on the right version/build of SQL Server – sometimes we have to install service packs, cumulative updates, or hotfixes to get production to work right.  There’s no guarantee our standby box has those same binaries.
  • Get the sp_configure settings right – if production had a certain maxdop, for example, it might fail terribly on the new hardware.
  • Get security working correctly – you did script out the logins and passwords, didn’t you?
  • Configure TempDB correctly – I’ve seen people restore production databases onto dev environments where TempDB was just configured with a few gigabytes of storage, and it instantly started growing when handling production load.  TempDB ran out of space, and the server went Pete Tong.
  • Restore the database – and hope to hell it restores correctly the very first time.

The less time you have, the more tasks you have to perform ahead of time, and the more you have to document, test, and rehearse the steps you’re going to perform at the moment of disaster.

The Most Important Three Letters: CYA

Right now, all of your users believe your RPO and RTO is zero.  They believe you’ve got everything under control, and that when disaster strikes, you’re going to instantly bring up another server with no data loss whatsoever.  We both know the truth, though – you spend your time at work checking blogs, not backups, so we’ve got some work to do.

Start with a statement of fact that avoids blame: “Right now, if we lost the production system at 3:45 PM, it would take me at least four hours to bring another copy online, and that copy would have lost all data between 3PM and 3:45PM.  Is that acceptable?”

They’re going to say, “ZOMG, NO WAY!!!1!  IT HAS TO BE BETTER THAN THAT!”

You say, “I thought you’d say that, because I’m uncomfortable with that too.”  (See what we did there?  We just put you on their side.)  “Here are our options to improve our RPO and RTO.”

To give them options, check out my free video on HA/DR Options: Clustering, Log Shipping, and Replication.  I cover the pros and cons of all your high availability and disaster recovery options including database mirroring, SAN replication, virtualization, and more.  I won’t show you how to implement each of those (hey, I’ve only got 30 minutes) but I’ll give you enough information to make a recommendation to your business users.

To learn more about backups in-depth, check out Backup & Recovery Step by Step.

What’s Going On in My SQL Server? Video

What do you turn to when you need to find out what’s happening in SQL Server? Do you know the pros and cons of using SQL Server Profiler? Kendra Little shows you two lightweight alternatives for collecting activity and bottlenecks from SQL Server. She’ll show you how these methods are even *more* powerful than a SQL Trace and how you can use them to solve problems like a hero.

This free webcast is useful for DBAs and Database Developers who have one year of experience with SQL Server.

Want a list of links and references from the video? Check out the bottom of this post!

Links from the Video and More

In the video, I show using powershell to generate activity in SQL Server. The commands I ran are very similar to the ones I blogged about here.

The sp_whoisactive stored procedure is written by Adam Machanic (blog | twitter). In the video, I describe setting up a procedure to loop and log activity to a table. First you want to get comfortable with sp_whoisactive, start looking at the output, and understand what parameters are right for *your* environment. Then you want to set things up so you can loop and store data. Here are the links to get you going:

To log interval samples from sys.dm_os_wait_stats, I excerpted the list of waits to exclude from Glenn Berry’s (blog | twitter) published SQL Server DMV queries.

  • Glenn maintains versions of the queries for different versions of SQL Server. Find the right one for you using our shortlink http://BrentOzar.com/go/Glenn
  • I will be blogging the sample code I used in the video shortly. I have enough to say about how you may choose to adapt the code that it is worthy of its own post. Check back soon.

Thanks to everyone for watching!

Silent CPU Killers

It’s always important to make sure that SQL Server is running at peak efficiency – nobody wants to waste resources. In the past we’ve concentrated on ways to reduce I/O – either network I/O or disk I/O. The increasing number of cores available in modern CPUs meant that CPU tuning wasn’t as much of a concern. With the licensing changes coming in SQL Server 2012, tuning SQL Server for CPU should be in front of every DBA and developer’s mind.

Old or Missing Statistics

Statistics help SQL Server determine the best way to access data. When we have accurate statistics, the database is able to examine the statistics and make good guesses about how much data will be returned by a given query. The statistics are then used to determine the most appropriate join or data access method (seek vs scan, for instance). The problem with statistics is that they’re only good for a short period of time. Statistics are recorded as a histogram that describes how data is distributed in a table or index. If the histogram is inaccurate, then SQL Server has no reliable way of knowing how much data is present.

Looking at the AdventureWorks sample database, we can see there’s an index on the SalesOrderHeader table IX_SalesOrderHeader_OrderDate. If we use DBCC SHOW_STATISTICS to look at the index, we can see that the most recent valued for RANGE_HI_KEY is ‘2008-07-31 00:00:00’. What happens when we query for data that’s outside of that range? SQL Server assumes that there’s only 1 row present no matter how many rows there are. There could be 0 rows, 5 rows, or 500,000 rows. SQL Server doesn’t have any statistics about the data, so it cannot make an accurate decision.

Old, or missing, statistics can be cause huge performance problems because SQL Server will have to make a bad guess (1 row) regardless of how much data is present. SQL Server may choose an inappropriate join type and perform a large number of additional reads; a nested loop join may be used when a merge or hash join would be more appropriate.

The solution is to make sure that statistics are kept up to date.

  1. Understand data change rates. If data is loaded as part of regular jobs, make sure that index maintenance is performed on a regular basis and that statistics are updated during index maintenance.
  2. Understand the type of data change. If data is written at the end of table, there’s a chance its data won’t be included in statistics (especially in the case of something like an order date). If a lot of data coming into the system is time sensitive, regular statistics updates will be necessary to keep things performing well.
  3. Make sure that AUTO_UPDATE_STATISTICS is enabled for all databases. Even if your jobs to update statistics fail to run, having AUTO_UPDATE_STATISTICS enabled will make sure that SQL Server updates your data as it changes. Statistics will not get updated as often as we’d like, but it is better than nothing.

Ordering Data

The ORDER BY clause exists with good reason, right? Users don’t just need to get data back from the database, they want to get it back in a specific order. While that’s true, ordering data has a definite cost associated with it, if the data isn’t already in the correct order. Take a look at this query:

SELECT SalesOrderID, OrderDate, DueDate, ShipDate, SalesOrderNumber, PurchaseOrderNumber FROM Sales.SalesOrderHeader WHERE OrderDate = '2007-02-01 00:00:00.000' ORDER BY SalesOrderNumber ; 

When we look at the actual execution plan, sorting the results take up 79% of the cost of the query. While this is a simple query, it’s easy to see how sorting can chew up a lot of CPU in SQL Server.

There’s no easy solution. However, many programming languages have the ability to use functional programming techniques to sort data in memory. The .NET framework makes it even easier by providing datatypes like the DataTable and DataSet. Both of these datatypes make it easy to sort data in memory in an application server. The idea here is that CPU is cheap on a Windows Server – we only pay to license the server, not each socket or CPU core. It is also much easier to horizontally scale the middle tier of an application than it is to horizontally scale SQL Server.

Functions

Functions are one of the biggest offenders when it comes to wasting SQL Server’s CPU time. The tricky part about functions is that they seem like they an easy way to encapsulate and re-use functionality. While functions accomplish this goal, there are other aspects of functions that make them a bad choice for many activities.

Simple, single statement table valued functions can be easily inlined by SQL Server. That is to say SQL Server will be able to include the execution plan of the function in the calling execution plan. The function’s execution plan can be optimized with the rest of the query. This can potentially lead to additional optimizations within the function. Multi-statement table valued functions and scalar functions won’t be optimized much of the time. Instead SQL Server will repeatedly execute the underlying function – once for every row in the result set. Over very large result sets this can result in a considerable amount of CPU utilization. IT’s often best to replace functions like this with inlined query logic. For a look at the behavior of functions in SQL server, take a gander at SQL Server Functions: The Basics.

Another downside to functions in SQL Server is that they frequently don’t get good statistics from the optimizer. When a table valued function returns data, SQL Server has no way to know (at compile time) how much data will be returned from the function. Instead it makes the assumption that there is only 1 row present in the result set from the TVF. This can result in poor join choice as well as bad data lookup decisions, just as if there were incorrect or missing statistics on an index or table.

Summing It Up

Keeping statistics up to date, moving ordering out of the database, and eliminating costly functions will go a long way to keeping your CPU utilization low. There are, of course, many other ways to combat high CPU, but these are some of the biggest culprits and some of the items that can be most easily fixed by database administrators and developers alike.

sp_Blitz® Updated to v5

Happy holidays! Santa SQL comes bringing tidings of cheer with a new sp_Blitz® – but also doom and gloom about configuration problems in your servers.  Here’s some of the recent contributions from readers:

  • John Miner suggested tweaking checkid 48 and 56, the untrusted constraints and keys, to look for is_not_for_replication = 0 too.  This filters out constraints/keys that are only used for replication and don’t need to be trusted.
  • Ned Otter caught a bug in the URL for check 7, startup stored procs.
  • Scott (Anon) recommended using SUSER_SNAME(0x01) instead of ‘sa’ when checking for job ownership, database ownership, etc.
  • Martin Schmidt caught a bug in checkid 1 and contributed code to catch databases that had never been backed up.
  • Added parameter for @CheckProcedureCache.  When set to 0, we skip the checks that are typically the slowest on servers with lots of memory.  I’m defaulting this to 0 so more users can get results back faster.
  • Andreas Schubert caught a typo in the explanations for checks 15-17.
  • K. Brian Kelley added checkid 57 for SQL Agent jobs set to start automatically on startup.
  • Added parameter for @CheckUserDatabaseObjects.  When set to 0, we skip the checks that are typically the slowest on large servers, the user database schema checks for things like triggers, hypothetical indexes, untrusted constraints, etc.
  • David Tolbert caught a bug in checkid 2.  If some backups had failed or been aborted, we raised a false alarm about no transaction log backups.
  • Fixed more bugs in checking for SQL Server 2005. (I need more 2005 VMs!)
  • Ali Razeghi added checkid 55 looking for databases owned by <> SA.
  • Fixed bugs in checking for SQL Server 2005 (leading % signs)

Whew!  And there’s more to come – I’ve got another half-dozen improvements queued up that also require new web pages, so those will take a little while longer.

If you’ve already downloaded sp_Blitz®, you can run master.dbo.sp_BlitzUpdate to fetch the latest version from The Clouds.  Enjoy!

Consulting Lines: SQL Server Needs a Dog

SQL Server isn’t lonely, but it needs a dog.  In today’s Consulting Lines series post, I’ll share my very favorite consulting line of all – one that makes people laugh while illustrating a point.

The Situation: Bob the Button Basher

My Blitz script helps me quickly spot SQL Server settings that have been changed from the default values.  I always work with the client’s staff in the room, and when I see a changed setting, I turn to them and ask:

Me: “Hey, Bob, I notice that max degree of parallelism has been set to 1.  The default is zero.  Can you tell me why that was set to 1?”  (The key here is to not say anything whatsoever about what the setting does – let Bob explain it to you.)

Bob: “Yeah, I read a blog saying that all OLTP servers should have maxdop set to 1.”

Me: “Ah, yeah, I’ve read those too.  Did we measure the before and after impact of the change?”

Bob: “No, it just seemed like a good setting to change.”

Me: “Gotcha.  Funny thing there – SQL Server is like an airplane.  Modern airplanes are so advanced that there’s only two things in the cockpit: there’s the pilot, and there’s the dog.  It’s the pilot’s job to feed the dog, and it’s the dog’s job to bite the pilot if he touches anything.”

Bob: “HA HA HO HO”

Me: “The problem with SQL Server is that it doesn’t have a dog.”

Bob: “Wait, what?”

Me: “SQL Server has a lot of knobs in the cockpit, and there’s a lot of advice out on the web that tells us to push all kinds of buttons to go faster.  Thing is, SQL Server ships out of the box with a pretty good set of defaults.  Let’s use those defaults until we have a really good reason to change something, and when we do, we need to document what we change so we know how it improved performance or reliability.”

What That Line Does

SQL Server really does have a lot of knobs, but every setting – every single one of them, including the defaults – has drawbacks.  I salute Microsoft for doing a really good job of weighing the pros and cons and coming up with a solid set of defaults.

In my line of work, I flip a lot of knobs, but for every one I touch, I try to explain thoroughly what I’m doing.  I may casually say, “This server needs Optimize for Ad Hoc Workloads enabled,” but when I do it, I also explain what I’m seeing about that particular server’s workload that justifies the change.

Jonathan Kehayias (Blog@SQLPoolBoy) recently wrote an excellent article about Lock Pages in Memory, one of those settings people just love playing with.  Before he posted the article, we had a great discussion about why we have differing opinions on that recommendation.  I understand why he recommends using it as a default, but the reason I don’t recommend it is illustrated beautifully by the fourth comment on the post:

“With LPIM, SQLOS will back down its usage to your ‘min server memory’ value. At that point Windows can try and hard trim the other processes, and potentially page the process requiring the 30GB memory out, or if it can’t respond efficiently this way, you may face a crash due to Out Of Memory for the OS.” – Jonathan

Ouch.  I’m not a big fan of giving a default recommendation that might crash the OS.  There’s also a bug in SQL2008R2 that causes corruption with LPIM.  I understand why Microsoft doesn’t enable Lock Pages in Memory by default, and I agree with them – it’s just not a safe button to push if you aren’t intimately familiar with the consequences and how to monitor for them.

This consulting line helps establish whether the other person understands the real ramifications of pushing buttons.  If they’re blindly bashing buttons based on blogs, it helps them understand the risks.

What Happens Next

The success of this line depends on your ability to dive deep and explain the concepts.  You have to be able to explain the pros and cons of both settings – the default setting and the one they chose.  Jonathan’s post on LPIM is a great example – if you can be comfortable explaining a topic to that level of detail, then you should feel comfortable flipping that switch.  (If, on the other hand, you’re not even comfortable reading a post with that level of detail, then you shouldn’t be flipping that switch.)

Bob: “So should we leave maxdop set to 1, or go back to the default?  We have no CXPACKET waits, so it must be working.”

Me: “Well, that’s true – no queries are going parallel. But parallelism isn’t always a bad thing – sometimes we want to throw lots of CPU power at queries.  Right now, we’ve got queries that aren’t running as fast as they could because they’re not getting the CPU they need.”

Bob: “But the server isn’t CPU-bound – CPU % is usually under 10%.”

Me: “That’s because you’re measuring CPU use as a total, not per thread.  We’ve only got a few active queries at a time, but we’ve got a four-socket server with 10 cores per processor, and hyperthreading is turned on.  Our CPUs are sitting around bored.  If we let some queries go parallel, we’ll be able to use those CPUs more effectively.  Based on what I’m seeing in your procedure cache, I recommend setting maxdop to 4 and cost threshold for parallelism to 100.  Here’s the evidence that supports it, and here’s how we’ll measure the success of that change…”

More of My Favorite Consulting Lines

Perk Up Your Career with the SQL Server Troubleshooting Checklist

When an application is offline or performing so badly that the users are complaining, what do you turn to?

When your cellphone says a major incident is ongoing and you can’t get to a computer, where do you send the team at the office?

When you’re putting out fires with whatever you can find, how do you record what steps you’ve taken and what information you’ve gathered?

You’re Only As Strong As Your Team

One major reason to put together a troubleshooting checklist is simple: you can’t always be working.

SQL Server Troubleshooting Checklist

What Would Brent Ozar PLF Do? (PDF)

If you’re the only person who knows how to use SQL Server in your office, you need to train someone. They don’t need to be an expert, they just need to be able to run through your checklist successfully and gather information. Find someone sensible, practical, and with a steady hand who you can practice the steps with, and tailor the checklist so it makes sense to your secondary.

If you work with a large team of SQL Server experts, you need a troubleshooting checklist just as much. As we all gain experience we all develop different ways of doing things. We each focus on different things and may interpret things differently.

The troubleshooting checklist gives your team consistency: it gives everyone a base process to make sure the major areas are covered.

You’re Only As Smart As Your Documentation

The other major reason you want a troubleshooting checklist is that all the rules change when things get really bad.

When a company is losing money, it’s hard to do things in a logical fashion. You’ve got four people at your desk asking all manner of questions from “we just bounced the Heffalamps services, is it OK now?” to “are the Wuzzles impacted by this problem?” Your boss’ boss keeps poking their head around your monitor saying, “I’m just checking in.” You’ve got 500 emails from your monitoring.

When this happens, you remember about 75% of all the basic stuff to check. It’s incredibly easy to miss some obvious things, though. After all, you’ve needed to go for the bathroom for about forty five minutes and still can’t leave your desk.

The troubleshooting checklist helps here in three ways:

  • You always hit a consistent list of things that are important to your business.
  • If someone misses a step when following up on a problem, you have a place to add a step to ensure the mistake doesn’t happen again. The troubleshooting checklist gives you a way to correct human error— that’s the secret to de-personalizing an embarrassing mistake and instead showing you’re a professional who follows up on errors and is in control of their process.
  • You have a place to record information which others can read. This helps you clear out that crowd from behind your desk! Save the checklist in a place where they can read it, and let them know they can see updates on your progress if they let you get going.

To Do: Give Our Checklist To Your Manager

There’s one person who really wants you to have a troubleshooting checklist, but they think of it in slightly different terms. They think of this as an ‘Incident Management Response Process’.

Your manager would love to have predictable response to problems. This helps them understand and explain to others what value you add to the company. It also helps them understand and justify having someone who can be your backup when you’re not available. It helps your manager show that you’re working to have an organized production environment with defined processes for keeping applications available.

Here’s how to handle this. Download our SQL Server Troubleshooting Checklist and give it a read. Think about a couple of things you’d customize for your environment.

Then show the checklist to your manager and say, “I think having a basic process like this would be helpful for our team. I’d like to lead a project customizing it for our applications. What do you think?”

According to Penelope Trunk, the path to promotion is shortest in December.

Now there’s a holiday score.

Check Out the Troubleshooting Checklist

Download the free SQL Server Troubleshooting checklist (PDF).

Coping with Change Control Video

Managing change is part of a DBA’s job. If you aren’t managing database changes effectively, though, disaster can strike. Jeremiah Peschka has worked with good and bad change management practices and even brought some awful ones under control. In this session, he introduces you to tools, tips, and techniques that will help you effectively handle changes in the database. If you need to deploy changes to the database, this session is for you.

This is a three pronged attack covering a set of techniques for gauging the scope of change, tracking code and configuration changes, and coping with the inevitable 3:00AM production hot fix. This is aimed at development teams and DBAs who are deploying both configuration and code changes to production servers.

Change Control Scripts & Resources

Coping With Change Control T-SQL Scripts

 

css.php