Blog

The Easiest Way to Restore Transaction Logs to A Point In Time (Spoiler: Use AmazonRDS)

… is to use a SQL Server where a robot does it for you!

Let’s take a look at how you can do this in AmazonRDS’s hosted SQL Server these days.

 Normally, restoring transaction logs is super tedious.

You’ve got to restore all the files in the right order, for every single database. Even if you script it, if you’ve got a bunch of databases you need to restore, it’s a giant pain.  AmazonRDS has essentially done all the scripting for you.

Amazon RDS makes sure it can always restore your SQL Server to a point in time

When you set up an RDS SQL Server instance, it asks you how long you’d like to keep backups, and if you have a preferred full backup window. And that’s all it asks. Transaction log backups are taken every 5 minutes to support point in time restores, no matter what you want.

Amazon RDS Instance Backup Settings

So I wondered, what happens if someone creates a database in the wrong recovery model by accident? Does this break the transaction log backups?

Nope! I created a database named DB1_CreatedInSIMPLE using the SIMPLE recovery model. Check out what showed up in the RDS logs right afterward:

RDS Recovery Model Set Back to Full

Alrighty then, we’ll do it your way

How Recently Can I Restore?

You can see log backups in the “Recent Events & Logs” above, or you can just look at the latest restore time for your instance. Since transaction log backups are every five minutes, you can’t always restore to one minute ago.

RDS Last Restore Time on Instance

OK, Let’s Restore This Thing! How Easy Is it?

I wasn’t lying when I said it was easy.  You just highlight your instance in the console and select “Restore to Point in Time” under instance actions.

Restore RDS Instance to Point in Time

Restore RDS Instance to Point in Time

This is going to create a whole new instance using the backups they’ve been taking. You get to pick the time you want to restore from. This will vary depending on how long the instance has existed, and what your backup window is set at. (Mine’s at 7 days, it can go up to 35 days.)

Restore RDS Instance to Point in Time- set time

You also get to pick the instance name you’re going to restore to, along with all sorts of properties and security settings:

Restore RDS Instance to Point in Time- set new instance name

Setting up the restored instance isn’t immediate. RDS has to set up a new Windows instance and apply a whole lot of your backups, and that takes time.

Restore may take a bit of time

After you get all your settings done, you click Launch to get your new RDS instance in motion to be created:

Restoring is launching a new instance

Our instance is Cooking

The instance goes through multiple phases. It automatically backs up the instance as part of the process if you’ve got backups enabled, just like any other instance.

New restored instance is creating

Once that completes, I can connect

Here I am, connected to my original and restored instances via SSMS on my desktop!

Connected to both RDS instances via SSMS

SQL Server in RDS Isn’t Perfect, But It Does a Lot For You

I am not a founding member of Team Cloud. But I meet a lot of people who struggle making sure that backups are being taken properly and that they can restore. They’re frequently teams of developers who don’t want to learn to deal with backups and restore. They also want things like database mirroring without having to learn it, and RDS handles that for them, too.

This isn’t for everyone. RDS limits the number of databases you can have on an instance. It doesn’t do all maintenance for you — you have to set up your own CHECKDB jobs, for instance, which you can do in the SQL Server Agent. You don’t get tools like Database Mail inside the SQL Server, or the ability to see the SQL Server Error log via SSMS (don’t worry, you can get to that and other logs in the AWS Console). You can’t use every version of SQL Server out there (2008R2 and 2012 only, currently). You can’t do Availability Groups in Amazon RDS, either, just database mirroring. (I’m a big fan of mirroring, so I’m not all that sad about that.)

Putting it All Together

Database as a service is a pretty compelling option these days. It doesn’t take away the role of the DBA, and it doesn’t handle everything for you — but the flip side of that is that you get a lot of flexibility and customization.

And dang, I love the ease of restoring that instance to a point in time.

Brent says: what I love about the cloud is that it takes away the parts of the DBA job that I never really liked in the first place.

Jeremiah says: I’m with Brent – I love getting rid of the parts of the job I never liked. Cloud lets me do that. And RDS lets me ease up management on a certain class of servers even more

Erik says:  I, for one, welcome our new Robot DBA overlords.

Database Connection Hazards with Entity Framework

I recently came across a curious case where a SQL Server was suffering a number of long-running queries coming from an application written in Entity Framework. When I measured the average query execution times, I got some unexpected results:

  • CPU time: 12 milliseconds
  • Reads: 273
  • Elapsed time: 6800 milliseconds

Wait, what?

Looking at the wait stats for these queries, I saw there was a lot of ASYNC_NETWORK_IO — often 1000+ milliseconds. That didn’t make any sense either! How can a query with so little CPU time and so few reads take so long to complete? It’s not like the application was asking for millions of rows and couldn’t consume the results fast enough.

I ran the query myself in Management Studio:

  • CPU time: 17 milliseconds
  • Reads: 273
  • Elapsed time: 155 milliseconds

These numbers made a lot more sense, and confirmed that the application was to blame. But what exactly was the application doing for 6645 milliseconds?

Entity Framework will try to be clever about connections

After some research, I found that Entity Framework can be sneaky about how it manages database connections. An article about EF connection management on MSDN proved both enlightening and slightly terrifying:

Entity Framework Connection Management

Entity Framework will handle database connections automatically by default. Note two things here: EF will open the connection if you specify any LINQ or ObjectQuery method, and that connection won’t be closed until the ObjectResult has been completely consumed or disposed.

In this case, the EF code invoked the Where method, then went on to do a number of foreach loops, assuming the connection had been closed when it hadn’t. SQL Server was left holding the phone until EF said good-bye and hung up. The solution here was to open the connection, do as little as necessary, and then close the connection.

UPDATE: EF MVP Julie Lerman (@julielerman on Twitter) mentions in the comments below that the MSDN documentation is outright wrong about some methods opening a connection, and has passed this on to Microsoft so they can correct the error.

This would be the end of the connection management story, except…

Entity Framework would like to announce it will no longer be clever about connections (when you open them yourself)

Another MSDN article about EF connection management points out changes to Entity Framework 6 and later:

EF6+ Connection Management

 

Again, the seemingly innocuous and trivial “Note” is anything but. It’s a reasonable assumption on Entity Framework’s part; if you’re going to open it yourself, you’ll be the one who closes it. Still, it means we have to be careful with Entity Framework code when it comes to database connections. And now, depending on the Entity Framework version, we’ll see one of two different connection management behaviors.

How to spot Entity Framework keeping connections open

The tell-tale signs, as we discovered in this case are:

  • When running the query from the application
    • Relatively low CPU time but high elapsed time when running the query from the application.
    • ASYNC_NETWORK_IO waits for the query
  • When running the query from SQL Server Management Studio
    • Relatively similar CPU time and elapsed time when running the query from Management Studio.
  • Significant amounts of application code that execute in between the connection open event and close event. To prove the connection is left waiting during the open and close events, step through the code in a debugger and pause before the connection is closed. You should see the query racking up ASYNC_NETWORK_IO waits. (Remember, the events that open and close the connection may not be explicitly doing so.)

IT’S NOT A SQL SERVER PROBLEM. IT’S An entity framework problem.

Entity Framework is great for developers who don’t have the spare time or motivation to learn SQL Server querying, but that convenience comes with costs. One of those costs is keeping a suspicious eye on how it manages database connections. It’s tempting to look at a long-running query and blame SQL Server for being slow. However, if this happens to you and your Entity Framework-based application, it’s worth investigating further to see who’s leaving whom waiting.

Brent says: don’t be ashamed of using EF, either! I’m all about getting to market as quickly as possible. After all, if you don’t get your app out there, you don’t get paid, and DBAs need to get paid too.

Pull Request 101 for DBAs Using GitHub

I’ve worked with source control quite a bit over the years — everything from ye olde Visual Source Safe to Subversion to TFS. I even supported the Source Depot system at Microsoft as an engineer for a year back in the day!

These days I don’t use source control a ton. We keep repositories of scripts in GitHub to help manage our scripts as a team, but I don’t really use it often enough to learn the command line well.

Even so, I’ve learned the basics of branching and merging. Git can be really confusing because there’s a ton of features, but here’s an outline of how you can work with a team in an uber-simple branching scenario safely.

When I Say 101, I Mean 101

We’re talking GUI screenshots and simple concepts, here. I’m not talking about how to do source control for schema and diffs, I’m just talking about how to save all those scripts you wrote about checking for that weird thing that happens in production every third Tuesday.

Onward!

1. Create a Branch

Branches are great, because they keep you safe. They give you a sandbox to work on your code where you don’t disrupt the main line of code. You can work in iterations and make multiple commits to your code, so you can do / undo incremental changes.

Creating a branch in the GitHub client for Windows

Creating a branch in the GitHub client for Windows

2. Check in Your Code (Commit to your branch)

You can make changes to multiple files. For this example, I made changes to just one file. Make sure that you’re committing to the new branch you created!

Review and commit the change you saved in the GitHub for Windows Client

Review and commit the change you saved in the GitHub for Windows Client

3. Create a Pull Request (to merge into another Branch)

Once all your commits are done and you’ve tested your code and you’re ready to roll, create a pull request. You’re essentially proposing your change for someone else to review and merge to a destination branch of your choice.

The big advantage to pull requests is that it makes it simple for your reviewer to see exactly the changes you’ve made and know clearly what will be impacted by merging.

Create Pull Request in GitHub for Windows Client

Create Pull Request in GitHub for Windows Client

Your pull request will automatically send out an email and notify other users of your repo that you’re ready for things to happen, and it’s time to review.

My pull request is ready for action!

My pull request is ready for action!

4. Reviewing the Pull Request (Someone else does this!)

Now it’s time for someone else to be responsible. The good news is, your pull request makes it easier for them to review your code. Here’s what it looks like in the GitHub web application:

Pull Request Summary

Pull Request Summary

To get into the details, it’s super easy to click on the “Commits” tab and see all the changes. The reviewer can easy open all the files, leave comments, and even edit changes if they really wanna get up in your business. Or they can push it back to you.

Reviewing changes on the commit tab

Reviewing changes on the commit tab

5. When the Reviewer is Happy, they Confirm the Merge

This pushed the change down to the branch you targeted, nice and easy.

I'm reviewing my own pull request. Quite shameful, really.

I’m reviewing my own pull request. So tacky!!!!

6. Delete the Branch

Right after merging this, it’s very easy to delete the branch and cleanup. You can also leave the branch there if you want, and there’ll be a history of what happened with the pull request. No more wondering, “what happened with this branch?”

Cleanup couldn't be easier

Cleanup couldn’t be easier

It’s so easy, even a manager can do it

You don’t have to be sloppy and save all your scripts in a file share!

Jeremiah says: pull requests make collaborative development easier – you can bundle up related changes, comment on why you did what you did, and collaborate with your team to get the best fix in place.

Doug says: This also answers a question we get frequently: “What can I use for source control for SQL Server?” It’s not integrated with SQL Server, but it’s free and not too shabby.

Three Easy Tweaks to Tune Up Your SQL Server

I’ve been doing SQL Critical Care® work with clients for over a year now. It’s absolutely true that every client’s situation is different and needs special attention. However, I’ve found that there’s a short list of issues that nearly all clients have in common. Drawing from that list, here are the top three high-impact, low-effort areas you can work on today to tune up your SQL Server.

Check power savings everywhere

I recently had a client whose CPU would peg at 100% for short stretches even though the server was under a light load. After some exploration, we found they had power savings turned on. In the BIOS. Because we couldn’t get to the BIOS remotely (and we didn’t want to reboot the server right then and there), we used a free tool called CPU-Z to watch core speeds. Can you spot the difference in speeds?

Specification Core Speed - Actual Core Speed = Sadness Factor

Specification Core Speed – Actual Core Speed = Sadness Factor

That’s a whole lot of CPU speed you paid for and aren’t getting! When checking power savings, make sure you check all of the following:

  • Windows Power Options (under Control Panel)
  • If a physical server, also check the BIOS at startup
  • If a virtual server, check:
    • hypervisor power settings/performance plan
    • BIOS at startup of the hypervisor host server

One other thing: make sure you don’t have outdated BIOS firmware. That can have some nasty CPU consequences too.

Changes to the BIOS require a server restart, so plan accordingly.

How to tell if it worked: Using CPU-Z, watch the Core Speed box at lower left. If it deviates from the core speed in the Specification box by more than approximately 1%, there’s probably power savings turned on somewhere.

address hot missing index requests

Every time a query runs and wants an index that doesn’t exist, SQL Server files that missing index request away. You need to know which missing indexes are being requested and how helpful that index will be. There are DMVs you can query for this information, but my favorite method is sp_BlitzIndex®. It will tell you the missing index definition, as well as three numbers:

  • Impact – the difference this index is expected to make in query cost.
  • Avg. Query Cost – the price in magical Microsoft Query Bucks as determined by SQL Server.
  • Uses – the number of times this index would have been used.

The product of these three numbers (Impact x Cost x Uses) equals the Estimated Benefit.

Because these numbers get reset with every service restart, we need to factor in uptime as well. If you’ve been up for seven days or more with your average user load during that time:

  • Est. Benefit > 1,000,000: Keep an eye on this index.
  • Est. Benefit > 10,000,000: Try it out in dev environment and see how it does.
  • Est. Benefit > 100,000,000: Try it out in a dev environment — TODAY.

I can’t recommend outright that you deploy these missing indexes to production, just that you test them. One reason: it may be more efficient to alter a very similar existing index rather than add the new one. You’ll have to decide what’s best, but don’t let the 100M+ indexes go unnoticed.

And in case you’re wondering, the record high I’ve seen for the Estimated Benefit number is just over 14,000,000,000 (billion). Let’s hope you can’t beat that. (You don’t want to.)

How to tell if it worked: This missing index recommendation should go away in sp_BlitzIndex® results and the query running it should decrease in CPU time and cost.

Raise cost threshold for parallelism

Is your server still humming "Mambo #5"

Is your server still humming “Mambo #5″?

SQL Server’s default Cost Threshold for Parallelism (CTP) is 5. Microsoft has left this setting untouched since the 20th century.

(Don’t laugh — that was at least sixteen years ago.)

Face it — pretty much everything that old needs an update. Your hairstyle. Your clothes. Your car. Your server’s CTP is no different. CPUs are way more powerful today than they were in the late 90’s. With CTP set to 5, chances are a lot of queries are going parallel when they don’t have to. Raise the bar to 25 or even 50 (as always, test this in dev first). Unlike licensing additional cores, changing CTP settings is absolutely free.

How to tell if it worked: You should see a drop in CXPACKET waits, along with some query plans no longer showing parallelism.

There you have it: three simple and quick checks to speed up your SQL Server. For more ideas, try Five SQL Server Settings to Change.

Brent says: Listen, people, this is free advice. Don’t pay us to tell you this. Just do it today. Then pay us anyway. Actually, ignore this post.

Replication Won’t Refresh Your Dev and Pre-Production Environments

reflection-photo-1415200358018-bb07fced3660

It’s almost like the real thing, except it’s totally different.

At first glance, SQL Server’s transactional replication seems like it’s useful for moving data around in all sorts of situations: it works in Standard Edition, it’s not synchronous, and you can have multiple subscribers.

Why People Want Replication to Test and Pre-Production Environments

Setting up a test environment sounds simple at first. But it’s pretty tricky. Here are typical requirements:

  1. Data in the test environment needs to be fresh, recent production data to support query tuning
  2. Data and schema in the test environment need to be writable for testing purposes
  3. Data and schema in the test environment need to be periodically reset back to production-like configuration

The first of those requirements are why replication sounds like a great idea. There’s a lot of gotchas, even with that issue, but the second two requirements are where replication just clearly doesn’t meet the bar.

Replication isn’t Friendly to Changing Subscriber Tables

Transactional replication won’t like it if you go around changing data and schema on its subscriber tables. If you change the data, updates and deletes flowing through replication may start to fail. Well, let’s just be realistic– they WILL fail, and fixing it will be manual and a major pain point.

Schema changes are even dicier. You can’t just drop or change the data type in a replicated column, you need to modify replication for that– and that means adding filters to the publication.

Resetting Replication isn’t Fun, Either

Re-initializing a replication subscriber is often very impactful on the publisher: running a replication snapshot is an intensive, time consuming process that causes CPU, IO, and lock overhead on the publisher. (Warning: this snapshot is a totally different thing from the database snapshot I talk about a bit later. This one basically copies out all the contents of the published articles on the publisher into flat files, copies everything, and then loads it into the subscriber.)

You can initialize a replication subscriber from a full backup, but that’s not lightweight or fast for large databases, and it’s very dependent on great custom scripting and timing. And usually people are looking at replication because they want to avoid big backups and restores.

What About a Creative Solution Involving Replication?

Refreshing test and staging environments is a tough problem. So tough that people often try to get creative, like this:

  • ProductionDB in Denver is a transactional replication publisher on PROD01
  • SubscriberDB in Portland is a transactional replication subscriber on STG01
  • Periodically, some process is run against subscriber DB to reset other databases on STG01. Ideas usually involve database snapshots, backups, and custom scripts.

But when you dig into the details, this is never great. Here’s why.

Transactional Replication Impacts the Publisher, Your Developers, and Your DBAs

If there’s one thing that most DBAs and developers can agree on, it’s that replication has made them work late and curse a lot.

Replication requires that every table have a primary key. This may mean schema changes just to get it set up.

Replication slows down your software deployment, because it’s easy to mess up publishing tables when you make schema changes to them. The only way to not mess this up is to also implement replication in your development environment, which developers hate. And in our scenario, where the test database is the subscriber, what would that even mean? Recursive replication torture?

Replication impacts performance on the publisher, particularly under high load. It frequently requires tuning storage, setting up a scale out distributor ($$$ hardware and licensing $$$), fine tuning your publisher and subscriber properties, implementing custom monitoring, and basically having at least one team member obtain a Masters Degree in Replication at the School of Late Nights. I went to that school a long time ago, and I didn’t keep the yearbook, but I did write about it here.

Database Snapshots are Read-Only

There’s no such thing as a writable database snapshot in SQL Server, even though it’s an Enterprise Edition feature. (Fancy SANs have options for that, SQL Server itself does not.) That pretty much takes most of the uses out of it for a staging environment.

If You’re Writing a Custom Script in the Staging Environment, Why Use Replication?

If you have to basically write a custom ETL to read from the subscriber to refresh staging databases, transaction log shipping is much easier to manage than replication, and it allows a read only standby mode on the subscriber.

If You’re Using Backup and Restore, Replication is Also Overkill

While it’s technically possible to back up a replication subscriber and restore it, then remove replication from it, it’s not all that great.  This is a lot of complexity without a lot of payoff.

  • You’ve still got the time to do the backup and restore (which you were trying to avoid in the first place)
  • You haven’t validated that the database you’ve restored has a schema that matches production (it could be very different on the replication subscriber)

You’re much better off basing the restore off production in this case. Even if the pre-production and testing databases are in a different datacenter, you can optimize the types of backups used, compression and network copy time, and restore schedule. Even setting up multiple test instances that restore “in rounds” and which can be used at different times a day is often operationally preferable to the performance, monitoring, and caretaking needs of replication.

There’s No Easy Answer to this Problem

Refreshing staging and development environments doesn’t have a single right answer. This is done many different ways based on application requirements. Just don’t pin all your hopes on replication and start investing in it without looking at the details of exactly how it will work — because it’s probably not what you think it is at first. Always make sure you factor in:

  • Performance impact on the publisher and hardware/storage/licensing costs to mitigate this
  • Any restrictions or requirements on schema changes at the publisher and impact on code development
  • Operational cost of upkeep of the technology

Don’t give up! Having staging and development environments with recent data that works well for you is totally possible, you just need to consider more technologies than replication.

Need Help?

We’ve got lots of resources, for money and for love. Or just for free.

Brent says: Yeah, replication isn’t what I call “refreshing.”

Why Zero Data Loss in SQL Server is Harder Than You Think

But Kendra, it can’t be that hard… after all, we have synchronous modes in Database Mirroring and Availability Groups, right?

Synchronous Commit doesn’t mean “zero data loss”

When we think about limiting data loss, the first thing we think of is a technology that lets us reduce points of failure. If every transaction must be written to two separate storage systems, we have a pretty good chance to have no data loss, right?

Maybe.

Let’s say you’re using a SQL Server Availability Group in SQL Server 2014 with a synchronous replica to do this. The secondary replica fails and is offline, but you don’t lose quorum. If you want 0 data loss, the primary needs to stop accepting writes immediately, right?

It doesn’t do that. The primary replica keeps going and writes can continue. Here’s what that looks like:

Synchronous AG Exposed

Tabloid headline: Synchronous AG Running Exposed! Scandal!

You could write custom scripts to detect the situation and stop the primary replica, but there’s a couple of problems with that. First, you’re offline, and you probably don’t want that. And second, it’s going to take some time to get that done, and that means that you don’t have zero data loss– you could lose anything written in the meanwhile. You could add another synchronous commit replica, but there’s obvious cost and support impacts, and you still aren’t guaranteed zero data loss.

Synchronous writes don’t necessarily guarantee zero data loss, you’ve got to dig into the details.

This stuff isn’t obvious

I’ll be straight up: I’ve been working with high availability and disaster recovery for a long time, and I hadn’t actually thought very critically about this until a recent chat room conversation with Brent discussing why it’s not super easy for cloud hosting providers to offer zero data loss in return for a lot of dollar signs.

Crazy facts: you can learn things from chat rooms and from the cloud. Who knew?

NEED TO PROTECT YOUR DATA? YOU NEED A FULL TIME EMPLOYEE WHO IS RESPONSIBLE FOR THAT.

If data loss is important to you, don’t just assume that you’ve got it under control because you’re paying a vendor to take care of it. If you look closely, you may find that nothing’s working like you think! When your data is important, you need to make someone responsible for ensuring that you’re meeting your RPO and RTO, and have them prove that it works on a scheduled basis. Their job title doesn’t have to be “Database Administrator,” but they need to work for you, and they need to take their responsibility seriously.

Want to Learn More About High Availability and Disaster Recovery?

We just launched our new DBA’s Guide to SQL Server High Availability and Disaster Recovery – a 6-hour online video course that teaches you about clustering, AlwaysOn AGs, quorum, database mirroring, log shipping, and more.

New Quiz! SQL Server High Availability and Disaster Recovery

HADRQuizThink you know everything there is to know about high availability and disaster recovery?

Head on over to our new free quiz to find out! Take the SQL Server High Availability and Disaster Recovery Quiz.

How Do You Manage DBAs? Part 1: Measuring Backups

When the DBA reports to you, ask yourself two questions:

  1. “What would I fire the DBA for?”
  2. “How can I prove that they’re not doing that?”

There’s a lot of possible answers to #1, many of which involve cocaine and the office copier, but let’s focus on one I hear a lot: an inability to restore data that they claimed was backed up.

Backups are the foundation of our Hierarchy of Database Needs. They’re incredibly important, but yet they often get ignored because nobody’s banging on the DBA’s cubicle telling them to focus on backups. Instead, users are screaming about performance or too-tight security.

It’s up to you, their manager, to enable them to get the basics down. Here’s how to do it.

1. Ask for a report of most recent backups, by database.

KPI: Coffee Cups Consumed (by André Freitas)

KPI: Coffee Cups Consumed (by André Freitas)

Call in the DBA and give them a list of columns you want in the report:

  • Server name
  • Database name
  • Database size
  • Purpose (production, QA, development)
  • Last full backup finish date/time
  • Last log backup finish date/time
  • Last successful DBCC CHECKDB finish date/time
  • Last date/time the backups were restored for testing

They can produce the report in any format that works for them, but they have to be able to produce it on demand with less than five minutes of manual work. (Ideally, they’d give you a solution you can run on your own, without telling them, but let’s start with something easy.)

Tell them this first task is about getting the lay of the land first, not fixing the problems. Oh, they’re going to find problems alright – but focus on just getting a clear picture of the work ahead first.

This task may take them a day or two because they have to assemble an exact list of the servers they’re responsible for, plus build the report process. (The good DBAs reading this are thinking, “I could do this in fifteen minutes!” If you’re just saying that now, that means you haven’t done it yet, and you’re not quite as good as you think. The great DBAs are thinking, “Yeah, that’s about how long it took me to get it right the first time.”)

2. Help the DBA come up with an improvement plan.

They’re going to be ashamed/disappointed by some of these numbers. (Or if they’re not, there’s your answer about their fit for the DBA position.)

Let them build a back-of-the-napkin mini-project plan to improve the metrics for your most important server. Give them that time – typically there’s a day or two of work involved – and after they finish, review what worked about the mini-project and what didn’t. Use that knowledge to build out a plan for the rest of the servers.

Your job is to run interference for them while they get the backups and DBCCs under control. When the DBA gets “urgent” help requests from end users, here’s what I coach them to say:

“Right now, I’m focused on a project to get our backups under control. If this issue is more important than getting our production data backed up, let’s go into my manager’s office together and talk through it. If not, I hate to say this, but I’ll need you to file a help desk ticket and I’ll get to it once we’ve got the backups under control.”

Yes, this means admitting that the backups aren’t under control. It’s the only way end users will accept a “no” from the DBA.

After this project finishes, run the report again, talk through the results, and then take the results to the business users who are storing data on these servers.

3. Ask the business if this amount of data loss is acceptable.

Take the backup numbers to the business along with our free RPO/RTO Worksheet. By default, the business usually wants zero data loss and zero downtime, but as good IT managers, we have to communicate the realities of today’s databases.

Armed with the costs and timelines in that worksheet, bring written RPO goals back to the DBA. (Yes, to be a good manager takes work, and these are political problems, not technical problems – save your DBA for the technical problems.) Sometimes, this will mean bringing budget numbers back as well – if the business wants tighter RPO/RTO goals than the current infrastructure provides, the DBA will need resources to build the infrastructure improvements.

4. Measure the gap between IT’s RPO goals and the DBA’s delivery.

Remember that report the DBA built? Add two columns:

  • IT’s goal for this database’s data loss (in minutes/hours) – you and the DBA come to an agreement on a realistic goal. In underfunded shops, this is sadly less ambitious than the business’s goals. (Example: “The business wants us to lose no more than 1 minute of data, but I understand that they’re not giving you space for log backups, so let’s aim for losing no more than 1 hour of data.”) In well-funded shops, the IT goal may actually be more ambitious than the business’s goal.
  • The gap between the DBA’s delivery and IT’s goal – if the last successful backup was 45 minutes ago, but we’re aiming for <30 minutes of data loss, we’re in the red by 15 minutes. This metric is how we measure DBAs.

As long as this gap is in the black, the DBA is doing a good job on backups. (Note: I’m not even talking about restore tests or DBCCs here – this is just what I consider table stakes on managing a DBA.)

When this number goes into the red, the DBA should stop focusing on anything else – anything – and focus exclusively on getting the data protected again.

For bonus points, trend the total server count, database count, and total data size over time. This helps your user base understand that the infrastructure will come under more pressure as the data size grows – you can only cram so much data into a server before performance issues arise.

In subsequent posts in this series, I’ll be exploring how to measure a DBA’s workload, the infrastructure’s complexity, RTO delivery, and other boring management topics. Buckle up.

Want to Learn More About Keeping Your Databases Online?

We just launched our new DBA’s Guide to SQL Server High Availability and Disaster Recovery – a 6-hour online video course that teaches you about clustering, AlwaysOn AGs, quorum, database mirroring, log shipping, and more.

The Nine Circles of Developer Hell

Commit message: Duck!

Commit message: Duck!

Everyone’s code is terrible. But exactly how terrible is yours?

First Circle: Code Limbo

We can’t say this is good code, but for the most part nobody notices it.

Second Circle: Code Lust

Cool function, bro!

Third Circle: Gluttonous Code

There’s always one simple piece of logic that’s been written hundreds of times, all over the codebase, but in slightly different ways. And then, one day, you need to change it.

Fourth Circle: Angry Code

Your application has periodic timeouts, which disappear as soon as users report them.

Fifth Circle: Greedy Code

This code was only supposed to run once when a user logged in and their profile was loaded. But sp_BlitzCache® says it runs 4,000 times per minute when you’re hardly doing anything.

Sixth Circle: Code Heresy

This code looks incredibly bad. It violates common sense, but it works right now. You’re afraid to touch it, for fear it will burst into flames.

Seventh Circle: Violent Code

Your application is deadlocking, and literally killing itself.

Eighth Circle: Code Fraud

That day you find out that large portions of your source code don’t actually belong to your company. And now you need to fix it.

Ninth Circle: Treacherous Code

When your database corrupts itself.

What Permissions does QUERYTRACEON Need?

The QUERYTRACEON hint can be particularly useful for testing the new cardinality estimator in SQL Server 2014 on individual queries using Trace Flag 2312, if your database is still using compatibility level 110:

SELECT ClosedDate, Id
FROM dbo.Posts
WHERE LastActivityDate > '2013-09-05 11:57:38.690' 
OPTION (QUERYTRACEON 2312);
GO

Conversely, if you’re using the new estimator everywhere by having database compatibility set to 120, you can use the old estimator for an individual query by using QUERYTRACEON 9481:

SELECT ClosedDate, Id
FROM dbo.Posts
WHERE LastActivityDate > '2013-09-05 11:57:38.690' 
OPTION (QUERYTRACEON 9481);
GO

But… Permissions?!?!

There’s a little problem. Turning on a trace flag requires high permissions. Let’s say that I’m attempting to use this hint from an account which doesn’t have superpower permissions. Here we create a login and user for app_account and grant it data reader, and execute on the dbo schema:

USE [master]
GO
CREATE LOGIN [app_account] WITH PASSWORD=N'DontBeLikeMeUseWindowsAuth', 
  DEFAULT_DATABASE=StackOverflow, CHECK_EXPIRATION=ON, CHECK_POLICY=ON
GO

use StackOverflow;
GO

CREATE USER [app_account] FOR LOGIN [app_account];
GO

ALTER ROLE [db_datareader] ADD MEMBER [app_account];
GO

GRANT EXECUTE ON  SCHEMA::dbo to [app_account];
GO

We can now impersonate app_account to run some tests!

EXECUTE AS LOGIN='app_account';
GO

Meet Error 2561, User Does Not Have Permission to Run DBCC TRACEON

When I try to run the query, things screech to a halt:

Error 2571 - no permission to run DBCC TRACEON

Oh, sad panda

Even if I try to get tricky, I can’t sneak this into a temporary procedure as app_account, either:

cannot sneak this into a temp procedure, either

Deferred name resolution doesn’t help me here. I guess it’s not a name.

Stored Procedures to the Rescue

We need a little help from a stored procedure. We’re going to revert back to our sa role, and create a procedure around our query:

REVERT
GO

CREATE PROCEDURE dbo.RecentPosts
    @DateVal DATETIME
AS
    SELECT ClosedDate, Id
    FROM dbo.Posts
    WHERE LastActivityDate > @DateVal OPTION (QUERYTRACEON 2312, RECOMPILE);
GO

/* Now we go back to running as app_account again */
EXECUTE AS LOGIN='app_account';
GO

Due to the magic of ownership chaining, we can now run the query as app_account:

No errors to be seen

 

This Means You Don’t have to Give Your applications Sysadmin Rights

… but you may have problems if those applications need to run a lot of adhoc queries.

Want to learn more about statistics and the new cardinality estimator? Join me and Jeremiah Peschka for our Advanced Querying and Indexing course!

 

 

css.php