Are you frustrated because you can’t get the tools or training you need? Does your manager keep saying “NO!” and make you do things the hard way? The real problem isn’t usually budgets: it’s communication. In this 20-minute video, I’ll show you how to make the case for the stuff you want, show you free resources to help convince your manager, and tell you about how I struggled with that same problem too.
Some of the links we covered:
- Merrill Aldrich’s Post on Leaking Money
- Brent’s Post on How to Get Budget Approval for Conferences
- Jeremy Clarkson Drives the Maserati Quattroporte
Liked this? Here’s the free webcasts we’ve got coming up:
March 6 – How to Get Your First Job as a DBA
Kendra Little, Tech Triage Tuesday
You’d love to become a Database Administrator, but how do you get your foot in the door? Ten years ago, Kendra Little was in your position. Since then, she gained a job as a DBA, worked her way up to Senior DBA, and is now a partner in a database consulting company. Along the way, she’s hired junior DBAs and helped employers develop interview procedures for database administrators. In this 30 minute session you’ll learn what employers look for, steps you can take to make yourself a great candidate, and how to build the right resume to get your first DBA job. Register now.
March 13 – How to Succeed in Database Development Without Really Trying
Jeremiah Peschka, Tech Triage Tuesday
March 20 – Monitoring SQL Server: How to Pick Your Strategy
Kendra Little, Tech Triage Tuesday
Monitoring is critical to project your database servers AND your job: if you don’t know when a problem is occurring, how can you respond quickly? In this 30 minute session, Kendra Little will discuss availability and performance monitoring for SQL Server. She’ll summarize the tools available and give you guidelines to plan your strategy to tackle a monitoring problem. This session is appropriate for DBAs or IT Managers with one year or more experience with a production environment. Register now.
No, this isn’t a touchy-feely inspirational post that talks you into taking leaps and bounds in your career. I write those every now and then, but that ain’t today.
Instead, today’s post is about a question: what’s stopping you from snooping in your company’s data?
You, as someone with a lot of database access, have more security permissions than just about anyone in the company. You probably have access to payroll records, financial data, company secrets, you name it. Sure, the company would like to think everything’s kept under lock and key, but developers and database administrators have the keys and they know how the locks work. Even in the most secure environments, I usually find that the staff could answer any question they feel like asking.
But we rarely do.
When I was a DBA, I felt a strange pride in knowing I could query any data whatsoever, yet I was beyond temptation. I didn’t want to know anybody else’s salaries because I believed it didn’t matter. If someone else was making more money than me, good for them and their negotiating skills. If nobody was making more than me, I shouldn’t be proud – I should be ashamed for taking more than I probably deserved. I didn’t want to see anybody else’s employee review, I didn’t want to know how much the company was making, and I didn’t want to see confidential data.
I’ll confess, though, since it’s just the two of us here – at a former company, when our sysadmins stumbled across a cache of videos on the file server, I stood around and watched with the rest of the IT team. We played an employee’s videos of their trip to Russia in search of a mail order bride, and my mind is permanently scarred by the memory of that employee sitting in a hot tub auditioning a prospective wife. It’d be one thing if these videos were in the employee’s private home directory, but these were shared with his entire department! We called HR, and oddly, the guy kept his job. He must have had more incriminating videos of other folks.
I’ll also confess that in the 2000s, while I lived in Houston, there were only two companies I ever aspired to work for: Arthur Anderson and Enron. Oh, how I longed to work in the beautiful architecture of the downtown Enron complex, but the Enron scandal brought both companies down in a flaming mess, almost before the buildings were ready. I counted my odd blessings that I hadn’t been admitted entry into those doomed businesses. However, what if I had? What if I’d gone to work for either company, and I suspected shady dealings? Would I have had the foresight, guts, and security permissions to make database queries to find out what was going on behind the scenes?
My guess is no – I’d never dream of writing a query to get data I’m not supposed to access in my daily job. Why is that, and what’s stopping you?
Over and over, readers leave comments here saying things like “Nice comment, Brent” on a post that Kendra wrote. We laugh about it – after all, they can’t really be reading that closely if they don’t even notice the name on the bottom of the post, right above where they’re commenting – but we started to ask ourselves questions. Are our writing styles really so similar that people can’t tell us apart despite hand-drawn illustrations, references to our favorite music, and our individual topic selections?
So we figured it’s time for a test. We’ve picked 3 topics, and Brent, Jeremiah, and Kendra each wrote a paragraph on each topic.
Your Mission: Guess the author of each paragraph.
Topic 1: Working with Bad Managers
Mystery Author A: In the real world, we don’t skip to work humming Don’t Worry Be Happy while eating a bacon-covered donut. Sometimes we drag our feet on every step of the way, cringing as we walk in the front door of the office. Looking back through my career, the one thing that’s always made me hate my job hasn’t been the work itself – it’s been a bad manager. I’ve been happy doing some truly awful work, but if I’ve got a bad manager, it doesn’t matter how good the work is, because I’ll still hate it. Working with bad managers means working against my own happiness, and I’ve come to learn that there’s no changing bad managers. I can try to manage them and groom them, but doing so comes at the expense of my own happiness and productivity. Life’s too short – take that effort and focus it toward getting a better environment where you can happily contribute and grow.
Mystery Author B: Let’s face it, we’ve all had bad managers. While you should never make your manager look bad, you should help them get better. Mentor your manager just like you would mentor a junior DBA. Provide them with guidance about interacting with the rest of the team – who responds well to which leadership styles – as well as help your bad manager understand how to communicate with the rest of the organization. Above all else, clearly set expectations around your role on the team and, your communication style, and just where your help is going to end. Making sure everyone knows what to expect is a good way to make the best of a bad manager.
Mystery Author C: Bad managers come in so many varieties. Ever worked for The Big Talker who sets out a bold plan, then folds at the first complaint from customers? How about The Groundhog who never remembers anything from the day before– so you get to constantly re-explain what you do? Have you reported to The Gossip, who can’t keep anyone’s secrets, much less their own? The trick to working for all bad managers is to remember that they aren’t your parent. Take charge of your career. Track your goals and weekly progress. Proactively schedule and manage meetings. Direct your own work. Make connections with your customers. People around you will notice who on your team is effective and helpful, and the skill of managing yourself successfully is well worth having in the long run.
Topic 2: Working with Remote Teams
Mystery Author A: Working remotely can be tricky; there can be a significant lag in communications and people in different time zones may have different expectations about when you leave work. Like almost anything else the key is to make sure that everyone on the team has an understanding of how and when communication will occur. If you pick up the phone at 8:00PM, then you can rest assured that your co-workers will assume you’ll be working until 8:00PM. It’s just as important to stay in communication with your remote team via email and some tool like Skype or Microsoft Communicator. It’s just as important to communicate at the right time as it is to communicate face to face.
Mystery Author B: You might not trust the guy sitting in the next cubicle over, but if you haven’t worked with successful telecommuters, you probably trust a remote worker even less. The key to remote team success is verifiable trust: feeling confident that invisible team members are doing the right amount of the right thing at the right time. Both words are important – verifiable and trust. When I’m working away from the rest of the team, I stay visible on instant messaging tools and try to respond as quickly as possible even when I’m away from the computer. A phone with email and good apps is your best friend. When a member of the remote team drops offline for extended periods, not visible in chat and not responding to emails, trust erodes and it’s time to start doing some verification. Are they checking in good code at frequent intervals? Are they responding to tickets and requests from other team members? If not, the thrill is gone.
Mystery Author C: I’ve learned one thing from years of experience working with remote teams: email can be your worst enemy. What seems clear, precise, and specific in one time zone comes across as vague, soupy, and just plain weird in another. It usually takes days to find a critical misunderstanding about instructions sent in an email, then more time to clarify the issue. Meeting regularly with webcams– at least twice per month– may require working some strange hours, but it will save your projects hundreds of hours in the long run. Schedule the meetings in advance and make sure it’s OK to use flex time to avoid burnout.
Topic 3: Working with Your Nemesis
Mystery Author A: Sooner or later, we all work with that one special person who pushes every button. You may feel that your nemesis is unethical or has harmed your career. How do you handle it? The most important technique is the most difficult: don’t gossip about the issue. Ever. If you have knowledge of specific unethical behavior, you need to report it. That’s where discussion should end. As soon as you blow off steam at work by talking about your colleague, you complicate the situation and make it more difficult to resolve. Instead, focus on what’s best for the company. When things feel too personal, block off some time in your calendar and use it to take a walk or work on your own goals in a place where you can’t be interrupted. Every job is temporary, but your reputation stays with you.
Mystery Author B: I’ve worked with a few people that I just couldn’t stand. Part of me always wanted to be a jerk, but another part of me wants to be respected and to get my job done. When working with your nemesis, it’s important to minimize unhealthy conflict. Conflict itself is good; healthy conflict will lead to challenging your existing ideas and making you better as a professional. Unhealthy conflict is going to lead to leaving three week old tuna fish sandwiches in someone’s desk drawer. I survived working with a nemesis by cultivating healthy conflict. We challenged each other on our ideas; even when we were on different projects, I always worked to make sure we were pushing each other forward instead of down the stairs. It wasn’t always a good time, but I got better as a result.
Mystery Author C: Oh no, it’s that guy. Try to avoid eye contact. We’ll slink into the stairway rather than use the elevator – he’s too lazy to take the stairs, so we can avoid having to spend another lunch hour with him. Whew! Made it. How the heck does he keep his job here? They say he produces good results, but I don’t believe that for a second. He must have incriminating pictures of somebody, because I’d have fired him long ago. You know what? I’ve had enough. Let’s make a plan to get rid of that stupid, arrogant jerkwad. We’ll take him out for drinks tonight and – no, wait, put the knife down, bear with me for a second. When he’s plastered, we’ll steal his phone and take pictures of our junk and text it to Shiela in HR. She’ll think it’s his junk – well, I mean, as long as she’s never actually seen his junk, but you know what a prude Shiela is, so that can’t be possible. He’ll get fired, and then hopefully the next guy will be better. I swear, I can’t understand why this company can’t just hire nice, normal people? Nobody ever sticks around here other than you and me. Raise your glass – let’s have a toast to dedicated, friendly, hard-working people like us. Cheers! Hey, waiter, bring us another round of martinis.
Test Time! Who Wrote What?
Copy/paste the below into your comment and give us your best guess in the comments before Sunday, March 4th at midnight GMT. We’ll throw all the correct answers in a hat, draw one winner at random, and announce their name on our March 6th webcast, How to Get Your First Job as a Database Administrator. If no one guesses the answers correctly, we’ll throw all of the most correct answers in a hat (meaning, if four people guessed 7 names correctly, those four people will go in the hat.) The winner will receive a set of PASS Summit 2011 Spotlight, Half-Day and Regular Session Recordings DVDs.
The fine print: Contest limited to US and Canada residents, because international shipping is such a pain in the old blog post. If you’re abroad, you can still enter for the fame and bragging rights, but you won’t get the prize – we’ll designate an honorary US/Canada winner on your behalf.
Good luck, and show us how you know us so well!
Topic 1: Working with Bad Managers
Mystery Author A is –
Mystery Author B is –
Mystery Author C is –
Topic 2: Working with Remote Teams
Mystery Author A is –
Mystery Author B is –
Mystery Author C is –
Topic 3: Working with Your Nemesis
Mystery Author A is –
Mystery Author B is –
Mystery Author C is –
And the Winner Is…David Skelton!
David, Andy Galbraith, and Jesse Reich all guessed the correct answers, and we pulled David’s name out of a hat on our live webcast. Here’s the correct answers:
Topic 1: Working with Bad Managers
Mystery Author A is – Brent
Mystery Author B is – Jeremiah
Mystery Author C is – Kendra
Topic 2: Working with Remote Teams
Mystery Author A is – Jeremiah
Mystery Author B is – Brent
Mystery Author C is – Kendra
Topic 3: Working with Your Nemesis
Mystery Author A is – Kendra
Mystery Author B is – Jeremiah
Mystery Author C is – Brent
How does the SQL Server transaction log impact the performance of your code?
What impacts the performance of the transaction log, and how do you know when you’re pushing the limits of your log?
Kendra Little explains the role of the transaction log in SQL Server and what you can do to optimize your use of the log– we’ll discuss everything from the physical structure of the log to using minimal logging in your code. This talk is aimed toward DBAs and developers who have worked with SQL Server >= 1 year.
Looking for references and links Kendra refers to in the video? Scroll on down to the bottom of this post.
A Quick Note on Wording
Videos are great for monitoring your own wording. After watching this video I realized I have a tendency to sometimes use phrases along the lines of “clearing transactions” from the log. The better way to say things like this is that “checkpoint flushes dirty pages to disk.” When in the simple recovery model as I was in my demo, this can allow inactive portions of the log to be marked for re-use. But it doesn’t really “clear” them– that’s misleading, even if it is easier to say. (Read more on checkpoint here.)
- Learn more with Brent’s “DBA Darwin Awards: Log File Edition” video: http://www.brentozar.com/go/LogBasics/
- Check out Amit Banerjee’s blog on autocommit, transactions, and log flushing– the inspiration for the first demo in the video.
- Read the Data Loading Performance Guide to learn when and how you can use minimal logging
- Find out the SQL CAT team’s take on the topic in Diagnosing Transaction Log Performance Issues and Limits of the Log Manager
- Get the in-depth scoop on how those writes work in How It Works: Bob Dorr’s SQL Server I/O Presentation
Tools We Used: sys.dm_os_wait_stats, fn_dblog, and perfmon counters
- We looked at wait stats by querying sys.dm_os_wait_stats with a stored procedure based off Glenn Berry’s DMV queries. To get yours, find his most recent set that’s right for your version of SQL Server, then search for sys.dm_os_wait_stats.
- We looked at log records in the active portion of the transaction log using fn_dblog. This is technically an undocumented and unsupported function– use your own best judgment about when and where it’s safe to use. In the video, I’m talking about test environments! For a sample use, check out an interesting use of fn_dblog in Kalen Delaney’s Geek City: What gets logged for index rebuild operations?
- We looked at the following performance counters on the SQLServer: Databases perfmon object:
- Log Bytes Flushed / Sec
- Log Flushes / Sec
- Log Flush Waits / Sec
With SQL Server 2012’s new AlwaysOn Availability Groups, we have the ability to run queries, backups, and even DBCCs on database replicas. But should we?
“It’s not checking the I/O subsystem of the log shipping primary database – only that the log backups are working correctly…. Log shipping only ships transaction log backups – so any corruptions to the data files from the I/O subsystem on the primary will likely not be detected by the offloaded consistency checks…. I’ve heard it discussed that the SQL Server 2012 Availability Groups feature can allow consistency checks to be offloaded to one of the secondary copies – no – by the same argument.”
Actually – brace yourself – I’m about to suggest that Paul’s DBCC advice needs to be even stronger. Take the following configuration:
- SQLPRIMARY – primary replica where users connect. We run DBCC here daily.
- SQLNUMBERTWO – secondary replica where we do our full and transaction log backups.
In this scenario, the DBCCs on the primary server aren’t testing the data we’re backing up. We could be backing up corrupt data every night as part of our full backups. If we experienced corruption on the primary server, and it tried to take care of business with automatic page repair by fetching a copy from the secondary server (which also happened to be corrupt), we’d be screwed. We wouldn’t have a clean backup of the page, and our data would be permanently lost.
The moral of the story: with Availability Groups, we need to run DBCCs on whatever server is doing our full backups, too. Doing them on the primary database server isn’t enough. In a perfect world, I’d run them regularly on any server that could serve a role as the primary database. Automatic page repair can only save your bacon if a replica is online and has a clean copy of the page.
One group I’m working with has taken a rather unique approach to running DBCCs. They can’t afford the performance overhead of running DBCC or backups on the primary replica (ServerA), so every night they run backups and DBCC on a secondary (asynchronous) replica in the same datacenter (ServerB). On Saturday night during their regularly scheduled outage, they switch into synchronous, get the boxes in sync, and then fail over to ServerB. Then they run on ServerB all week long, and run DBCCs on ServerA every night. It’s more manual work, but the payoff is a blazing fast and safe primary node. (And in case you’re wondering about lost transactions in async mode, they’ve thought about that too – when the server becomes mission-critical, they plan to add a ServerC instance acting as a synchronous replica as well.)
And no, I never thought I’d write that Paul Randal isn’t telling you to run DBCC often enough.
Here at Brent Ozar Unlimited®, we’re very open to alternative viewpoints on information technology. We’d like to help broaden your horizons and introduce you to new possibilities for your servers, so we’re introducing a new Monday series – the Celebrity Guest Post.
Our first guest is life coach Phony Robbins, an internationally renowned motivational speaker who’s written books, delivered sessions, and brought millions of people to the success they’ve always wanted. With no further ado, I give you our first guest post.
What Great Admins Have in Common
Hi, I’m Phony Robbins. I’ve studied some of the best admins around, and I’ve had the extraordinary privilege to ask them personal questions. Why do you do what you do? How do you help others get things done? How do you accomplish so much at such little cost? Where are the bodies buried?
I’ve noticed a thread that weaves all of them together. The very best technology people like to listen. No, not to people. They can’t stand people. They like listening to machines.
They jump at the excuse to walk into the datacenter and pretend like they’re fixing something, but watch what happens when they reboot a machine. They don’t jump into another machine to check email. They don’t update executives on status. They just lean casually against a rack, with one hand on a server, but their relaxed demeanor hides their real work: they’re feeling the power.
The best admins hear right away when a server reboots because the amount of power coursing through the server changes, the fans slow down, then they speed up again. They feel the varying forces of air acting on the temperature of the datacenter and they know it all comes down to power.
Your Server is Part of the 99%
The sheep, the masses, they believe power needs to be balanced – but that’s simply not true. A balance of power means that you have some of the power, and someone else has the rest. If you believe that, and if you let that happen to you, then you’re never going to become one of the 1%. That’s what you want, isn’t it? There’s nothing wrong with wanting the very best for you and your servers. Those suckers at Occupy, they’re going about it wrong by trying to bring the 1% down to the rest of us. We need to bring all of ourselves up to the 1% level.
Forget 99% or 1%. Even forget 100% power – that’s not powerful enough for us. We need more than 100% power.
You probably think that getting all of the power is hard, that it requires political work, that it requires convincing. Nope. The answer is inside you, right now. Well, it’s not yet, but it’s about to be, because I’m going to empower you with the secret to powerful success.
You just have to believe.
No, not in yourself – you’re a loser. You have to believe in CPU-Z. It’s a completely free piece of software that tells you how fast your CPUs are supposed to go, and how little power you actually have. Download the zip-no-installation version, extract it, and run it on your server. Here’s what it’s going to look like:
Read the “Specification” line about halfway down the screen – that’s how fast the manufacturer told you the processor would go. In this case, it’s 2.66GHz. Then read the Core Speed line at the bottom left – that’s how fast your processor is actually going.
Or rather, how slow your processor is going, because your server is lazy. Your server hasn’t awoken the power within. In this case, this server’s plugging along at just 1995.6 MHz. 1995 was great for a year, because it was the year my book broke a million sales per year and I bought my second wife her third Mercedes. 1995 is crappy for your server’s speed, because like Sammy Hagar said, he can’t drive 1995.
Awakening the Power Within Your Server
It’s time to turn things around and take control, and we know where that happens – the Control Panel. Open it up, and go to Power, because that’s where the 1% are keeping things back from us. Your server will say something weaselly like Power Saver or Balanced or Saving the Whales. That’s not powerful. Click High Performance, and that’s powerful.
Run CPU-Z again, and if you clicked that mouse like a boss, your screen will look more like this:
The server above has a core speed of 3,168.5MHz, and the specification is only 2.93GHz (2,9300MHz). This server isn’t part of the 99% or the 1% – this server is part of the 110%. Isn’t that what you want, real power? Sure, it comes at a cost of higher electrical bills that cost more money, but money isn’t a problem for the powerful, kiddo.
If it still doesn’t show at least the same speed as the Specification, your hardware guys might be holding you back, too. Server BIOS settings can be configured to override the operating system and leave power saving – or as I like to call it, poverty-causing – enabled no matter what. At your next scheduled outage (and by that I mean the next time you can get into the datacenter and pretend to trip over the power cord) you should change that back again.
Run CPU-Z today, activate the power within you, and stop letting the 99% and the 1% boss you around. From this point forward, when developers or users come to you whining about why your servers are running slow, the answer isn’t your servers – it’s their weak and puny code. After all, your server simply can’t get any more powerful.
SQL Server and SSD make for a hot and crazy relationship. In this webcast, you’ll learn about the wild obsession of the database and solid state storage. It might just be time for you to consider the siren song of the SSD, but know what a dangerous mistress she can be. http://www.youtube.com/watch?v=FVPmsrXpKjI This session is a subset of a 2-hour presentation Brent’s doing on SQLCruise 2012. Come join us – there’s still a few spots left on the May 26th cruise to Alaska. The training is $895, and your cabin is as low as $850 per person for double occupancy. Often companies pay for their employees’ training cost and let them go without taking vacation time, and the employee’s just responsible for the cruise & airfare costs. It’s a pretty fun way to learn a lot and network with some great presenters, including all of Brent Ozar Unlimited®. To get an idea of what it’s like, check out the 2012 trailer.
When preparing for a technical interview, most people forget to think about how they answer questions.
Even though you’re focusing on technical topics, make sure you avoid some simple statements that could spoil your chances to land that job.
#3: My Coworker Was an Idiot
Whether or not it’s true, don’t let this statement come out of your mouth.
You will often be asked questions in a technical interview about mistakes you’ve made or witnessed in your career. Your job is to describe the cause of the problem, your response, and what you learned.
Whatever the incident, be cautious pointing the finger of blame. Calling out your colleagues for incompetence or stupidity will bring you down as well. After all, why didn’t you stop them? (Tip: you don’t want to answer that.)
Instead: Focus on the problem with the technology and the process. Talk about your own role– take responsibility for both your mistakes and your achievements. Talk about changes which would prevent the problem from happening again. (Bonus points if you can tell a story about helping make that happen!)
#2: That Wasn’t My Job
Technical interviews will often touch on topics outside your experience. This is no accident: screeners want to see how you react to questions either above your level or in a different area.
When the conversation goes into these areas, never make excuses. Many people try to provide explanations which boil down to “I don’t know that because it wasn’t my responsibility.” This is a huge mistake: you immediately sound like you have no passion, no intellectual curiosity, and only do what you are tasked with.
Instead: It may seem counter-intuitive, but admitting when you don’t know something can work hugely in your advantage in a technical screening. The key is to do so calmly and with confidence. If you have a theory as to how you could figure the question out, or strategies you might take to solve the problem, offer them!
#1: I’m Sorry
Unless you spill coffee on someone, apologies have no place in a technical screening.
You may apologize without even realizing it. When in a high pressure situation, it’s natural for many people to apologize if they feel it isn’t going well. The more nervous you are, the greater chance that you’ll start apologizing.
Apologizing when you don’t know things doesn’t show your strengths. Instead, it gives an impression that you’ve given up on the question.
Instead: Even if you’re on the tenth question you don’t know the answer to, keep your chin up and keep going strong. If many of the questions are on the same topic, ask if the technical screener has a good references where you could learn about that topic. You can try to save the situation by doing some research and sending a great follow up email after the screening. Whatever you do, proceed through the interview with confidence— it’s never too late to succeed.
What to Remember For Your Technical Screening
Your greatest mission in your technical screening is not to get all the questions right: it’s to best represent how you solve problems. Even if you don’t know answers to half of the questions, you can make the screening a success if you show your strengths along the way.
I have seen quite a few situations where a technical screen shows that a candidate absolutely can’t handle the job in question, but the screener is still incredibly impressed by their performance. In many of these cases, the candidate lands a job at the company— it’s just not the job they initially applied for.
Crazy Ballmer’s Database Hut is having the fire sale of the century! SQL Azure pricing is sliced in half or more. We must be crazy, they say! Check out these discounts:
When SQL Azure first came out with its absurd limitations (no backups, tiny database sizes), I viewed the pricing the same way I view cocaine pricing. You’d have to be pretty stupid to do cocaine, but if you do, then you’re not really the kind of person who’s deterred by astronomically high prices anyway. (You’re probably proud of the high prices, come to think of it.)
Today, it’s a different story. We’ve got:
- Databases up to 150GB
- Federations to make sharding easier (not easy, but easier)
- Backup capabilities
- Reasonable pricing
Suddenly, SQL Azure isn’t cocaine anymore. I’m not saying you’ll want to start sniffing it today, but for starters, why not run a query to find out how much your own server would cost to host in SQL Azure?
SELECT @@SERVERNAME AS ServerName , DB_NAME(database_id) AS DatabaseName , SUM(( size * 8.0 ) / 1048576) SizeGB , SQLAzurePrice = CASE WHEN SUM(( size * 8 ) / 1048576) > 150 THEN 999999.000 WHEN SUM(( size * 8.0 ) / 1048576) > 50 THEN (125.874 + ((SUM(( size * 8.0 ) / 1048576) - 50) * .999)) WHEN SUM(( size * 8.0 ) / 1048576) > 10 THEN (45.954 + ((SUM(( size * 8.0 ) / 1048576) - 10) * 1.998)) WHEN SUM(( size * 8.0 ) / 1048576) > 1 THEN (9.990 + ((SUM(( size * 8.0 ) / 1048576) - 1) * 3.996)) WHEN SUM(( size * 8.0 ) / 1024) > 100 THEN 9.990 ELSE 4.995 END FROM sys.master_files WHERE type_desc <> 'LOG' GROUP BY DB_NAME(database_id) ORDER BY DB_NAME(database_id)
Disclaimer: that query is probably wildly inaccurate because I write horrendous T-SQL and barely tested it. For exact pricing, check the Azure pricing page and scroll down for SQL Azure, or use the Azure pricing calculator.
My query lists the data file sizes – not the actual data sizes, so if you’ve comically overprovisioned your files with lots of empty space, you’ll get artificially high SQL Azure costs. That’s how I like to do my projections, though – if I’m worried that the data will grow to a certain size, then I size the database for it, and that’s a reasonable projection number for my SQL Azure costs as well.
That query also doesn’t warn you about features you’re using that aren’t available in SQL Azure, nor the security challenges of having your customer data offsite, nor does it cover how your applications might be impacted by a database that lives outside of your network, yadda yadda yadda. These aren’t unsolvable challenges – especially when the price is right, and today, it’s gotten a lot closer to right.
Here’s the much harder query to write: we need to be ready when management asks, “So how much does it cost to host these databases on-premise, and do we have three replicated copies like SQL Azure?” SQL Server 2012’s AlwaysOn Availability Groups get us much closer to that solution, but it ain’t cheap.
The options for high availability can get confusing. I was lucky enough to begin working with SQL Server clusters early in my career, but many people have a hard time finding simple information on what a cluster does and the most common gotchas when planning a cluster.
Today, I’ll tell you what clusters are, what they’re good for, and why I like to plan out my clusters in a very specific way. I’ll also give an overview of how clustering relates to the AlwaysOn Availability Groups feature in SQL Server 2012, and wrap up with frequently asked questions about clustering SQL Server.
What Type of SQL Clustering are we Talking About?
There are lots of types of clusters out there. When we cluster SQL Server, we install one or more SQL Server instances into a Windows Failover Cluster. In this post I’m talking specifically about clustering SQL Server 2005 or later using Windows Server 2008 or later.
Key Concept: A Windows Failover Cluster uses shared storage– typically, this shared storage is on a SAN. When a SQL Server instance is installed on the cluster, system and user databases are required to be on the shared storage. That allows the cluster to move the SQL instance to any server (or “node”) in the cluster whenever you request, or if one of the nodes is having a problem. There is only one copy of the data, but the network name and SQL Server service for the instance can be made active from any cluster node.
Translation: A failover cluster basically gives you the ability to have all the data for a SQL Server instance installed in something like a share that can be accessed from different servers. It will always have the same instance name, SQL Agent jobs, Linked Servers and Logins wherever you bring it up. You can even make it always use the same IPAddress and port– so no users of the SQL Server have to know where it is at any given time.
Here is a diagram of a SQL Server cluster. The cluster is named SQLCLUSTER01. It has two nodes (servers), which are named SQLCLU01NODE01 and SQLCLU01NODE02. People connect to the SQL Server instance at SQLCLU01A\SQL. The instance has been configured on port 1433.
Oh no! There’s been a failure in our environment!
Here’s what happened.
The SQLCLU01NODE01 server crashed unexpectedly. When this happened, the Windows Failover Cluster service saw that it went offline. It brought up the SQL Server services on SQLCLU01NODE02. The SQLCLU01A\SQL instance started up and connected to all the same databases on the shared storage– there’s one copy of the data, and it doesn’t move. As part of the SQL Server startup, any transactions that were in flight and had not committed at the time of the crash were rolled back.
While this automatic failover was occurring, users could not connect to the SQLCLU01A\SQL instance. However, after it came back up they were able to resume operations as normal, and had no idea that a server was still offline.
Why You Care About SQL SERVER Clustering
If you’re a business owner, manager, or DBA, you care about clustering because it helps keep your applications online more of the time— when done properly, it makes your database highly available.
Here are some ways that clustering makes your life easier:
- Hardware failures are a nightmare on standalone servers. If a server starts having problems in a failover cluster, you can easily run your SQL Server instance from another node while you resolve the issue.
- Applying security patches on a standalone server can be very tedious and annoying to the business: the SQL Server is offline while you wait for the server to reboot. By using failover clustering, you can apply patches with only brief downtimes for your application as you move your SQL Server instance to a different node.
- Failover clusters can also give you an additional tool in your troubleshooting toolkit. Example: if you start seeing high latency when using storage and you’ve ruled out all the immediate candidates, you can fail to another node to try to rule out if it’s a problem with a per-node component like an HBA.
- Clustering is transparent to the calling application. Lots of things with SQL Server “just work” with clustering, whereas they’re a little harder with other alternatives. With clustering, all of my databases, logins, agent jobs, and everything else that’s in my SQL Server instance fail over and come up together as a single unit— I don’t have to script or configure any of that. I can also cluster my distributed transaction coordinator and fail it over with my instance as well.
Gotchas and Notes for Planning a SQL Cluster
Know What Clustering SQL Server Doesn’t Do
The first gotcha is to be aware of what a failover cluster won’t help you with.
Clustering won’t improve your performance, unless you’re moving to more powerful servers or faster storage at the same time you implement clustering. If you’ve been on local storage, don’t assume moving to a SAN means a nirvana of performance. Also, clustering doesn’t guarantee that everything involved in your SAN is redundant! If your storage goes offline, your database goes too.
Clustering doesn’t save you space or effort for backups or maintenance. You still need to do all of your maintenance as normal.
Clustering also won’t help you scale out your reads. While a SQL Server instance can run on any node in the cluster, the instance is only started on one node at a time. That storage can’t be read by anyone else on the cluster.
Finally, clusters won’t give you 100% uptime. There are periods of downtime when your SQL Server instance is “failing over”, or moving between nodes.
Invest Time Determining the Right Naming Convention
You have a lot of names involved in a cluster: a name for the cluster itself, names for each of the servers in the cluster, and names for each SQL instance in the cluster. This can get confusing because you can use any of these names later on when connecting with Remote Desktop– so if you’re not careful, there may be times when you’re not entirely sure what server you’re logged onto! I have two general rules for naming:
First, make sure it’s obvious from the name what type of component it is– whether it’s a cluster, physical server, a SQL Server instance, or a Distributed Transaction Coordinator. I also recommend installing BGINFO to display the server name on the desktop for every server in the cluster.
Second, name everything so that if you later add further nodes or install another SQL Server instance onto the cluster, the names will be consistent.
Avoid Putting Too Many Nodes in One SQL Cluster
I prefer to have only two or three nodes in a cluster. For example, if I need to cluster five SQL Server instances, I would put them in two failover clusters.
This requires a few extra names and IP Addresses overall, but I prefer this for management reasons. When you apply patches or upgrades, you must make sure that each service on your cluster runs on each node successfully after you’ve applied the change. Having a smaller cluster means you don’t need to fail your instance over as many times after a change.
Don’t Assume Your Applications Will Reconnect Properly After Failover
Even though your SQL Server instance will come up with the same network name and IPAddress (if not using DHCP), many applications aren’t written to continue gracefully if the database server goes offline briefly.
Include application testing with your migration to a failover cluster. Even though the application doesn’t know it’s talking to a cluster (it’s a connection string like any other), it may not reconnect after a failover. I worked with one application where everything worked fine after a failover, except web servers stopped writing their log data to a database because they weren’t designed to retry after a connection failure. The data was written asynchronously and didn’t cause any failures that impacted users, but the issue wasn’t noticed immediately and caused the loss of some trending data.
“Active Active” Can Be Useful
My ideal cluster layout to work with is a two node cluster with identical hardware and two SQL Server instances on it. This is commonly called “Active Active” clustering, but that term is technically a no-no. Officially this is called a “Multi-Instance Failover Cluster.” Not quite as catchy, is it?
Many people think the ideal situation is to put their most important SQL Server instance on a two node cluster and leave the second node ready, waiting, and idle. So, why do I want a second SQL Server instance?
I like to put my critical, heavy hitter database on one of those instances in the cluster. I then want to take a couple of less critical, less busy databases and put them on the second instance. The perfect examples are logging databases. There are two requirements for these databases: first, they can’t require a large amount of memory or processor use to run well, because I absolutely have to know that these two instances can run successfully at peak load on a single node if required. Second, the databases on the “quiet” instance shouldn’t cause the whole application to go offline if they aren’t available.
Why do I like having a “quiet” instance? Well, whenever I need to apply updates to Windows or SQL Server, this is the canary I send into the coal mine first. You can perform rolling upgrades with failover clusters, which is great. But it’s even better to know that the first instance you fail over onto an upgraded node won’t take absolutely everything down if it has a problem.
Notes: Because of licensing costs, this option won’t always be realistic. If you go this route you have to make sure everything can stay within SLA if it has to run on a single node at your busiest times– don’t overload that “quiet” instance!
Re-Evaluate your SQL Server Configuration Settings
Revisit your configuration settings as part of your planning. For example, on a multi-instance cluster, you use the minimum memory setting for SQL Server to configure how your instances will balance their memory usage if they are on the same node.
Do I Have to Use Clustering to Use Availability Groups in SQL Server 2012?
This is an interesting question– don’t let it confuse you. We have a very cool new feature called Availability Groups coming in SQL Server 2012, which does offer awesome scale-out read functionality. You’ll read in many places that it “requires Failover Clustering.”
This is true. In order to use the Availability Group feature in SQL Server 2012, the Failover Clustering feature must be enabled in Windows. If you’re using Windows Server 2008 or prior, this feature is only available in Datacenter and Enterprise edition of Windows Server, so that feature isn’t free. This feature is now included in Windows Server 2012 for all editions.
But wait, there’s a catch! Even though you’re enabling the Failover Cluster feature, you are NOT required to have shared storage to use Availability Groups. You have the option to use a Failover Cluster in an Availability Group, but you can also run your Availability Groups with entirely independent storage subsystems if you desire. The feature is required because no matter what, Availability Groups will use parts of the Failover Clustering feature to manage a virtual network name and IP Address.
Frequently Asked Questions for Clustering SQL Server
Q: Can I install every SQL Server component on my cluster?
A: Nope. SQL Server Integration Services is not “cluster-aware” and can’t fail back and forth with your cluster.
Q: How long does it take to fail over?
A: There are several factors to consider in failover time. There’s the time for the SQL Server Instance’s service to go down on one node, be initiated on another node, and start up. This time for instances to start and stop includes normal database recovery times. If you need to keep failovers within an SLA, you’ll want to test failover times in a planned downtime, but also estimate in how long failover might be if it happened at peak load.
Q: Can I cluster a virtualized server?
A: Yes, you can create failover clusters with virtual servers with VMware or Hyper-V, and install SQL Server into it. I think this is great for learning and testing, but I’m not crazy about this for production environments. Before you go too far down this path, look at leveraging your hypervisor’s native components for high availability.
Q: Why do you make such a big deal about the shared storage?
A: Because not everyone has robust shared storage available. You want to make sure you’re using shared storage that has redundancy in all the right places, because in a failover cluster shared storage is a single point of failure, no matter how magical the SAN seems. This also means that if your data is corrupted, it’s going to be corrupted no matter which node you access it from.
Q: What’s the minimum number of nodes in a failover cluster?
A: One. This is called a single-node cluster. This is useful for testing purposes and in case you have a two node cluster and need to do a work on a node. You can evict a node without destroying the cluster.
Q: Can I use geo-clustering for Disaster Recovery?
A: Yes, but it requires some fancy setup. Most SQL Server clusters are installed in the same subnet in a single datacenter and are suitable for high availability. If you want to look into multi-site clustering, “geo-clustering” became available with SQL Server 2008, and is being enhanced in SQL Server 2012. Note: you’ll need storage magic like SAN replication to get your Geo-cluster on.
Q: Does it matter which version of Windows I use?
A: Yes, it matters a lot. Plan to install your Windows Failover Cluster on the most recent version of Windows Server, and you need Enterprise or Datacenter edition. If you must use an older version of Windows, make sure it’s at least Server 2008 with the latest service packs installed. The Failover Clustering Component of Windows was rewritten with Server 2008, so if you run on older versions you’ll have fewer features and you’ll be stuck chasing old problems.
Q: What is Quorum?
A: Quorum is a count of voting members— a quorum is a way of taking attendance of cluster members who are present. The cluster uses a quorum to determine who should be online. Read more about quorum here.