Here’s some of the reasons companies usually virtualize their SQL Servers:
- Cost savings on hardware
- Cost savings on Windows OS licensing
- Cost savings on SQL Server licensing
- Protect against the failure of a single hardware element
- Leverage extended features for Disaster Recovery
- Automatic load balancing across multiple hosts
- Easier hardware replacement/migration
When we perform a SQL Critical Care® on a virtualized SQL Server, we often ask, “Are we actually getting those benefits?”
1. Cost savings on hardware – do you find yourself putting one SQL Server guest on each host, isolating them to make sure they get the performance they need? If so, you’re not actually saving money on hardware.
2. Cost savings on Windows OS licensing – as a standard, some companies license all their virtualization hosts with Windows Server Datacenter Edition in order to get unlimited virtualization rights. However, if you’re only running one guest per host (or just a few), then you’re not saving money here either.
3. Cost savings on SQL Server licensing – for this one, you’ve gotta do a little bit harder work. Add up the licensing you’re spending now, and look at what it would take to run similar instances on bare metal hardware. Keep in mind that you can still buy dual-socket, quad-core servers that are insanely powerful (768GB RAM, dozens of SSDs), thereby keeping your SQL licensing lower.
4. Protect against the failure of a single hardware element – on the free versions of most hypervisors, you don’t get automatic failover protection. You can manually start up a guest on another host with some human intervention. Is that enough for the business, or are they assuming it’ll all happen automatically with only a minute or two of downtime – even when you’re not around? Or even worse, do you not have enough hardware horsepower to start up your biggest SQL Server guest somewhere else if its host fails? Or, heaven forbid, are you using local SSDs with virtualization, thereby missing the entire ability to move guests around?
5. Leverage extended features for Disaster Recovery – VMware and Hyper-V have killer features (and third-party app extensions) that make it easy to replicate a guest from one site to another. Are you using those, or have you given up because SQL Server’s data change rates are too high, and your network can’t keep up?
6. Automatic load balancing across multiple hosts – VMware’s Distributed Resource Scheduler (DRS) will automatically shuffle VMs around between hosts based on resource utilization. It’s an amazing way to react to performance issues with less human intervention. You should be using it.
7. Easier hardware replacement/migration – because SQL Server licensing is priced by the CPU core, and it’s super expensive, many shops choose to improve their virtualization host hardware annually. Whenever they need more capacity in their VMware or Hyper-V clusters, they drop in a couple of new hosts, vMotion or LiveMigrate the most expensive per-core guests over to those hosts (thereby taking advantage of today’s faster processors), and then give everybody else the hand-me-downs. It’s easy to do even live during the daytime. However, some shops are still running their SQL Servers on CPUs that might get featured on Antiques Roadshow.
If you’re not leveraging at least some of these virtualization features, and you don’t plan to…then what was the point of virtualizing to begin with? Jump on in – the water’s fine!
If you’re using AGs, don’t apply these patches:
- SQL 2012 SP2 CU3
- SQL 2012 SP2 CU4
- SQL 2014 CU5
until you read this Microsoft post about a breaking bug.
Your AG may stop synchronizing due to blocking between user queries and a system session. The fix is to disable automatic failover, restart the primary, and enable automatic failover again.
You’re a developer or a DBA, and you’re comfortable writing queries to get the data you need. You’re much less comfortable trying to design the right indexes for your database server.
Join me on Wednesday, Jan 28, for a free one-hour session explaining how the SQL Server engine looks at your indexes and builds your query results.
- The differences between clustered and nonclustered indexes
- How (and when) to make a covering index
- The basics of execution plans
- What determines sargability
- What SQL Server uses to estimate the memory your query needs
- What happens when those estimates are wrong
- And it’s all demoed with these Stack Overflow database pages (PDF)
I’m not kidding when I say it’s the best session I’ve ever written, and I’m really proud of it. This session is normally $29 in our online store, but thanks to Idera, this live version will be free. Register now.
When we first started our company, we laid out our goals in a shared Google Doc. We wrote about what we wanted to do, how we wanted to treat our employees and customers, and the growth numbers we wanted to achieve. One of our original goals was to hire one consultant per year.
In 2012, it was Jes Schultz Borland, and in 2013, it was Doug Lane. We held off adding a consultant in 2014 because we wanted to be financially able to hire two consultants at once in the next round. I’m not saying we WILL, I’m just saying we wanted to be ABLE to. See, one of our other goals is to be able to survive a 9/11-type business event – no incoming revenue for several months, and know that we can still pay everybody’s paychecks for that duration. That means we’re a much slower-growing consulting company, but everybody involved can sleep soundly at night.
In 2013, we also added a part-time admin assistant – my wife, Erika. She was previously a legal assistant at a downtown Chicago law firm, managing administrative and office work, so she’s been able to clean up a lot of our messes. (Scheduling training classes, selling videos via credit cards, helping out our accountants, and sending out Christmas cards with stickers requires a surprisingly high amount of time.) Having Erika on the team has really helped free us up to focus on the technology parts of the job that we love.
In 2014, we realized that just like admin duties, handling incoming sales requests had become a pretty big chunk of time too. Do the numbers: we have 5 consultants, and our main product is our 4-day SQL Critical Care®. That means we sign 4-5 new clients per week, which translates into hundreds of clients per year. That’s a lot of emails to answer, contracts to sign, and engagements to schedule. I do that work today in my spare time – and oddly, I really enjoy it – but if we’re going to scale this little shop, we’re gonna have to bring in a sales pro to manage this process full time.
So on our Employment Opportunities page, we added a
job description for our new salesperson.
We’re looking for a salesperson with experience. If you’ve been working with SQL Server for the last several years, and only SQL Server, it’s probably not a good fit. (That means most, if not all, of our readers, ironically.) However, you might know someone who’s a good fit, so we’re putting this out there.
We’re excited to bring them on board and get our sales process polished – because hey, that means we can hire more consultants and trainers, woohoo!
Update Jan 27 – we’ve got the resumes we need. Thanks everybody!
To figure it out, we have to define what the words senior and DBA even mean. I explain in this video.
To follow along, print this image out:
Kendra says: The part of this that resonated with me most was how to work with your existing management to show that you’ve been growing your skills – I wish I’d heard that back when I was struggling to figure out how to be a Senior DBA!
Jeremiah says: The best part of this was the reminder that I don’t need to be a specialist everywhere. If I had kept this advice in mind when I was struggling to become a Senior DBA.
Jes’s thoughts: my favorite advice from this video is to focus on 1-2 things to learn in the next 6-12 months, and make a plan for that. When I started, I assumed I’d be able to learn everything at the same rate – I know now that isn’t true. In 2015, I plan to learn more about Extended Events and virtualizing with VMware.
Doug says: Love it. It’s all true in my experience too — you can’t become a specialist without focus, you can’t focus without a plan, and you won’t get noticed for a promotion (or a new job) if you aren’t really good at a few things. No one in the SQL Server world ever made for themselves a reputation for being good at everything.
And the next step after finishing the video and building your learning plan: learn to say no.
When you’re trying to fix a problem (or even just researching something you’re curious about), do what scientists do.
1. Ask a question. “Why is this database so slow?”
2. Do background research. Hit Google, check Stack Exchange, maybe even read Books Online.
3. Construct a hypothesis. “I think this database is slow because I don’t have Lock Pages in Memory enabled.”
4. Test your hypothesis with an experiment. “On Saturday night during the change control window, I’ll enable Lock Pages in Memory. After that change, I believe that Page Life Expectancy will be higher during our normal business hour load periods – it’s currently 300, and I expect to see it go up.” Pick the exact metric you want to watch, and explain – in advance – how you expect that metric to change.
5. Analyze your data and draw a conclusion. “I measured PLE, but it still hovers around 200-300 during our weekday peak times. Therefore, enabling Lock Pages in Memory was not the solution I needed.”
6. If it didn’t get the result you expected, put it back. This is the part most folks miss – we make changes willy-nilly when we’re troubleshooting, and we leave behind all kinds of crazy sp_configure options, trace flags, and registry settings. The next time you have a problem on the server, it’s hard to know what settings we changed on purpose – and which unrelated ones we just button-bashed.
I can hear the experienced DBAs in the crowd screaming out that we should never run experiments in production, and that’s somewhat true. In a perfect world, we’d all have exact replicas of our production environment, and the ability to run production-quality loads. We don’t. Let’s start by taking the first step – making sure our production changes have the effect we expect.
If you’re a production database administrator responsible for backups, corruption checking, and index maintenance on SQL Server, try Ola Hallengren’s free database maintenance scripts. They’re better than yours (trust me), and they give you more flexibility than built-in maintenance plans.
However, the index maintenance defaults aren’t good for everyone. Here’s how they ship:
CREATE PROCEDURE [dbo].[IndexOptimize] @Databases nvarchar(max), @FragmentationLow nvarchar(max) = NULL, @FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE', @FragmentationLevel1 int = 5, @FragmentationLevel2 int = 30,
The defaults on some of these parameters are a little tricky:
When an index is between @FragmentationLevel1 (5%) and @FragmentationLevel2 (30%), the scripts will perform the actions listed in @FragmentationMedium, in order. First it’ll try an index reorganize, which is an online operation. If the index’s design doesn’t allow for that, IndexOptimize will try to rebuild the index online. If the index’s design or your SQL Server edition doesn’t allow for that, it’ll perform the last resort – an offline index rebuild.
When an index has higher fragmentation than @FragmentationLevel2 (30%), the scripts will try the actions listed in @FragmentationHigh – first an online rebuild, then if that’s not possible, an offline rebuild.
These defaults aren’t dangerous or deadly, but they do have drawbacks.
Index maintenance changes data pages and creates transaction log records. This means larger transaction log backups, which at first doesn’t sound too terribly harmful. However, if you’re using database mirroring, transaction log shipping, AlwaysOn Availability Groups, or anything else that copies your logged transactions around, you’ve put yourself behind. In one case, I saw an all-solid-state server generate ~25GB of transaction logs in under 3 minutes – all of which had to be copied out to the various reporting and disaster recovery servers. This creates an RPO/RTO problem without really solving a performance problem.
Index maintenance changes data pages, too. This also sounds harmless, but if you’re doing differential backups or storage replication, boom, you just inflated your storage and bandwidth costs. The larger your database becomes, the more you’ll start to investigate these recovery options, and the more index maintenance starts to backfire.
Offline index rebuilds cause blocking. As low as 5% fragmentation, the script defaults have the potential to take tables offline and block queries. I take kind of an extreme view on this: I would rather not take tables offline when there’s not a human being around to watch what’s happening.
So here’s the defaults I prefer:
- FragmentationLevel1 = 30%
- FragmentationLevel2 = 50%
- FragmentationMedium = ‘INDEX_REORGANIZE,INDEX_REBUILD_ONLINE’
- FragmentationHigh = ‘INDEX_REBUILD_ONLINE’
Note that I raised the fragmentation thresholds by a lot, and I removed the offline rebuild operations. I’m not saying I never rebuild indexes offline – I just don’t want to do that by default, with nobody around. When I’ve got tables that are heavily fragmented, and I’m convinced that a rebuild will solve the problem, AND I can only rebuild them offline, then I want to run the stored procedure manually when I can be around to keep an eye on it.
How to Change the IndexOptimize Defaults
You could just edit the IndexOptimize creation script. This way, whenever you execute it without any parameters, it will inherit the defaults you set at creation time. If you maintain IndexOptimize across a lot of servers, and you keep a single master version in a central location, this works.
But you have to be doggone sure to remember to change those defaults every single time you download a fresh version from Ola. (You won’t. You’re too busy reading blogs.)
The safer way is to leave his script the way it is, but when you work with SQL Agent jobs to run IndexOptimize, change the job’s settings.
You should probably be doing this anyway because you should create two separate Agent jobs: one to do your normal lightweight optimizations, and a second one for deeper optimizations (like with offline rebuilds). Some folks like to have weekday jobs that only do low-impact maintenance techniques for highly fragmented tables (like if something manages to hit 80% fragmentation on a Tuesday, go ahead and rebuild it online) followed by more invasive techniques on the weekends.
Just keep your backup schedule in mind – if you do full backups weekly on Saturday, followed by differential backups the rest of the week, don’t do your index rebuilds on Sunday. That instantly has the effect of inflating your differential backups all week long. Do the index maintenance first (like Saturday morning) followed by the full backup Saturday evening.
You may have heard the term “commodity hardware” thrown around when describing solutions like Redis, Elasticsearch, or kCura’s new Data Grid.
Commodity hardware refers to cheap, standardized servers that are easy to buy off the shelf from any vendor. Here’s a typical example of a 2u, 2CPU commodity hardware server:
- Supermicro SYS-1028R-TDW 1U rack server
- Two Xeon E5-2600 v3 CPU sockets
- 16 RAM slots, holding up to 1TB RAM
- 8 2.5″ hot-swap SATA 3 drive bays
- 2 PCI-E 3.0 x16 full height half length slots
- $900 at NewEgg
Say we buy a few parts to get ‘er started:
- Two Intel Xeon E5-2623 v3’s (quad core) – $900 total
- 128GB RAM (using 8GB DIMMs) – $1,920
- Two 512GB SSDs for fast storage – $450
- Six 4TB hard drives for slow storage – $900
- Grand total: $5,070
Not bad. Want a little more power? Here’s a faster config:
- Two Intel Xeon E5-2623 v3’s (quad core) – $900 total
- 256GB RAM (using 16GB DIMMs) – $3,500
- 8 1TB SSDs – $2,600
- Grand total: $7,900.
The term “commodity hardware” used to mean really crappy gear, but when you look at these numbers, that’s not necessarily the case anymore. You can build yourself quite the army of pizza boxes.
When vendors say, “You can deploy our solution on commodity hardware,” they’re not saying you’re going to get amazing performance with 16GB of RAM and a couple of spinning rusty frisbees. It’s time to reset your expectations about what commodity means.
Every now and then I run across an automated script that does something a little suspicious. I’m not saying you should never put these things into a script, but if you do, seriously think about the surprise gotchas when someone runs the script:
- DBCC commands other than CHECKDB
- sp_configure (and especially RECONFIGURE afterwards)
- ALTER SERVER CONFIGURATION
- ALTER DATABASE
- ALTER AVAILABILITY GROUP
- CREATE INDEX or DROP INDEX
- And most of the database engine management stored procedures
If you’re doing any of these on a scheduled basis, take a few minutes to document what you’re doing, why, and whether it’s safe to stop doing it. Your successor will thank you.
Believe me, otherwise she’s going to throw you under the bus when you’re gone, and if you haven’t left something behind to defend you, you’re going to look like That Guy.