Blog

Book Review: Virtualizing SQL Server with VMware

tl;dr – I do not recommend this book.

I was so incredibly excited when it was originally announced. A book published by VMware Press, written by prominent VMware, SQL, and storage consultants? GREAT! So much has changed in those topics over the last few years, and it’s high time we got official word on how to do a great job with this combination of technology. Everybody’s doin’ it and doin’ it and doin’ it well, so let’s get the best practices on paper.

When it arrived on my doorstep, I did the same thing I do with any new tech book: I sit down with a pad of post-it notes, I hit the table of contents, and I look for a section that covers something I know really well. I jump directly to that and I fact-check. If the authors do a great job on the things I know well, then I’ve got confidence they’re telling the truth about the things I don’t know well.

I’ll jump around through pages in the same order I picked ‘em while reading:

Page 309: High Availability Options

Here’s the original. Take your time looking at it first, then click on it to see the technical problems:

Page 309 - read it yourself critically first, think about what you know might not be right, then click for my annotated notes

Page 309 – read it yourself critically first, think about what you know might not be right, then click for my annotated notes

OK, maybe it was bad luck on the first page. Let’s keep going.

Page 111: Database File Design

Page 111 - read it critically first, then click here for my annotated version

Page 111 – read it critically first, then click here for my annotated version

The “Microsoft Recommended Settings” are based on a 2006 article about Microsoft SQL Server 2005. I pointed this out to the book’s authors, who responded that Microsoft’s page is “published guidance” that they still consider to be the best advice today about SQL Server performance. Interesting.

Even so, the #3 tip in that ancient Microsoft list is:

3. Try not to “over” optimize the design of the storage; simpler designs generally offer good performance and more flexibility.

The book is recommending the exact opposite – a minimum of one data file per core for every single database you virtualize. That’s incredibly dangerous: it means on a server with, say, 50 databases and 8 virtual CPUs, you’ll now have 400 data files to deal with, all of which will have their own empty space sitting around.

I asked the authors how this would work in servers with multiple databases, and they responded that I was “completely wrong.” They say in a virtual world, each mission critical database should its own SQL Server instance.

That doesn’t match up with what I see in the field, but it may be completely true. (I’d be curious if any of our readers have similar experiences, getting management to spin up a new VM for each important database.)

So how are you supposed to configure all those files? Let’s turn to…

Page 124: Data File Layout on Storage

Page 124 - read it, think on it, and then check out my notes

Page 124 – read it, think on it, and then check out my notes

Imagine this setup for a server with dozens of databases. And imagine the work you’d have to do if you decide to add another 4 or 8 virtual processors – you’d have to add more LUNs, add files, rebalance all of the data by rebuilding your clustered indexes (possibly taking an outage in the process if you’re on SQL Server Standard Edition).

What’s the point of all this work? Let’s turn to…

Page 114: You Need Data Files for Parallelism

Page 114 - you don't even have to click for my thoughts this time. See, I'm all about you.

Page 114 – you don’t even have to click for my thoughts this time. See, I’m all about you.

No, you don’t need more data files for parallelism. Paul Randal debunked that in 2007, and if anybody still believes it, make sure to read the full post including the comments. It’s simply not true.

I asked the authors about this, and they disagree with Paul Randal, Bob Dorr, Cindy Gross, and the other Microsoft employees who went on the record about what’s happening in the source code. The authors wrote:

You can’t say Microsoft debunked something when they still have published guidance about it…. If in fact if your assertions were accurate and as severe then we would have not had the success we’ve had in customer environments or the positive feedback we’ve had from Microsoft. I would suggest you research virtualization environments and how they are different before publishing your review.

(Ah, he’s got a point – I should probably start learning about SQL on VMware. I’ll start with this this guy’s 2009 blog posts - you go ahead and keep reading while I get my learn on. This could take me a while to read all these, plus get through his 6-hour video course on it.)

So why are the authors so focused on micromanaging IO throughput with dozens of files per database? Why do they see so many problems with storage reads? I mean, sure, I hear a lot of complaints about slow storage, but there’s an easy way to fix that. Let’s turn to page 19 for the answer:

Page 19: How to Size Your Virtual Machines

Page 19 - read critically, and then click for my annotated notes

Page 19 – read critically, and then click for my annotated notes

Ah, I think I see the problem.

To make matters worse, they don’t even mention how licensing affects this. If you’re licensing SQL Server Standard Edition at the VM guest level, the smallest VM you can pay for is 4 vCPUs. Oops. You’ll be paying for vCPUs you’re not even using. (And if you’re licensing Enterprise at the host level, you pay for all cores, which means you’re stacking dozens of these tiny database servers on each host, and managing your storage throughput will be a nightmare.)

In fact, licensing doesn’t even merit a mention in the Table of Contents or the book’s index – ironic, given that it’s the very first thing you should consider during a virtual SQL Server implementation.

In Conclusion: Wait for the Second Edition

I’m going to stop here because you get the point. I gave up on the book after about fifty pages of chartjunk, outdated suggestions, and questionable metrics (proc cache hit ratio should be >95% for “busy” databases, and >70% for “slow” databases).

This is disappointing because the book is packed with information, and I bet a lot of it is really good.

But the parts I know well are not accurate, so I can’t trust the rest.

Watch This Week’s Webcast Today (and Win a Prize Tomorrow)

We always like to innovate — not just with the solutions we design for our consulting customers and in how we teach, but in our free videos, too.

Our YouTube channel has become super popular. Lots of folks watch the recordings of our live webcasts. We stopped recently and asked, “How can we make this even better for the folks who attend our live event?” And we realized: we can give you more time to ask questions about that week’s training topic!

Here’s your mission:

  1. Watch the video below today. We won’t be presenting this live this week or re-covering the material from the video, we’re doing more advanced QA for the folks who’ve already watched it.
  2. Note down questions or comments you have on this post. (This is totally optional, but it means you won’t forget your question and it’s more likely we have time to talk about it with you.)
  3. Attend the live webcast on Tuesday at the normal time (11:30 am Central). Register here.
  4. During the first 10 minutes of the webcast, we’ll give away a prize– but you must be present to win!

The live discussion of the video and Q&A won’t be recorded and published, and you also need to be present to win the prize. See you on Tuesday!

The Mysteries of Missing Indexes [Video]

SQL Server is trying to help you – when you run a query, you see a missing index request. Before you run that CREATE INDEX script in production, consider a few things. How helpful will the index be? Is it similar to an existing index? Why is it recommending three indexes that are very similar? Jes will solve some common missing index mysteries in this 30-minute webcast.

#SQLPASS Speakers: Here’s How to Improve Your Session

This year there’s a couple of special challenges you need to keep in mind for projecting your beautiful knowledge out to the audience.

Fixing the Contrast Levels

PASS distributes a custom PowerPoint template – looks great this year – but the contrast level is extremely light. The template uses light blue and light gray text on a white background. In a bright room with less-than-perfect projectors, this is going to wash out pretty badly, and people won’t be able to read your brilliant nuggets of wisdom.

To fix this, open the slide template and click View, Master, Slide Master. Mine is Office for the Mac, but the general idea will hold true across all versions of PowerPoint:

PASS Summit Template - Slide Master

PASS Summit Template – Slide Master

Click on the slide title (“Click to edit Master title style”) so that the whole entire title box is selected, not the text in the title, and click onto the Home tab of the ribbon. From there, you can change the font color:

Changing the font color

Changing the font color

Just use a darker version of the existing color – that way the slides will still look matchy-matchy, but just have more contrast.

Repeat the same process with the content box as well, turning it into a darker gray or black.

This is totally okay – Lana Montgomery of PASS HQ writes, “Feel free to adjust the depths of the colors for the blue and grey for your presentation.”

Next: That Widescreen Thing

This year’s slide template is widescreen – 16:9 aspect ratio - and the projectors are widescreen as well. The minimum resolution will be 1280×720.

This is a fantastic resolution for slides, but if you’ve got demos, you need to make sure to test them at the 1280×720 resolution. There are some parts of SSMS’s user interface where buttons disappear at just 720 pixels of vertical resolution, for example. If you’re using a virtual machine, and you’re sacrificing some pixels at the top and bottom for toolbars, this may be even worse.

If you’re doing query demos, you may also want to take the widescreen effect into account. Often I run SSMS in a 4:3 aspect ratio, like 1280×1024, and there’s plenty of space to put the query at the top and my execution plan at the bottom. In a widescreen aspect ratio, that’s not the case – you’re better off putting the query in a pane on the left, and the plan in a pane on the right.

Knock ‘em dead! And while you’re prepping, ask these 51 questions about your session.

Is Transparent Data Encryption Right for Your Data? [Video]

Transparent Data Encryption is a method to encrypt your SQL Server data on disk. Using it can present challenges, though. How does it work? How does it interact with other SQL Server features? What does it not encrypt? Join Jes as she explains the ins and outs of TDE.

Want to try it yourself (in a test environment!)? Download the script.

Announcing a New 2-Day Event: Make SQL Server Apps Go Faster

We started with our top-selling PASS Summit pre-con, and then doubled it to two days long!

You’re a developer or DBA stuck with a database server that’s not going fast enough. You’ve got a hunch that the biggest bottleneck is inside the database server somewhere, but where? In just two days, you’ll learn how to use powerful scripts to identify which queries are killing your server, what parts of the database server are holding you back, how to tackle indexing improvements, and how to identify query anti-patterns.

On Monday and Tuesday before the PASS Summit conference, we’ll dive deeply into:

  • How wait stats tell you where to focus your tuning
  • How the plan cache shows you which queries are the worst
  • How to make fast improvements by picking the right indexes
  • How to identify and fix the most common query anti-patterns
  • And you’ll even get execution plan, query, and index challenges to solve during the class
Make yourself comfortable and get your learn on.

Make yourself comfortable and get your learn on.

You’ll be plenty comfortable because we’re holding this independent event in the Big Picture Theater in downtown Seattle, one mile from the Convention Center. Coffee, beverages, snacks, and lunch are all covered. (Sorry, no popcorn – you know we wouldn’t be able to stop eating it once we get started.)

Registration is $695, and for another $200, we’ll throw in our Developer’s Guide to SQL Server Performance video class. This way, you can get started learning right away without waiting for the class!

Act fast, save $100 - if you register during July with coupon code GOFAST, registration is only $595.

Space is limited to just 100 seats – way less folks than we had last year, so you’d better move fast if you want to get in. This is a totally independent training event run by Brent Ozar Unlimited – you can’t get in with your PASS Summit ticket, and you can’t register for it as part of the PASS check-out process.

Check out the details now.

Capturing Deadlocks in SQL Server [Video]

If you’re experiencing deadlocks in SQL Server, you want to know more about them. What queries are causing them? What tables and indexes are at the root of the problem? You can find this information by using tools such as trace flags, Profiler, and Extended Events. Jes shows you how in this 25-minute video:

Download the scripts here.

For more videos like this:

How to Get “Senior” in Your Title [Video]

In my blog post Sample SQL Server Database Administrator Job Descriptions, I explained how production DBAs are like fridge maintenance engineers, and development DBAs are like chefs.

In this 35-minute video, I explain how this plays into the concept of a “senior DBA” – you can’t just say “senior fridge user” any more than you can say “senior DBA.” There’s a lot of kinds of database administrators, and they each have different job tasks.

In the video, I use this grid to lay out common job duties:

DBA Job Duties

DBA Job Duties

Then in the video, I use a 10-question self-assessment test to check your skills across each job duty, figure out where you’ve got the best strengths, and guide your training to focus on one of the most common job descriptions.

For more videos like this:

Sizing SQL Server for AWS

Let’s skip the shenanigans and get right down to it – you’re responsible for SQL Server and someone in your company wants to move things into Amazon Web Services (AWS). You’ve got SQL Server setup covered thanks to our helpful SQL Server setup checklist and you’re confident you’ve got hardware under control, but things are different up in the clouds.

The question on your mind is “What’s the best way to size a SQL Server in the cloud?”

The Basics: Sizing the Server

measure-this

Amazon gives you a set list of server sizes to choose from. It’s up to you to pick the right server for the job. The available choices can be overwhelming at first glance, but it gets simpler when you consider different classes of SQL Server and map them to different families of AWS instances.

By the way: we call servers instances in the cloud. Yes, it’s confusing. No, we can’t change it.

Instances are grouped into instance types – you can think of an instance type as a general category of performance. This is going to be your starting point for deploying SQL Server. Don’t worry, though, you can usually change the instance type after you’ve configured SQL Server.

Possible SQL Server instance types:

  • M3 General Purpose – These are basic servers with up to 8 cores and 30GB of memory. M3 instances won’t win any awards for speed, but they are cheap.
  • R3 Memory Optimized – These instances ship with up to 32 cores (2 sockets) and 244GB of memory. Two local SSDs are thrown in – seems like a good place for TempDB to me!
  • I2 IO Optimized – Sounds great at first, right? All the memory of the R3 with 6.4TB of local SSD. At only $7 an hour, it’s a steal!

So, which one do you need? Remember that you don’t want to just choose the biggest and baddest instance right off the bat – there’s a noticeable cost associated with any of these instances.

We’ll get back to picking an instance type after covering a few other details.

Support Level

In many organizations, DBAs are lucky to have tiered support. Mission critical applications like POS systems, or credit card processing get put in the highest tier of support. Internal applications may end up in a second tier. And, finally, legacy applications or applications that are not critical in any way to the business end up in the third tier.

We can’t ask AWS to give us faster disks, SAN replication or some other magical enterprise HA feature. But we can use this support information to figure out if this SQL Server requires the attention of a full time DBA or if it can be offloaded to something like Amazon RDS. Making this decision is outside the scope of this article, but knowing that there is an option is enough.

  • What’s the support level required for this application?
  • Does the application require regular manual tuning?
  • What’s the support SLA?
  • Is there an incident response plan?
    • Has anyone ever reviewed the incident response plan?
    • Have you ever had to follow it? That is: does it actually work?

Networking

I bet you don’t even think about networking.

The first thing you need to know is: by default, AWS instances use a single network card (NIC) for all traffic. Storage and communication use the same pipe. If you want to use database mirroring and rebuild your indexes, you may be in for a nasty surprise when mirroring drops in the middle of a rebuild.

Second: not all instance types are networked equally. Look in the Networking Performance column to get a vague idea of what’s going on. “Moderate” equates to 500 Mbps, “High” is 1000 Mbps, and “Low”/”Very Low” shouldn’t be entering the equation for a SQL Server.

Watch the storage and network throughput of your SQL Server to verify how much total throughput you need.

For SQL Server it’s better to guarantee a certain level of both network and disk performance. AWS offers what are called
EBS-Optimized Instances. These EBS-Optimized Instances use a separate network to communicate with storage in AWS. Traffic on the storage network won’t cause problems for traffic on the regular network.

Use EBS-Optimized Instances for your most demanding applications.

Other applications may not require as much throughput, but you should adjust maintenance to prevent the shared network from becoming a bottleneck.

Network considerations are:

  • What is typical network throughput for the application like?
  • Do you need separate storage and networking pathways?

Storage

storage

I’m going to repeat myself: Use EBS-Optimized Instances for your most demanding applications. You can change this after the instance is up and running but it will require a restart. Plan carefully.

For I/O intensive workloads, AWS offers Provisioned IOPS (PIOPS). The PIOPS option lets you decided on a guaranteed number of IOPS (+/- 10%) available to the system using that disk. PIOPS disks have an increased cost associated with them – they’re 2.5 times more expensive per GB than regular storage and you pay $0.10 / IOP / month.

Stripe drives with caution. In a conventional data centers, it’s a best practice use RAID to configure large volumes and to stripe data across many disks for additional performance. Although RAID striping is possible in AWS, keep in mind that any RAID volume is only as fast as the slowest disks in the RAID volume. Since all permanent storage in AWS is networked attached (and subject to unpredictable latency), think very carefully before setting up RAID volumes.

Storage is somewhat limited in AWS, too. Each AWS disk (EBS volume) can be up to 1TB in size. Up to either 16 or 25 disks can be attached to an instance (this depends on some driver limitations). At a maximum, 25TB of data can be stored on an AWS instance using EBS volumes. I bet you were worried for a minute, right? Anyone wanting to store more than 25TB of data will need scale out across multiple servers.

The storage considerations are:

  • How much storage do you need for data, logs, and backups?
  • How fast does that storage connectivity need to be?
  • How much are you going to access that storage in any given second?
  • Do you need to guarantee the accesses per second?
  • What is the rate of data growth?
  • What is the rate of data retention?
  • What is the total volume of data being stored?

Picking an AWS Instance Type for SQL Server

So how should you go about picking an AWS instance type for SQL Server?

Take a look at your memory requirements for SQL Server and compare that the instance types on offer in AWS. This may exclude several instances types immediately.

After answering the storage questions, determine if you need provisioned IOPS, dedicated storage connectivity, or even 10Gb networking. This further narrows down the choices you have for instances.

Once you’ve reached this point, you should have several choices of instance size available.

Put It All Together:

  • Network
    • What is typical network throughput for the application like?
    • Do you need separate storage and networking pathways?
    • Are features in place that require additional network throughput?
      (This might be AlwaysOn Availability Groups, database mirroring,
      or log shipping.)
  • Storage
    • How much storage do you need for data, logs, and backups?
    • How fast does that storage connectivity need to be?
    • How much are you going to access that storage in any given second?
    • Do you need to guarantee the accesses per second (IOPS)?
    • What is the total volume of data being stored?
  • Memory
    • What are the memory requirements of the existing SQL Server?
    • What is the use case for this SQL Server? OLTP? DSS?
    • What other services will run on this system?
    • Do you need Enterprise Edition?
  • CPU
    • What is the current CPU count?
    • What is the current CPU speed?
    • Do you require any Enterprise Edition features?
  • Support
    • What is the support agreement for this SQL Server?
    • What’s the support level required for this application?
    • Does the application require regular manual tuning?
    • What’s the support SLA?
    • Is there an incident response plan?
      • Has anyone ever reviewed the incident response plan?
      • Have you ever had to follow it? That is: does it actually work?
  • Instance Size: If you still have choices left to make between one of several instance sizes, ask a few more questions:
    • What is our current rate of activity growth? (e.g. as data volume doubles, will user activity double?)
    • How much will this instance cost over 1 year? 3 years?
    • Is a reserved instance a possibility (taking into account your growth rate)?

When you put it all together, sizing a SQL Server in AWS is a lot like sizing a on-premises SQL Server with several constraints around sizing and capacity.

Bonus questions:

  • Performance Profile
    • Is there a performance SLA?
    • What are the top wait types of an existing SQL Server?
    • What aspects of the application can be changed?
  • Doing Less Work

 Ignorable AWS Instance Types

Instance types we can completely ignore:

  • G2 GPU – This is for people who do math on graphics cards.
  • C3 Compute Optimized – Instances with a lot of CPU, some local SSD scratch space, and not a lot of memory. Great for HPC, not great for SQL Server.
  • HS1 High Storage Density – These monsters can have up to 24 2TB local drives delivering 2.6 gigabytes per second of disk throughput. At first glance, they look like a dream. However, the data stored on the local disks will be erased if you stop and then start your AWS instance. You could try to use AlwaysOn Availability Groups, but a total AWS Availability Zone or Regional outage could lead to massive data loss. These are primarily intended for clustered computing like Hadoop or distributed data warehouses (GreenPlum).

Capturing Deadlocks in SQL Server

What’s a deadlock? Well, let’s say there’s a fight going on between Wonder Woman and Cheetah, and, in the same room, a fight between Batman and Mr. Freeze. Wonder Woman decides to help Batman by also attempting to throw her lasso around Mr. Freeze; Batman tries to help Wonder Woman by unleashing a rope from the grappling gun at Cheetah. The problem is that Wonder Woman already has a lock on her opponent, and Batman has his. This would be a superhero (and super) deadlock.

When a deadlock occurs in SQL Server, two or more tasks are running and holding locks on data. Then, each task requests to lock the data the other task is already holding. Both tasks wait for the other to give up. Neither does. SQL Server could let this showdown continue indefinitely, but it won’t. It picks one task – usually, whichever will be the least expensive to roll back – as the victim, and that task is killed.

How do I know if it’s happening to me?

You might notice slowness in your application. Users might complain about slowness or error messages. And, you’ll see a message in the SQL Server Log that says “Transaction (Process ID 103) was deadlocked on resources with another process and has been chosen as the deadlock victim.”

Your server has encountered a deadlock.

How can I capture more information about it?

You have several options: you can enable a trace flag to write more information to the log, you can capture deadlocks graphs using Profiler or Extended Events, and you can track the number of deadlocks occurring using Performance Monitor.

Use trace flags to write to the log

Two trace flags can be enabled to capture more information in the log: 1204 and 1222. 1204 lists the information by node; 1222 lists it by process and resource. You can enable both simultaneously. To enable the flags, use the command

DBCC TRACEON (1204, 1222)

Here’s a small sample of what would appear in the log:

tf 1222

No one wants to read through that, manually parsing it to figure out what database it occurred in, what tasks were involved, and which was the victim. Wouldn’t it be great if this was represented graphically?

Capture a deadlock graph with Profiler or Extended Events

It is – you just have to know where to look to for a deadlock graph. In SQL Server 2005, 2008, and 2008R2, I prefer to use a Profiler trace; in SQL Server 2012 and 2014 I rely on Extended Events.

Profiler

When you set up a Profiler session to gather deadlock information, select the “Deadlock graph”, “Lock:Deadlock” and “Lock:Deadlock Chain” events. Make sure you go to the Events Extraction Settings tab and select the option “Save Deadlock XML events separately”. This ensures that the .xdl files – the graphs – will be saved as separate files. When you start the session, if a deadlock occurs, you will see “Deadlock graph” captured as an event.

deadlock graph

The task with the blue X on it is the victim. By holding your mouse over the process oval, you can see what statement was being executed by that task. The Key Lock rectangles will help you find the object and index that the locking and blocking occurred on.

Extended Events

In the future, Profiler will be removed from SQL Server. Extended Events (EE) is taking its place. You have two options for getting deadlock information using EE. First, there is the system_health session that, by default, runs continuously. You can mine this for deadlocks that have happened in the past. You can also set up your own session to capture only deadlock information (and any other relevant events).

If using the system_health session, filter on “xml_deadlock_report”. If setting up your own session, capture the same event. If you are using the GUI available in 2012 and 2014, you can open the file and view the deadlock graph on the “Deadlock” tab.

system_health

If parsing the results using T-SQL and XQuery, you would use the .query method to extract the deadlock XML, like such:

SELECT DeadlockEventXML.query('(event/data[@name="xml_report"]/value/deadlock)[1]') AS deadlock_graph

This generates the XML, which you would then save as an .xdl, then close and re-open with SSMS to view the graph.

Track Performance Counters

Performance Monitor (PerfMon) is valuable for tracking many statistics. One of the metrics you can track for SQL Server is SQLServer:Locks – Number of Deadlocks/sec. This is the least informational of the methods mentioned here – it only gives you a count of how many are happening per second in your system. It can be helpful, however, to track this to see what times of the day produce the most deadlocks. You can then use the other methods mentioned here to research what is happening during that time to cause the deadlocks.

Don’t be a victim!

Take charge of your SQL Servers! Become familiar with detecting deadlocks, finding out what caused them, and fixing them! For more tools and advice, visit Locking and Blocking in SQL Server.

css.php