Blog

Our #SQLPASS Schedule (And a Downloadable Calendar)

Want to see our sessions in Seattle? Add all of our calendar events to our schedule:

Or if you’d rather add individual events:

Kendra Little – World’s Worst Performance Tuning Techniques
Wednesday 1:30PM – 2:45PM, Room 6A

Brent Ozar – CXPACKET and MAXDOP Lightning Talk
Wednesday 3PM-4:15PM, Room 606

Kendra Little – Why Does SQL Server Keep Asking for this Index?
Thursday 3PM-4:15PM, Room 6C

Brent Ozar – Developers: Who Needs a DBA?
Thursday 5PM-6:15PM, Room 6C

Doug Lane – From Minutes to Milliseconds: High-Performance SSRS Tuning
Friday 8AM-9:15AM, Room 6A

Jeremiah Peschka – Dynamic SQL: Build Fast, Flexible Queries
Friday 8AM-9:15AM, Room 6B

Jes Schultz Borland – Are Your Indexes Hurting or Helping You?
Friday 10:15AM-11:30AM, Room 618-620

The Very First Concern with SQL Virtualization: Licensing

Every single thing you do with a virtual SQL Server starts with one very important concern: how are you going to license it?

The 3 Common Ways to License Virtual SQL Servers

The software licensing section of my local library

The software licensing section of my local library

Developer Edition, by the guest – Developer Edition has all the power of Enterprise Edition, but it’s way cheaper (like in the $50/server range) and you’re not allowed to use it for production purposes. Because the licensing is so cheap, just mix these VMs in with the rest of your guests on a shared pool of hosts. Developer Edition licensing is handled differently than other SQL Server editions, typically bundled with your developers’ MSDN subscriptions.

Enterprise Edition, by the host – When you have 5 or more guests, it usually makes sense to buy a dedicated pair of host servers in an active/passive configuration. For example:

  • Dell R720 Primary – 2 sockets, 4 cores each, 768GB memory – ~$25k USD
  • Dell R720 Secondary – same hardware config – ~$25k USD
  • 8 physical cores of SQL Server Enterprise Edition licensing and Software Assurance – ~$55k USD (you only have to pay licensing on the physical cores regardless of whether you enable hyperthreading)
  • Hypervisor, Windows, management software licensing, shared storage – left as an exercise for the reader

You can configure your hypervisor (VMware, Hyper-V, Xen) to use one of the hosts as the primary, and when it fails, automatically boot up the guests on the secondary R720. This way, you only have to license one of the two hosts, and you can run an unlimited* number of SQL Servers on this host. (* – Not really.) Even better, even though they may be small VMs, you can still use Enterprise Edition on these guests and get access to cool EE-only features like online index rebuilds.

As your SQL Server needs grow, you can license the Secondary and balance the VM load across both, and even start to add additional licensed hosts. The key, though, is to avoid running any non-SQL-Server loads here because these are your utmost expensive host cores.

Standard Edition, by the guest – When you have 4 or fewer SQL Server instances, and they don’t need enough hardware horsepower to merit a dedicated host pool, then you’re usually best off licensing each guest with Standard Edition. This way, you can intermingle them on the same hardware that the rest of your virtual guests use.

At around $2k USD per core, it’s roughly 1/4 the price of Enterprise Edition, but you have to keep an eye on the number of virtual cores you’re using. You may read training material that says “just add vCPUs when you have performance pressure,” but this is expensive advice when you’re licensing each guest by the core. As you approach 16 vCPUs of Standard Edition licensing, you need to build a plan to migrate over to the host-based Enterprise Edition licensing approach. 28 vCPUs of Standard costs about the same as 8 vCPUs of Enterprise, but only the latter lets you run an unlimited number of guests, use over 64GB of memory on SQL 2012 and prior, do online index rebuilds, etc.

How Licensing Affects Your Administration

If you need SSRS, SSIS, or SSAS, and you’re licensing hosts with Enterprise Edition, you simply deploy these other services on separate virtual servers. You never make these other services fight with the core SQL Server engine for the same memory. However, if you’re on Standard, you saved money on licensing, but you spend extra performance troubleshooting time. You’ll need to install these services on the same VM as the engine to keep costs down, but you’ll need to understand how to triage performance.

When you design a high availability solution, the edition determines your choices as well. If you’re on Standard, and you need automatic failover for OS & SQL problems, you’ve backed yourself into a corner of either failover clustered instances or synchronous database mirroring. These solutions add complexity that shops without a DBA often can’t afford, so they end up going without OS/SQL protection and sticking with the simpler hypervisor-based failovers. On the other hand, once you’ve decided to license the hosts with Enterprise Edition, you can take advantage of AlwaysOn Availability Groups and asynchronous database mirroring without extra licensing costs – even for your smallest VMs.

Knowing your licensing means faster decision-making on architecture, and that’s why you want to start your virtualization projects here. Licensing is usually the most expensive and most restrictive part of the project – get it right first.

To learn more about licensing SQL Server, check out the 2014 Licensing Guide PDF.

We’re Changing SQL Server Training Classes.

Until now, SQL Server training usually meant registering for a course, showing up, hearing a lecture from a podium (often from someone who’s only done demos on their laptop, not production systems), working through a few workshops, and then going home with a bunch of ideas. It was like trying to ride a rollercoaster by jumping on as it came speeding toward the platform, having a great time, and then jumping off at high speed again. No graceful transitions, just a shocking experience.

We wanted to do better.

We wanted to ease students into the experience, get them more involved with the material ahead of time, connect the dots between the demos and their real environment, and empower them to help each other with their studies. We wanted to mix up the training so that it was less about theory, and more about their real world environment. Finally, as students left the classroom, we wanted to ease them back into the real world in a way that empowered them to succeed.

We couldn’t accomplish all of those goals immediately – doing this kind of practical, realistic, and comprehensive training is really hard work, so we’ve been gradually experimenting with more and more tactics this year. Here’s what we’ve come up with so far.

What It’s Like to Attend Our Training

Kendra Teaching

Kendra Teaching

We start mixing things up as soon as someone registers for one of our courses. For example, our How to Be a Senior DBA students were assigned a series of private pre-requisite training videos as soon as they registered for the course. They get started learning about RPO, RTO, and SQL Server’s options to meet their availability goals. (Behind the scenes, it took us years of work to get our e-commerce setup and training videos to this point, but the integration is definitely paying off now.)

On the first day of the in-person class, we started by discussing a couple of real-life case studies – then turned their own environment into a real case study. Each attendee used worksheets to describe their current production environment’s requirements to the person sitting next to them, and that person acted as a consultant. The team worked together to sketch out the right long-term solution for a company’s needs, and then they switched roles. Each student was able to sit on both sides of the issue – describing a problem, and building a solution for someone else.

In just one hour, this combination of pre-learning plus collaborative exercises accomplished three goals: it reinforced the student’s knowledge, helped them get a second opinion about their systems, and built a relationship with other people in the room. The relationship-building paid off throughout the course as the students carried on lively discussions at every break, over every lunch, and hanging around the flipcharts to talk shop. I wish I could say this was part of our giant master plan, but it was just a happy discovery that we made back in San Diego, and we’ve been using it ever since.

Another discovery this year was that students learned better when we assigned them script homework before the course. For example, before the SQL Server Performance Troubleshooting class, students were assigned scripts to run on their production servers and copy the results into Excel.

In Chicago, one of the students came up to me and said:

“I’m so glad you gave us the wait stats homework before class. I’d run the queries on several of my slow production servers, and I didn’t understand the results at the time, but now it all makes perfect sense. I understand why the data warehouse has PAGEIOLATCH and the web app has CXPACKET, and I know exactly what to do next.”

As we’d been covering wait stats, she’d been comparing her results with what we were discussing, and she was able to make a simple set of steps for her next maintenance window – all while sitting in class. Some of our students even used our lab scripts on their live production servers, fixing issues as we discussed them. (That might be taking the concept a little too far!)

Our Next Round of Training Experiments: Seattle

In our 2-day Seattle class, Make SQL Server Apps Faster, we’re running a couple more experiments.

Jeremiah Teaching

Jeremiah Teaching

We’re starting off with a day of lectures on performance tuning. We cover how to think like SQL Server, using wait stats to know why SQL Server is slow, tuning with the plan cache, beating anti-patterns, building the best indexes, and a challenge for attendees.

We weren’t content with that. People keep asking for more hands on sessions like “Watch Brent Tune Queries.” You really want to see our thought processes at work. We’ve done just that. Attendees think through the problem and sketch out their answers. Then they spend the day watching us solve it – they watch me doing query tuning, Kendra doing index tuning, and Jeremiah working through execution plans. At the end of the day, we judge their answers together.

We’re also offering a package deal – some attendees have said they want to watch recordings of the sessions later to reinforce what they’ve learned. (They already get the demo scripts, but sometimes it’s better to watch us doing it.) For an extra $200, the attendees get our 7-hour Developer’s Guide to SQL Server Performance videos that cover about half of what we’re teaching in-person, and we’ve extended similar deals to our 2014 training class attendees too.

For 2015, our new week-long courses represent another new experiment. This year’s attendees kept telling us, “As long as I’m out of the office, I might as well be out for the whole week.” Sounds good! So we’re taking more time to dive deeper into more topics.

We’re excited about how we’re doing SQL Server training differently than anybody else on the market, and we can’t wait to share it with you.

Great new Microsoft KB article: Recommended SQL Server Updates and Trace Flags

Got a high performance SQL Server 2012 instance? Check out the brand new KB article for recommended updates and configuration options for SQL Server 2012 and SQL Server 2014 used with high-performance workloads.

Make sure you expand the following plus sign – this is where all the good stuff is stored:

Expand this.

Expand this.

After expanding it, you’ll get a huge list of categorized advice for trace flags, cumulative updates, MAXDOP settings, and much more:

Detailed performance advice

Detailed performance advice

I haven’t read through the details on this yet, but I’d note that this advice is focused on high performance workloads. Don’t go enabling trace flags without understanding what they do and their side effects.

I continue to be impressed by the documentation Microsoft is putting out. Books Online keeps getting better and better, and this is the coolest KB article I’ve seen in a long time. Way to go, Microsoft!

How to Tell If You’re Good At SQL Server Transactional Replication (Quiz)

Let’s play the quiz! We’re using the honor system: be honest with yourself.

Five Simple Questions

  1. When I say the first name “Hilary”, you immediately know his last name is ____.
  2. Have you successfully set up replication so that you can add and remove articles without running a snapshot for everything, and done this before? (Yes or No). _____
  3. Do you have monitoring set up that notifies you immediately if a subscriber is behind? (Yes or No) ____
  4. How many books on SQL Server replication do you own (which you have read)? _____
  5. Do you know exactly how long it would take to re-initialize all your publications? (Yes or No) ____
Lego builder shutterstock_119038624

This is totally stable!

Score Your Answers

  1. One point if you thought of this man.
  2. One point for Yes
  3. One point for Yes
  4. One point if the answer is 1 or higher.
  5. One point if your answer is “I know exactly how long, and I wish I didn’t know.”

Here’s how to interpret your score:

  • 1-3 points: You’re just beginning, but you cared enough to take the quiz. You can get there.
  • 4 points: You’re good at replication, AND you’re honest ;)
  • 5 points: You’re an expert

Learn More

Ready to learn more about replication? We’ve written a lot about it.

I’m Speaking at kCura Relativity Fest 2014 about SQL Server

I’m very proud to announce that kCura selected me to speak again this year at Relativity Fest, the annual gathering of e-discovery professionals. I’ve been doing a lot of Relativity work (here’s a recap of my blog posts about it) and I really like working with the company and the product. Heck, I even attended end-user training recently because I wanted to be able to better speak the language of the lawyers.

Here’s my sessions this year:

Building a Reliable SQL Server for Relativity

Your users want Relativity to be up at all times. What’s the first step? How much work is involved? What will it all cost? Microsoft Certified Master Brent Ozar will give you a simple worksheet to get management and the IT team on the same page, and then show you how to turn those specs into a rough project plan and budget. You’ll learn how to choose between virtual and physical servers, explain different high-availability methods, and understand why clustering is such a no-brainer choice.

Why is My SQL Server Slow?

You’re a DBA who’s struggled with Perfmon metrics and Profiler. You’re facing a sea of confusing numbers, and you don’t know where to focus first. Microsoft Certified Master Brent Ozar will give you a friendly introduction to wait statistics—the dramatically easier way to find your SQL Server’s bottleneck. He’ll show you how to collect the right data with free scripts, then interpret confusing results like CXPACKET and PAGELATCH_EX.

Sound good? Join me!

I’ll also be hanging out during the entire Fest, and it’s your chance to pick my brain at length for free. But even if you don’t like me (and who could blame you), if you work with Relativity, you should attend Fest. It’s a great way to meet hundreds of people who share your job and your challenges. Head on over and register for Fest – they usually do sell out early.

Should I Run sp_recompile After I Create An Index?

Making index changes in SQL Server is tricky: you immediately want to know if the new index helped your performance and if it’s being used, but SQL Server execution plans and their related statistics can be are insanely confusing.

It can be useful to run sp_recompile after you create an index, but not necessarily for the reason you might think.

It’s easier to show this than just write it out.

Let’s Fix a Slow Query By Creating an Index

Let’s say the biggest, baddest query on our SQL Server is from a stored procedure called dbo.kl_RecentlyCreatedAnswers. Our free tool, sp_BlitzCache, calls this procedure out for being our #1 CPU user:

exec sp_BlitzCache @top=5, @results='narrow';
GO

blitzcache before

I can fix that. I design and test this index, and I deploy it to production. It’s totally going to fix this bad execution plan!

CREATE NONCLUSTERED INDEX itotallyfixedit
ON [dbo].[Posts] ([CreationDate],[PostTypeId],[Score])
INCLUDE ([Id])
GO

I’m excited to see how awesome things are, so I immediately run sp_BlitzCache again. But here’s what I see:

blitzcache after

Wait a second. But I just… It hasn’t even changed… What the heck, SQL Server?

Why Is My Terrible Plan Still In Cache?

Creating the index doesn’t cause SQL Server to find related plans in the cache right away and flush them out. My execution plans will hang out in the cache until one of these things happens:

  • The query runs again. SQL Server sees that the schema on the table has changed and decides it needs to reconsider what to do, it recompiles the execution plan*.
  • The old plan gets “aged” out of cache if it isn’t used again (maybe pretty fast if there’s a bunch of memory pressure)
  • The plan is cleared by DBCC FREEPROCACHE, taking the database offline, a SQL Server restart, or a settings change that impacts the plan cache

*The fine print: Books Online lists a list of causes of recompilation here– note that creating an index on the table isn’t necessarily guaranteed by the list. However, the amazing Nacho Portillo recently blogged on this after looking at the source code and indicates that creating an index does flip a ‘schema changed’ bit that should reliably always trigger a recompile. He also mentions that there’s really no way to query all the plans that are still in the cache but are basically ‘invalidated’ due to the metadata change. Sorry, rocket scientists.

But My Plan Is STILL In Cache. Sort Of. Remember When I Said This Was Confusing?

Once the query runs again, I see something different. It did automatically decide to use my new index!

blitzcache after after

Wait a second. Something’s weird. Compare the average executions and CPU for the stored procedure (line 1) and the statement in it (line 2). They don’t seem to match up, do they?

Here’s what happened: the stored procedure ran again. The statement detected the schema change and recompiled. But the *whole* stored procedure didn’t recompile, and it’s showing me stats for 13 executions (not just the 10 since the index change). So my old performance metrics are all mixed up with my new performance metrics. I’m not loving that.

sp_recompile Can Help

Confusing, right? Because of this issue, you might want to run sp_recompile against the stored procedure after making an index change, even if it decided to use it. This forces the whole procedure to get a fresh plan and start collecting fresh execution statistics the next time it runs.

You could also take a heavier hand and run sp_recompile against the whole table, but do that with care: it requires schema level locks and can cause long blocking changes if lots of queries are reading and writing from that table.

Remember: even with sp_recompile, the execution plan stays in cache until it runs again (or is evicted for other reasons). The benefit is just that it will give you a “fresher” view of the execution stats for the whole stored procedure.

Fact: It’s a Little Messy

The main thing to know here is that creating indexes won’t drop or flush plans out, so don’t be surprised if you see old plans in execution plan analysis after you add indexing changes. This isn’t a completely tidy process, sometimes things are a little bit messy.

If you’re actively looking at execution plans in your cache, then running sp_recompile after you create an index can help ensure you’re looking at consistent data. But use it with care and monitor for blocking– don’t leave it unattended.

This example used a downloaded copy of the StackOverflow database. Learn how to get your own here.

Developers: You Don’t Need a DBA. I’ll Prove It.

That’s right, you heard me, and I’ll prove it on a really big stage. I’m honored to announce that I’ll be the first speaker at this year’s 24 Hours of PASS: Summit Preview Edition.

On Tuesday, September 9th, join me for my newest session: “Developers: Who Needs a DBA?” Here’s the abstract:

You store data in SQL Server, but you don’t have enough work to keep a full-time DBA busy.

In just one hour, you’ll learn the basics of performance troubleshooting and index tuning. I’m a recovering developer, and I’ll teach you the basic care and feeding of a Microsoft SQL Server 2005, 2008, 2012, or 2014 instance, plus give you scripts to keep you out of trouble.

Register now for free. While you’re there, check the boxes for some of the other 24 back-to-back SQL Server training sessions from community volunteers who’d love to share their knowledge with you. See you there!

Generating Identities

The only thing you ever need to use for database identity is an IDENTITY, right? Well, maybe. There are a lot of different options and they all have different pros and cons.

IDENTITY columns

The default way to identify objects in SQL Server is to use an INT or BIGINT column marked as an IDENTITY. This guarantees relatively sequential numbers, barring restarts and failed inserts. Using identity columns put the responsibility for creating and maintaining object identity in the database.

SQL Server will cache IDENTITY values and generate a new batch of identity values whenever it runs out. Because identity values are cached in memory, using identity values can lead to jumps in the sequence after SQL Server is restarted. Since identities are cached in memory in large batches, they make it possible to rapidly insert data – as long as disks are fast enough.

Sequences

Sometimes the application needs more control over identity. SQL Server 2012 added sequences. A sequence, unlike an identity value, is a separate object in the database. Both application and database code can read from the sequence – multiple tables can share a sequence for an identity column or separate sequences can be created for each table.

Developers using a sequence can use the CACHE value to cache a specific number of sequence values in memory. Or, if the application should have minimal gaps in the sequence, the NOCACHE clause should be used.

The Problem with Sequential Identities

Both IDENTITY and SEQUENCE values keep identity generation squarely in the database and, by using integral values, they keep the value narrow.

You can run into problems with sequential inserts on very busy systems – this can lead to latch contention on the trailing pages of the clustered index. This issue can be resolve by spreading inserts across the table by using a GUID or some other semi-random clustering key. Admittedly, most systems are never going to run into this problem.

GUIDs for Object Identity

Some developers use GUIDs as a way of managing object identity. Although database administrators balk at this, there are good reasons to use GUIDs for object identity.

GUIDs let the application generate object identity. By moving object identity out to the application layer, users can do work in memory and avoid multiple round trips to the database until they’re ready to save the entire set of data. This technique gives tremendous flexibility to application developers and users.

There’s one other thing that a well designed application gets from this technique – independence from the database. An application that generates its own identity values doesn’t need the database to be online 24/7; as long as some other system is available to accept writes in lie of the database, the application still function.

Using GUIDs for object identity does have some issues. For starters, GUIDs are much wider than other integral data types – 16 bytes vs 4 bytes (INT) or 8 bytes (BIGINT). This is a non-issue for a single row or even for a small database, but at significant scale this can add a lot of data to the database. The other issue is that many techniques for generating sequential GUIDs in the application (see NHibernate’s GuidCombGenerator) can still run into GUID collisions.

Integral Generators

What if you could get the best of both worlds? Applications generating unique identities that are also sequential?

The point of identity generation is to abstract away some portion identity from data attributes and provide an independent surrogate value. GUIDs can provide this, but they aren’t the perfect solution. Identity generators like flake or rustflakes promise roughly sequential identity values that are generated in the application layer and are unique across multiple processes or servers.

The problem with an external identity generator is that it is an extra piece of code that developers need to manage. External dependencies carry some risk, but these are relatively safe items that require very little effort implement and maintain.

The Solution

There’s no right solution, there’s only a solution that works for you. You may even use each solution at different points in the lifecycle of the same product. It’s important, though, for developers and DBAs to be aware of how identity is currently being handled, the issues that can arise from the current solution, and ideas of how to handle it going forward.

Why Archive Data?

The data story so far

The data story so far

Meet Margot. Margot is an application developer who works for a small company. Margot’s application collects and generates a lot of data from users including their interactions with the site, emails and texts that they send, and user submitted forms. Data is never deleted from the database, but only a few administrative users need to query historical data.

The database has grown considerably because of this historical data – the production database is around 90GB but only 12GB or so is actively queried. The remaining data is a record of user activity, emails, text messages, and previous versions of user data.

Margot is faced with an important decision – How should she deal with this increase in data? Data can’t be deleted, there isn’t budget to upgrade to SQL Server Enterprise Edition and use table partitioning, and there’s a push to move to a cloud service to eliminate some operational difficulties.

Using Partitioned Views to Archive Data

A Partitioned View

A Partitioned View

One option that Margot has read about is “partitioned views” – this is a method where data is split into two or more tables with a view over the top. The view is used to provide easy access to all of the information in the database. Storing data across many tables means DBAs can store data in many different ways – e.g. compressed tables or filegroups and tiered storage.

There’s a downside to this approach – all of the data is still in one database. Any HA solutions applied to the live portion of the data set will have to be applied to the entire data set. This could lead to a significant cost increase in a hosted/cloud scenario.

Archiving Data with a Historical Database

Archive this!

Archive this!

The second thing that sprang to mind was creating a separate archival database. Old data is copied into the archival database by scheduled jobs. When users need to run historical reports, the queries hit the archival database. When users need to run current reports, queries are directed to the current application database.

Margot immediately noticed one problem – what happens when a user needs to query a combination of historical and current data? She’s not sure if the users are willing to accept limited reporting functionality.

Archiving Data with Separate Data Stores

One active database. One archival octopus.

One active database. One archival octopus.

A third option that Margot considered was creating a separate database for the data that needed to be kept forever. Current data would be written to both the live database and the historical database. Any data that didn’t need to be ever be in the current database (email or SMS history) would only be written to the historical database.

Although this made some aspects of querying more complex – how could row-level security from the primary database be applied to the historical database – Margot is confident that this solves the majority of problems that they were facing.

This solution would require application changes to make querying work, but Margot and her team thought it was the most flexible solution for their current efforts: both databases can be managed and tuned separately, plus the primary database remains small.

Other Ideas?

Not every database needs to scale in the same way. What ideas do you have to solve this problem?

css.php