Blog

Extended Events were introduced with SQL Server 2008. Extended Events is a replacement for Server Side Traces and so much more – it’s a lightweight way to look at SQL Server performance, events, deadlocks, locking, blocking, and more. In this webcast, Jeremiah Peschka provides a high-level introduction to Extended Events. You’ll learn about how Extended Events are structured, how to find out what’s available, and where to get started.

Find out more at our Extended Events resource page.

“If You Can’t Do ____, You’re Not a Production DBA”

Last Updated February 9, 2017

Professional Development

Recently I saw a tweet that said, “If you can’t read a query execution plan, you’re not a production DBA.”

I love execution plans, and there are really great reasons to learn to read execution plans, but I disagree completely.

Database Administration is not all about performance

Performance is only one component to database administration. Let’s take Jill as an example.

Jill is a SQL Server DBA who focuses on High Availability and Disaster Recovery for a large company. She works to keep her knowledge and skills sharp on backups and restores, Windows Failover Clustering, and High Availability Groups. She designs mission critical systems to meet high uptime requirements. She also designs and runs a lab where she proactively causes outages and troubleshoots them, corrupts databases and repairs them. She uses these experiences to build processes for front-line engineers to respond to production incidents and trains them to keep systems online as much as possible. Jill is excited about learning more about virtualization and SAN replication in the next year and has created an aggressive learning plan.

Jill’s pretty awesome, isn’t she? Is Jill magically not a Production DBA if she looks at an execution plan and doesn’t know what it means?

If you’re just getting into database administration, we don’t recommend that you start with performance as your primary focus. We recommend that you start at the base of the database Hierarchy of Needs. As you work your way up, decide where to specialize.

Don’t de-legitimize DBAs who are Different than You Are. Talk to Them!

It’s great to love what you do. Just remember that database administration is a really broad field, and the way you DBA isn’t necessarily the way everyone has to DBA. (Sure, DBAing is a verb. Why not?)

Let’s not fall into the trap of thinking “I’ve got to have this one skill so I can get into the cool club.” Instead, let’s get excited about the fact that there’s so much to learn that we all have to make tough choices. Let’s seek out the people who choose differently than us, and learn from them, too.

Developer’s Guide to Understanding DBAs [Video]

Last Updated April 18, 2014

Doug Lane

You’re a developer, and you want to know:

What do DBAs value?
What do DBAs need from me?
What does my DBA want to help me with?
How big should that server be?
Is the DBA judging my code?
What happens if I get in trouble with my DBA?

Learn the answers in this 30-minute video:

Liked this? Register for our free upcoming webcasts, and check out our video archive of past webcasts.

Happy Third Birthday to Us, Presents for You

Last Updated February 12, 2017

Company News, SQL Server

Three years ago this month, we turned this blog into a consulting company. This year, we’re … well, we’re still not adults.

It’s been an incredible three years. Sure, it feels just “normal” now, but when we look at the numbers, suddenly it feels like a big deal compared to where we were in 2011:

Over 250 SQL Critical Care® patients
1,322 training video customers
Thousands of in-person training attendees at dozens of classes and pre-cons
190 YouTube videos watched by 635,471 people
23,782 email subscribers
3,333,741 unique web site viewers
2 awesome full time employees
Microsoft Silver Partner
3 company retreats

Let’s celebrate. On Tuesday, April 29th, for exactly one day only, discount code Happy3rd gets you 33% off our training videos and in-person classes. Enjoy!

Advice to an IT newcomer

Last Updated April 15, 2014

Jes Schultz

Women in Technology (yes, Clippy is a woman, too)

We recently got the following question for Kendra and Jes in the Brent Ozar Unlimited® mailbox:

Six months ago I stumbled into the IT world. Do you have any advice for someone (a woman) starting off in the IT industry, specially someone without a computer science degree? I really enjoy working with databases and data. I would love to one day be a Database Administrator and get into business intelligence.

JES SAYS…

There has been a lot written about women in technology – positive and negative. I’m happy to say that my experience as a woman in technology has been incredible. The SQL Server community has been accepting, helpful, and nurturing. If anyone, man or woman, came to me, asking how to succeed, I’d give them this advice.

Ask questions. Constantly learn. There is no such thing as too much knowledge. The most successful people I know in IT – not just the SQL Server world – are lifelong learners. They read books and blogs, attend user group meetings and conferences, and learn new programming languages in their spare time. Build your own computers. Know how to set up a network. You don’t have to be an expert at every facet of technology, but knowing enough to talk to other people in “their” language will go a long way.

Speaking of user groups…find the closest user group, and attend it regularly. Don’t have one nearby? Start one, or join a virtual user group (like those at http://www.sqlpass.org/PASSChapters/VirtualChapters.aspx). Not only will you learn about things you may not be exposed to at work, you’ll have a chance to build a network. I can’t tell you the number of questions I’ve gotten answered through this channel, or the number of people I know that have gotten new (better!) jobs this way.

Never, ever be afraid to say, “I don’t know, but I can find the answer.” People will respect you far more if you are honest. Don’t try to be a know-it-all. If you haven’t dealt with a particular technology or situation before, acknowledge that, make note of what the person is asking for, and research it later. You’ll learn something new, and you won’t get caught giving bad – or wrong – answers.

Don’t think, “I’ll never know as much as him/her.” Yes, there are some people in IT that started building computers or robots or software before they could drive a car. Instead of thinking that you will never know as much as they do, remind yourself how many years of experience they have – and how they can help you. Ask to pick their brains. Ask them what books they read or what tools they use. Learn from them.

Most of all, don’t fall victim to the so-called “impostor syndrome”. Someone always appears smarter, faster, more organized, more accomplished, less stressed, or hasn’t spilled coffee all over herself yet today. Don’t let that invalidate where you started from and what you have accomplished. Keeping a blog – even if it’s private – that you can go back and reference over the years is a great way to value yourself. I know I’ve seen a dramatic change in my writing over five years, from both a technical and editorial perspective.

Good luck! Being in IT – and especially the SQL Server field – is exciting and rewarding.

KENDRA SAYS…

Remind yourself that you’re qualified to be a DBA. You mention that you don’t have a computer science degree. Great news: that’s not required. There’s no degree or certification that is the Official DBA Training Program or Proving Ground.

I had a period where I felt I wasn’t “legitimately a DBA” because I also worked with some C# and drove some non-SQL software development processes. But I was doing some really cool things with SQL Server administration, too. I should have felt awesome about being a DBA, regardless of my job title.

Never feel that your background or your current job have to fit a specific mold for you to be a “real” DBA. There is no mold!

Remind yourself that you are designing your own career as you go. Just like there’s no set educational and certification “solution” to end up with all the right skills, job progression for DBAs is usually not linear. It’s a rare person who starts with a “Junior DBA” job title and works their way up to “Mid Level DBA” and then “Senior DBA”.

Instead, most people these days struggle to find the right training, the right mentor, and discover if they want to specialize (Virtualization? Performance Tuning? Data modeling?), be a generalist, or dive off into uncharted waters (Hadoop? Different Platforms? Business Intelligence?). There are many paths, and there are new paths all the time.

Expect to repeatedly redefine your interests and redesign your career. Make sure that every six months you have a conversation with your manager about what kind of work makes you excited, and where you’d like to be in a year or two.

Remind yourself that you do great stuff: and write it down. For a few years, I had a formal review process where I was regularly required to write out my accomplishments. And you know what? That was great for me! Each item on the list seemed small, but when I put it all together it gave me a new view of myself.

This may be difficult, but it’s worth it. Keeping track of the great things you do boosts your confidence and makes you ready when opportunity comes around.

GET INSPIRED

Is this advice only for women? Heck no! Sometimes it’s just nice to ask advice from someone who’s part of a group you also identify with and hear their perspective.

Want to learn more about how to build a great career working with data? Hop on over to Brent’s classic post, “Rock Stars, Normal People, and You.”

Going, Going, Gone: Chicago Performance Class Selling Out

Last Updated April 15, 2014

Your application wants to go fast, so I guess that means our tickets go fast.

For our upcoming Chicago training classes in May, the 3-day SQL Server Performance Troubleshooting class is just about sold out. (There’s still about 15 seats left for our How to Be a Senior DBA training class, though.)

If you missed out this round, we’re doing the same classes in Philadelphia this September. Read more about the classes here, or read about how our San Diego attendees liked it earlier this year.

Don’t hold out for additional cities or dates – this is our entire 2014 training calendar. We’re working on our 2015 lineup as we speak, though – let us know what you’re looking for, and you might just influence our training class decisions.

SQL Server 2014 Buffer Pool Extensions

Last Updated April 15, 2014

SQL Server 2014 contains some interesting new features. Although SQL Server Standard Edition is limited to 128GB of memory, teams deploying on Standard Edition have an option to fit more of their working set in low latency storage – SQL Server Buffer Pool Extensions.

How SQL Server Normally Deals With Data

During SQL Server’s normal operations, data is read from disk into memory. At this point, the data is clean. Once the data is changed, it is marked as dirty. Eventually the dirty pages are written to disk and marked as clean; clean pages may be flushed from memory when the data cache (the buffer pool) comes under pressure. At this point the data is gone from memory and SQL Server has to read it from disk the next time a user runs a query that requires the data.

As long as you have less data than you have memory, this isn’t a problem. As soon as you have more data than you have memory, you’re at the mercy of physical storage.

Right now, some of you are probably saying “So what? I have an awesome EMC/3PAR/Hitachi VX6.” You also probably have SQL Server Enterprise Edition and a pile of unused RAM sticks. This blog post isn’t for you. Go away.

The rest of you, the 99%ers, listen up.

Speeding Up Data Access with Buffer Pool Extensions

SQL Server 2014 Buffer Pool Extensions are our new secret weapon against not having enough memory. Like most secret weapons, there’s a lot of hype surrounding Buffer Pool Extensions (BPE).

The idea behind BPE is a lot like the idea behind virtual RAM (better known as swap space): fast, low latency persistent storage is used to replace a portion of memory. In the case of BPE, SQL Server will use the disk space to store clean buffers – specifically BPE will hold unmodified data pages that would have been pushed out of RAM.

To see just how fast this was going to perform, I create a test instance of SQL Server and decided to find out.

Test Methodology

I ran SQL Server through a relatively boring test harness – TPC-C running through HammerDB. The database was created at a scale factor of 444 warehouses – this yields 44GB of data on disk, or near enough. SQL Server 2014 RTM was installed on Windows Server 2012 on an Amazon i2.8xlarge instance. To find out more about how the instance was physically configured, you can check out the instance type details page.

SQL Server was set up in a fairly vanilla way:

Max degree of parallelism was set to 8
Max server memory was left alone
tempdb was given 4 data files located on a local all SSD RAID 0 of four drives
A second all SSD RAID 0 of four drives was reserved for BPE

Tests were run multiple times and the results of the first test were discarded – many changes during this process could clear the buffer pool as such, the first test results were assumed to be anomalous. The remaining results were averaged to produce the following chart:

It's faster than not having enough RAM. — It’s faster than not having enough RAM.

Conclusions

Having an appropriately sized buffer pool was far more effective than allocating considerable space to buffer pool extensions. BPE improved performance by 42.27%. This is not an insignificant performance gain, but BPE is no substitute for memory.

BPE will shine for customers deploying in dense, virtualized environments where memory is constrained but SSD is cheap and plentiful. Given the near ubiquity of SSD, and high speed consumer grade SSD being available for as little as $0.50 per GB, BPE may seem tempting. It may even provide some respite from performance issues. However, BPE is no substitute for RAM.

Collecting Detailed Performance Measurements with Extended Events

Last Updated April 4, 2014

Analyzing a workload can be difficult. There are a number of tools on the market (both free and commercial). These tools universally reduce workload analysis to totals and averages – details and outliers are smeared together. I’m against using just averages to analyze workloads; averages and totals aren’t good enough, especially with the tools we have today.

Collecting Performance Data with Extended Events

A note to the reader SQL Server Extended Events let us collect detailed information in a light weight way. These examples were written on SQL Server 2012 and SQL Server 2014 CTP2 – they may work on SQL Server 2008 and SQL Server 2008R2, but I have not tested on either version.

We need to set up an Extended Events session to collect data. There are three Extended Events that provide information we want:

sqlos.wait_info
sqlserver.sp_statement_completed
sqlserver.sql_statement_completed

In addition, add the following actions to each event, just in case we need to perform deeper analysis at a later date:

sqlserver.client_app_name
sqlserver.client_hostname
sqlserver.database_id
sqlserver.database_name
sqlserver.plan_handle
sqlserver.query_hash
sqlserver.query_plan_hash
sqlserver.session_id
sqlserver.sql_text

In a production environment you should enable sampling. Sampling is just a predicate that filters out random events, typically by using the modulus of the session identifier (e.g. SPID modulo 5 = 0 will sample 20% of activity). Enabling sampling makes sure that you don’t collect too much data (yes, too much data can be a bad thing). During a routine 5 minute data load and stress test, I generated 260MB of event data – be careful about how often you run this and how long you run it for.

To get started, download all of the scripts mentioned in this post.

Reporting on Extended Events

Go ahead and set up the Extended Events session and run a workload against that SQL Server for a few minutes. If you’re analyzing a production server, copy the files to a development SQL Server or to a SQL Server that has less workload – you’ll thank me later. Once you’ve collected some data in the Extended Events session, stop the session, and run the results processing script. On my test VM, processing the data takes a minute and a half to run.

The data processing script creates three tables materialize XML data to speed up processing; shredding XML takes a long time. Relevant data is extracted from each of the different events that were collected and persisted into the processing tables. If you’re going to be doing multiple analysis runs across the data, you may even want to put some indexes on these tables.

Once the data is loaded, run the analysis query. This doesn’t take as long as the processing script, but it does take some time; it’s worth it. The analysis script collects a number of statistics about query performance and waits. Queries are grouped into 1 minute time blocks and metrics around reads, writes, duration, and CPU time are collected. Specifically, each metric has the following statistics built up:

Total
Average
Standard Deviation
Minimum
Maximum
Percentiles – 50th, 75th, 90th, 95th, and 99th

The same thing happens for each wait as well – each wait is time boxed on the minute and then both signal and resource waits are analyzed with the same statistics as query duration metrics.

In both analyses, analysis is performed on a query by query basis. At the end of the analysis we you get a multi-dimensional view of the data by time and query. It should be easy to perform additional analysis on the data to create broader time windows or to analyze the entire dataset at once.

Want to see what the output looks like? Check out these two screenshots:

Query Performance by the Numbers

Query Waits

Why Produce All of These Metrics?

All of these metrics give us insight into how these queries really run; an average just doesn’t help. Standard deviation alone lets us be aware of the variability of a particular metric – high standard deviation on a particular wait type means that we have a lot of variability in how long we wait on a resource. We also collect percentiles of all of these different metrics to help understand the distribution of data.

With this data at our disposal, we can make a better analysis of a workload. Now we can identify variations of a query that are producing bad plans, taking too long, or just reading an excessive amount of data. Or, better yet, if query performance is constant, we know that your code is just plain awful.

How’s that for a parting thought?

Five Reasons a Production DBA Should Learn to Read Execution Plans

Last Updated August 7, 2019

Jes Schultz

Execution Plans

While chatting with attendees before my Don’t Fear the Execution Plan webcast, a question was posed: “As a DBA who mainly worries about backups, index fragmentation and checkdb, should I also be looking at execution plans?”

YES! Here’s why.

Performance is important. A production DBA’s main focus should be the safety and security of the data, but the performance of servers is also something to be concerned with. Companies pay a lot of money in hardware, support, and licensing costs for these servers. Being able to squeeze every bit of performance from them will save the company money – which helps everyone in the company in the long run.
Be a superhero. Your first response to an ongoing, persistent performance problem may be a hardware fix. Add more memory, add more disks, ask the SAN team to give you dedicated LUNs, beg for a new server. But at some point, it is going to be worth your time – and the company’s money – to tune bad queries. By being the person that can identify the bad queries and provide ways to improve them, you are putting on your superhero cape and saving the day.
You’re being proactive, not reactive. We all need to be reactive sometimes – that’s a facet of the job. But being proactive – identifying top resource-consuming queries on a server, identifying problem spots in the queries, and suggesting fixes – makes us look better. We show we are bringing value to the company.
Grow your knowledge of SQL Server. You never know when you may need the information. A new developer may have been able to ask the DBAs questions at her last job; you may need the skill at your next job. You will also learn things along the way about SQL Server internals, and that’s great knowledge to have.
Increase communication between DBAs and developers. If your developers already use execution plans to tune queries, you will be able to speak their language. If they don’t, you can teach them as you learn – and there is no better way to learn than to teach something! Breaking down communication barriers is a goal my clients typically have. It involves some work and willingness to give, but will make your job better.

The SQL 2014 Cardinality Estimator Eats Bad TSQL for Breakfast

Last Updated April 5, 2014

Sometimes I run into a query plan that just shocks me. It’s like looking up and suddenly seeing an adult dressed as a squirrel riding a unicycle down a busy city street. You have to stop and ask yourself, “Did that really just happen?” (I live in Portland, Oregon, so yeah, I’ve seen that.)

A while back I blogged about how to write a query to demonstrate a big memory grant against the AdventureWorks2012 database. The trick was to stuff the query’s joins full of functions so that the query optimizer became baffled about how many rows might come out of the join. The query is terrible, but it’s useful for demonstrating some problematic situations with memory on a test server.

Recently I dug out the query to set up a demo on SQL Server 2014 and something strange happened: it saw through the awfulness of my query. It made my horrible TSQL fast.

Let’s feed my terrible query to SQL Server 2014

First, make sure the new cardinality estimator will be used for your query. You do this by setting the database compatibility level to 120:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=120;
GO

1 2	ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=120; GO

Now we’ll run this terrible, awful, no good query:

SELECT e.[BusinessEntityID],
       p.[Title],
       p.[FirstName],
       p.[MiddleName],
       p.[LastName],
       p.[Suffix],
       e.[JobTitle],
       pp.[PhoneNumber],
       pnt.[Name] AS [PhoneNumberType],
       ea.[EmailAddress],
       p.[EmailPromotion],
       a.[AddressLine1],
       a.[AddressLine2],
       a.[City],
       sp.[Name] AS [StateProvinceName],
       a.[PostalCode],
       cr.[Name] AS [CountryRegionName],
       p.[AdditionalContactInfo]
FROM   [HumanResources].[Employee] AS e
       INNER JOIN [Person].[Person] AS p
       ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
       INNER JOIN [Person].[BusinessEntityAddress] AS bea
       ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
       INNER JOIN [Person].[Address] AS a
       ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID]))
       INNER JOIN [Person].[StateProvince] AS sp
       ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID]))
       INNER JOIN [Person].[CountryRegion] AS cr
       ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode]))
       LEFT OUTER JOIN [Person].[PersonPhone] AS pp
       ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID]))
       LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt
       ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID]))
       LEFT OUTER JOIN [Person].[EmailAddress] AS ea
       ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID]));
GO

SELECT e.[BusinessEntityID],

p.[Title],

p.[FirstName],

p.[MiddleName],

p.[LastName],

p.[Suffix],

e.[JobTitle],

pp.[PhoneNumber],

pnt.[Name] AS [PhoneNumberType],

ea.[EmailAddress],

p.[EmailPromotion],

a.[AddressLine1],

a.[AddressLine2],

a.[City],

sp.[Name] AS [StateProvinceName],

a.[PostalCode],

cr.[Name] AS [CountryRegionName],

p.[AdditionalContactInfo]

FROM [HumanResources].[Employee] AS e

INNER JOIN [Person].[Person] AS p

ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))

INNER JOIN [Person].[BusinessEntityAddress] AS bea

ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))

INNER JOIN [Person].[Address] AS a

ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID]))

INNER JOIN [Person].[StateProvince] AS sp

ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID]))

INNER JOIN [Person].[CountryRegion] AS cr

ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode]))

LEFT OUTER JOIN [Person].[PersonPhone] AS pp

ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID]))

LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt

ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID]))

LEFT OUTER JOIN [Person].[EmailAddress] AS ea

ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID]));

Shazam, it finishes using only 63 ms of CPU time. It asks for a 27MB memory grant and estimates 290 rows (quite accurately).

The execution plan contains warnings that “Type conversion in expression … may affect “CardinalityEstimate” in query plan choice”, but wow, it really did a remarkably good job with this!

Using OPTION(QUERYTRACEON 9481) to test Compatibility Level 110

You can go back to the old cardinality estimator with SQL Server 2014 in two ways: You could change the whole database’s compatibility level back to 110 like this:

ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=110;
GO

1 2	ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=110; GO

But that changes it for everything in the whole database. We might just want to see how THIS query would behave using the old cardinality estimator, but still on SQL Server 2014. You can do that by adding OPTION(QUERYTRACEON 9481) to the very end of our gruesome query:

SELECT e.[BusinessEntityID],
       p.[Title],
       p.[FirstName],
       p.[MiddleName],
       p.[LastName],
       p.[Suffix],
       e.[JobTitle],
       pp.[PhoneNumber],
       pnt.[Name] AS [PhoneNumberType],
       ea.[EmailAddress],
       p.[EmailPromotion],
       a.[AddressLine1],
       a.[AddressLine2],
       a.[City],
       sp.[Name] AS [StateProvinceName],
       a.[PostalCode],
       cr.[Name] AS [CountryRegionName],
       p.[AdditionalContactInfo]
FROM   [HumanResources].[Employee] AS e
       INNER JOIN [Person].[Person] AS p
       ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
       INNER JOIN [Person].[BusinessEntityAddress] AS bea
       ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))
       INNER JOIN [Person].[Address] AS a
       ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID]))
       INNER JOIN [Person].[StateProvince] AS sp
       ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID]))
       INNER JOIN [Person].[CountryRegion] AS cr
       ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode]))
       LEFT OUTER JOIN [Person].[PersonPhone] AS pp
       ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID]))
       LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt
       ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID]))
       LEFT OUTER JOIN [Person].[EmailAddress] AS ea
       ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID]))
OPTION(QUERYTRACEON 9481);

SELECT e.[BusinessEntityID],

p.[Title],

p.[FirstName],

p.[MiddleName],

p.[LastName],

p.[Suffix],

e.[JobTitle],

pp.[PhoneNumber],

pnt.[Name] AS [PhoneNumberType],

ea.[EmailAddress],

p.[EmailPromotion],

a.[AddressLine1],

a.[AddressLine2],

a.[City],

sp.[Name] AS [StateProvinceName],

a.[PostalCode],

cr.[Name] AS [CountryRegionName],

p.[AdditionalContactInfo]

FROM [HumanResources].[Employee] AS e

INNER JOIN [Person].[Person] AS p

ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))

INNER JOIN [Person].[BusinessEntityAddress] AS bea

ON RTRIM(LTRIM(bea.[BusinessEntityID])) = RTRIM(LTRIM(e.[BusinessEntityID]))

INNER JOIN [Person].[Address] AS a

ON RTRIM(LTRIM(a.[AddressID])) = RTRIM(LTRIM(bea.[AddressID]))

INNER JOIN [Person].[StateProvince] AS sp

ON RTRIM(LTRIM(sp.[StateProvinceID])) = RTRIM(LTRIM(a.[StateProvinceID]))

INNER JOIN [Person].[CountryRegion] AS cr

ON RTRIM(LTRIM(cr.[CountryRegionCode])) = RTRIM(LTRIM(sp.[CountryRegionCode]))

LEFT OUTER JOIN [Person].[PersonPhone] AS pp

ON RTRIM(LTRIM(pp.BusinessEntityID)) = RTRIM(LTRIM(p.[BusinessEntityID]))

LEFT OUTER JOIN [Person].[PhoneNumberType] AS pnt

ON RTRIM(LTRIM(pp.[PhoneNumberTypeID])) = RTRIM(LTRIM(pnt.[PhoneNumberTypeID]))

LEFT OUTER JOIN [Person].[EmailAddress] AS ea

ON RTRIM(LTRIM(p.[BusinessEntityID])) = RTRIM(LTRIM(ea.[BusinessEntityID]))

OPTION(QUERYTRACEON 9481);

Retesting the query with the old cardinality estimator… Ouch! The query uses 84,109 ms of CPU time and the execution plan is back to a world of confusion, thinking it’s going to have to handle a kazillion rows:

Hey there new cardinality estimator, I’d like to get to know you better.

I’m not saying the new cardinality estimator will be better at every query, that it won’t have any regressions, or that you should start putting functions around all your joins.

But it’s pretty remarkable when the optimizer takes code that you wrote to be as terrible as possible, and suddenly makes it fast. Sign me up for more of that.

Updated “How to Think Like SQL Server” Videos

Last Updated April 7, 2014

On my East Coast user group tour last month, I presented my How to Think Like the SQL Server Engine course to a few hundred folks. I’m always trying to learn and adapt my delivery, and I noted a lot of attendee questions this round, so I updated the course.

This is my favorite course I’ve ever done. In about an hour and a half, I cover:

Clustered and nonclustered indexes
Statistics, and how they influence query plans
How sargability isn’t just about indexes
T-SQL problems like table variables and implicit conversions
How SQL turns pages into storage requests

If you’ve always wanted to get started learning about SQL internals, but you just don’t have the time to read books, this course is the beginning of your journey.

Attendees give it rave reviews, but this one is one of my favorites:

@BrentO I came to that presentation thinking I was too cool to take notes then ended up furiously trying to keep up with taking notes.

— Andrew Notarian (@anotarian) April 4, 2014

Go check it out and let me know what you think.

Why Index Fragmentation and Bad Statistics Aren’t Always the Problem (Video)

Last Updated February 13, 2017

Index Maintenance, SQL Server

Do you rely on index rebuilds to make queries run faster? Or do you always feel like statistics are “bad” and are the cause of your query problems? You’re not alone– it’s easy to fall into the trap of always blaming fragmentation or statistics. Learn why these two tools aren’t the answer to every problem in these two videos from Kendra Little.

Does Rebuilding Indexes Make Queries Faster in SQL Server?

Learn if rebuilding indexes is a good way to help queries run faster in SQL Server in this 13 minute video.

Not convinced that fragmentation isn’t really your problem? Read Brent’s post, “Stop Worrying about Index Fragmentation“.

Why are Statistics Always the Problem? (Or ARE They?)

Once you’ve moved beyond an addiction to rebuilding indexes, you may find that you become an obsessive Statistics Blamer. Learn why you might mistakenly think your stats are “bad” when something else is really the problem in this 12 minute video.

Want to learn more about statistics, parameter sniffing, and hints? Read “Optimize for… Mediocre?” and “Recompile Hints and Execution Plan Caching“.

SQL Server 2014 Licensing Changes

Last Updated February 9, 2017

Licensing

159

With the release of SQL Server 2014, we get to learn all kinds of new licensing changes. While I don’t work for Microsoft legal, I do have a PDF reader and a web browser. You can follow along in the SQL Server 2014 Licensing Datasheet… if you dare.

Server + CAL Licensing is Still Around

It’s only for Standard Edition and BI Edition.

Microsoft are highly recommending that VMs be licensed as Server + CAL (rather than per core). This can make a lot of sense when there are small, single application SQL Servers that cannot be consolidated for security reasons. Having a number of Server + CAL license for 1 or 2 vCPU instances can be much more cost effective than having a large number of core based licensed.

Of course, it makes even more sense to just license the entire VM host…

How much money would it take to give you assurance?

Standby Servers Require Software Assurance

Prior to SQL Server 2014, many shops were able to deploy a single standby server without licensing SQL Server. Log shipping, mirroring, and even failover clustering allowed for an unlicensed passive node, provided that the passive node didn’t become the primary for more than 28 days.

That’s gone.

If you want to have a standby node, you’ve got to pony up and buy software assurance. Head over to the SQL Server 2014 Licensing Datasheet; at the bottom of page three, it reads “Beginning with SQL Server 2014, each active server licensed with SA coverage allows the installation of a single passive server used for fail-over support.” The passive secondary server doesn’t need to have a complete SQL Server license, but Software Assurance is a pricey pill to swallow. In short, Software Assurance (SA) is a yearly fee that customers pay to get access to the latest and greatest versions of products as well as unlock additional features that may have complex deployment scenarios.

In case you were confused, high availability is officially an enterprise feature. Note: I didn’t say Enterprise Edition. I mean enterprise with all of the cost and trouble that the word “enterprise” entails in our modern IT vernacular.

All Cores Must Be Licensed

You heard me.

To license a physical server, you have to license all of the cores. Don’t believe me? Check out this awesome screenshot:

License ALL THE CORES

It’s even more important to consider alternatives to having a number of SQL Servers spread throughout your environment. SQL Server consolidation and virtualization are going to become even more important as SQL Server licensing changes.

Finding new ways to analyze, tune, and consolidate existing workloads is going to be more important than ever before. Your ability to tune SQL Server workloads is going to be critical in successful SQL Server deployments. The days of worrying when the server hit 25% capacity are fading into history – as licensing costs increase, expect server density and utilization to increase, too.

Standard Edition has a new definition

“SQL Server 2014 Standard delivers core data management and business intelligence capabilities for non-critical workloads with minimal IT resources.” You can read between the lines a little bit on this one – SQL Server Standard Edition isn’t getting back mirroring or anything like it. In fact – SQL Server Standard Edition sounds an awful lot like the database that you use to run your non-critical ISV applications, SharePoint, and TFS servers.

Software Assurance Gives You Mobility

If you want to move your SQL Server around inside your VM farm, you need to buy Software Assurance. VM mobility lets teams take advantage of VMware DRS or SCOM VMM. This isn’t new for anyone who has been virtualizing SQL Servers for any amount of time. What is explicitly spelled out, though, is that each VM licensed with SA can be moved frequently within a server farm, or to a third-party hoster or cloud services provider, without the need to purchase additional SQL Server licenses.”

In other words – as long as you’re licensed for Software Assurance, those SQL Servers can go anywhere.

SQL Server 2014 Licensing Change Summary

Things are changing. DBAs need to take stock of their skills and help the business get more value from a smaller SQL Server licensing footprint. Realistically, these changes make sense as you look at the broader commercial IT landscape. Basic features continue to get cheaper. More resources are available in SQL Server 2014 Standard Edition, but complex features that may require a lot of implementation time, and Microsoft support time, come with a heavy price tag.

What happens to in-flight data compression in an emergency?

Last Updated January 29, 2015

Jes Schultz

Data compression can have many uses and advantages, but it also has its drawbacks. It’s definitely not a one-size-fits-all strategy. One of the things to be aware of is that initial compression of a table or index can take quite some time, and will be resource-intensive. It also is an offline operation, so the object can’t be accessed while it’s being compressed or uncompressed. (Clarification: by default, an ALTER TABLE statement is an offline operation. You can declare it online, but, as David notes in the comments, “Although the operation is ONLINE, it’s not completely idiot-proof.”)

So what would happen if your SQL Server service or server restarted while you were in the middle of (or, as it usually goes, 90% of the way through) compressing an index? Let’s investigate.

I have a table that is 1.1 GB in size, with a 1.0 GB nonclustered index.

Table name	Index name	total_pages	PagesMB
bigTransactionHistory	pk_bigTransactionHistory	143667	1122
bigTransactionHistory	IX_ProductId_TransactionDate	131836	1029

I need to reduce the size of the nonclustered index, so I decide to compress it. After using sp_estimate_data_compression_savings, I determine that I will benefit more from page compression.

I apply page compression, and allow my server to work.

ALTER INDEX IX_ProductId_TransactionDate
ON dbo.bigTransactionHistoryPage
REBUILD WITH (DATA_COMPRESSION = PAGE);

ALTER INDEX IX_ProductId_TransactionDate

ON dbo.bigTransactionHistoryPage

REBUILD WITH (DATA_COMPRESSION = PAGE);

Now, let’s say there is an emergency during this operation. Perhaps a component in the server breaks; maybe the data center loses power. (Or I restart the SQL Server service.) Uh-oh! What happened?

When the service is running, I check SSMS. I see the following error.

Msg 109, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. 
(provider: Shared Memory Provider, error: 0 - The pipe has been ended.)

Msg 109, Level 20, State 0, Line 0

A transport-level error has occurred when receiving results from the server.

(provider: Shared Memory Provider, error: 0 - The pipe has been ended.)

What happened to the data? Is compression all or nothing, or is it possible that some pages are compressed and others aren’t? I first check the index size.

Table name	Index name	total_pages	PagesMB
bigTransactionHistory	pk_bigTransactionHistory	143667	1122
bigTransactionHistory	IX_ProductId_TransactionDate	131836	1029

Nothing has changed – the index is not one byte smaller. This tells me the operation was not successful.

I also check the error log to see what information it can provide.

There are 2 transactions that are rolled back. As I am the only person in this instance right now (the benefits of a test environment), I know those were my transactions.

SQL Server has treated my data compression operation as a transaction. If there is a restart at any point, the operation will be rolled back to maintain data integrity.

Are Table Variables as Good as Temporary Tables in SQL 2014?

Last Updated February 13, 2017

In-Memory OLTP (Hekaton), SQL Server, TempDB

There’s a couple of new features in SQL Server 2014 that provide options for how you work with temporary objects. Will inline index creation or memory optimized temporary tables forever change the way you code? Let’s take a look!

Inline Index Creation

SQL Server 2014 brings us a TSQL improvement called “inline specification of CLUSTERED and NONCLUSTERED indexes.” This feature works in Standard Edition and applies to many types of table. This improves the functionality of table variables. But does it make table variables as good as temp tables?

First, let’s take a look at the feature, then take it for a test drive.

Creating a table with an inline nonclustered index

Here’s a simple look at the new feature– when I create a table (nothing temporary about it), I can name and define an index on multiple columns. In this case, I’m creating a nonclustered index on the columns j and k:

CREATE TABLE dbo.IndexCreation (
	i int identity
		constraint pkIndexCreation primary key clustered,
	j char(10)
		index ixIndexCreation_jk nonclustered (j,k),
	k varchar(100)
);
GO

CREATE TABLE dbo.IndexCreation (

i int identity

constraint pkIndexCreation primary key clustered,

j char(10)

index ixIndexCreation_jk nonclustered (j,k),

k varchar(100)

);

I also have the option to put all my constraint and index create statements at the end of the table, like this:

CREATE TABLE dbo.IndexCreation (
    i int identity,
    j char(10),
    k varchar(100),
    constraint pkIndexCreation primary key clustered (i),
    index ixIndexCreation_jk nonclustered (j,k)
);
GO

CREATE TABLE dbo.IndexCreation (

i int identity,

j char(10),

k varchar(100),

constraint pkIndexCreation primary key clustered (i),

index ixIndexCreation_jk nonclustered (j,k)

);

What about temp tables?

The same syntax for inline index create listed above works just fine for me on temporary tables in my CTP of SQL Server 2014.

This is a good thing for some people! One of the issues with temporary tables in stored procedures is that creating an index on the temp table can prevent caching of the temporary item. For most people, this is no big deal, but for some frequently run stored procedures, it might make a difference.

Creating the nonclustered indexes on the temp table at creation time (instead of afterward) can improve temp table caching in stored procedures.

Before you rejoice, there’s some fine print. If you change your temp table syntax to use inline index creation and it enables caching, you might run into issues where statistics aren’t updated on your temporary tables when you think they would be. (The short version: Statistics are also cached for temp tables, and Gremlins keep from updating very frequently.) Just test carefully if you’ve got very frequently run stored procedures you’re modifying.

Inline index creation on table variables

The new syntax works on table variables, too! This means that with SQL Server 2014 and higher, you can create non-unique nonclustered indexes on table variables. You can even set the fillfactor option. (Note: I’m not saying you should use a lower fillfactor– I was just surprised that option was available for table variables.)

DECLARE @IndexCreation TABLE (
    i int identity primary key clustered with (fillfactor=95),
    j char(10)
        index ixIndexCreation_jk nonclustered (j,k) with (fillfactor=95),
    k varchar(100)
);
GO

DECLARE @IndexCreation TABLE (

i int identity primary key clustered with (fillfactor=95),

j char(10)

index ixIndexCreation_jk nonclustered (j,k) with (fillfactor=95),

k varchar(100)

);

With the new SQL 2014 syntax, you will also have the option to create a non-unique clustered index on a table variable:

DECLARE @IndexCreation TABLE (
    i int,
    j char(10),
    k varchar(100),
    index cxIndexCreation_i clustered (i),
    index ixIndexCreation_jk nonclustered (j,k)
);
GO

DECLARE @IndexCreation TABLE (

i int,

j char(10),

k varchar(100),

index cxIndexCreation_i clustered (i),

index ixIndexCreation_jk nonclustered (j,k)

);

So, are table variables as good as temp tables now?

Well, sorry, not generally.

First, there’s some limitations to the “inline index create” feature. You can’t inline every kind of index you might want — you can’t add included columns or filters to indexes created with an inline create statement. Since you can add indexes with those to a temporary table after it’s created (and you CANNOT do that with a table variable), it has an advantage.

But temporary tables still have another advantage.

Statistics help temporary tables

When looking at the new feature, I wondered if these new indexes declared inline might secretly allow some sort of populated statistic on table variables– which hasn’t worked before.

But, unfortunately no. Even using the inline indexes, table variables do NOT get statistics to help the optimizer psychically predict how many rows will come back based on a specific predicate value. That can cause you to get less efficient execution plans. Let’s look at an example.

First up, the temp table:

CREATE TABLE #TestMe  (
	i INT IDENTITY PRIMARY KEY CLUSTERED,
	City VARCHAR(60),
	StateProvinceID INT,
	INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)
);
INSERT #TestMe (City,StateProvinceID)
SELECT City, StateProvinceID
FROM AdventureWorks2012.Person.Address;
GO

--This gets a nested loop plan
SELECT City
FROM #TestMe
WHERE StateProvinceID=1;
GO;

CREATE TABLE #TestMe (

i INT IDENTITY PRIMARY KEY CLUSTERED,

City VARCHAR(60),

StateProvinceID INT,

INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)

);

INSERT #TestMe (City,StateProvinceID)

SELECT City, StateProvinceID

FROM AdventureWorks2012.Person.Address;

--This gets a nested loop plan

SELECT City

FROM #TestMe

WHERE StateProvinceID=1;

GO;

For the temporary table, SQL Server uses statistics associated with the nonclustered index to estimate that it will get 25 rows back (which is right on). Based on this it decides to seek to the rows in the nonclustered index, then do a nested loop lookup to fetch the City column from the clustered index. It does 52 logical reads:

Now let’s run the same code, but with a table variable with an inline index:

DECLARE @TestMe TABLE (
	i INT IDENTITY PRIMARY KEY CLUSTERED (i),
	City VARCHAR(60),
	StateProvinceID INT,
	INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)
);

INSERT @TestMe (City,StateProvinceID)
SELECT City, StateProvinceID
FROM AdventureWorks2012.Person.Address;

--This gets a clustered index scan
SELECT City
FROM @TestMe
WHERE StateProvinceID=1;
GO

DECLARE @TestMe TABLE (

i INT IDENTITY PRIMARY KEY CLUSTERED (i),

City VARCHAR(60),

StateProvinceID INT,

INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)

);

INSERT @TestMe (City,StateProvinceID)

SELECT City, StateProvinceID

FROM AdventureWorks2012.Person.Address;

--This gets a clustered index scan

SELECT City

FROM @TestMe

WHERE StateProvinceID=1;

Oddly enough, it gets a clustered index scan. It estimates that only one row will be found — that’s because for table variables, statistics associated with the nonclustered index still can’t be populated. So it doesn’t know to estimate the 25 rows, and it just guesses one.

With a one row estimate, I thought the optimizer would surely go for the nested loop lookup. Looking up just one row is easy, right? But instead it decided to scan the clustered index. (Note: The nonclustered index is functional– it will use it if I omit the City column from the query. But it does not seem to want to do a key lookup from it in my testing here. Wacky!)

Finding: Inline Index creation is the most awesome for temp tables

I like the new syntax for its added flexibility. I do wish it allowed included columns, but for indexes with just key columns it can make for tidier, more compact code.

This feature doesn’t fix all the flaws of table variables– it’s interesting that I didn’t get a nested loop lookup in my testing, which makes me wonder if the optimizer has as many options with the indexes on table variables.

The ability of temporary tables to have column and index related statistics still gives them a great advantage in most situations. Using the inline index creation script on temporary tables in stored procedure to improve caching is a nice little bonus. Even with the gotcha I linked to above about statistics updates on temp tables, I think this feature makes the case for temporary tables even stronger.

What about Memory Optimized Table Variables?

First off, this feature is more expensive. SQL Server 2014 adds in new “memory optimized” tables, AKA “Project Hekaton”. This feature is only available in Enterprise Edition. I won’t come close to covering the whole feature here– I’ll just scratch the surface of one of its uses: the “Memory Optimized Table Variable”.

The first thing of note is that memory optimized tables DO support statistics— so does that mean that a Memory Optimized Table Variable might have them? Let’s take a look!

First, let’s enable memory optimized objects

To test this feature out, I need to make some changes to my test database:

ALTER DATABASE [IndexTest] ADD FILEGROUP [ImaNewFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA
GO
ALTER DATABASE [IndexTest] ADD FILE (name='imanewfilegroup_1', filename='C:\MSSQL\Data\imanewfilegroup_1')
	TO FILEGROUP [ImaNewFileGroup];
GO
ALTER DATABASE [IndexTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT ON;
GO

ALTER DATABASE [IndexTest] ADD FILEGROUP [ImaNewFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA

ALTER DATABASE [IndexTest] ADD FILE (name='imanewfilegroup_1', filename='C:\MSSQL\Data\imanewfilegroup_1')

TO FILEGROUP [ImaNewFileGroup];

ALTER DATABASE [IndexTest] SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT ON;

Memory_Optimized = ON !

Now I can start creating my memory optimized table variable. First, I must create a table type with the table variable’s definition. I’m not going to test this in a natively compiled stored procedure just yet, so I’m leaving off the identity (it’s not supported in this scenario). I also use some special collations and don’t allow nullability on some columns to get the whole thing to work.

CREATE TYPE TM1 as TABLE (
	City VARCHAR(60) COLLATE Latin1_General_100_BIN2,
	StateProvinceID INT NOT NULL,
	INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)
) WITH (MEMORY_OPTIMIZED = ON)

CREATE TYPE TM1 as TABLE (

City VARCHAR(60) COLLATE Latin1_General_100_BIN2,

StateProvinceID INT NOT NULL,

INDEX ixTestMe_StateProvinceID NONCLUSTERED (StateProvinceID)

) WITH (MEMORY_OPTIMIZED = ON)

All right, now that our table type exists, we can put it to use! Let’s populate it and query it and check out what plan we get:

DECLARE @TestMe TM1;

INSERT @TestMe (City,StateProvinceID)
SELECT City, StateProvinceID
FROM AdventureWorks2012.Person.[Address];

SELECT City
FROM @TestMe
WHERE StateProvinceID=1;
GO

DECLARE @TestMe TM1;

INSERT @TestMe (City,StateProvinceID)

SELECT City, StateProvinceID

FROM AdventureWorks2012.Person.[Address];

SELECT City

FROM @TestMe

WHERE StateProvinceID=1;

There’s a little surprise in the execution plan:

Estimated rows is still 1– statistics still aren’t working here. But magically we get an index seek on the nonclustered index instead of a scan on the clustered index. This is remarkable because back in the normal world of non-memory-optimized indexes, the City column would not be in that nonclustered index– we only asked for it to contain StateProvinceID!

Memory Optimized Indexes are Always Covering

In the brave new world of Memory Optimized tables, indexes have a motto: Always Be Covering. We’re not in Kansas any more Toto, and there’s no such thing as a Key Lookup or Bookmark Lookup anymore.

This means a couple of things: for these tables, it really doesn’t matter that you don’t have the INCLUDE syntax on inline index creation. Nonclustered indexes will always have all the columns, so why bother with an INCLUDE clause? (I’m guessing that’s why it’s not possible in the new syntax!) Also, memory optimized indexes may be larger than you’d assume because they cover everything.

Summing Up

For 99% of the use cases I see in the wild these days, temporary tables beat table variables easily because their ability to have statistics improves query optimization dramatically. That remains true with the enhancements in SQL Server 2014.

For the 1% of cases where stored procedures using temporary objects are run at high rates of frequency, there are a few new choices that are attractive. They both have gotchas and require testing, but they could both turn out pretty well:

Option 1: Inline Nonclustered Indexes on Temp Tables: This works in Standard Edition, so I think this will be the most widely adopted. Instead of creating a nonclustered index after the temp table exists, you can improve caching of your temp tables by creating nonclustered indexes with the temp table itself. Just make sure you don’t get into trouble with out of date statistics.

Option 2: Memory Optimized Table Variables: For you lucky people with Enterprise Edition, you can test these out. You won’t want to do exactly what I did above — you’ll probably want to use natively compiled stored procedures. You’ll definitely want to test at scale and make sure you can monitor performance of these new structures at load. That may be a little tricky: you don’t get actual execution plan or cost information for natively compiled procedures. Given the fact that the memory optimized table variables didn’t show any statistics in my testing, in complex implementations you could end up with some strange execution plans. (This post hasn’t covered questions like “How much memory are these structures using?” “How much memory do you have left?” “What happens if you run out?”) In short: there’s much to learn here– don’t rush in without making sure you can support it.

Microsoft Cloud Rebranded as Microsoft Pale Blue

Last Updated April 1, 2014

It’s time to learn another new set of acronyms.

Effective today, Microsoft’s as-a-service brand is changing names again. As recently as last week, the product’s name had been changed from Windows Azure to Microsoft Azure, but industry observers noted that Microsoft’s web pages actually referred to a different name – Microsoft Cloud.

“Our research found that the primary barrier to Azz..As..Accu..cloud adoption was pronounciation,” said an inside source. “No one could say the damn word correctly, and nobody wanted to look stupid, so they just recommended Amazon Web Services instead.”

Thus the new name, Microsoft Cloud – but it ran into more branding problems right away, said the source. “We tried to trademark our virtual machines and databases, but you-know-who had a problem with our names, MC-Compute and MC-Database. People kept calling them McCompute and McDatabase. It probably didn’t help that our combination program was called the Value Menu.”

Enter the New Brand: Microsoft Pale Blue

The new Microsoft Pale Blue logo

Satya Nadella, Microsoft’s new CEO, picked the name himself. “Microsoft Pale Blue captures the wide-open possibilities of the empty sky. Everybody knows that blue is the best color for logos, so why not take it to the next level? Let’s use the color name as our brand.”

“Microsoft has learned to play to their core strength – product rebranding,” said industry analyst Anita Bath. “Nobody goes through product names like they do. Metro, Vista, Zune, PowerWhatever, Xbone, you name it, this is a company that understands brands are meaningless.”

Nadella has realigned Microsoft’s organizational structure to support the new mission. “Developers are building more and more applications with cloud-based services and Javascript. We have to understand that it’s the right combination for today’s agile startups.” The new Pale Blue & Javascript division will be led by John Whitebread, a developer widely known in the community as the beginning and end of this kind of work.

“We’re also announcing new datacenters – or as we call them, Pale Blue Regions – in China, North Korea, and Iran,” said Microsoft spokesperson Pat McCann. “We don’t believe politics should stop people from having access to the best technology, and we’re committed to aggressively growing our regions. Anytime we see new cloud demand, we’ll open a PBR.”

Today’s announcements did not include any numbers about customers or revenues, however, and questions remain. A few European reporters at today’s announcement asked Nadella if he thought security concerns around Microsoft reading customer data or NSA back doors might be barriers to cloud adoption. Nadella paused for a moment, then said, “No, no way. That can’t be it. It’s gotta be the brand name.”

Refactoring T-SQL with Windowing Functions

Last Updated February 14, 2017

Doug Lane

SQL Server, T-SQL, Videos

You’ve been querying comparative numbers like Year To Date and Same Period Last Year by using tedious CTEs and subqueries. Beginning with SQL Server 2012, getting these numbers is easier than ever! Join Doug for a 30-minute T-SQL tune-up using window functions that will cut down dramatically on the amount of code you need to write.

Looking for the scripts? Grab them below the video!

Script 1: Create Windowing View

--******************
-- (C) 2014, Brent Ozar Unlimited (TM)
-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:
--This script suitable only for test purposes.
--Do not run on production servers.
--******************

/* Script 1: Create Windowing View
   Note: This script requires the AdventureWorks2012 database,
   which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330
*/

USE AdventureWorks2012
GO

/* Duplicates the data with a DueDate field set to [year] - 1. */

CREATE VIEW vWindowing
AS
    ( SELECT    CustomerID ,
                CAST(SUM(TotalDue) AS DECIMAL(18,2)) AS TotalDue ,
                DATEPART(mm, DueDate) AS DueMonth ,
                CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1
                     WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2
                     WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3
                     WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4
                     ELSE -1
                END AS DueQtr ,
                DATEPART(yy, DueDate) AS DueYear
      FROM      sales.SalesOrderHeader
      GROUP BY  CustomerID ,
                DATEPART(mm, DueDate) ,
                CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1
                     WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2
                     WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3
                     WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4
                     ELSE -1
                END ,
                DATEPART(yy, DueDate)
      UNION
      SELECT    CustomerID ,
                CAST(SUM(TotalDue) AS DECIMAL(18,2)) AS TotalDue ,
                DATEPART(mm, DueDate) AS DueMonth ,
                CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1
                     WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2
                     WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3
                     WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4
                     ELSE -1
                END AS DueQtr ,
                DATEPART(yy, DueDate) - 1 AS DueYear
      FROM      sales.SalesOrderHeader
      GROUP BY  CustomerID ,
                DATEPART(mm, DueDate) ,
                CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1
                     WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2
                     WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3
                     WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4
                     ELSE -1
                END ,
                DATEPART(yy, DueDate) - 1
    )

	GO

--******************

-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:

--This script suitable only for test purposes.

--Do not run on production servers.

--******************

/* Script 1: Create Windowing View

Note: This script requires the AdventureWorks2012 database,

which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330

USE AdventureWorks2012

/* Duplicates the data with a DueDate field set to [year] - 1. */

CREATE VIEW vWindowing

( SELECT CustomerID ,

CAST(SUM(TotalDue) AS DECIMAL(18,2)) AS TotalDue ,

DATEPART(mm, DueDate) AS DueMonth ,

CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1

WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2

WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3

WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4

ELSE -1

END AS DueQtr ,

DATEPART(yy, DueDate) AS DueYear

FROM sales.SalesOrderHeader

GROUP BY CustomerID ,

DATEPART(mm, DueDate) ,

CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1

WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2

WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3

WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4

ELSE -1

END ,

DATEPART(yy, DueDate)

UNION

SELECT CustomerID ,

CAST(SUM(TotalDue) AS DECIMAL(18,2)) AS TotalDue ,

DATEPART(mm, DueDate) AS DueMonth ,

CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1

WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2

WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3

WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4

ELSE -1

END AS DueQtr ,

DATEPART(yy, DueDate) - 1 AS DueYear

FROM sales.SalesOrderHeader

GROUP BY CustomerID ,

DATEPART(mm, DueDate) ,

CASE WHEN DATEPART(mm, DueDate) BETWEEN 1 AND 3 THEN 1

WHEN DATEPART(mm, DueDate) BETWEEN 4 AND 6 THEN 2

WHEN DATEPART(mm, DueDate) BETWEEN 7 AND 9 THEN 3

WHEN DATEPART(mm, DueDate) BETWEEN 10 AND 12 THEN 4

ELSE -1

END ,

DATEPART(yy, DueDate) - 1

)

Script 2: The Old Way of Querying

--******************
-- (C) 2014, Brent Ozar Unlimited (TM)
-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:
--This script suitable only for test purposes.
--Do not run on production servers.
--******************

/* Script 2: The Old Way of Querying
   Note: This script requires the AdventureWorks2012 database,
   which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330
*/

USE AdventureWorks2012
go

SET STATISTICS IO ON
GO

/* --------------------------------------- */
-- The old way to blend YTD and detail. 
-- Pretty simple so far with just one join.
/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
, SUM(vsum.TotalDue) AS YTDTotalDue
FROM dbo.vWindowing AS vdet
JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth
WHERE vdet.CustomerID = 11091
GROUP BY vdet.CustomerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
ORDER BY vdet.DueYear, vdet.DueQtr, vdet.DueMonth

/* --------------------------------------- */
-- That's great, but what if you need other metrics?
-- How do we put QTD and YTD in the same T-SQL statement?
/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue
, SUM(vsum.TotalDue) AS YTDTotalDue
FROM dbo.vWindowing AS vdet
LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear 
																		AND vsum.DueMonth <= vdet.DueMonth
WHERE vdet.CustomerID = 11091
GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue
;

/* --------------------------------------- */
-- Looks good. What about same YTD last year?
/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue
, SUM(vsum.TotalDue) AS YTDTotalDue
, SUM(vsum2.TotalDue) AS LastYTDTotalDue
FROM dbo.vWindowing AS vdet
LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear 
																		AND vsum.DueMonth <= vdet.DueMonth
LEFT JOIN dbo.vWindowing AS vsum2 ON vsum2.CustomerID = vdet.CustomerID AND vsum2.DueYear = vdet.DueYear - 1 
																		AND vsum2.DueMonth <= vdet.DueMonth
WHERE vdet.CustomerID = 11091
GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue
;

/* --------------------------------------- */
-- And now we have problems. What WILL give us
-- the Last YTD number we want?
/* --------------------------------------- */
;
WITH va AS
	(SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
	, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue
	, SUM(vsum.TotalDue) AS YTDTotalDue
	FROM dbo.vWindowing AS vdet
	LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth
	WHERE vdet.CustomerID = 11091
	GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue
	)

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
	, vdet.QTDTotalDue
	, vdet.YTDTotalDue 
	, vsum.QTDTotalDue AS SameQTDLastYearTotalDue
	, vsum.YTDTotalDue AS SameYTDLastYearTotalDue
FROM va AS vdet
LEFT JOIN va as vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueMonth = vdet.DueMonth AND vsum.DueYear = vdet.DueYear - 1

/* --------------------------------------- */
-- That's great but now we're four joins deep.
-- How much deeper dare we go? Roll 2d8 to find out.

-- Roll		Joins
-- -----	-----
-- 2-6		6
-- 7-10		8
-- 11-15	10
-- 16		Save vs. Traps or suffer 1d6 + 2 damage.

-- And how's the execution plan looking?
/* --------------------------------------- */

/* --------------------------------------- */
-- Let's add the TotalDue for everyone over that same period
-- ...if we dare.
/* --------------------------------------- */
;
WITH va AS
	(SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
	, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue
	, SUM(vsum.TotalDue) AS YTDTotalDue
	FROM dbo.vWindowing AS vdet
	LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth
	WHERE vdet.CustomerID = 11091
	GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue
	)
	, vAll as
	(SELECT vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue
	, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue
	, SUM(vsum.TotalDue) AS YTDTotalDue
	FROM dbo.vWindowing AS vdet
	LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth
	GROUP BY vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue
	)

	SELECT * FROM vAll
	-- ...and so on...

/* --------------------------------------- */
-- I'll just stop here and say 
-- it's a whole lotta work.
/* --------------------------------------- */

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

--******************

-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:

--This script suitable only for test purposes.

--Do not run on production servers.

--******************

/* Script 2: The Old Way of Querying

Note: This script requires the AdventureWorks2012 database,

which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330

USE AdventureWorks2012

SET STATISTICS IO ON

/* --------------------------------------- */

-- The old way to blend YTD and detail.

-- Pretty simple so far with just one join.

/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

FROM dbo.vWindowing AS vdet

JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth

WHERE vdet.CustomerID = 11091

GROUP BY vdet.CustomerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

ORDER BY vdet.DueYear, vdet.DueQtr, vdet.DueMonth

/* --------------------------------------- */

-- That's great, but what if you need other metrics?

-- How do we put QTD and YTD in the same T-SQL statement?

/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

FROM dbo.vWindowing AS vdet

LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear

AND vsum.DueMonth <= vdet.DueMonth

WHERE vdet.CustomerID = 11091

GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue

;

/* --------------------------------------- */

-- Looks good. What about same YTD last year?

/* --------------------------------------- */

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

, SUM(vsum2.TotalDue) AS LastYTDTotalDue

FROM dbo.vWindowing AS vdet

LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear

AND vsum.DueMonth <= vdet.DueMonth

LEFT JOIN dbo.vWindowing AS vsum2 ON vsum2.CustomerID = vdet.CustomerID AND vsum2.DueYear = vdet.DueYear - 1

AND vsum2.DueMonth <= vdet.DueMonth

WHERE vdet.CustomerID = 11091

GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue

;

/* --------------------------------------- */

-- And now we have problems. What WILL give us

-- the Last YTD number we want?

/* --------------------------------------- */

;

WITH va AS

(SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

FROM dbo.vWindowing AS vdet

LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth

WHERE vdet.CustomerID = 11091

GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue

)

SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, vdet.QTDTotalDue

, vdet.YTDTotalDue

, vsum.QTDTotalDue AS SameQTDLastYearTotalDue

, vsum.YTDTotalDue AS SameYTDLastYearTotalDue

FROM va AS vdet

LEFT JOIN va as vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueMonth = vdet.DueMonth AND vsum.DueYear = vdet.DueYear - 1

/* --------------------------------------- */

-- That's great but now we're four joins deep.

-- How much deeper dare we go? Roll 2d8 to find out.

-- Roll Joins

-- ----- -----

-- 2-6 6

-- 7-10 8

-- 11-15 10

-- 16 Save vs. Traps or suffer 1d6 + 2 damage.

-- And how's the execution plan looking?

/* --------------------------------------- */

-- Let's add the TotalDue for everyone over that same period

-- ...if we dare.

/* --------------------------------------- */

;

WITH va AS

(SELECT vdet.customerID, vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

FROM dbo.vWindowing AS vdet

LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth

WHERE vdet.CustomerID = 11091

GROUP BY vdet.CustomerID, vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue

)

, vAll as

(SELECT vdet.DueMonth, vdet.DueQtr, vdet.DueYear, vdet.TotalDue

, SUM(CASE WHEN vsum.DueQtr = vdet.DueQtr AND vsum.DueMonth <= vdet.DueMonth THEN vsum.TotalDue ELSE 0 END) AS QTDTotalDue

, SUM(vsum.TotalDue) AS YTDTotalDue

FROM dbo.vWindowing AS vdet

LEFT JOIN dbo.vWindowing AS vsum ON vsum.CustomerID = vdet.CustomerID AND vsum.DueYear = vdet.DueYear AND vsum.DueMonth <= vdet.DueMonth

GROUP BY vdet.DueYear, vdet.DueQtr, vdet.DueMonth, vdet.TotalDue

)

SELECT * FROM vAll

-- ...and so on...

/* --------------------------------------- */

-- I'll just stop here and say

-- it's a whole lotta work.

/* --------------------------------------- */

Script 3: The New Way of Querying

--******************
-- (C) 2014, Brent Ozar Unlimited (TM)
-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:
--This script suitable only for test purposes.
--Do not run on production servers.
--******************

/* Script 3: The New Way of Querying
   Note: This script requires the AdventureWorks2012 database,
   which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330
*/

USE AdventureWorks2012
GO

SET STATISTICS IO ON
GO

/* --------------------------------------- */
-- Remember the old way got us roughly one 
-- table join per metric? 
--
-- Hold on to your seat.
/* --------------------------------------- */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue
FROM dbo.vWindowing
WHERE CustomerID = 11091
ORDER BY DueYear, DueMonth

/* We'll start slow by just adding QTD */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr 
						ORDER BY DueMonth 
						ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue
FROM dbo.vWindowing
WHERE CustomerID = 11091
ORDER BY DueYear, DueMonth

/* + YTD */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDTotalDue
FROM dbo.vWindowing
WHERE CustomerID = 11091
ORDER BY DueYear, DueMonth

/* What are we waiting for -- let's really see what this can do! */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID) AS AllTimeTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotalTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS RemainingFutureTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
	/SUM(TotalDue) OVER (PARTITION BY CustomerID) AS PctRemainingFutureTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth) AS ThisQtrTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth) AS ThisYearTotalDue
, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS Rolling12TotalDue
, TotalDue/SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr) AS PctOfThisQtrTotalDue
, TotalDue/SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear) AS PctOfThisYearTotalDue
, LAG(TotalDue, 1) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS PrevMonthTotalDue
, LAG(TotalDue, 3) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS SameMonthPrevQtrTotalDue
, LAG(TotalDue, 12) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS SameMonthPrevYearTotalDue
, AVG(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDAvgTotalDue

/* ------------------ */
-- If I want to see how this customer stacks up to others, 
-- I can remove the where clause, add these columns, and filter at the client

--, AVG(TotalDue) OVER (PARTITION BY DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDAvgTotalDueAllCustomers
--,	AVG(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
--		/ AVG(TotalDue) OVER (PARTITION BY DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS PctVarYTDAvgTotalDueVsAllCustomers
/* ------------------ */

FROM dbo.vWindowing AS va
WHERE CustomerID = 11091
ORDER BY CustomerID, DueYear, DueMonth

--******************

-- See https://www.brentozar.com/go/eula for the End User Licensing Agreement.

--WARNING:

--This script suitable only for test purposes.

--Do not run on production servers.

--******************

/* Script 3: The New Way of Querying

Note: This script requires the AdventureWorks2012 database,

which can be found here: https://msftdbprodsamples.codeplex.com/releases/view/55330

USE AdventureWorks2012

SET STATISTICS IO ON

/* --------------------------------------- */

-- Remember the old way got us roughly one

-- table join per metric?

-- Hold on to your seat.

/* --------------------------------------- */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue

FROM dbo.vWindowing

WHERE CustomerID = 11091

ORDER BY DueYear, DueMonth

/* We'll start slow by just adding QTD */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr

ORDER BY DueMonth

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue

FROM dbo.vWindowing

WHERE CustomerID = 11091

ORDER BY DueYear, DueMonth

/* + YTD */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDTotalDue

FROM dbo.vWindowing

WHERE CustomerID = 11091

ORDER BY DueYear, DueMonth

/* What are we waiting for -- let's really see what this can do! */

SELECT customerID, DueMonth, DueQtr, DueYear, TotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS QTDTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID) AS AllTimeTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotalTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS RemainingFutureTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)

/SUM(TotalDue) OVER (PARTITION BY CustomerID) AS PctRemainingFutureTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr ORDER BY DueMonth) AS ThisQtrTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth) AS ThisYearTotalDue

, SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS Rolling12TotalDue

, TotalDue/SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear, DueQtr) AS PctOfThisQtrTotalDue

, TotalDue/SUM(TotalDue) OVER (PARTITION BY CustomerID, DueYear) AS PctOfThisYearTotalDue

, LAG(TotalDue, 1) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS PrevMonthTotalDue

, LAG(TotalDue, 3) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS SameMonthPrevQtrTotalDue

, LAG(TotalDue, 12) OVER (PARTITION BY CustomerID ORDER BY DueYear, DueMonth) AS SameMonthPrevYearTotalDue

, AVG(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDAvgTotalDue

/* ------------------ */

-- If I want to see how this customer stacks up to others,

-- I can remove the where clause, add these columns, and filter at the client

--, AVG(TotalDue) OVER (PARTITION BY DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS YTDAvgTotalDueAllCustomers

--, AVG(TotalDue) OVER (PARTITION BY CustomerID, DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- / AVG(TotalDue) OVER (PARTITION BY DueYear ORDER BY DueMonth ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS PctVarYTDAvgTotalDueVsAllCustomers

/* ------------------ */

FROM dbo.vWindowing AS va

WHERE CustomerID = 11091

ORDER BY CustomerID, DueYear, DueMonth

How to Add Nonclustered Indexes to Clustered Columnstore Indexes

Last Updated February 13, 2017

Bad Idea Jeans, Columnstore Indexes, Indexing, SQL Server

SQL Server 2012 introduced nonclustered columnstore indexes, but I never saw them used in the wild simply because once created, they made the underlying table read-only. Not a lot of folks like read-only tables. (Bad news, by the way – that limitation hasn’t disappeared in 2014.)

SQL Server 2014 brings clustered columnstore indexes, and they’re totally writeable – you can insert, update, or delete into them. They’re best suited for wide data warehouse fact tables that have lots of columns, but your queries might only want a few of those columns, and they might pick any of the columns for filtering. These types of tables are notoriously difficult to index, and columnstore indexes can give you dramatic performance improvements here.

Books Online says you’d better be sure your access patterns all benefit from columnstore indexes, because you can’t add any nonclustered indexes to your columnstore tables. The CREATE CLUSTERED COLUMNSTORE INDEX syntax page explains that the clustered columnstore “is the only allowable index on the table,” meaning you can’t add non-clustered indexes.

Or can you?

Allow me to demonstrate what I call the Locke Technique:

CREATE TABLE dbo.SimpleTable(
    ProductKey [int] NOT NULL,
    OrderDateKey [int] NOT NULL,
    DueDateKey [int] NOT NULL,
    ShipDateKey [int] NOT NULL);
GO
CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON dbo.SimpleTable;
GO

CREATE VIEW dbo.vw_SimpleTable WITH SCHEMABINDING
AS SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey
FROM dbo.SimpleTable;
GO

CREATE UNIQUE CLUSTERED INDEX CLIX_ProductKey ON dbo.vw_SimpleTable (ProductKey);
GO
CREATE NONCLUSTERED INDEX IX_OrderDateKey ON dbo.vw_SimpleTable(OrderDateKey);
GO

CREATE TABLE dbo.SimpleTable(

ProductKey [int] NOT NULL,

OrderDateKey [int] NOT NULL,

DueDateKey [int] NOT NULL,

ShipDateKey [int] NOT NULL);

CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON dbo.SimpleTable;

CREATE VIEW dbo.vw_SimpleTable WITH SCHEMABINDING

AS SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey

FROM dbo.SimpleTable;

CREATE UNIQUE CLUSTERED INDEX CLIX_ProductKey ON dbo.vw_SimpleTable (ProductKey);

CREATE NONCLUSTERED INDEX IX_OrderDateKey ON dbo.vw_SimpleTable(OrderDateKey);

Presto – you can have as many nonclustered indexes as you need – and technically, in a strange twist, the clustered index on the view is a nonclustered index on the table as well.

These have the same drawbacks as any other nonclustered index: slower inserts/updates/deletes on the underlying table, more complex query tuning, more space requirements, and so on. They also have even more drawbacks because the schema-binding view means you can’t alter the columnstore table without first dropping the nonclustered indexes and the view.

After You’ve Been a DBA, What Do You Do?

Last Updated February 9, 2017