A while back, we posted about how SQL Server index fragmentation doesn’t matter. That blog post generated a lot of heated discussion and has served to make people aware that fragmentation may not be the source of their woes. Sometimes, though, there are data access patterns where fragmentation really does matter.
Some data access patterns lead to tables and indexes that will always be fragmented, no matter what you do. If you have frequent data changes in the middle of a table or index, you can see heavy fragmentation.
What’s this pattern look like? Check it out!
CREATE TABLE process_list ( transaction_id INT IDENTITY(1, 1) PRIMARY KEY , status_flag BIT , last_update DATETIME2 , transaction_type INT , transaction_desc VARCHAR(25) ); GO CREATE INDEX IX_giant_index_process_list ON dbo.process_list (transaction_id, status_flag); GO TRUNCATE TABLE process_list ; GO /* Random string generation code taken from: http://social.technet.microsoft.com/wiki/contents/articles/21196.t-sql-random-string.aspx */ /* insert another 1000 rows */ DECLARE @string_max_length TINYINT = 25; INSERT INTO process_list (status_flag, last_update, transaction_type, transaction_desc) SELECT 0 , GETUTCDATE() , v.number % 4 + 1 , x.txt FROM master.dbo.spt_values v JOIN ( SELECT TOP 1 LEFT(REPLACE(CAST (NEWID() AS NVARCHAR(MAX)), '-', ''), ABS(CHECKSUM(NEWID())) % @string_max_length + 1) AS txt FROM SYS.OBJECTS A CROSS JOIN SYS.OBJECTS B ) AS x ON 1 = 1 WHERE v.type = 'P' AND v.number < 1001; /* Look at table/index size. The table gains 9 pages. The index only gains 1 page from its previous state */ SELECT OBJECT_SCHEMA_NAME(o.object_id) AS schema_name, o.name, i.name, ddps.reserved_page_count AS reserved_pages FROM sys.objects AS o JOIN sys.dm_db_partition_stats AS ddps ON ddps.object_id = o.object_id JOIN sys.indexes AS i ON i.object_id = ddps.object_id AND i.index_id = ddps.index_id WHERE o.name = 'process_list' ; -- Change the status of 70%ish of transaction to TRUE UPDATE process_list SET status_flag = 1 WHERE transaction_id % 10 < 7 ; /* Get rid of processed rows */ DELETE FROM process_list WHERE status_flag = 1 ; /* Look at table/index size. No page size changes... */ SELECT OBJECT_SCHEMA_NAME(o.object_id) AS schema_name, o.name, i.name, ddps.reserved_page_count AS reserved_pages FROM sys.objects AS o JOIN sys.dm_db_partition_stats AS ddps ON ddps.object_id = o.object_id JOIN sys.indexes AS i ON i.object_id = ddps.object_id AND i.index_id = ddps.index_id WHERE o.name = 'process_list' ; GO 10 /* And now we rebuild the table Before the rebuild: index - 25 pages table - 57 pages If you just run ALTER TABLE process_list REBUILDema: index - 25 pages table - 25 pages After the rebuild with the index: index - 7 pages table - 25 pages */ ALTER INDEX ALL ON dbo.process_list REBUILD ; SELECT OBJECT_SCHEMA_NAME(o.object_id) AS schema_name, o.name, i.name, ddps.reserved_page_count AS reserved_pages FROM sys.objects AS o JOIN sys.dm_db_partition_stats AS ddps ON ddps.object_id = o.object_id JOIN sys.indexes AS i ON i.object_id = ddps.object_id AND i.index_id = ddps.index_id WHERE o.name = 'process_list' ; GO
TL;DR – The code in this example inserts a large number of rows at the end of the table. 70% of rows are marked as processed and are then deleted. This causes a lot of free space or, as you call it, fragmentation.
The only way to get rid of this fragmentation is to rebuild both the non-clustered index and the clustered index. For this example, the defragmentation reduces the size of the index by more than 3x (from 25 pages to 7 pages) and the table consumes just over 50% less space.
For workloads with a lot of random deletes (or updates that change the size of a row in a way that causes data movement), index defragmentation is necessary. Without index fragmentation, these database will continue to grow in size and result in the usual problems that we associate with fragmentation.
What Doesn’t Cause Perpetual Fragmentation
At this point you might be saying “Ah ha! You’ve proved that I should always defragment my indexes! Constantly! ZOMGWTFBBQ!!!11!11!shiftone!!!!!”
For some workloads, you can still avoid index fragmentation – if you’re adding data to the tailing end of the table and deleting data from the beginning of the table, you won’t need to defragment your indexes. Ghost record clean up should take care of deleted rows in this case.
For most workloads, your best bet is still to analyze SQL Server wait statistics and understand the basic problem before you start defragmenting indexes on a regular basis.
The original idea for this blog post comes from Hemant K Chitale’s Index Growing Larger Than The Table. Even though SQL Server handles this situation differently from Oracle, the underlying pattern is still interesting for database practitioners.
The random length string code was adapted from T-SQL: Random String.
If you don’t license all of your SQL Server’s CPU cores, you need to pay particular attention to your server hardware.
Say you’re using an HP DL580 with four CPU sockets, each of which has a Xeon processor plugged in with 10 cores. That’s a total of 40 real cores, or 80 logical processors if you’ve enabled hyperthreading.
In SQL Server’s log at startup, if you’re using an upgraded core-based license capped out at 20 cores, you’ll see a message like this:
SQL Server detected 4 sockets with 10 cores per socket and 20 logical processors per socket, 80 total logical processors; using 40 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
To understand the performance implications, it helps to look at a diagram of the server’s system board:
The 4-3-2-1 across the top are the four Xeon processors, each of which has 10 cores (20 with hyperthreading.)
The 8-7-6-5-4-3-2-1 across the bottom are the memory banks. Memory is not shared across all CPUs – each CPU has its own local banks of memory, a critical concept in NUMA – non-uniform memory access. To learn more about NUMA, check out my MCM video on the SQLOS.
Here’s what that concept means in practice for our server – say we’ve got 1TB of total memory:
Each processor gets its own local memory.
And when SQL Server says, “Hey, you’re only licensed for half of these cores, so I’m only going to turn on half of them,” it will only use the memory in half of the NUMA nodes. In this case, we’re licensed for 40 logical processors, so we have to dig further into DMVs and logs to figure out which NUMA nodes got turned on, and whether or not our memory is even available to us. See, if memory is hooked up to a NUMA node that isn’t enabled, that memory won’t be accessible to SQL Server.
Similar problems exist when you run Standard Edition on a server with more than 4 CPU sockets. Sounds crazy – most of us don’t use servers that large – but this situation pops up a lot when VMs are misconfigured with 8 virtual CPUs, each of which is presented to the guest as its own CPU socket.
How to Fix It
One option is to license all of your sockets and cores. (Hey, listen, you’re the one who wanted a solution.) Don’t buy hardware with more cores than you actually need because CPUs with more cores typically have slower clock speeds (often by 1/3 or 1/2) compared to CPUs with less cores.
Another option is to use affinity masking to tell SQL Server, “These are the specific cores I want you to use,” and manually load balance the work across all NUMA nodes.
sp_BlitzCache™ is a fantastic tool for finding the worst queries in your SQL Server. Did you know that it has a number of advanced options available to you? These options can help you dig into your worst queries, slice and dice the data, and get back exactly what you need, without wading through hundreds of results.
Only/Ignore Query Hashes
Version 2.3 of
sp_BlitzCache™ introduced two parameters so you can ignore individual query hashes. Reader Brian Baker had the idea of filtering out a single stored procedure by supplying all of the query hashes for a stored procedure. There’s one problem – there’s no easy way to filter an entire procedure (well, not without making
sp_BlitzCache™ really complex).
Here’s how we’d go about answering Brian’s question:
Step 1: sp_BlitzCache™ in expert mode.
EXEC dbo.sp_BlitzCache @results = 'expert' ;
Find the stored procedure that you want in the plan cache and then scroll all the way to the right until you get just past the
Query Plan column. There you’ll find the
Plan Handle. This lets us find a stored procedure. Copy it to your clipboard and then past it into the following script:
/* Step 1: Run sp_BlitzCache @results = 'expert' */ EXEC dbo.sp_BlitzCache @results = 'expert' ; /* Step 2: Grab the plan_handle of the plan you want to examine. */ DECLARE @plan_handle AS varbinary(64) ; SET @plan_handle = 0xEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ; WITH XMLNAMESPACES('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS p) SELECT query_hash FROM ( SELECT n.value('@QueryHash', 'nvarchar(max)') AS query_hash FROM sys.dm_exec_procedure_stats AS deps CROSS APPLY sys.dm_exec_query_plan(deps.plan_handle) AS deqp OUTER APPLY deqp.query_plan.nodes('//p:StmtSimple') AS q(n) WHERE deps.plan_handle = @plan_handle ) AS x WHERE query_hash IS NOT NULL ;
Alright, now that we have our query hash(es), you can plug them into
EXEC sp_BlitzCache @only_query_hashes = '0x6A26E5C5E3825653' ;
Or, if you feel like ignoring that particular set of statements:
EXEC sp_BlitzCache @ignore_query_hashes = '0x6A26E5C5E3825653' ;
And, there you have it – how to zoom in on a single procedure, or completely ignore it.
Only Look at Queries or Procedures
This is much simpler, to narrow down the scope of
sp_BlitzCache™, you can use the
@query_filter parameter. This parameter can be one of three values: procedures, statements, all. The default is all and, by default,
sp_BlitzCache™ will look at stored procedures, statements, and triggers. By specifying either “procedures” or “statements”, you can force
sp_BlitzCache™ to narrow down its scope.
This can be particularly helpful if you have encapsulated your business logic into stored procedures and you want to know where to focus your tuning, rather than worrying about individual statements.
Rapidly Re-Running sp_BlitzCache™
Have you ever run
sp_BlitzCache™ and noticed that it takes a while on production systems? And then you realized, after you wandered off and got coffee, that you forgot to use the
I’ve done that, too.
sp_BlitzCache™ now ships with a
@reanalyze feature. By default,
sp_BlitzCache™ will analyze your plan cache fresh every time the query is executed. By using
@reanalyze = 1, you’re telling
sp_BlitzCache™ to skip collecting data and immediately report on the information that it already has. This is great when you want to export to excel, send the report to your co-workers, and then immediately get to work on the results. It also lets you re-sort the already collected data in a different ways.
There You Have It!
Three fancy features in
sp_BlitzCache™ that you can use to make your life easier and more fun.
What are you waiting for? Download it right now!
8:25AM Pacific: Goooood morning Seattle! It’s time for day 2 of the PASS Summit, and this morning we’ve got a technical keynote lined up. I’m a much bigger fan of these than the day 1 marketing keynotes.
Dr. Rimma Nehme from the Microsoft Gray Systems Lab is talking Cloud Databases 101. Here’s the abstract:
Cloud computing has emerged as an extremely successful paradigm for deploying applications. Scalability, elasticity, pay-per-use pricing, and economies of scale are some of the major reasons for the successful and widespread adoption of cloud. A cloud database is a type of database service that is built, deployed and delivered through a cloud platform. In this talk, I’ll give rapid introduction to cloud databases: what they are, where they came from, how they are implemented, and give a rough taxonomy. We’ll look at some of the specific examples of the cloud databases running out there in the wild and the latest state-of-the-art cloud database research.
I’ll be liveblogging it this morning, taking notes. I’m not going to transcribe it line-by-line because these technical keynotes are typically much more … technical … and you can watch the session live on PASS TV anyway.
Refresh the page every few minutes to see the updated notes at the bottom of this page.
8:31AM – The SQLPASS PASSion Award goes to Andrey Korshikov, who’s done all kinds of Russian event organizations in the last few years. Sounds really cool.
8:35AM – The next PASS Summit will be back in Seattle again on Oct 27-30th, 2015.
8:36AM – Dr. Rimma Nehme taking the stage.
8:42AM – The clicker isn’t working. Who cares. Confession: before the keynote, she said to me, “You’d better go easy on me.” I told her she has nothing to fear because I love technical keynotes. Yes, dear readers, I’m a softie for keynotes that aren’t trying to sell me things, only teach me things. That’s why we try to do the presentations we do as a company, too. Help by teaching.
8:44AM – Cloud computing characteristics: on-demand self-service, location transparent resource pooling, ubiquitous network access, rapid elasticity, and measured service with pay per use. (See, this is how you kick ass with a technical keynote – that one slide alone teaches everyone in the room something, and it’s accessible at all levels. That slide is awesome.)
8:46AM – Quick & easy deployment means that you can be agile with shorter development cycles and no need to wait for provisioning. Technically, you can do this on-premise in large corporations, but the beauty of the cloud is that it brings large-company IT scale without the red tape and red P&L statements.
8:48AM – “Storage costs are rapidly approaching zero.” For example, Amazon Prime’s free unlimited photo storage is a great example of this – companies can make enough off the storage of your data that they don’t mind eating the storage costs. Think of it as banks that can sit on your money and make money with it – cloud utilities can sit on your data and use it for interesting revenue purposes. While that makes a lot of the geeks in the room feel a little dirty, the public seems to feel it’s a fair tradeoff.
8:50AM – Showing Microsoft’s container-based data center in Chicago.
8:53AM – Dr. Nehme is explaining how container-based data centers improve power efficiency by letting servers run hotter, plus cool them down with more ecological methods like evaporative cooling. Again, this is how you do a killer database keynote – show concepts that are very interesting to your audience regardless of their expertise. Most of the audience has something in common, but not enough expertise on that topic. Play to that, and you can keep them entertained and educated.
8:57AM – Explaining PaaS, IaaS, SaaS etc by breaking them down into infrastructure, platform, software, and applications (Office). I’d love to see a pop quiz to know how many people know the differences between those. That’s our job as data professionals – we do need to know these – and I think a lot of us don’t. (I’m talking about you, dear reader.)
9:00AM – Explaining them with Pizza as a Service as an example.
9:08AM – Dr. Nehme’s focusing on virtualizing SQL Server in the cloud with shared resources. My notes will be less frequent here because I think the topic is covered online pretty well in a lot of resources (not to mention her talk) and I’m not going to add a lot of value here. Nothing against the talk – this actually gets to the heart of what I mentioned earlier, that so many folks need to understand more about services.
9:16AM – “Container-based hardware is 99.9% reliable, but we need 99.99% software.” That means you have to expect whole containers and data centers to fail, and assume you’ll need redundant data centers with the same services and databases. It’s thinking about reliability at a much higher level – for example, you might not bother with redundant power supplies or fans in a server. You can save a lot on power and cooling for the whole data center.
9:18AM – To achieve reliability in this mixed scenario (99.9 vs 99.99), Azure DBs have a primary database, plus three replicas. (It’s not explicitly stated on the slide, but you would assume there would be one or two in the same data center, plus one or two in other data centers or regions, based on how much you want to pay for geo-redundancy.
9:20AM – Each SQL DB node has one database, and multiple tenants are in that same database. They’re just logically separated – they all share the same data file and log file, but that’s not visible to you as a consumer.
9:24AM – “The number of DBAs is significantly smaller than the demand.” This is totally true – all of us are overworked, and there’s plenty more DBA jobs. The cloud (just like virtualization) is designed to offload work from you.
9:26AM – “Do the cost/benefit analysis.” This is one of the tough things for DBAs – most of us aren’t able to precisely identify the costs with on-premise databases. The cloud makes that way easier.
9:27AM – Dr. Nehme is wrapping up, and thanking Dr. DeWitt, who just took the stage. Great job!
We had so much fun with this last year that we’re doing it again. Starting at midnight Eastern (11PM Central,
6AM UTC 5AM UTC) on Friday, November 28th, we’re going to offer a day of special deals on our training videos and our in-person classes:
Coupon Doorbuster90 will take 90% off the cost of any of our 4-5 day in-person training classes. Instead of $3,195-$3,995, you’ll be paying just $319.50-$399.50. This sale will be good for only the first ten people – you’ll need to set an alarm and move fast to get a deal this good.
Coupon Doorbuster50 will take 50% off everything – our videos and training classes – for the first fifty buyers, or until 6AM Eastern, whichever comes first.
We’ll be posting more deals throughout the day, but we wanted to give you the heads-up on this so you could start making your plans and convincing the boss. During the day, you’ll want to be following @BrentOzarULTD or watching our Facebook page to know when the deals come out – they’ll be first come first serve, announced every 4 hours.
Good luck, and we’ll see you when the doors come busting open!
Four years ago, we put on the FreeCon: a free invitation-only pre-conference event before the PASS Summit here in Seattle. The FreeCon was like a pre-conference session focused on your career and the SQL Server community. The goal was to spark fires, encourage people to reach for new levels of community contribution, and build connections to help people succeed.
Looking back at the pictures, it’s amazing to see the progress everybody made in their careers since that event. Part of the magic of the FreeCon was identifying the right mix of community leaders and up-and-coming contributors, new bloggers and experienced authors, outstanding presenters and people who’d never gotten up on stage before. The original FreeCon attendees have since gone on to become vendor evangelists, PASS Board of Directors members, published authors, MVPs, and MCMs.
We ran a couple more Free-Cons since, and it’s always been one of the most fulfilling events. It’s so fun to help people achieve new levels of success in their careers and the community.
In 2015, we’re taking it up a notch.
We’re gathering 50 people together for FreeCon 2015 on the Monday before the PASS Summit in Seattle, WA. It’s your chance to share ideas about blogging, presenting, careers, and technology with some of the most interesting people in the database community – and we don’t mean us, we mean the rest of you.
Wanna take part?
Apply before November 30th. All closed! The winners and … uh, runners-up – will be emailed.
10:03AM – Ranga back onstage to finish things up. Thanking Pier 1 for letting them share the exploratory work. And we’re out!
10:00AM – Power BI adding a new authoring and editing mode. James removes the pie chart and replaces it on the fly. Now that is knowing your audience – nice job.
9:58AM – Demoing updated Power BI dashboards. Man, these screens are always so gorgeous – Microsoft is fantastic at reporting front end demos.
9:56AM – Oops. Looks like somebody is running out-of-date Apple software that needs an update. Note the screen.
9:54AM – “I’ve been watching the Twitter feed, and I kinda have a pulse of where people are sitting.” Talking about why Microsoft is moving to the cloud. “I can ship that service every single week.” It gives Microsoft agility.
9:50AM – Microsoft’s James Phillips takes the stage by briefly recapping his background with Couchbase. (That name has come up a few times today.) “Data is a bucket of potential.” Love that line.
9:49AM – Demoing a phone app to locate items in, you guessed it, Pier 1. Was there some kind of sponsorship deal here? I do love the opportunity to tell a single story though. Just wish they’d have tied it to a single individual’s journey through the store through the entire keynote.
9:44AM – “Using 14 lines of SQL code, you can do real-time analytics.” Uh, that is the easiest part of the entire demo. How about putting Kinects in stores, building the web front end, etc?
9:40AM – Demoing a browser-based SSIS UI for Azure Data Factory.
9:37AM – Sanjay Soni taking the stage and explaining in-store analytics demos using Kinect sensors to watch where customers go using a heat map.
9:35AM – Apparently the “wave” trick had absolutely nothing to do with the session – maybe some kind of ice-breaker? This keynote has now gone completely surreal. He’s now moved on to machine learning. No segue whatsoever.
9:32AM – Joseph Sirosh, machine learning guy at Microsoft, starts by having the community applaud themselves. And now he’s teaching us to do the wave. Uh, okay.
9:30AM – Demoing a restore of the local portion of the database. Even restoring a database got applause. Tells you how desperately rough that first stretch of the keynote was.
9:26AM – Demoing stretch tables: 750mm rows of data in Azure, plus 1mm rows of data stored locally on the SQL Server. It’s one table, with data stored both in the cloud and local storage. You can still query it as if it was entirely local.
9:22AM – Showing a demo of a Hekaton memory-optimized table, plus a nonclustered columnstore index built atop it. This is genuinely new.
9:20AM – “What if you could run analytics directly on the transactional data?” Doing a live demo of a new way to run your reports in production.
9:18AM – Ranga announces a preview coming at some point in the future. “And that’s the announcement.” Literally one clap, and it was from the blogger’s table. This is bombing bad.
9:15AM – Uh, big problem here. Ranga just said Stack Overflow is using Microsoft SQL Server’s in-memory technologies. That is simply flat out wrong – Stack does not use Hekaton, Buffer Pool Extensions, or anything like that. Just plain old SQL Server tables. Very disappointing. If you’re at PASS, look for Nick Craver, one of their database gurus who’s here at the conference.
9:12AM – Now talking about something near and dear to my heart: how Stack Overflow is pushing limits.
9:10AM – To recap the demo, Azure SQL Database does geo-replication and sharding.
9:06AM – The utter and complete dead silence in the room tells a bigger story than the demo. This is just not what people come to PASS to see. If you think about the hourly rate of these attendees, even at just $100 per hour, this is one expensive bomb.
8:58AM – Quite possibly the most underwhelming demo I’ve ever seen. If you want to impress a room full of data professionals, you’re gonna need something better than a search box.
8:56AM – Pier 1 folks up to demo searching for orange pumpkins.
8:52AM – “The cloud enables consistency.” Errr, that’s not usually how that works. Usually the cloud enables eventual consistency, whereas on-prem enables consistency. I know that’s not the message he’s aiming for – he’s talking about the same data being available everywhere – but it’s just an odd choice of words.
8:49AM – Discussing Microsoft’s data platform as a way to manage all kinds of data sources. This is actually a huge edge for Microsoft – they have the Swiss Army knife of databases. Sure, you could argue that particular platforms do various parts much better – you don’t want to run a restaurant on a Swiss Army knife – but this is one hell of a powerful Swiss Army knife.
8:45AM – Ranga’s off to a really odd start. He starts by talking about Women in Technology and says “We’ll do our best,” and then segues into an odd story about his wife refusing to use online maps. Really, really awkward.
8:41AM – Microsoft’s T.K. Ranga Rengarajan taking the stage.
8:39AM – Watching several minutes of videos about people and stuff. No clapping. Hmm.
8:34AM – Tom’s a natural up on stage, totally relaxed.
8:33AM – Thanking the folks who help make the event possible: sponsors and volunteers.
8:31AM – Hands-on training sessions for 50 people per workshop are available here at PASS. Good reason to come to the conference instead of just playing along online – convince your boss by explaining that you can’t really get this hands-on experience anywhere else as easily.
8:27AM – PASS has provided 1.3 million hours of training in fiscal year 2014. (Would be interesting to see that broken out as free versus paid, and national versus local user group chapters.
8:22AM – PASS President Tom LaRock taking the stage and welcoming folks from 50 countries, 2,000 companies.
8:18AM – From the official press release: “The PASS Summit 2014 has 3,941 delegates as of last night and 1,959 pre-conference registrations across 56 countries for a total of 5,900 registrations.” The registrations number is always a little tricky because it counts pre-con attendees multiple times, but that delegates number is phenomenal. Nearly 4,000 folks is a lot! This is far and above the biggest Microsoft data event.
8:16AM – Folks are coming in and taking seats.
8:00AM Pacific – Good morning from the Blogger’s Table at the Seattle Convention Center. It’s day 1 of the PASS Summit 2014, the biggest international conference for SQL Server professionals. A few thousand data geeks are gathered here to connect, learn, and share.
The keynote session is about to start, and here’s how this works: I’m going to be editing this blog post during the keynote, adding my thoughts and analysis of the morning’s announcements. I’ll update it every few minutes, so you can refresh this page and the news will be up at the top.
You can watch the keynote live on PASS TV to follow along.
Here’s the abstract:
Evolving Microsoft’s Data Platform – The Journey to Cloud Continues
Data is the new currency and businesses are hungrier than ever to harness its power to transform and accelerate their business. A recent IDC study shows that business that are investing in harnessing the power of their data will capture a portion of the expected $1.6 trillion dollar top line revenue growth over the coming four years. SQL Server and the broader Microsoft data platform with the help of the PASS community are driving this data transformation in organizations of all sizes across the globe to capture this revenue opportunity.
In this session you will hear from the Microsoft Data Platform engineering leadership team about recent innovations and the journey ahead for Microsoft’s data platform. Learn first-hand how customers are accelerating their business through the many innovations included in SQL Server 2014 from ground breaking in-memory technologies to new highly efficient hybrid cloud scenarios. See how customers are revolutionizing their business with new insights using Power BI and Azure Machine Learning and Azure HDInsight services. Learn about the investments were are making Microsoft Azure across IAAS and PAAS to make it the best cloud hosting service for your database applications.
Ignore the first paragraph, which appears to be written for salespeople attending the Microsoft Worldwide Partner Conference, not the
Professional Association for SQL Server PASS Summit. The second paragraph – not to mention the neato changes at Microsoft lately – offer a glimmer of hope for us geeks. If Microsoft wants to win market share away from Amazon’s huge lead, they’re going to have to bring out features and services that compete. The terms “investments” and “journey ahead” implies that we’ll be hearing about future features in Azure and SQL Server vNext.
Let’s see what they’re announcing today – and remember, like I wrote last week, my new perspective today is a chillaxed 1960s car show attendee. Bring on the flying cars.
For the latest updates, refresh this page and check the top.
I spend most of my day tuning SQL Server to make it go faster. I’m usually called in after the fact, when the app has become intolerably slow.
One of the first things I ask is, “What’s changed?”
Nobody ever knows with any sense of accuracy.
I understand – until now, SQL Server hasn’t shipped with any kind of change detection or tracking for common execution plan problems.
How We Manage Queries and Plans in SQL Server 2014
A good performance tuner uses tools like sp_BlitzCache®, Opserver, and Plan Explorer to identify their top resource-using queries and examine their execution plans. They’re intimately familiar with those plans and how they look today, and the tuner makes ongoing efforts to improve the shape of those plans.
Those tools look at execution plans exposed in dynamic management views and functions, the internal instrumentation tables of SQL Server. Unfortunately, those views clear out whenever SQL Server is restarted. Or the plan cache is cleared. Or statistics get updated. Or you come under memory pressure. Or …you get the picture.
If a query suddenly gets a bad plan and rockets to the top of the resource utilization charts, the tuner examines the root cause of the variation, but she’s often unable to see the prior version of the plan. Sometimes a growing amount of data in one of those tables will influence the optimizer into picking a different execution plan, or maybe someone made an ill-advised sp_configure change. Ideally, we work on the query, statistics, indexes, and sp_configure settings to get the plan back where it needs to be.
The hard part here is that we often have no idea what the plan looked like before. Sure, if we’ve got the budget to get fancy, we install performance monitoring software that tracks the execution plans of all our queries over time.
Even when we know what the plan looked like before, it’s not always easy to get SQL Server to change the execution plan. We end up using tricks like plan guides and hints to get the plan we want. I used to see plan guides as a tool of the devil, but I’ve lived long enough to see myself become the villain.
The SQL Server Query Store: Putting Your Plans on Layaway
Enter a recently declassified session at the PASS Summit. On Wednesday, November 5, Conor Cunningham will unveil the Query Store:
Have you ever come in to work only to have the boss come tell you that your main site is down and the database is “broken”? Fixing query performance problems can be rough, especially in high-pressure situations. Microsoft has developed a feature to help customers gain significantly easier insight into production systems and to be able to quickly fix cases where a new plan choice from the query optimizer has undesired performance consequences. This talk introduces the Query Store, explains the architecture, and shows how it can be used to solve real-world performance problems. It will now be possible to ask questions like “show me what query plans have changed since yesterday” and to quickly ask the optimizer to “go back” to the query plan that was working fine for you previously.
This is where things get a little tricky for me as an MVP. If I have any advance knowledge of something, I can’t confirm or deny it, and I certainly can’t blog about it. Buuuut…I can read the abstract, put on my thinking cap, and talk about it in terms of what’s already public.
Reading that abstract, you can infer that:
- SQL Server’s new Query Store will cache queries and execution plans even after they’ve changed (like due to a statistics change on a table)
- These changes may even persist beyond a SQL Server restart (meaning they’ll have to be written to disk somewhere)
- The current and prior plans will be exposed in a DMV for you to query (meaning you might be able to roll your own alerts when a plan changes so you can check out whether it’s better or worse)
So the questions you might ask would be:
- Will this functionality work with read-only databases? Think AlwaysOn Availability Group readable secondaries, or servers used as log shipping reporting boxes.
- Will it work with plans that are not normally cached? Think trivial plans or Optimize for Ad Hoc Workloads.
- What happens if you guide a query into an execution plan, and changes to the database mean the plan is no longer valid? Think dropping indexes.
- Will you be able to see what the query would look like without the frozen plan? Think about adding new indexes or updating stats on a table, and wanting to see whether the new plan would be better or worse without endangering running queries.
- If the Query Store data is written into the database itself, will the execution plans flow through to other servers? Think AlwaysOn AG secondaries and development servers that are restored from production. This is especially tricky if the feature is considered an Enterprise Edition feature, and thereby restricts the ability to restore the database onto a Standard Edition box like compression & index partitioning.
And of course, will it be Enterprise-only or included in Standard Edition, and what will the release date be? Those last ones are outside of Conor’s control, obviously, but you should still ask them. And then tell me what the answer is when you find out, because I don’t know either.
This Is Gonna Be Big.
I love features like this because everybody wins. It doesn’t require changes to existing applications, it doesn’t require attention out of the box, and it just gives more tools to performance tuners like me.
I don’t usually recommend that PASS attendees sit in forward-looking sessions that cover features that may not be in your hands for quite a while. In this case, I’m making an exception: attend Conor’s session. He’s one hell of a smart guy, and he has incredibly elegant ways of answering questions with honesty and insight. I’m publishing this blog post a little early because I want you to start thinking about the feature and how you’d use it in your line of work. That’ll prep you to ask Conor better questions, and you’ll get the most out of this great opportunity.
Enjoy it, jerks, because I’m giving a lightning talk at the same time. I swear, if you do performance tuning and I see you in my talk instead of Conor’s, I’m gonna make you take over my talk so I can go watch Conor.
Update 10:30AM Pacific: after a discussion on Twitter, Argenis Fernandez and I put some money down. If Query Store is fully functional in Standard Edition, I’ll donate $250 to Doctors Without Borders. If it’s only fully functional in Enterprise Edition, Argenis will donate $250. Everybody wins! Well, except Express Edition users.
We give away a lot of stuff – scripts, setup checklists, e-books, posters, you name it.
But we kept hearing a theme from folks: “Wow, I’ve seen one of your tools before, but I had no idea there were so many others!” In order to get everything, they had to go all over the place in our site.
To fix that, we’ve got a new easy button: our free SQL Server download pack. Now when you get anything, you’ll get everything in a single zip file, plus get email notifications whenever there’s a new version.
Enjoy, and hope we make your job suck just a little less.
Every year, when I attend the PASS Summit conference, I liveblog the announcements. Let’s take a quick walk down memory lane:
- SQL Server 2014 CTP2, the last preview before the official release
- You could deploy in-memory OLTP (Hekaton) without rebuilding your application (which just simply wasn’t true)
- AlwaysOn AG secondaries in Windows Azure
- Backup to Windows Azure
- Data files in Windows Azure
- Office365 PowerQuery – the ability to use plain English queries against your data
- In-memory OLTP (Hekaton) feature
- Hadoop integration coming at some point
- SQL 2012 Parallel Data Warehouse
- HDInsight and PolyBase
So, looking back – which of these features are you using?
Yeah, me too. And at first, I thought that sucked.
But I’ve Been Doing Keynotes Wrong.
In preparation for this year’s keynotes, I looked back and got really frustrated. Every year, new features were trotted out, and I got really frustrated because they didn’t make any real-world sense. I blogged angrily about how these features wouldn’t ever catch on with the mainstream.
But Microsoft is playing a different game.
Microsoft is treating keynotes the way car manufacturers treated car shows back in the 1950s-1960s. It wasn’t about showing people what they’d be buying in the showroom next year – it was about a race to the top of technology, showing off that your brand was thinking farther ahead than anybody else.
Nobody’s actually putting their on-premise SQL Server’s data files in Windows Azure. It’s surreal, the same way flying cars are surreal. But who cares? This year, I’m embracing the wild and the fantastic. Bring on the turbine-powered databases that don’t even allow me to change indexes. Show me iPad-based BI that will probably never ship. For one keynote, I’m going to suspend disbelief and just enjoy the make-believe.
And I’ll be tweeting my reactions from @DBAreactions.