We surf the web a lot, so every Monday morning in our email newsletter, we share our favorite links from that week. It’s not just about SQL Server, and it’s not just about new links – sometimes it’s inspiring career advice, sometimes it’s classic computer white papers. In this 30-minute webcast, we shared our favorites from 2013. Enjoy!
SQL Server has amazing instrumentation. Dynamic management views and functions give you great insight into what’s happening in your SQL Server and what’s slowing it down. Interpreting these views and functions takes a lot of time, but with practice and skill you can use them to become great at performance tuning.
But nothing’s perfect. Some of these DMVs and DMFs have bugs or column names that can be misleading. If you take them at face value, you can end up with egg all over your face.
sys.dm_os_sys_info – hyperthread_ratio column
This DMV is great for quickly checking the last startup time of your SQL instance and finding out if it’s virtualized. But don’t trust sys.dm_os_sys_info to tell you whether or not hyperthreading is enabled on your processors.
The “hyperthread_ratio” column is simply an indication that you have multiple cores per processor and does NOT tell you whether or not they are using hyperthreading.
sys.dm_index_usage_stats – the whole concept of “usage”
This DMV is trying to tell you the truth, but almost nobody understands what it means. This DMV tells you how many times a query used an index in its execution plan– but it doesn’t tell you exactly how many times the index was accessed. Only the number of operators referencing it when the plan was run.
A simple example: I run a query that does a nested loop lookup to retrieve a value from the index PK_Address_AddressID. The query runs once, and in that one run it executes the lookup 30 times. This is counted as one “use” of PK_Address_AddressID, even though 30 lookups were executed.
If a single index appears more than once in an execution plan (due to a subquery, union, etc), then it’ll be counted once for each operator in the plan.
TLDR; index_usage_stats does NOT show you the number of “reads” or “writes” an index is really serving up.
sys.dm_exec_trigger_stats – execution_count and last_execution_time columns
Which triggers on your server are running the most and using the most resources? The sys.dm_exec_trigger_stats DMV seems like it’s perfect to answer that question, but beware.
There’s a bug where execution_count increments due to completely unrelated actions that wasn’t fixed until SQL Server 2012. (I’ve seen the info from this DMV be wonky on 2008R2, and I’ve validated I can’t reproduce this on SQL Server 2014, but I haven’t personally tested on 2012.)
Side note: isn’t it cool that Microsoft makes the Connect bug data public? I’m not sure that I ever would have figured out what contributes to inflating the execution counts on my own!
sys.sysindexes – Rowmodctr
It can sometimes be useful to estimate how many changes have occurred since statistics were last updated on a column or index. This gives you a quick way to guestimate if those stats are maybe not-so-fresh.
Good news on this one– the column isn’t perfectly true, but Books Online has a great rundown of its issues. It lets you know that SQL Server doesn’t really use this counter for its own purposes, and that it’s only roughly accurate at best.
And then it lets you know that SQL Server doesn’t expose column modification counters, so this maybe-pretty-close-guestimate counter is still perhaps better than nothing.
I’m a huge fan of SQL Server’s performance counters, wheter you’re querying them via sys.dm_os_performance_counters or looking at them with perfmon.exe.
But there’s a whole host of misleading and just plain outdated counters that will lead you astray. Learn which perf counters to beware in Jeremiah’s post on perfmon.
SQL Server’s Instrumentation is Great
And honestly, so is SQL Server’s documentation. The challenge is that there’s a massive amount to document– and Books Online can’t cover all the nuances of everything. Keep using those DMVs– just keep an eye out for the gotchas.
Say you run an online store, and on each item’s page, you need to show related items that people purchased. Let’s take an Amazon page for my favorite rechargeable AA batteries:
In a perfect world, we would cache this data in the web/app tier – but back here in the real world, sometimes our developers build stored procedures to fetch this kind of data, and the stored procedure ends up getting called way too often.
To solve it, let’s build a caching layer into our stored procedure.
Here’s the pseudocode of how our stored procedure usually works:
I’m using a really simple stored procedure, but this approach works best when you have a complex stored procedure that does a lot of heavy lifting – calculations, querying big tables, aggregations, etc.
Instead of directly doing all the heavy work, check this out:
I’ve introduced a new table here – Cache.dbo.GetRelatedItems. I created a new database called Cache, and I keep it in simple recovery mode. I can use a totally different backup strategy for this database – perhaps not even backing it up at all.
The Cache.dbo.GetRelatedItems table has exactly the same columns that are normally returned by usp_GetRelatedItems, plus the input field. In this case, our stored procedure has an input field of ItemID, and it returns RelatedItemID and RelatedItemName, so the table would look like the one shown at right.
Rocket surgeon readers will note that I added an ID field to give the table something unique. They will also notice that I didn’t add a DatePopulated or DateCached field – depending on my business needs, I would probably just truncate this table every night. In the case of related items in an e-commerce store, I might only truncate it every week, and I’d want to do it right before a really low load period so that the cache could gradually refresh. This outright-truncation technique is less efficient for refreshing the cache, but it minimizes the locking required by deletes. In a caching setup, I’m worried about concurrency here.
When implementing a solution like this, I usually do a lot of A/B performance testing to find the right clustered index for the table. Typically each caching table has no non-clustered indexes, and has just one clustered index designed to produce the fastest range scans for the number of parameters for the stored proc. (Before somebody posts a comment asking for that strategy, no, I’m not blogging that, because it’d take me a day, and I’m lazy.)
If you choose not to back up the Cache database, your code should not rely on the existence of objects in it. It should start by checking to see if the Cache.dbo.GetRelatedItems table exists, and if not, create it. That way if you fail over to your DR site and the Cache database is empty, your queries won’t fail.
I’ve over-simplified the stored procedure a little, though – something actually has to populate the cache table. There’s two ways I could do it: externally, like a SQL Agent job or an SSIS process, or internally – inside the stored procedure itself. Let’s code that:
We start by checking the cache table for results, and if we don’t find any, we add them.
I’m cringing as I write this because I can hear the screams of performance tuners. Yes, I’m adding additional write load on the SQL Server – keep in mind that I only use this approach when I’m faced with:
- A very work-intensive but read-only stored procedure
- Called very frequently (hundreds or thousands of times per minute)
- Whose results change less than once a day (or where we’re not concerned about real-time accuracy)
- A business that needs immediate speed and can’t wait for developers to implement a caching layer
As soon as I deploy a solution like this and the business pain goes away, I immediately start working with the developers on a better long-term solution. This solution is an emergency band-aid to get the business up and running, but it still incurs load on the SQL Server for writing the caching results, getting locks, and running the stored procedure. This is when I start talking to the developers about caching in the app tier, and here’s my favorite resources on that:
Our mysterious employee #2 – where did he come from? How did he get into SQL Server? What does he want to learn in his first year at work with us? The answers to those questions, and more, will reveal themselves in this webcast recording:
This artist formerly known as SQL Azure is a cloud service something akin to Microsoft SQL Server.
When it first came out, it had a lot of challenges – a small subset of T-SQL commands and datatypes, inability to take backups, and absurdly small database sizes.
But much like Emma Watson, when your back was turned, WASD matured into a surprisingly capable platform. However, most of us still aren’t using it, preferring to run SQL Server on-premise.
This leads to an interesting question: what would Microsoft have to do to get you into Windows Azure SQL Database?
The latest version of sp_Blitz™ alerts you if you haven’t been using the WITH CHECKSUM parameter on your backups. This parameter tells SQL Server to check the checksums on each page and alert if there’s corruption.
But what about corrupt backups? Books Online says:
NO_CHECKSUM - Explicitly disables the generation of backup checksums (and the validation of page checksums). This is the default behavior, except for a compressed backup.
CHECKSUM - Specifies that the backup operation will verify each page for checksum and torn page, if enabled and available, and generate a checksum for the entire backup. This is the default behavior for a compressed backup.
Hmmm, let’s see about that. In my SQL Server 2014 lab environment, I shut down my primary replica, then busted out the hex editor XVI32 to edit the data file by hand, thereby introducing some corruption on a clustered index.
After starting the replica up again, I ran a normal compressed backup:
BACKUP DATABASE [AdventureWorks2012] TO DISK = N'\\DC1\SQLCLUSTERA\MSSQL\Backup\AW20131202_248_NoChecksum' WITH NOFORMAT, INIT, NAME = N'AdventureWorks2012-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
The backup completed fine without errors – even though compressed backups are supposed to run WITH CHECKSUM by default.
Then I ran a compressed backup and manually specified the CHECKSUM parameter:
BACKUP DATABASE [AdventureWorks2012] TO DISK = N'\\DC1\SQLCLUSTERA\MSSQL\Backup\AW20131202_256_Checksum' WITH NOFORMAT, INIT, NAME = N'AdventureWorks2012-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10, CHECKSUM
That time, the backup stopped with an error:
10 percent processed. Msg 3043, Level 16, State 1, Line 4 BACKUP 'AdventureWorks2012' detected an error on page (1:3578) in file '\\dc1\SQLClusterA\MSSQL\Data\AdventureWorks2012_Data.mdf'. Msg 3013, Level 16, State 1, Line 4 BACKUP DATABASE is terminating abnormally.
Conclusion #1: Compressed backups don’t really check checksums. No idea if that’s a bug in the code or in the Books Online article.
But the plot thickens – this particular database is also part of an AlwaysOn Availability Group. One of the cool benefits of AGs (and also database mirroring) is that when one of the replicas encounters corruption, it automatically repairs the corruption using a clean copy of the page from one of the replicas. (After all, I didn’t use a hex editor on the secondary – only on the primary’s data file, so the secondaries still had a clean copy.)
After running the first backup (compressed, but no checksum), I queried sys.dm_hadr_auto_page_repair, the DMV that returns a row for every corruption repair attempt. The DMV held no data – because a backup without checksum doesn’t actually detect corruption.
After running the second backup (compressed, with checksum), I queried sys.dm_hadr_auto_page_repair again, and this time it successfully showed a row indicating which page had been detected as corrupt. However, the backup still failed – but why?
The clue is in the Books Online page for sys.dm_hadr_auto_page_repair – specifically, the page_status field’s possible values:
The status of the page-repair attempt:
2 = Queued for request from partner.
3 = Request sent to partner.
4 = Queued for automatic page repair (response received from partner).
5 = Automatic page repair succeeded and the page should be usable.
When I first queried the DMV, the page’s status was 3 – request sent to partner. My primary had asked for a clean copy of the page, but because my lab hardware is underpowered, it took several seconds for repair to complete. After it completed, I ran the backup again – and it completed without error.
A few things to take away here:
- Automatic page repair is automatic, but it’s not instant. When you’ve got corruption, a query (or backup) can fail due to corruption, and then magically succeed a few seconds later.
- Unless you’re doing daily DBCCs (and you’re not), then as long as you can stand the performance hit, use the WITH CHECKSUM parameter on your backups. Just doing compression alone isn’t enough.
- No, I can’t tell you what the performance hit will be on your system. Stop reading blogs and start doing some experimenting on your own.
SQL Server has this fancy feature called constraints. Database constraints are used to restrict the domain (the range of allowed values) of a given attribute. That’s just a funny way of saying: through a set of carefully crafted rules, we control the shape of our universe.
Our Test Table
We’re going to be testing with the following tables:
CREATE TABLE sandwiches ( id INT IDENTITY(1,1), title VARCHAR(60), price MONEY );
A unique constraint guarantees that one and only one row in the table can have a specific value. If our application requires there is only one sandwich called “The Super Big Elvis”, we need some way to make sure that the database can’t contain two “The Super Big Elvis” sandwiches.
A naive approach would involve application code checking the database for the existence of data. This approach has several problems:
- Every application that connects to the database must either duplicate this logic or else use a centralized service to check.
- There’s no guarantee that another application won’t check for data and save faster.
Instead of requiring our applications to handle this integrity, we can push the uniqueness requirement down to the database using a unique constraint:
ALTER TABLE sandwiches ADD CONSTRAINT UQ_sandwiches UNIQUE ( title );
There’s one important gotcha with SQL Server: only one
NULL value is allowed per column (or set of columns) in the unique constraint. If you had a table with two columns,
b, there are only three possibilities for rows that involve
If you need a uniqueness constraint to ignore
NULL values, then you’ll need to create a unique filtered index like this:
CREATE UNIQUE INDEX UX_customers_full_name ON customers(first_name, last_name) WHERE first_name IS NOT NULL AND last_name IS NOT NULL;
Now we can at least make sure that the uniqueness applies to people with both a first name and last name.
The Primary Key Constraint
A primary key is a logical construct – it’s a set of attributes that cannot be duplicated by any other row in the table. The primary key doesn’t have to be a single column, it can be a composite key – as long as the columns uniquely identify a single row it’s a valid primary key.
Don’t confuse a primary key with a clustered index, though. In SQL Server, the default behavior is to create a primary key as a clustered index – the clustered index defines the physical order of data in the table. This has led to a slew of iffy advice like “never use natural primary keys” or “don’t use GUIDs as primary keys”.
When thinking about primary keys as constraints, concern yourself first with identifying the unique characteristics of the rows for your application. Follow that up with physical optimization decisions once you’re ready to implement the physical model.
ALTER TABLE sandwiches ADD CONSTRAINT pk_sandwiches PRIMARY KEY (id);
Aww yeah, we’ve got a primary key. The default behavior for SQL Server is to create a clustered index under the PK if one doesn’t exist already. If you need to use a different index as the clustered index (e.g. the PK is a GUID and the clustered index is an integer) just make sure that you create the clustered index before you create the PK or that you specific the primary key as a non-clustered index:
ALTER TABLE sandwiches DROP CONSTRAINT pk_sandwiches; ALTER TABLE sandwiches ADD unique_id UNIQUEIDENTIFIER DEFAULT(NEWID()) CREATE UNIQUE CLUSTERED INDEX CX_sandwiches ON sandwiches(unique_id); ALTER TABLE sandwiches ADD CONSTRAINT pk_sandwiches PRIMARY KEY NONCLUSTERED (id) ;
What do PKs buy us? A primary key gets you the same thing as a unique constraint with the added benefit of saying, “This is the primary identifier for this row. Other attribute combinations may be unique, but this combination of attributes identifies this entity in the database.”
Check constraints give you more flexibility than the other forms of constraints. Both unique constraints and primary key constraints operate on a list of columns that we supply – they are limited to simple combinations of columns. Check constraints, however, give you something more.
On the surface, as check constraint is just as simple as any other constraint – the statement in the check constraint has to evaluate to
true. Seems simple, right?
The upside of this approach is that “any statement” means that any valid T-SQL statement can be part of the check constraint… so long as the T-SQL statement doesn’t rely on access to another table. So, we can do something like this:
CREATE FUNCTION fn_check_sandwich_name ( @sandwich_name VARCHAR(60) ) RETURNS BIT AS BEGIN IF LTRIM(RTRIM(@sandwich_name)) = 'Big Mac' RETURN 0; RETURN 1; END;
We want to avoid lawsuits with a certain fast food chain, so this function should keep us safe. In order to use the function a check constraint we can just do the following:
ALTER TABLE sandwiches WITH CHECK ADD CONSTRAINT CK_sandwich_name CHECK (dbo.fn_check_sandwich_name(title) = 1);
Let’s try it out:
INSERT INTO sandwiches (title, price) VALUES ('The Big Ozar', 12.88); INSERT INTO sandwiches (title, price) VALUES ('Big Mac', 2.99);
The first row will successfully inserted into the
sandwiches table. The second insert attempt will fail – the
title matches the failure condition in the function
There’s at least one scenario that this doesn’t catch:
NULLs. With code in its current state, we can insert
NULL in the
title column of
INSERT INTO sandwiches (title, price) VALUES (NULL, 9999.99);
Functions used as check constraints should explicitly check for
NULL inputs; without taking
NULL into account, it’s possible to create a check constraint that will allow incorrect data to end up in a table. Although these data rules should be pushed down to the underlying schema where it’s easy to declare a column as
NOT NULL, any check constraints that operate on incoming data need to make sure they account for
About That World…
Using a combination of table constraints, we can build a complete understanding of the totality of the data in the database without having to execute a single query. Constraints let us push universal rules about the total state of data down into the database where all applications can take advantage of them and where those rules only need to be maintained in one place.
At SQL Rally Amsterdam, I sat down with Matan Yungman for a half-hour interview for the SQL Server Radio Podcast. The podcast is usually in Hebrew, but Matan was nice enough to let me stick with English, heh.
Matan and I had a great time. He came with a list of really good questions, and he even threw some of our own interview questions back at us, hahaha.
During the show, he also interviewed Adam Machanic and Denny Cherry. I think they did it in Hebrew, but I’m still waiting for the download to finish to find out for sure. (I do know that my interview went in first though – I started listening to it and the buffering was just a little on the slow side here in the US.)
You can listen to the podcast here.
As a DBA, you constantly seek ways to optimize database performance. One option is TempDB, a widely misunderstood system database that has a critical impact on how your SQL Server performs. Learn more about this option in this 20-minute video that covers what TempDB is for, how SQL Server uses it, how best to set it up, and how to tell if it’s a performance problem:
You can learn more on our TempDB resources page.
It seems a like a stretch. After all, plumbers get called when sinks are running backwards and when toilets and drains are clogged. But, really, the users and developers do the same thing with DBAs that you do with a plumber – when something is broken, we get the call.
Unclog Your Database Fast
The main thing that plumbers do is fix immediate problems. When this is your job, you have to focus on diagnosing the problem, understanding the ramifications of different solutions, and coming up with the right solution. In databases, as in plumbing, we’re almost never called on to fix problems when we can do this at our leisure – there’s a mess that needs to be cleaned up quickly.
The first thing that we need is a set of tools. Don’t go scrolling down looking for scripts, you won’t find any. The first tool that you need is the ability to listen.
You might recall when Brent had to call a plumber to fix his garbage disposal. Based on Brent’s description, the plumber had a good idea of the problem and knew that there was a simple solution: just give the disposal a good crank. Afterwards, the plumber gave Brent advice on how to keep the problem from happening again. Brent happily wrote a check and the plumber made $1000 an hour for those 10 minutes. The plumber was able to use his listening skills to quickly diagnose a problem, pick the right tool, solve the problem, and leave the customer with tools to prevent the problem in the future. As a plus, by listening, he knew that he didn’t have to bring heavy tools upstairs, just a tiny wrenchette.
As a DBA, you need a similar set of tools – when there’s a problem you need to be able to understand what the problem is, come up with a solution, implement the solution, and watch to make sure things are working correctly. But you can’t just focus on the immediate problem; take some time after you’ve solved the immediate problem to focus on the system as a whole.
The Bigger Picture
While you’re busy unclogging an index, have you ever found yourself saying “You know, if this database design was different, this wouldn’t be an issue”? You’re not alone – your plumber wants your kids to stop flushing toys down the toilet. There’s always something else that needs to be fixed; whether it’s another quick fix or a larger architectural change.
Plumbers also have a different job: sometimes a plumber gets handed a set of plans for a building and is told to make it work. It doesn’t matter that the right parts may not be on hand, there’s a deadline. Walls need to be put in place and if the plumber isn’t done on time, the entire schedule is going to slip. In these situations, the plumber has to deal with the constraints of reality. There are some things that you can do without all of the parts in place – you can put plumbing in place for a washer and dryer without the appliances being on site. But you can’t put the washer and dryer in place and expect them to work without any plumbing.
When you’re designing a new system from the ground up, you don’t have the luxury of taking all the time in the world to build things The Right Way™. You focus on filling the customer’s specs with the equipment that you have on hand. Don’t have particular type of pipe fitting? You’ll either have to wait or devise a solution. This is how things end up different in the real world from the way they were devised on paper – implementation details are frequently more than just minor details. Those database architects who really screwed things up? They weren’t incompetent, they just had a deadline.
You call your plumber to fix clogs that are happening right now. Likewise, management is usually bringing you in because there’s a clogged database. You’re there to fix a problem (unclog the database) and get back to your regularly scheduled day. It’s tempting to bring up all of the design flaws but, unless you’re asked, you probably shouldn’t bring it up. Write them down, keep a list of database design issues in your wish list. You’ll feel better about it.
While you’re writing down all the big problems that you noticed in the database, make sure you write down how this problem could have been prevented. If it’s conducting stress tests to find missing indexes, talk to the development team about the tools they can use. If the developers need some query tuning help, get in there and help them with their query writing skills.
Fixing the Problem with Great Service
Users, management, and developers don’t want to know what they’re doing wrong. They want to know how it can be made right. Sure, there are problems; nobody is happy about it, but the users know that everything can’t be made perfect.
When you help the users make things better, show them the problem. Take the time to explain why it’s a problem – “the dimensions of this toy don’t mesh with the dimensions of your plumbing, it’s only natural that it get stuck”. Then explain how the problem can be prevented going forward.
Great plumbers don’t just explain how to solve a problem, they deliver great customer service while they’re doing it. A great plumber doesn’t make you feel like an idiot because your kids decide to have a toy soldier flushing contest. A great plumber fixes the problem, tells you it happens to everyone, and makes a joke with your kids about how flushing toys is how he became a plumber.
What About the Long Term Problems?
Every time you look at a system, write down all of the long term fixes that should be put in place. Keep a list of them around and build out a plan of attack for the best way to fix it with minimal disruption. If you do a good job of solving the immediate problems, there’s good chance that someone is going to bring you back to fix problems in the future.
Don’t think you need to be silent, though. Tell the users about things you can easily fix right now. If it isn’t much effort to add an extra index during an outage window, suggest the index and go on your merry way. The last time I had a plumber over, he suggested that I throw ice in the disposal once a week in order to keep the blades sharp and to free up any gunk that was floating around in there. Did he try to sell me another disposal? No. Is that plumber going to get my business the next time there’s a problem? You bet.
You’re a plumber. Be happy that you’re unclogging a drain today. Solve the problem you’re brought in to solve and there’s every chance you’ll get to implement the plumbing the next time around.