I love living in the city
Blog posts about people’s favorite data sets seem to be popular these days, so I’m throwing my hat in the ring.
NYC has been collecting all sorts of data from all sorts of sources. There’s some really interesting stuff in here.
Another personal favorite of mine is MTA turnstile data. If you’re a developer looking to hone your ETL skills, this is a great dataset, because it’s kind of a mess. I actually had to use PowerShell to fix inconsistencies with the older text files, which I’m still recovering from. I won’t spoil all the surprises for you.
Of course, there’s Stack Overflow.
You can’t go wrong with data from either of these sources. They’re pretty big. The main problem I have with Adventure Works is that it’s a really small database. It really doesn’t mimic the large databases that people deal with in the real world, unless you
do some work run a script to make it bigger. The other problem with Adventure Works is that it went out of business a decade ago because no one wanted to buy yellow bikes. I’ve been learning a bit about Oracle, and their sample data sets are even smaller. If anyone knows of better ones, leave a comment.
Anyway, get downloading! Just don’t ask me about SSIS imports. I still haven’t opened it.
Thanks for reading!
Here’s a handy flowchart for whenever you find yourself worrying about fragmentation:
*Talk to your doctor about whether fragmentation is a problem for you. Seek help for index maintenance lasting longer than four hours. Serious side effects may occur if all index maintenance is ignored completely.
Brent says: pop quiz: what are you doing more often, rebuilding indexes or running DBCC CHECKDB?
This year, we’re trying something new: every Monday, we’re going to show you one module out of our online training classes.
It’s the whole module, not a sneak peek, and it’s available the whole week. But you’d better not put it off – this post will disappear the following week, and get replaced by a new post.
This week, it’s Jeremiah Peschka explaining the different kinds of joins in SQL Server execution plans:
UPDATE 1/30 – Expired!
(If you don’t see the video above, you’re reading this somewhere that doesn’t support video embedding. You’ll need to read the post on our blog.)
Wanna see more? Check out our How to Read Execution Plans course. We cover what an execution plan is, how to read them, predicates, aggregates, grouping, windowing, parallelism, the plan cache, and common problems in execution plans For $299, you get 18 months of access to stream it on your desktop, laptop, tablet, or phone.
The short version: configure one volume/drive for TempDB. Divide the total space by 9, and that’s your size number. Create 8 equally sized data files and one log file, each that size. Presto, the drive is full and your TempDB is configured for easy performance.
The long version is a little more complicated.
If you’re on SQL Server 2012, get on SP1 Cumulative Update 10 or newer. Microsoft made performance improvements to SQL Server 2014 so that it writes to TempDB less frequently. Because you were so good this year, they even backported it to SQL Server 2012, too. Get on this level – and by this level, I mean the latest updates shown on SQLServerUpdates.com.
Create one volume/drive/array for TempDB. You want this to be a separate volume from your user database & log files – not just for performance reasons, but also for logistical management reasons. Any user can dump a ton of stuff into TempDB without your control, and they can even run TempDB out of space. If TempDB lives on the same volume as your user data and log files, and the drive runs out of space, you can have a tough time restarting SQL Server. However, if TempDB is on its own volume that runs out of space, no big deal – just restart the instance and you’re back in business.
If you’re using a SAN-backed cluster, use local SSDs for TempDB. Since SQL Server 2012, Microsoft has fully supported using local solid state drives even in a cluster. This has three benefits: the local SSD is often cheaper, it’s often faster, and it keeps your SAN path traffic freed up for your valuable user data file & log file reads and writes.
Create 8 equally sized data files. This helps you avoid the GAM/SGAM/PFS page contention issues described in Microsoft KB 2154845. In your journey through the blogosphere, you may have read that at one point you were supposed to create one data file per core. That’s no longer the case – just start with 8, and only add more when you’re facing the page contention issues described in that KB article.
Size them (and the log file) to fill up the drive. If the drive’s dedicated to TempDB anyway, why leave free space? You don’t want SQL Server pausing user activity to stop to grow the log file out, and you don’t want to have to hassle with possibly uneven data file growth sizes, and you haven’t got time for the pain. Just set it up and walk away with confidence.
And while you’re setting up the server, check out our free SQL Server Setup Checklist.
Unless you’re on SQL 2000, don’t worry about scheduling log backups during full backups
Log backups during full backups won’t truncate the transaction log
You want to keep taking log backups in case your full backup fails
The first time I ever set up backups
Was, unfortunately, using a maintenance plan. All of the databases were in simple recovery. It was largely used for staging and tenderizing data during ETL. No log backups need apply.
Fast forward a bit, and I’m setting up backups for a server where losing 30 minutes of data could set a project back several hours. We’re now hiring log backups.
So there I was, dutifully creating extravagant maintenance plans, pondering the miracle of the differential backup, and the grand eloquence of log backups. They were running every 10 minutes, those log backups.
Every 10 minutes.
Even during full backups.
I’m a developer and what is this?
- What is SQL going to do with those?
- Do I have to restore them?
- Should I pause log backups during full backups?
- Will this break something?
- How much Laphroaig do I have to drink to forget I thought about this?
This was confounding to me. So I did some digging. Back in the SQL 2000 days, this could have gotten weird. But I was, thankfully(?) on 2005. Heh. Yeah. 2005. I know.
After 2005, you can totally, absolutely, 100% take log backups during full backups. You can take them and nothing will break and you won’t have to restore them (unless your full backup fails).
Wanna see a demo?
Of course you wanna see a demo. This one is a little more complicated than usual. It will require several-SSMS-tab technology.
ALTER DATABASE [StackOverflow] SET RECOVERY FULL BACKUP DATABASE [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow.bak' WITH CHECKSUM, COMPRESSION, STATS = 1
I usually keep my SO database in simple, because I do horrible things that I don’t want to fully log. Fun fact: if you switch from simple to full recovery model and don’t take a full backup, you’re basically still in simple recovery. You should think about that for a minute and then take steps to avoid getting fired. Backing up SO takes me about 3.5-4 minutes.
In another window, kick off some log backups 1 minute apart. Note that doing this won’t overwrite log backups, it will stack them all within a single file.
You can verify this behavior by running the RESTORE HEADERONLY command at the end of this block. If you want to restore a particular backup out of a file with multiple backups in it, you use the position column and specify it with FILE = [n], which you can read more about here.
BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM, INIT, FORMAT WAITFOR DELAY '00:01:00.000' BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM WAITFOR DELAY '00:01:00.000' BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM WAITFOR DELAY '00:01:00.000' BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM WAITFOR DELAY '00:01:00.000' BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM WAITFOR DELAY '00:01:00.000' BACKUP LOG [StackOverflow] TO DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn' WITH COMPRESSION, CHECKSUM RESTORE HEADERONLY FROM DISK = 'D:\Backup\Crap\StackOverflow\StackOverflow_log.trn'
Just so you know I’m not pulling any shenanigans, let’s generate some log activity. This will dump 100 rows into a dummy table every 30 seconds. It is neither pretty nor elegant.
USE [StackOverflow] IF OBJECT_ID('dbo.YouBigDummy') IS NOT NULL DROP TABLE [dbo].[YouBigDummy] CREATE TABLE [dbo].[YouBigDummy] ( ID INT IDENTITY(1,1), [Name] VARCHAR(50) DEFAULT 'You', CreateDate DATETIME DEFAULT GETUTCDATE() ) INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO INSERT dbo.[YouBigDummy] ( [Name], [CreateDate] ) DEFAULT VALUES GO 100 WAITFOR DELAY '00:00:30.000' GO TRUNCATE TABLE [dbo].[YouBigDummy]
When that’s all done, you can run something like this to see what happened. You’ll probably have to replace the date. I wrote this, like, two or three weeks ago by now.
SELECT [b].[database_name] , [b].[backup_start_date] , [b].[backup_finish_date] , [b].[type] , [b].[first_lsn] , [b].[last_lsn] , [b].[checkpoint_lsn] , [b].[database_backup_lsn] FROM [msdb].[dbo].[backupset] AS [b] WHERE [b].[database_name] = 'StackOverflow' AND [b].[backup_start_date] >= '2015-12-02 17:30:00.000' ORDER BY [b].[backup_start_date];
When you take a full backup, the first thing it does it issue a checkpoint. That’s why the full and all subsequent log backups have the same checkpoint LSN. The first four log backups all have the same database backup LSN because they occurred during the full backup. That doesn’t change until the full is done.
For toots and snickers, I ran this all a second time, and cancelled the full backup halfway through. The full backup issued a new checkpoint, so the checkpoint LSN changes, but the database backup LSN never changes, because it got canceled. That means taking log backups during full backups is totally useful. If your full backup fails for whatever reason, these things keep the chain alive.
If the third time is to be a charm, and it is, the same thing occurs as the first run. New checkpoint LSN, and the database backup LSN runs through until the backup finishes. You can verify that by looking at the start and end times columns.
If you still don’t believe me
Just look at sys.databases while you’re running a full backup.
SELECT [name], [log_reuse_wait] , [log_reuse_wait_desc] FROM [sys].[databases] WHERE [name] = 'StackOverflow';
The result is acknowledged, though not documented, here. This does indeed mean that log truncation will not occur during a full backup even if you take log backups. It will happen when you take the first log backup after the full finishes. You may want to consider this when scheduling maintenance items that may hog up log space alongside full backups.
Backups are beautiful things. You should take full ones, and probably differential ones, and if you’re in full recovery model, definitely log ones. How often you take them is up to you and your boss. Or maybe their boss. But it’s definitely not up to you. Unless you’re your boss.
Log backups during full backups won’t hurt anything, and may end up helping things if your full backup fails, and you need to restore something.
Log backups during full backups will not truncate the log. That has to wait until the first log backup after the full finishes.
HW – if all you do is rack & stack hardware, you can manage thousands of boxes. You see this at Facebook/Google type scales.
OS – when you’re responsible for troubleshooting operating system problems (as opposed to simply re-imaging the box from scratch), you can manage hundreds of boxes or VMs.
SQL – if you have to install, configure, and troubleshoot standalone instances of SQL Server, you can manage 50-100 instances.
HA/DR – when you have to design, build, and troubleshoot high availability features like clustering, Availability Groups, replication, etc, then you can manage 10-50 SQL instances. (Here, it’s important to know that we’re talking about boxes critical enough to have a high availability solution.)
PERF – if you have to tune queries and indexes for performance, you can do a great job on 1-5 instances of SQL Server. Doing a good job of performance tuning means understanding the underlying data model of the databases you’re tuning, plus having a good relationship with the application developers.
These numbers aren’t absolute: tools like training and automation can raise them, and a lack of those tools can make matters worse.
Brent says: remember a couple months back when we announced that we were hiring a SQL Server Triage Specialist? We’re really proud to welcome Angie Walker to our funky little medical team. When you bring us in for a SQL Critical Care®, Angie might be the first person you work with to make sure the server’s in a safe place. Let’s hear how she got here….
Angie says: WHERE TO START…?
As with many IT professionals I’ve met, I didn’t intend to find a career in IT, but took when an opportunity came along a few years back, I jumped at the chance. After a couple of years, I moved to a Junior DBA position and I’ve been learning SQL Server ever since. When I first started, I couldn’t tell you what a database was. I remember the first time I forgot the WHERE clause on an UPDATE statement, blindly deploying code into production, and asking why we take backups.
The first session of the first SQL Saturday I attended was so packed, I sat on the floor, and watched Kendra present “How to Find Bad Queries, Fast”. I didn’t know what she was talking about half of the time (What’s Perfmon? Activity Monitor is the graphs, right? She has pink hair!), but I hoped that if I took enough notes, I might be able to absorb some of what I heard. And I keep going to SQL Saturday and SQL user groups for that exact reason. I might not understand today, but one day it might click when I see the issue or feature firsthand.
The world of SQL Server is vast, murky, and sometimes scares me, but knowing where to look for help has been vital. I’ve been following Brent Ozar Unlimited for almost three years; signing up for the Accidental DBA 6 Month Training Plan, taking advantage of Kendra’s “DBA Job Interview Question and Answer Kit” for my first DBA interview, trying (and failing) to score killer Black Friday deals last year, and of course, following the blog every day.
So when I saw the open Triage Specialist position, I knew I HAD to apply. The morning I woke up, checked my email, and saw Kendra Little was emailing me, I was bouncing around as if I drank three cups of coffee. I mean, I read the blog and use the tools, and even though I know they might be real people, the possibility to work alongside some of the most well-known folks in the business seemed too good to be true! When I met with Kendra she was so nice and I thought it went really well, but when she said it would take about two weeks to hear back I had to stop myself from thinking that she was trying to let me down easy. I didn’t know just how badly I wanted the job until I got the invite for a second interview – It didn’t matter that I didn’t know everything about SQL Server, I have to learn it, and who better to learn from than Brent Ozar Unlimited?! So I prepared myself for the interview with Kendra, Jeremiah and Brent… And I must have done something right, because here I am!
I’m super excited for this new adventure as the newest member of the Brent Ozar Unlimited team!
SQL Server 2012 SP3 has just been freshly released! Not only does this release contain bug fixes, it also contains a slew of performance tuning related features.
Many of these features are about memory grants. This is NOT the memory used in the buffer pool/ data cache by the query — it’s the memory also known as Query Workspace Memory. Start reading more here.
Check these out:
KB 3107398 – Improvements to sys.dm_exec_query_stats
This is kind of a big deal. The sys.dm_exec_query_stats DMV will now report on total, last, min and max statistics for granted workspace memory, ideal workspace memory, degree of parallelism, and reserved threads. These columns are in SQL Server 2016 CTP3, but it’s really awesome that they’ve been backported town to 2012!
KB 3107397 – “Actual Rows Read” Added to Query Execution Plans
This is really exciting! It’s hard for people to read execution plans. You may have a very large index seek or scan that reads a ton of rows, and it has a hidden filter in the seek or scan. But it may be hard to diagnose because you only see the number of rows to come OUT of the seek or scan operator. This lets you know not only the rows that exit the operator, but how many it had to consume, too. It’s sorta like having a little bit of STATISTICS IO in your execution plan!
KB 3107401 – New Query Hints for min_grant_percent and max_grant_percent
Previously, if you identified that a query was asking for a problematic memory grant, your options were pretty limited – code changes, index changes, resource governor, or server wide settings. These options let you
shoot yourself in the foot tune this just for a single query. Check out the KB– there is a safety in the feature. If the minimum required grant is higher than max_grant_percent, it will still get the minimum.
KB 3107172 – hash_spill_details Added to Extended Events to Diagnose tempdb Spills, Improvements to sort_warning and hash_warning
Ever seen sort or hash spill warnings in execution plans and wondered how much of a difference they were making to the runtime of the plan? That may be less of a mystery with these improvements.
KB 3107173 – query_memory_grant_usage added to Extended Events
This XEvent allows tracing for the ideal memory grant for queries, what they actually got, and the degree of parallelism they used.
What About SQL Server 2014?
I don’t believe any of these are currently available for SQL Server 2014, but looking at past release patterns it’s likely they’ll be available soon.
PSA: Don’t Forget to Test Your Service Packs
Even the best laid plans can sometimes go awry.
We sometimes help clients interview DBAs and Developers for their team. Requirements vary depending by position, but sometimes it’s important to test a candidate’s TSQL skills.
While I can test candidates thoroughly on strategies for TSQL (when they’d use certain constructs, the pros and cons of various approaches), if what you care about is skills writing complex TSQL there’s one surefire test: a lab.
The first step is to pick your dataset and environment, then dream up the TSQL query you’d like your candidate to write. Next, test it with one of your own staff members and make sure the problem is suitable for the job you’re trying to fill.
But you may not want to give candidates access to your production, or even development data — after all, they don’t work for you yet.
Good news, there’s an easy way to do this online! Check out the Stack Exchange Data Explorer. You can assign the candidate a query to write using this tool online.
How to make it work:
- This only works for in-person interviews. Because human nature.
- You’re going to need a moderator and at least 30 minutes, because the candidate will need to figure out the schema.
- A history of queries is kept in the tool. You’ll need to let them know they can’t search that history, and it’s up to you if you’re going to let them use Books Online (hey, we all use it in the real world).
If you don’t want to deal with the hassle of the query history online, you can always download a copy of the Stack Overflow database and set up your problem on a locally installed test instance of SQL Server, too.
The secret test: if your candidate likes writing TSQL, they’re going to love the challenge and it will energize them.
Query Store is so cool
Billed as a flight data recorder for SQL Server, the Query Store is a repository of execution plan information, like the plan cache, except a bit more actionable. And it has a GUI.
You can read all about what it does and what you can do with it around the internet. You can be suitably impressed and enchanted by the promise of data that’s persisted between restarts, being able to quickly and easily address plan regression issues, and so forth.
I’m here to bum you out when you’re done with all that.
You can’t use it in tempdb
On tempdb? Whatever.
What? Really? With all the stuff that goes on there? Aside from the many selfish demo reasons I have for wanting it, I know lots of people who make tempdb their default database for user logins so they don’t go messing things up in the real world, or more fragile system databases. I’m also aware of several vendor apps that dump users into tempdb and then use fully qualified names to point them to application data.
I found this all out by accident. I had just nabbed CTP 3 of SQL Server 2016, and was settling in for an evening of staring contests with execution plans and new features.
tragedy struck I got an error message:
ALTER DATABASE tempdb SET QUERY_STORE = ON; Query Data Store cannot currently be enabled on tempdb Msg 12438, Level 16, State 1, Line 1 Cannot perform action because Query Store cannot be enabled on system database tempdb. Msg 5069, Level 16, State 1, Line 1 ALTER DATABASE statement failed.
And yeah, it’s an official no-go
I’m all disillusioned and stuff, man.
This is going to be frustrating out in the wild for a lot of people. Not just fancy-
pants camo shorts consultants who like to make demos where they treat tempdb like it owes them money. Real people. Who pay real money, for real servers.
Anyway, I started a Connect item to get this changed. Vote for it. Or don’t. Just don’t vote more than once, or if you’re dead.