The latest version of sp_Blitz™ alerts you if you haven’t been using the WITH CHECKSUM parameter on your backups. This parameter tells SQL Server to check the checksums on each page and alert if there’s corruption.
But what about corrupt backups? Books Online says:
NO_CHECKSUM - Explicitly disables the generation of backup checksums (and the validation of page checksums). This is the default behavior, except for a compressed backup.
CHECKSUM - Specifies that the backup operation will verify each page for checksum and torn page, if enabled and available, and generate a checksum for the entire backup. This is the default behavior for a compressed backup.
Hmmm, let’s see about that. In my SQL Server 2014 lab environment, I shut down my primary replica, then busted out the hex editor XVI32 to edit the data file by hand, thereby introducing some corruption on a clustered index.
After starting the replica up again, I ran a normal compressed backup:
BACKUP DATABASE [AdventureWorks2012] TO DISK = N'\\DC1\SQLCLUSTERA\MSSQL\Backup\AW20131202_248_NoChecksum' WITH NOFORMAT, INIT, NAME = N'AdventureWorks2012-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
The backup completed fine without errors – even though compressed backups are supposed to run WITH CHECKSUM by default.
Then I ran a compressed backup and manually specified the CHECKSUM parameter:
BACKUP DATABASE [AdventureWorks2012] TO DISK = N'\\DC1\SQLCLUSTERA\MSSQL\Backup\AW20131202_256_Checksum' WITH NOFORMAT, INIT, NAME = N'AdventureWorks2012-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10, CHECKSUM
That time, the backup stopped with an error:
10 percent processed. Msg 3043, Level 16, State 1, Line 4 BACKUP 'AdventureWorks2012' detected an error on page (1:3578) in file '\\dc1\SQLClusterA\MSSQL\Data\AdventureWorks2012_Data.mdf'. Msg 3013, Level 16, State 1, Line 4 BACKUP DATABASE is terminating abnormally.
Conclusion #1: Compressed backups don’t really check checksums. No idea if that’s a bug in the code or in the Books Online article.
But the plot thickens – this particular database is also part of an AlwaysOn Availability Group. One of the cool benefits of AGs (and also database mirroring) is that when one of the replicas encounters corruption, it automatically repairs the corruption using a clean copy of the page from one of the replicas. (After all, I didn’t use a hex editor on the secondary – only on the primary’s data file, so the secondaries still had a clean copy.)
After running the first backup (compressed, but no checksum), I queried sys.dm_hadr_auto_page_repair, the DMV that returns a row for every corruption repair attempt. The DMV held no data – because a backup without checksum doesn’t actually detect corruption.
After running the second backup (compressed, with checksum), I queried sys.dm_hadr_auto_page_repair again, and this time it successfully showed a row indicating which page had been detected as corrupt. However, the backup still failed – but why?
The clue is in the Books Online page for sys.dm_hadr_auto_page_repair – specifically, the page_status field’s possible values:
The status of the page-repair attempt:
2 = Queued for request from partner.
3 = Request sent to partner.
4 = Queued for automatic page repair (response received from partner).
5 = Automatic page repair succeeded and the page should be usable.
When I first queried the DMV, the page’s status was 3 – request sent to partner. My primary had asked for a clean copy of the page, but because my lab hardware is underpowered, it took several seconds for repair to complete. After it completed, I ran the backup again – and it completed without error.
A few things to take away here:
- Automatic page repair is automatic, but it’s not instant. When you’ve got corruption, a query (or backup) can fail due to corruption, and then magically succeed a few seconds later.
- Unless you’re doing daily DBCCs (and you’re not), then as long as you can stand the performance hit, use the WITH CHECKSUM parameter on your backups. Just doing compression alone isn’t enough.
- No, I can’t tell you what the performance hit will be on your system. Stop reading blogs and start doing some experimenting on your own.
SQL Server has this fancy feature called constraints. Database constraints are used to restrict the domain (the range of allowed values) of a given attribute. That’s just a funny way of saying: through a set of carefully crafted rules, we control the shape of our universe.
Our Test Table
We’re going to be testing with the following tables:
CREATE TABLE sandwiches ( id INT IDENTITY(1,1), title VARCHAR(60), price MONEY );
A unique constraint guarantees that one and only one row in the table can have a specific value. If our application requires there is only one sandwich called “The Super Big Elvis”, we need some way to make sure that the database can’t contain two “The Super Big Elvis” sandwiches.
A naive approach would involve application code checking the database for the existence of data. This approach has several problems:
- Every application that connects to the database must either duplicate this logic or else use a centralized service to check.
- There’s no guarantee that another application won’t check for data and save faster.
Instead of requiring our applications to handle this integrity, we can push the uniqueness requirement down to the database using a unique constraint:
ALTER TABLE sandwiches ADD CONSTRAINT UQ_sandwiches UNIQUE ( title );
There’s one important gotcha with SQL Server: only one
NULL value is allowed per column (or set of columns) in the unique constraint. If you had a table with two columns,
b, there are only three possibilities for rows that involve
If you need a uniqueness constraint to ignore
NULL values, then you’ll need to create a unique filtered index like this:
CREATE UNIQUE INDEX UX_customers_full_name ON customers(first_name, last_name) WHERE first_name IS NOT NULL AND last_name IS NOT NULL;
Now we can at least make sure that the uniqueness applies to people with both a first name and last name.
The Primary Key Constraint
A primary key is a logical construct – it’s a set of attributes that cannot be duplicated by any other row in the table. The primary key doesn’t have to be a single column, it can be a composite key – as long as the columns uniquely identify a single row it’s a valid primary key.
Don’t confuse a primary key with a clustered index, though. In SQL Server, the default behavior is to create a primary key as a clustered index – the clustered index defines the physical order of data in the table. This has led to a slew of iffy advice like “never use natural primary keys” or “don’t use GUIDs as primary keys”.
When thinking about primary keys as constraints, concern yourself first with identifying the unique characteristics of the rows for your application. Follow that up with physical optimization decisions once you’re ready to implement the physical model.
ALTER TABLE sandwiches ADD CONSTRAINT pk_sandwiches PRIMARY KEY (id);
Aww yeah, we’ve got a primary key. The default behavior for SQL Server is to create a clustered index under the PK if one doesn’t exist already. If you need to use a different index as the clustered index (e.g. the PK is a GUID and the clustered index is an integer) just make sure that you create the clustered index before you create the PK or that you specific the primary key as a non-clustered index:
ALTER TABLE sandwiches DROP CONSTRAINT pk_sandwiches; ALTER TABLE sandwiches ADD unique_id UNIQUEIDENTIFIER DEFAULT(NEWID()) CREATE UNIQUE CLUSTERED INDEX CX_sandwiches ON sandwiches(unique_id); ALTER TABLE sandwiches ADD CONSTRAINT pk_sandwiches PRIMARY KEY NONCLUSTERED (id) ;
What do PKs buy us? A primary key gets you the same thing as a unique constraint with the added benefit of saying, “This is the primary identifier for this row. Other attribute combinations may be unique, but this combination of attributes identifies this entity in the database.”
Check constraints give you more flexibility than the other forms of constraints. Both unique constraints and primary key constraints operate on a list of columns that we supply – they are limited to simple combinations of columns. Check constraints, however, give you something more.
On the surface, as check constraint is just as simple as any other constraint – the statement in the check constraint has to evaluate to
true. Seems simple, right?
The upside of this approach is that “any statement” means that any valid T-SQL statement can be part of the check constraint… so long as the T-SQL statement doesn’t rely on access to another table. So, we can do something like this:
CREATE FUNCTION fn_check_sandwich_name ( @sandwich_name VARCHAR(60) ) RETURNS BIT AS BEGIN IF LTRIM(RTRIM(@sandwich_name)) = 'Big Mac' RETURN 0; RETURN 1; END;
We want to avoid lawsuits with a certain fast food chain, so this function should keep us safe. In order to use the function a check constraint we can just do the following:
ALTER TABLE sandwiches WITH CHECK ADD CONSTRAINT CK_sandwich_name CHECK (dbo.fn_check_sandwich_name(title) = 1);
Let’s try it out:
INSERT INTO sandwiches (title, price) VALUES ('The Big Ozar', 12.88); INSERT INTO sandwiches (title, price) VALUES ('Big Mac', 2.99);
The first row will successfully inserted into the
sandwiches table. The second insert attempt will fail – the
title matches the failure condition in the function
There’s at least one scenario that this doesn’t catch:
NULLs. With code in its current state, we can insert
NULL in the
title column of
INSERT INTO sandwiches (title, price) VALUES (NULL, 9999.99);
Functions used as check constraints should explicitly check for
NULL inputs; without taking
NULL into account, it’s possible to create a check constraint that will allow incorrect data to end up in a table. Although these data rules should be pushed down to the underlying schema where it’s easy to declare a column as
NOT NULL, any check constraints that operate on incoming data need to make sure they account for
About That World…
Using a combination of table constraints, we can build a complete understanding of the totality of the data in the database without having to execute a single query. Constraints let us push universal rules about the total state of data down into the database where all applications can take advantage of them and where those rules only need to be maintained in one place.
At SQL Rally Amsterdam, I sat down with Matan Yungman for a half-hour interview for the SQL Server Radio Podcast. The podcast is usually in Hebrew, but Matan was nice enough to let me stick with English, heh.
Matan and I had a great time. He came with a list of really good questions, and he even threw some of our own interview questions back at us, hahaha.
During the show, he also interviewed Adam Machanic and Denny Cherry. I think they did it in Hebrew, but I’m still waiting for the download to finish to find out for sure. (I do know that my interview went in first though – I started listening to it and the buffering was just a little on the slow side here in the US.)
You can listen to the podcast here.
As a DBA, you constantly seek ways to optimize database performance. One option is TempDB, a widely misunderstood system database that has a critical impact on how your SQL Server performs. Learn more about this option in this 20-minute video that covers what TempDB is for, how SQL Server uses it, how best to set it up, and how to tell if it’s a performance problem:
You can learn more on our TempDB resources page.
It seems a like a stretch. After all, plumbers get called when sinks are running backwards and when toilets and drains are clogged. But, really, the users and developers do the same thing with DBAs that you do with a plumber – when something is broken, we get the call.
Unclog Your Database Fast
The main thing that plumbers do is fix immediate problems. When this is your job, you have to focus on diagnosing the problem, understanding the ramifications of different solutions, and coming up with the right solution. In databases, as in plumbing, we’re almost never called on to fix problems when we can do this at our leisure – there’s a mess that needs to be cleaned up quickly.
The first thing that we need is a set of tools. Don’t go scrolling down looking for scripts, you won’t find any. The first tool that you need is the ability to listen.
You might recall when Brent had to call a plumber to fix his garbage disposal. Based on Brent’s description, the plumber had a good idea of the problem and knew that there was a simple solution: just give the disposal a good crank. Afterwards, the plumber gave Brent advice on how to keep the problem from happening again. Brent happily wrote a check and the plumber made $1000 an hour for those 10 minutes. The plumber was able to use his listening skills to quickly diagnose a problem, pick the right tool, solve the problem, and leave the customer with tools to prevent the problem in the future. As a plus, by listening, he knew that he didn’t have to bring heavy tools upstairs, just a tiny wrenchette.
As a DBA, you need a similar set of tools – when there’s a problem you need to be able to understand what the problem is, come up with a solution, implement the solution, and watch to make sure things are working correctly. But you can’t just focus on the immediate problem; take some time after you’ve solved the immediate problem to focus on the system as a whole.
The Bigger Picture
While you’re busy unclogging an index, have you ever found yourself saying “You know, if this database design was different, this wouldn’t be an issue”? You’re not alone – your plumber wants your kids to stop flushing toys down the toilet. There’s always something else that needs to be fixed; whether it’s another quick fix or a larger architectural change.
Plumbers also have a different job: sometimes a plumber gets handed a set of plans for a building and is told to make it work. It doesn’t matter that the right parts may not be on hand, there’s a deadline. Walls need to be put in place and if the plumber isn’t done on time, the entire schedule is going to slip. In these situations, the plumber has to deal with the constraints of reality. There are some things that you can do without all of the parts in place – you can put plumbing in place for a washer and dryer without the appliances being on site. But you can’t put the washer and dryer in place and expect them to work without any plumbing.
When you’re designing a new system from the ground up, you don’t have the luxury of taking all the time in the world to build things The Right Way™. You focus on filling the customer’s specs with the equipment that you have on hand. Don’t have particular type of pipe fitting? You’ll either have to wait or devise a solution. This is how things end up different in the real world from the way they were devised on paper – implementation details are frequently more than just minor details. Those database architects who really screwed things up? They weren’t incompetent, they just had a deadline.
You call your plumber to fix clogs that are happening right now. Likewise, management is usually bringing you in because there’s a clogged database. You’re there to fix a problem (unclog the database) and get back to your regularly scheduled day. It’s tempting to bring up all of the design flaws but, unless you’re asked, you probably shouldn’t bring it up. Write them down, keep a list of database design issues in your wish list. You’ll feel better about it.
While you’re writing down all the big problems that you noticed in the database, make sure you write down how this problem could have been prevented. If it’s conducting stress tests to find missing indexes, talk to the development team about the tools they can use. If the developers need some query tuning help, get in there and help them with their query writing skills.
Fixing the Problem with Great Service
Users, management, and developers don’t want to know what they’re doing wrong. They want to know how it can be made right. Sure, there are problems; nobody is happy about it, but the users know that everything can’t be made perfect.
When you help the users make things better, show them the problem. Take the time to explain why it’s a problem – “the dimensions of this toy don’t mesh with the dimensions of your plumbing, it’s only natural that it get stuck”. Then explain how the problem can be prevented going forward.
Great plumbers don’t just explain how to solve a problem, they deliver great customer service while they’re doing it. A great plumber doesn’t make you feel like an idiot because your kids decide to have a toy soldier flushing contest. A great plumber fixes the problem, tells you it happens to everyone, and makes a joke with your kids about how flushing toys is how he became a plumber.
What About the Long Term Problems?
Every time you look at a system, write down all of the long term fixes that should be put in place. Keep a list of them around and build out a plan of attack for the best way to fix it with minimal disruption. If you do a good job of solving the immediate problems, there’s good chance that someone is going to bring you back to fix problems in the future.
Don’t think you need to be silent, though. Tell the users about things you can easily fix right now. If it isn’t much effort to add an extra index during an outage window, suggest the index and go on your merry way. The last time I had a plumber over, he suggested that I throw ice in the disposal once a week in order to keep the blades sharp and to free up any gunk that was floating around in there. Did he try to sell me another disposal? No. Is that plumber going to get my business the next time there’s a problem? You bet.
You’re a plumber. Be happy that you’re unclogging a drain today. Solve the problem you’re brought in to solve and there’s every chance you’ll get to implement the plumbing the next time around.
When we started building our classes for next year, we started by following our own advice. We wrote detailed attendee biographies defining exactly who should attend, what pains they were facing, and how they would use the training to solve the pains when they got back to the office.
We’re Guilty of Profiling
We got really detailed with the attendee bios – how many years they’d been working with SQL Server, what topics they were comfortable with, what topics they hadn’t even seen yet, what kinds of user group sessions they’ve attended, and more. Then we boiled down the attendee profile to a single sentence for each class:
How to Be a Senior DBA Class – You’re a DBA who is ready to advance to the next level in your career but aren’t sure how to fully master your environment and drive the right architectural changes.
SQL Server Performance Troubleshooting Class – You’re a developer who needs to speed up a database server that you don’t fully understand.
I love this kind of profiling. For example, with the DBA class, we want the attendee to read the sentence and go, “Wow, that’s me – I’m a DBA and I don’t feel like I’m driving the architecture changes we need. In fact, I think the entire company is the one driving me, dictating what we’re doing, and I don’t agree with the direction we’re taking. How can I turn things around so that I’m the master of our database strategy?” So many DBAs feel like they’re constantly behind the 8-ball, being reactive instead of proactive, and they’re not happy about it.
With the performance troubleshooting class, we want the reader to say, “Whoa, that’s me – if I don’t get this app running faster, nobody else is going to do it for me, and I don’t even know where to start. I’ve tried adding indexes, I’ve tried adding NOLOCK, and I know how to run Profiler traces, but it doesn’t feel like we’re making real, significant progress.” Most of the developers I work with aren’t suffering from a lack of information – they’ve got an overwhelming flood of information, and they’re not sure which parts to believe and which parts to prioritize.
Then We Thought About Our Clients
We went back through our client findings from our SQL Critical Care™ sessions, and we categorized them into three piles:
- Shops with a loner DBA totally qualified to become a senior DBA, but just didn’t have any in-house mentoring
- Shops with developers who were facing performance issues, but couldn’t justify a full time DBA (and many of these happened to be software vendors or software-as-a-service companies)
- Shops where neither applied
The last day of our SQL Critical Care™ sessions is a set of customized training modules that cover how to get fast relief for their SQL Server’s pain points. We looked at the most common pain points for each of these categories, and then built the module list for our public training:
How to Be a Senior DBA Modules
Here’s some of the modules we picked, and why:
Architecture Design for RPO/RTO – most solo-DBA shops don’t have enough variety to really get exposed to all of the different HA/DR options. We talk about the real-world pros and cons of each method so you can get a sanity check about whether you should build a cluster, use log shipping, or jump into AlwaysOn AGs.
The Secret Meaning of Performance Counters – you don’t have enough time to gather every metric and read the tea leaves to figure out what it means. We feel your pain; we have to rapidly performance tune slow SQL Servers too, and we’ll share our favorite counters and shortcuts.
What Every Senior DBA Must Know About Failover Clustering – this is the single most common high availability feature that we end up recommending to clients. It had a horrendous reputation years ago due to complexity and shakiness, but it’s better these days – as long as you follow some important rules.
Pushing Boundaries: Stories from the Field – You can learn real-world solutions from some of the crazy outlier clients we’ve got. We’re talking thousands of databases per server, tens of terabytes in a single data file, and over 100k batch requests per second, and yet there’s tips that can help you with your “normal” servers.
SQL Server Performance Troubleshooting Modules
How to Think Like the Engine – yes, you can buy a video version of this class for $29, but we still use a condensed 45-minute version of it to kick off our troubleshooting class because it’s so important to get you into the frame of mind of SQL Server’s internals. We’re going to be talking about indexes, wait stats, sargability, and other related concepts throughout the following 3 days, so we’ve gotta get you started off right.
Why NOLOCK Isn’t Enough – you’ve already peppered all your queries with NOLOCK, but you’re still facing performance problems. Picking the right isolation level can make all the difference, but you need to know how to plan for it and test it.
X-Ray Glasses for SQL Server Indexes – did you pick the right clustering key? If you didn’t, how is it impacting performance? There’s a lot of advice out there about whether you should swap out GUIDs as your clustering key, and we’ll show you how to get the right answer for your app.
Developer’s Guide to Dangerous Queries – when you’re looking at T-SQL and a query plan, how can you tell if it’s harmless or horrific? It’s not enough to just look at estimated costs, either. You’ll learn how to spot the queries that are really giving the SQL Server engine fits, and how to reshape them.
Data-Driven Index Changes – after you’ve changed indexes, how can you numerically prove that things have gotten better? How do you know when to stop? We’ll give you tools to document it quickly.
The Result: Our 2014 Class Lineup
We spent hours planning out the modules, setting how much time we needed per module, putting them together around break schedules, and making sure they flowed together well. I’m really surprised by how much work this takes to do it right!
We went live with our training class pages, and then recently updated ‘em to make it more clear how the scheduling works. Go check it out now and let us know what you think of the updated pages.
SQL Server learning materials seem to live at two extremes. Blog posts are short and to the point, but they don’t cover material in depth. Books are long and detailed, but to quote a famous philosopher, ain’t nobody got time for that.
Enter two resources that have been out for quite a while. They aim to cover subjects from start to finish, but in a way that you can digest in an hour.
Microsoft Books Online (Yes, Books Online!)
Microsoft Books Online earned a terrible reputation for being the last place you’d ever want to look for help. Look up the sys.databases view, for example, and you get gems like this:
No context, no hyperlinks to more details for a particular setting. These aren’t even new settings – they’ve been out for years. Since at least SQL 2000, there’s been a Books Online page for ANSI NULLS that they could have linked to.
However, when you weren’t looking, Books Online went to finishing school. It’s now chock full of great explanations of concepts. When you’re looking to implement a feature, check out these explanations:
Books Online still tends to focus on features rather than tasks. For example, if you need to find out why your server is slow, heaven help you if your only resource is Books Online.
SQL Server Central Stairways
SSC’s Stairways series covers topics start to finish with 5-15 tutorials from one or two authors. I love the consistency on these – you can settle in with one author and really dig into a topic with a logical flow. Think of it as an interactive book chapter, often with lots of demos you can run to illustrate concepts.
They’ve added stairways for T-SQL, indexes, transaction logs, PowerShell, replication, SSRS, and other good foundational topics. The existing stairways keep getting better as the authors add more posts.
Because why not?
Starting at midnight Eastern time (5AM GMT) on Friday, November 29th, we’ll post a new sale every four hours from our @BrentOzarULTD Twitter account. To give you an idea of what kinds of deals to expect, here’s the first one:
At midnight, all of our in-person training courses will have exactly two tickets available for just $495. First come, first serve.
No trampling, people. Stay safe out there. Good luck!
Midnight Deal Update: Congratulations to Anne Hills, David Yard, Jeff Willett, Kiran Makkapati, Mark Parfrey, Oleg Bulay, Sam Bryant, and Sam Sparks for getting in on $495 training class tickets. Several of ‘em even bought two so that they could attend back-to-back classes in the same city – nice move, because those tickets went quick!
4AM Deal Update: Congratulations to Brian Han, Brian Moore, John Crawshaw, Manoj Badamikar, and Tae Jin Kim. We posted a 50% off coupon good for any online video course, but only for the first 5 takers, and they were the quickest on the draw.
8AM Deal Update: Coupon code BlackFriday200off was good for $200 off our in-person training classes until noon Eastern. Congrats to the buyers!
Noon Deal Update: Coupon code BlackFridayFree29 got Andre Ranieri, Brian Han, Brian Hendrickson, Eric Klovning, Javier Castrillon, Lily Chiu, Manoj Badamikar, Mark Wilkinson, and Unai Garcia Herguedas a free $29 credit in our training videos.
4PM Deal Update: For just one hour, if you register for the 2-day How to Be a Senior DBA class at regular price ($1,395), you get our 3-day SQL Server Performance Troubleshooting class in that same city for free. That means for $1,395, you get a full week of training class goodness! Register for the city you want, and after we’ve received your payment confirmation, we’ll send you registration information to sign up for the 3-day class for free. This is a manual process that won’t show anything special during your 2-day class registration process. You must register between 4PM and 5PM Eastern (GMT -5) today to be eligible for this deal, and travel/hotel/expenses are still your responsibility. Read about the classes, or register for the 2-day class in San Diego (Feb), Chicago (May), or Philly (Sept).
5PM Final Deal Update: We got such a good response to our Black Friday deals that we’re launching the last one now and leaving it up for the rest of the day. Until midnight Eastern time today, all our in-person training classes are half off. Enjoy!
Thanks for spending Black Friday with us. See you in San Diego, Chicago, and Philadelphia. Enjoy the holiday weekend, and thanks for having fun with us on Black Friday.
The best way to improve your skills as a database professional is to practice. If you’re like me, you might even take that practice outside of the workplace and do a bit of homework on the side. You can’t take a copy of your production database home with you, so what can you do? Thankfully, the fine people Microsoft have put together a set of databases that you can download and play with.
Introducing the AdventureWorks OLTP Sample Database
AdventureWorks is a transactional database for a fictional bicycle and outdoor supply company. It’s not very exciting data and it’s not terribly big data, but it gives developers a highly normalized schema to work with for testing different querying techniques. How Microsoft managed to cram all of the relevant SQL Server features into a 215MB database is beyond me. All that aside, AdventureWorks is the first place that database professionals should go to get started practicing their skills.
There are a number of downloads available for AdventureWorks, but only a few that matter to the OLTP crowd. To get started, database professionals should download one of the following:
- AdventureWorks 2012 Data File – this is just an MDF file. You’ll need to drop it into a folder and tell SSMS to create a log file.
- AdventureWorks 2012 data and log file – there’s an MDF and an LDF zipped up in a file.
- AdventureWorks 2012 OLTP Script – these scripts will create AdventureWorks from scratch.
For a local database, go with one of the first two options – just drop the files in their appropriate folders, attach the database, and then you’re off to the races. If you want something for Azure, use the script file to create the database and deploy data.
Protip: There’s usually a CS option for each of these. The CS lets you know it’s a case sensitive option. If you want to deal with international databases, pick the CS version.
A full list of AdventureWorks OLTP options is available at http://msftdbprodsamples.codeplex.com/. There are some light weight data warehouse options available, but they are still relatively small. If you’re just getting started, those are a good option (especially since there’s an SSAS tutorial available).
Introducing the Contoso Data Warehouse
Let’s say you want to do something a bit more adventurous and you want to branch out into the bigger BI world that Microsoft has to offer. The AdventureWorks data sets are a great starting place, but the data set size doesn’t pose many challenges.
The Contoso Retail DW data set is several times the size of AdventureWorks and comes as a pre-built star schema. While it’s not the biggest database (my copy is around 1.6GB), ContosoRetailDW provides a large enough data set where aspiring database professionals can really start to push the limits of a local SQL Server.
ContosoRetailDW is also a good place to try out Enterprise Edition features like table partitioning and ColumnStore indexes. The data set is large enough that it’s possible to make use of these features but it’s still small enough that you won’t need enterprise grade hardware to accomplish these things in a reasonable amount of time.
What Are You Waiting For?
Database professionals, this is your chance to start exploring the features and functionality that SQL Server has to offer. The sample databases from Microsoft give you plenty of opportunities to work with different techniques for both OLTP and data warehouse and even familiarize yourself with Azure without having to come up with your own data.
Our Hierarchy of Database Needs training email plan has been a lot of fun. Thousands of SQL Server professionals have signed up to get an email in their in-box every Wednesday for 6 months. It’s like a part-time college course – the one you should have been given when you first got shuffled into this DBA job thingy.
Now, we’ve taken some of the content and turned it into a free 38-page PDF ebook.
It starts at the base of Ozar’s Hierarchy of Database Needs, covering backups, security, and then moves up to capacity planning and performance.
It’s not meant to be doctoral-level – this is just the intro course that we all wish we’d have gotten before management started screaming at us about why the database is so slow. And expensive. And unpredictable.
It’s like a prerequisite of things we want to make sure people know before they move up to our training classes.
Let us know what you think, and enjoy!