Blog

Professional Development

The Women in Technology lunch and panel discussion at the PASS Summit last week had a great topic: negotiation. When it came time to Q & A, I knew exactly what topic I wanted to introduce.

How Should You Prepare to Ask for a Raise?

It’s funny this was my question, because I don’t need to ask for a raise. As a co-founder and co-owner of a business I am no longer salaried. Instead of asking for raises, I negotiate contracts and use the principle of least regret.

But I wanted to talk specifics about this topic. Why?

Over the years I learned the hard way about asking for a raise. First, it took me a long time to learn simply that I should ask. I’m not alone: according to this study men are nine times more likely to ask for more money than women. If you’re a woman, or are close to a woman professional, this is worth some conversations. When is the last time you asked for additional compensation? How did you do it? When do you think is the right time to ask again?

Once I started asking it took me a long time to learn techniques to support my request. Through trial and error I discovered there are relatively easy things you can do daily to show the value you bring to your company. This is important— for DBAs and developers, I find we need to outline a methodology before we get moving. Once we have a plan, we can move forward.

Today, I will share tips from panelists at the WIT lunch and introduce additional tips which have been key to my own success.

Use Your Performance Reviews

Karen Lopez ( blog|twitter ) emphasized using your performance reviews to your advantage. I agree with this strongly— and if you are one of the few people who don’t have a formal annual performance review, I think you should (gasp!) ask for one.

We all tend to dread performance reviews. We think of one thing: paperwork.

Here’s how to look at your performance review differently. Think of it as being about data. Your performance review is a chance to show data about the value you provide to your company. This is, in a sense, an annual resume. Your resume is better when it highlights facts about how you perform well on the job.

Here’s the type of data to include in your performance review:

I reduced the speed of processing daily ROI for PandaBears by 20%, ensuring that data is delivered to customers well within SLAs;
I identified ten customer impacting bugs and drove 8 to resolution;
I implemented monitoring for application availability across 100 production SQL Server instances;
I wrote and presented three brownbag training sessions for developers across departments on optimizing transact SQL.

Karen emphasized that you should describe not only how much work you did, but also quantify the impact of your work. If you identified bugs and drove them to resolution, how much time did this save customers? If you consolidated databases on SQL Server instances, how much money did this save the company over the next five years?

This requires a bit of research, but it has a silver lining. To discover the impact of your changes, you’ll need other people to help you. What would hosting and maintenance cost for those servers you decommissioned? Ask the team who manages your datacenter. How did resolving those bugs impact customers’ daily lives? Ask their account managers. Did anyone learn something in those brownbags you gave that they used in production code? Ask a sampling of people who attended and include explicit examples about how it was useful.

By asking these questions you will build stronger relationships with your coworkers. By talking about how you’ve made things better together you’re increasing the likelihood that you’ll be helpful to each other in the future.

Yanni Robel ( blog | twitter ) shared that she proactively keeps a weekly list of threes: the top three things she did, the top three things that weren’t done due to other priorities, and the three things she plans to address the next week. She shares this list with her management and uses it to track her own progress. This is a way in which she creates data she can review over time for her own performance.

To Do: Today, write down a list of three things you’ve improved in the last three months. Create a weekly half hour calendar appointment for yourself to expand on this list by adding items, adding facts, and adding impact statements. Each week, work to introduce more facts and think about ways you can measure your impact on the company and incorporate the data into your performance review and conversations about advancing your career.

Perform Health Checks

One of my favorite things about working on a SQL Server Health Check with clients is training them how to use the Health Check process to advance their careers.

When we perform a health check with a client, our chief objective is to identify the client’s pain points and design targeted, realistic solutions. We step you through a top-to-tails approach to your SQL Server environment and train you on the process: how to take the pulse of your system, how to interpret the results. As part of this process we baseline the system together.

This process is fully repeatable by you after the health check is over. The health check doesn’t just give you solutions, it gives you the tools you need to make things better and show the improvement with data.

Performing repeat health checks and baselining your system is an extremely powerful method you can use to connect with your management chain— and this goes for both DBAs and developers. This is data that your management team can use to show the value of your team and raise your profile in the organization.

By focussing on the data and providing metrics about improved performance, you become increasingly valuable and visible. This will lead you to great career opportunities, many of which have increased compensation.

Negotiate Your Rewards

There are times when a raise isn’t in the cards, and you should always be ready to ask for more than money. Think ahead of time about what may be useful to you. You can score in multiple areas, and you may win more than you expect.

What does promotion mean in your organization? In some organizations, promotions aren’t directly tied to huge raises, but may increase other benefits and lead to larger bonuses.

What training opportunities are available? What skills would you like to advance by being assigned to a particular project? Do you have an idea for a way to improve your system that you would like to have time to test in a lab? What improvements might that make?

There are many types of rewards you can use to advance your career in addition to direct monetary compensation. When you’re in a spot where monetary compensation is truly limited, you want to maximize the rest of these areas so your experience grows and your career continues to move forward.

When negotiating training and project work, keep a close eye on your time commitments. While you take on additional responsibilities you must also account for time to transition some of your older tasks to other people, or to automate those tasks so they don’t require daily work. Put this in a positive light and think in terms of creating a successful proposal for making good use of your time.

Takeaways to Take Home More

The key here is to remember a few things:

It’s important to ask for a raise or promotion. By doing this you are explicitly stating “I actively want to advance my career.” Don’t assume your management knows this about you.
Ask in a way that shows your strengths. Even if your request isn’t granted this time, take this opportunity to talk about your successes and strengths in a way that can open doors later on.
Know that “no” is not a final answer. As long as you’re taking action to increase your knowledge and advance your career, you’re moving in the right direction.
Value your time, and value time you can devote to learning.

Interview with #SQLPASS Presidents Rushabh Mehta and Bill Graziano

Last Updated February 12, 2017

PASS takes bloggers seriously.

This is the third year I’ve been offered press-style sit-down interviews with members of the PASS Board of Directors. On Friday, outgoing President Rushabh Mehta (@RushabhMehta), upcoming President Bill Graziano (@BillGraziano), and PASS Marketing’s Alison MacDonald talked with me for over an hour about PASS, the Summit, finances, membership, web sites, and beer.

Rushabh and Bill are perhaps two of the busiest guys at the entire Summit, and yet they made sure to connect, learn, and share with me. I can’t tell you how humbled I am about that. These guys really live the mantra.

But wait – it gets even better.

About 45 minutes into our discussion, Rushabh excused himself because he had to go to his session. This guy had actually taken time OUT OF HIS OWN SESSION TO TALK TO ME. His co-presenter handled the first half of the session while Rushabh talked to me. I was completely shellshocked. Ladies and gentlemen, I am not that important, but it tells you just how serious PASS is about community. That’s insane.

Every year, it seems like more and more events overlap. Several sessions compete for your attention, three or four parties happen every night at the same time, and there’s hundreds of people you want to talk to in the hallways. I thought my schedule was tough, but I can’t even imagine how difficult it must be for the PASS volunteers. I really salute these folks who give up their own enjoyment of the Summit just to make sure everyone else is taken care of. Big round of virtual applause for these selfless people.

Summit 2011: Best. Summit. Ever.

Talking to attendees and speakers, I kept hearing that this was one of the best SQL Server community events ever, and I completely agree. Board of Directors member Allen Kinsel (@SQLInsaneo) heard this a lot, too, but he followed up with a sharp question: why? Here’s what I heard:

First-timers were embraced. Last year’s introductory programs for first-timers had a rocky start, but this year’s looked really impressive. The Big Brothers & Sisters program helped introduce newbies to the community.

Session offerings were fantastic. In almost every time slot, I had at least two must-see sessions. The community is learning to write better abstracts, and the Program Committee is doing a great job of sifting through them. Attendees had plenty of great choices for pre-cons, too, and they responded by buying more than ever.

First-time speakers were well-received. Bill Graziano pointed out that a lot of local speakers are building their reputations at SQLSaturdays and SQLRally, and as a result, even first-time-speakers got good attendance numbers in their sessions.

How Do We Make the Summit Even Better?

Bill Graziano carries a little notebook around during the Summit to jot down suggestions. He thumbed through page after page and talked about some of the most interesting and practical suggestions he’d gotten from attendees. I’m going to touch on a few of them and give my own thoughts, not necessarily Bill’s or Rushabh’s – the Board has to put thought into these kinds of things, and they can’t leap to conclusions like a knee-jerk blogger like me.

Help the hearing impaired – hearing-impaired attendees had a tough time getting the most out of some sessions. (Not to mention those of us who couldn’t read the slides.)

Add a closing get-together – the Summit has a gradual taper-down close right now. People gradually leave throughout the day, with less and less people in each session, and there’s not a feeling of closure. For those of us who stay over Friday night, we could have one last networking event. I love this idea. In the past, there’s been an MVP networking event, but an open event makes more sense. Vendors probably aren’t going to throw last-night parties because their staff have all flown out on Friday.

Put the PASS logo at the bottom of the slides, not the top – attendees in the back of the room can’t see the bottom 10-20% of the slides because the projector screens are near the floor. You can’t just raise the screens – in many rooms they were already near the ceiling. You have to make the screen size smaller, and that isn’t good. Instead, move the PASS logo from the top of the slide down to the bottom, giving more content space at the top. Smart.

Put electric outlets on the pre-con desks. Attendees pay $395 to sit in a pre-con all day and learn, but they can’t take notes as easily if their laptops die within the first hour or two. At every break in my pre-con, the attendees made a mad dash for the wall outlets to plug their laptops in for a brief burst of resuscitating juice. I can almost understand the unreliable wifi – after all, it’s hard to satisfy thousands of geeks in a single room – but extension cords are easy. We need to handle this. One attendee even told me, “Idera is my new favorite vendor because they put fast phone chargers in the hallway.”

To read about more ideas, vote on them, and submit your own, check out Feedback.SQLPass.org.

How Do We Make the Keynotes Better?

This year will go down as one of the most (unintentionally) funny keynotes ever. BI projects made the amazing discovery that kids like frozen yogurt. Attendees made the awkward discovery that presenters wanted to utilize children. And nobody made any discoveries on screens covered with tiny fonts. When I recovered from the fits of giggles, I was pretty happy with the balance of marketing versus technical information when all three keynotes (Wednesday, Thursday, and Friday) were added together. There’s still some improvements to make, and I talked about these with Bill and Rushabh:

Microsoft keynotes need to embrace the community. PASS listens to what the community likes and jumps right in to join us. For example, Bill Graziano took the stage on #SQLkilt day wearing a kilt himself – but not the Microsoft speakers. A few of the Microsoft speakers were willing to poke fun at themselves (Denny Lee and Lara Rubbelke were a hoot), but otherwise, things felt pretty stiff.

Test the keynotes with tough critics. If Sean McCown, Aaron Bertrand, or I would have seen any of the decks or demos, we could have pointed out the font illegibility, pedophile hilarity, and Excel saturation ahead of time rather than having it blow up in public. Plus, by getting members of the community vested in the keynote itself, it turns us into cheerleaders rather than the cast of Mystery Science Theater 3000.

If you don’t have something timely, do something timeless. Microsoft didn’t really have anything impressive to unveil at the Summit – absolutely not their fault, because the Summit came at an awkward time in the release cycles for SQL Server and SQL Azure – but that doesn’t mean Microsoft doesn’t have anything cool to say. Microsoft has all kinds of brilliant thinkers who’ve done amazing things over their careers. Put Buck Woody up on the stage as a host and let him cycle through four people as they explain something they’re really proud of. Look at the number of people who chose to go to Bob Ward’s session on TempDB – that demonstrates how well-received a Microsoft speaker can be even when there’s no new feature announcements involved.

Offer choices. When a keynote starts to bomb, we can’t really vote with our feet – there’s nowhere else to go. What if we opened the expo hall earlier so that attendees could spend time with the sponsors rather than being stuck in uncomfortable seats listening to uncomfortable pitches? Competition breeds improvement, and right now the keynotes feel a little monopolistic. If Microsoft knew people had real choices, I bet keynote quality would improve dramatically – especially after the first year of people walking out. Or perhaps we could choose from three simultaneous keynotes on day 2 – one for DBAs, one for BI, and one for developers? Microsoft could deliver targeted information without boring 2/3 of the people at any given time. I swear, if I see one more demo of PowerViewPivotCrescentStreamChartExcel, I’m going to pivot forward in my chair and hurl. I’m not saying we should offer sessions that compete with keynotes. (Although as a presenter, I’d looooove that. I bet I could completely pack a room with keynote refugees. Hmmm.)

Got something you’d like to see improved? Check out feedback.SQLPass.org and cast your vote. The PASS Board listens!

PASS Summit 2011 Day 2 Keynote Liveblog #sqlpass

Last Updated February 12, 2017

Time for another keynote liveblog. Let’s get started.

It’s #SQLKilt day! Bill Griaziano is kicking things off in a kilt:

8:19am – Bill introducing Tim Radney and Jack Corbett as outstanding volunteers.

8:22 – Lori Edwards is the 2011 PASSion award for her outstanding work making this such a great Summit. Nice job!

8:24 – PASS Board Meet & Greet is happening Friday 12:15-1:30pm in room 307-308. The Board will be available for any Q&A. I love that they’re doing it during lunch – it’s probably the only time left without overlapping content.

8:26 – Videos playing pimping the AlwaysOn, Column Store indexes, Distributed Replay, Power View (formerly Crescent). Aaron Bertrand and I were talking at the LiveBlogger table, trying to figure out which features have spaces in the name, which don’t, and how the capitalization works. Microsoft needs to be more consistent with that. But hey, if that’s the biggest complaint I’ve got about SQL Server 2012 (Denali), then we’re in great shape – and I think that might just be my biggest complaint. It really is that good of a release.

8:30 – Microsoft’s Quentin Clark talking about SQL Server 2012’s features. When it comes to data, he wants to let us “manage it in the form it was born in.” I like that, catchy.

8:34 – Quentin’s Fantastic 12 of SQL Server 2012:

Required 9’s and Protection – Integration Services as a Server, HA for StreamInsight, SQL Server for AlwaysOn (woohoo!)
Blazing-Fast Performance – ColumnStore Indexes, performance improvements in the engine, SSAS, SSIS
Rapid Data Exploration – PowerView, PowerPivot, administration from SharePoint, reporting alerts
Managed Self-Service BI
Credible, Consistent Data – BI semantic model, Data Quality Services, Master Data Services
Organizational Compliance – Expanded Audit with user-defined audits and filtering, user-defined server roles
Peace of Mind – Production-simulated application testing, System Center Advisor, Management Packs, Expanded Support – Premier Mission Critical
Scalable Data Warehouse – SQL Server Appliances, HW + SW + Support (Just Add Power), Choice of Hardware
Fast Time to Solution
Extend Any Data Anywhere – Greater interoperability with PHP, Java, Hadoop. ODBC drivers for Linux, CDC for SSIS & Oracle
Optimized Productivity – SQL Server Data Tools (formerly Juneau), unified across database & BI, deployment & targeting freedom
Scale On Demand – AlwaysOn, deployment across public & private cloud, elastic scale

8:37 – Bob Erickson, Executive VP of Interlink Transport Technologies, Mediterranean Shipping Company taking the stage. #2 in the world transport industry, and their line-of-business apps are mission critical. Bills of lading, logistics, vessel planning, invoicing, accounting, sales, marketing all goes through SQL Server.

8:41 – Doing a demo of SQL Server 2012’s AlwaysOn Availability Groups. Showing how you can configure one synchronous replica and a couple of read-only asynchronous replicas to offload reporting queries.

8:46 – Good news: the crowd erupts wildly in applause. Bad news: they’re applauding because the speaker zoomed in on the demos instead of making us read tiny text from far away. Sarcasm aside, these features are just freakin’ amazingly usable in the real world. This is the kind of tool DBAs need.

8:48 – Quentin’s covering other features. I sound like a fanboy when I say this, but for each feature he’s covering in just 60-90 seconds, there’s a gold mine of good stuff for DBAs, BI pros, and devs. This really is a killer release.

8:53 – Lara Rubbelke, @SQLGal, takes the stage and starts with a joke. “I was going to demo all Excel, but…” Nice. Bloggers chuckle.

8:57 – Lara’s demoing a report built with a Column Store Index. Finds suspect data, then segues into a Data Quality Project – showing how users can identify suspect data, get data cleaning info from cloud-based services, and then update their report.

9:01 – Lara demoed how users can set up their own alerts. Hello, SQL Server Notification Services? Not quite sure what that tool was, wasn’t paying close enough attention.

9:06 – Quentin’s whipping through feature lists. All good stuff – really hard to do a whirlwind tour on this. I wish for every summary, they had something like, “For more info, go to room X at 3:00PM where you’ll talk to Mr. Y.”

9:07 – Talking appliances with Dell and HP data warehouse and business decision appliances with SQL, SharePoint, and/or Parallel Data Warehouse.

9:12 – Covering the new HP Database Consolidation Appliance as a scale-out appliance. “We have never built the biggest one.” Uh, whatever you built is the biggest one. That’s kinda how that works. Sarcasm aside, I really like the appliance concept, but I don’t see a lot of these deployed. I think it’s mostly just the type of clients I have – my clients don’t buy hardware dedicated to SQL Servers in one big rack chunk. They want to cycle through hardware easily because big changes happen in short amounts of time. Virtualization helps you cycle through those.

9:20 – Announcing ODBC drivers for Linux. The audience breaks into spontaneous applause that surprises even Q.

9:21 – Michael Rys takes the stage to talk about the new semantic search. This is kind of the opposite of full text search – instead of searching for terms, SQL Server figures out what documents are related to each other. This is massively useful for things like StackOverflow’s “related questions” search – you want to figure out what questions are related to the ones you’re looking at now.

9:28 – Nicholas Dritsas takes the stage to talk about merging on-premise and off-premise SQL Server deployments. Inside SSMS, he can create a new database in SQL Azure, pick the database, and export an existing on-premise database to Azure. This integration between developer tooling, on-premise services, and cloud services is a big edge for Microsoft. Nicholas has a nice sense of humor. “Let’s refresh and see if it’s there….oh thank God it’s there.”

9:36 – Inside SSMS, we can back up SQL Azure to Windows Azure storage. Nicholas is having to beg for applause, but frankly, this just shouldn’t have been hard. This should have been built in from the beginning. You don’t release a database without backups. I’m glad we have it now though.

9:37 – Demoing Windows Azure file management inside SSMS so you can review your backups. Nifty.

9:39 – Nicholas & Quentin are discussing Samsung’s use of the cloud for SmartTV. I’ve got one of these, and I can see how it’d be a perfect fit for the cloud. I’d like to hear more about this one for just an hour or so, but of course there’s only so much you can do in a keynote.

9:40 – Cihan Biyikoglu onstage to talk about elastic scale with SQL Azure federations.

9:44 – By the end of the year, SQL Azure will support databases up to 150GB and any collation. Big pause in the crowd, Microsoft waits for applause, and they get a very polite and quick golf clap. Seriously, though, between 150GB databases and federations, this platform is getting serious.

9:47 – They demoed the new Azure management portal and it looks gorgeous. Metro tile UI like Windows Phone 7, new way of looking at execution plans, zooming in and out. Very quick preview, but I think it caught a lot of viewers by surprise – nobody seemed to understand what they were looking at.

9:50 – And we’re out! Off to Kendra Little’s session.

Announcing sp_Blitz®: Automated SQL Server Takeovers #sqlpass

Last Updated February 12, 2017

Download sp_Blitz® and check your servers now.

I’m presenting right now at the Professional Association for SQL Server Summit in Seattle, and I’m just about to show my attendees the latest incarnation of my Blitz script.

In the past, my manual Blitz script has helped DBAs all over the world discover ugly problems in SQL Servers. However, one thing has been bothering me a lot for the last year or so. One of the lines in the script said, “Change this email address to your own, then execute this line and make sure you can receive emails from Database Mail.” You can probably guess what happened – a lot of people didn’t bother changing the email address. I used my own email in the script because I wanted to know when people ran it that way, and I got at least a dozen emails a week. I talked to some of the users, and they confessed that they didn’t even bother to read the script – they just clicked F5.

At first that bugged me. The Blitz script really has two purposes: to enlighten DBAs about their SQL Server’s risks, and to start teaching the basics of DMVs. These users weren’t learning anything – they just wanted to run the script and make the magic happen. So I figured – why not give it to ’em?

sp_Blitz®: One-Minute Server Takeovers

Say hello to sp_Blitz® – a simple stored procedure that runs a bunch of health checks and exports the results in a prioritized list. Here’s what the output looks like:

sp_Blitz Output — sp_Blitz® Output (click to enlarge)

The URL column includes a link for each problem we found in the SQL Server. Copy/paste that link into a web browser and you’ll be able to learn more about the particular issue you’re facing. I also include a quick snippet about the general source of the data, like which DMVs I’m querying to catch the issue.

The stored procedure can take a minute or two to run on larger servers, and this is very much a version 1.0. If you find things you’d like to improve, please feel free to let me know – especially if you include sample code to improve it, heh. I expect sp_Blitz® to undergo some rapid improvements over the coming weeks as people holler about bugs, which leads me to the next fun idea I’m playing with in the session.

Liveblog for #SQLPASS 2011 Day 1 Keynote

Last Updated February 12, 2017

Welcome to the big show! I’ll be liveblogging today’s keynote at the Professional Association for SQL Server Summit in Seattle. If you’d like to check out some of the photos I’ve taken so far, check out my Facebook album for the Summit.

You can watch the keynote live, and refresh this page every couple of minutes to get the latest notes.

8:19AM – Packed room, very large. I didn’t bring my wide-angle lens with me.

8:22 – Showing a very professionally-produced video of attendees talking about what the PASS Summit community means to them. Great sales pitch. I hope we see this on YouTube.

8:24 – PASS President Rushabh Mehta taking the stage. He’s reminding us that PASS’s mission is to help us connect, learn, and share, and that PASS is the community. I’d agree – for SQL Server professionals, this event really does feel like coming home. Great community vibe.

8:30 – PASS has provided over 430k technical hours of training, 20k new members (currently 80k). Their goals are 1 million technical hours of training and 250k members.

8:33 – There’s 189 sessions on 5 tracks, 204 speakers, 93 MVPs, 11 MCMs. Rushabh has heard attendees say, “There’s just too many simultaneous sessions – we can’t choose between ’em.” That’s where the DVDs and online streaming come in.

8:36 – Lots of community bloggers helping to promote #SQLPASS. (Ooo, and I’m at the top left! Flattery will get you everywhere.) Lots of chances to make connections with your peers and Microsoft at the Summit. Talking about all the different ways you can interact with your peers. The Summit really does shine this way – you *always* have something to do here.

8:40 – Covering the new edition of SQL Server MVP Deep Dives 2, a monster book with a chapter from each of 55 different MVPs.

8:42 – Ted Kummert, Senior VP of Microsoft BPD taking the stage. “We have the most amazing jobs in the industry. We get to sit around every day and imagine what the future’s going to be for business processes and applications.”

8:44 – Ted says, “Some database vendors just decided to get into the cloud last week. You know who I’m talking about, right?” Zing, Oracle, zing!

8:48 – Ted’s looking back at all the things released or announced in the last year, and there is indeed a lot. PDW, HP Enterprise Database Consolidation Database Appliance, SQL Server Denali CTP 3, Windows Azure Marketplace, etc. He believes in a hybrid cloud world: on-premise and off-premise database capabilities.

8:51 – New product names:

Denali will be SQL Server 2012 and released in the first half of 2012
Juneau is going to be released as SQL Server Data Tools
Project Crescent is going to be released as Power View

8:55 – Love the metro-style slides. Content, whatever – give me beautiful presentation, heh.

8:59 – SQL Server 2012 will support Hadoop. This shouldn’t surprise anybody – if there’s a way to connect more stuff to SQL Server, it sells more SQL Server licensing. Yay!

9:03 – Eric Baldeschwieler, CEO of HortonWorks, is onstage to discuss why Hadoop is the best way to solve big data problems. He was one of the original team to implement it at Yahoo. Ted adds that Microsoft is going to contribute code to make Hadoop run better on Windows Server and Windows Azure.

9:06 – CTP of Azure-based Hadoop coming by the end of the year.

9:07 – Microsoft’s Denny Lee gets by FAR the loudest applause. Then leads off with, “Are you ready for some demos?” Big cheers. “Sorry, not yet.” HA! He can get away with that.

9:10 – Denny’s handling network demo issues with grace. He starts with a Hadoop demo but glides into Excel. All demo roads lead to Excel these days. Denny’s excitement can’t cover that he’s demoing Hadoop and Excel to a room full of SQL Server people. Tough crowd, not much applause after the content starts. Denny can only carry things so far.

9:16 – Suggesting a private in-enterprise data marketplace so people can shop for internal data. Now that sounds kick ass.

9:17 – Tim Mallalieu and Nino Bice are demoing code name Data Explorer. And of course, they’re demoing…Excel.

9:22 – Demoing joins between Excel and Azure. Users are adding their own Excel data, joining it (doing lookups) to public data sources. “With just 3-4 mouse clicks, we’ve joined between Excel, Azure, and marketplace data.” Yes, you did a one-time one-way join between multiple sources. It’s called a query. It’s easier, but it’s not persisted and it’s not updating and it’s not centrally managed. This is Access again. Not that Access wasn’t successful – it was – but there’s a landmine of management perils here, and it just doesn’t play to SQL Server professionals.

9:27 – The demo suddenly went creepy – talking about how kids like free ice cream and candy. The Twittersphere is going off the hook with racy jokes about free candy vans. The whole back of the conference hall is giggling. This demo is going down in the hall of fame with Tina Turner.

9:33 – Ted’s retaking the stage. I’m almost sorry – that demo was one of the unintentionally-funniest ones I’ve ever seen, right down to tweets about Pedobear.

9:37 – Amir Netz, Microsoft Technical Fellow, taking the stage. He was the one who introduced us to Project Crescent last year.

9:45 – Amir is demoing movie sales over time with ~~Project Crescent~~ Power View, but it’s just completely impossible to see. These are the tiniest fonts I’ve ever seen in a demo.

9:49 – Lots of funny jokes about different actors making more money than others. Only one actor has made more movies/money/gross than Samuel L. Jackson – John Wayne. Interesting storytelling about data exploration, but you just can’t demo this in front of thousands of people on a low-res screen a hundred feet away.

9:52 – Power View will export to PowerPivot, and will render on Windows Phone, iPad, and Android. Unfortunately, only one of those three demos worked the first time – the iPad one.

9:59 – Windows tablet demo is a little less, uh, portable. Couldn’t even pick it up off the desk, and the demos failed.

10:00 – Keynote wrapping up. Off to the sessions! This one was hilarious. More thoughts shortly.

Free Guidebook App for #SQLPASS Summit

Last Updated February 12, 2017

Want a complete session schedule, conference center maps, feedback forms, and more all on your phone? There’s an app for that.

Go to Guidebook and download the app for your iPhone, Windows Phone 7, Android, or Blackberry. After launching it, you’ll be prompted to download a guide. Type in PASS Summit, and we’re near the bottom of the list:

Downloading the PASS Summit 2011 Guidebook

The home page of the PASS Summit Guidebook has links like Schedule, Maps, Sponsors, and My Schedule:

When you click on Schedule, you get the list of sessions. Use the right/left arrows at the top of the screen to switch between days. Clicking on an individual session gives you the speaker and session details:

The guy's pretty hot too. — Damn, that's a good-looking session.

Click the Add to My Events button, and it asks whether you’d like a reminder before the session starts. (Which is awesome, because a couple of Summits ago I was late to my own session. Never gonna live that one down.) The time zone support seems a little off – I downloaded the SQLSaturday Portland guide when I was in the Central time zone, but now that I’m in Pacific, it’s sending me alerts 2 hours off. Probably a bug in the app. Because of that, you might want to wait to download the app until you get to Seattle.

Minor drawback here: it’s totally independent from the PASS Schedule Builder, so if you’ve already built your schedule there, you’ll have to build it here too. It’s also not synced between devices – I’ve got Guidebook on both my iPhone and my iPad, but it’s as if they don’t know that each other exists. It’s like me and Nikki Rant in high school. But I digress.

Inside joke: for sessions with multiple presenters, there can be only one speaker thumbnail. Looks like I won this one:

Not our book. MY book. — Not our session. MY session.

I find this hilarious because people are constantly calling Professional SQL Server 2008 Internals & Troubleshooting “Brent’s book.” The head author, the genius behind the concept and the bulk of the work, is fellow MCM Christian Bolton. He’s also in this session, and he’s listed first – although it’s really Chris Testa-O’Neill’s session here. Regardless, my picture, MY SESSION, guys. I’m gonna give Christian a hard time on that one.

Conference organizers can keep the guide up to date without sending a new version of the entire app – the schedules are in-app downloads. I’ve already gotten updates to SQLSaturday Portland’s schedule for this morning through the app.

Nice work, PASS! Go to Guidebook and get the free app for your iPhone, Windows Phone 7, Android, or Blackberry.

Guessing the SQL Server Denali Release Date

Last Updated October 5, 2011

In July, I started a contest for readers to guess the SQL Server Denali release date. People can leave a comment with their best guess, and the closest guess wins a USB-powered 7″ monitor.

We can see their 113 guesses so far at right. We had a few readers who optimistically thought we’d get a release in July or September, but nearly 2/3 of the readers expected to see SQL Server Denali come out in 2011, and the most popular month guessed was November 2011.

Given what the public voiced here, I’m guessing there’s a big population of SQL Server people who expect to hear a 2011 release date at the PASS Summit next week. Unfortunately, Microsoft has already hinted at other conferences that Denali will ship in the early part of 2012. It’ll be interesting to hear how the attendees receive that news.

This might be a tough year to be a keynote speaker: Denali’s been in the hands of the public for a while, so it’s tough to do a big feature unveil when it’s not news. The release doesn’t appear to be coming soon, so it’s tough to rally enthusiasm at this exact point in SQL Server’s development.

My fear is that Microsoft will say, “Since we’ve got nothing to announce around SQL Server, let’s go all in with Azure,” and bombard us with SQL Azure keynotes. I like Azure for certain uses, and I’ve said before that I think private and public clouds are the future, but continuing to push Azure won’t lead to huge rounds of applause at the PASS Summit. That message plays really well to pure developers at conferences like Build, Connections, Mix, and TechEd, but not so well to people who make their living managing on-premise databases or building BI processes around them. It’s the old presenter adage – know your audience.

I haven’t yet heard whether I requalify for the Bloggers’ Table at the PASS Keynote, but if I do, I’ll liveblog the keynotes as I’ve done in the past. I use Engadget’s liveblog style as my goal, but with less pictures since SQL Server keynotes don’t tend to photograph well, and I like to write the newest stuff on the bottom to make it easier to read later. I think livetweeting the keynotes is fun, but it doesn’t leave an easy-to-follow record for people who can’t be there live.

If there’s something you’d like me to change about my liveblog style this year, let me know in the comments. And by the way, if you haven’t entered a guess yet for the Denali release date, now’s your chance in the comments in that post.

PASS Summit Tips and Tricks

Last Updated October 4, 2011

Wanna learn where to go after hours? What to take with you during the day? What to say to presenters? It’s all here in this week’s Technology Triage Tuesday webcast:

https://www.youtube.com/watch?v=ghrfKy2XnzY

DotNetRocks Podcast

Last Updated September 4, 2011

Jeremiah Peschka

What’s a post about the DotNetRocks podcast doing over here? Turns out that I recorded a podcast! You can check it out on September 27th (that’s in just a few days). So what the devil did I talk about?

It turns out that there’s some interested in the .NET library for Riak that I’ve been working on. Over the course of the interview, we hit on some of the difficulties with relational and non-relational database, how Riak solves them, and how CorrugatedIron solves some of the problems that developers can run into working with Riak. Along the way we talk about Riak, NoSQL, distributed data storage, load balancing, and functional programming. It was a great conversation and I’m glad I had the chance to chat with Carl and Richard.

When it goes live, you’ll be able to find the recording over on DotNetRocks.com

Kendra Little Explains How to Design Smarter Indexes

Last Updated February 13, 2017

Indexing, SQL Server

How do you know which indexes to create to improve your database performance? Microsoft Certified Master Kendra Little shows you how to find a missing index, decide on the best index definition, and measure the usefulness of your new index. If you know the basics of what a database index does and want to learn the smart way to design indexes, this talk is for you.

Want to play along with the video with the sample scripts? Scripts are below the video.

https://www.youtube.com/watch?v=NiS_CsWoLdQ

Here’s what you need to miss (and fix) these indexes at home:

Get the ContosoRetailDW database from Microsoft
Then check out the missing index samples below

Workload Script

This set of queries generates sample missing index information for the Contoso Retail DW database. This uses queries that cause conversion errors on purpose: missing index recommendations will be generated, but this will still run very fast.

This script is designed to be saved as a file and run periodically during the demo.

/**********************
This is a demo script from https://www.brentozar.com
Scripts provided for testing/demo purposes only.
***********************/

--***IMPORTANT****
--Set Query Options for this connection to 'discard results'.


use ContosoRetailDW
GO

--Queries to build stats

select top 50 OnlineSalesKey, DateKey, StoreKey, PromotionKey
from dbo.FactOnlineSales
where DateKey &gt; '2009-11-11'
and ProductKey = 'abc'
go 200

select top 50 OnlineSalesKey, DateKey, StoreKey, PromotionKey
from dbo.FactOnlineSales
where DateKey &gt; '2010-01-10'
and ProductKey = 'abc'
go 201


select top 50 OnlineSalesKey, DateKey, StoreKey
from dbo.FactOnlineSales
where DateKey &gt; '2010-01-10'
and ProductKey = 'abc'
go 202


select OnlineSalesKey, StoreKey
from dbo.FactOnlineSales
where DateKey &gt; 'a'
go 413



--Other indexes (just for color)
select CurrencyKey, OnlineSalesKey, StoreKey, ProductKey
from dbo.FactOnlineSales
where currencyKey ='a'
and storekey='a'
go 2

select ProductKey, StoreKey, PromotionKey
from factsales
where StoreKey='a'
go 4

select *
from FactSalesQuota
where StoreKey = 'a'
GO 2

select DateKey, Amount
from FactStrategyPlan
where Amount &gt; 'a'
GO

/**********************

This is a demo script from https://www.brentozar.com

Scripts provided for testing/demo purposes only.

***********************/

--***IMPORTANT****

--Set Query Options for this connection to 'discard results'.

use ContosoRetailDW

--Queries to build stats

select top 50 OnlineSalesKey, DateKey, StoreKey, PromotionKey

from dbo.FactOnlineSales

where DateKey > '2009-11-11'

and ProductKey = 'abc'

go 200

select top 50 OnlineSalesKey, DateKey, StoreKey, PromotionKey

from dbo.FactOnlineSales

where DateKey > '2010-01-10'

and ProductKey = 'abc'

go 201

select top 50 OnlineSalesKey, DateKey, StoreKey

from dbo.FactOnlineSales

where DateKey > '2010-01-10'

and ProductKey = 'abc'

go 202

select OnlineSalesKey, StoreKey

from dbo.FactOnlineSales

where DateKey > 'a'

go 413

--Other indexes (just for color)

select CurrencyKey, OnlineSalesKey, StoreKey, ProductKey

from dbo.FactOnlineSales

where currencyKey ='a'

and storekey='a'

go 2

select ProductKey, StoreKey, PromotionKey

from factsales

where StoreKey='a'

go 4

select *

from FactSalesQuota

where StoreKey = 'a'

GO 2

select DateKey, Amount

from FactStrategyPlan

where Amount > 'a'

Demo SCRIPT

This script will prompt you several times to run a “fake workload” using the script above.

/**********************
This is a demo script from https://www.brentozar.com
Scripts provided for testing/demo purposes only.
***********************/


/**********************
0. Preliminary Check
How long has our instance been up?
We want to know to contextualize the number of times the index was needed.
***********************/

select cast(
  datediff(hh,sqlserver_start_time,GETDATE())
  /24. 
  as numeric(10,1)) as [Days Uptime] 
from sys.dm_os_sys_info

--What does this mean for missing index data?


/**********************
1. Review missing indexes.
Create a view to look at them.
**************************/    
Use admin;
GO

if object_id('dbo.MissingIndexes') is null
  exec sp_executesql N'create view dbo.MissingIndexes as select ''''as Stub'
GO
ALTER VIEW dbo.MissingIndexes
as
SELECT 
    id.statement,
        cast(gs.avg_total_user_cost * gs.avg_user_impact * ( gs.user_seeks + gs.user_scans )as int) AS Impact,
        cast(gs.avg_total_user_cost as numeric(10,2)) as [Average Total Cost],
        cast(gs.avg_user_impact as int) as [% Reduction of Cost],
        gs.user_seeks + gs.user_scans as [Missed Opportunities],
        id.equality_columns as [Equality Columns],
        id.inequality_columns as [Inequality Columns],
        id.included_columns as [Included Columns]
FROM sys.dm_db_missing_index_group_stats AS gs
JOIN sys.dm_db_missing_index_groups AS ig ON gs.group_handle = ig.index_group_handle
JOIN sys.dm_db_missing_index_details AS id ON ig.index_handle = id.index_handle
go


--Run a fake workload using the "workload script" above in another session.
--Then check out your view

SELECT *
from admin.dbo.MissingIndexes
ORDER BY Impact desc

/**********************
2. Make a list of the columns which look interesting
**************************/ 
--Key:


--Include:



/**********************
3. Review indexes already on the table.
This will give you a sense of the current 'weight' of indexes for the table.
**************************/    
use ContosoRetailDW;
go

SELECT  
        OBJECT_NAME(ps.object_id) AS object_name ,
        ps.index_id ,
        ISNULL(si.name, '(heap)') AS index_name ,
        CAST(ps.reserved_page_count * 8 / 1024. / 1024. AS NUMERIC(10, 2)) AS reserved_GB ,
        ps.row_count ,
        ps.partition_number ,
        ps.in_row_reserved_page_count ,
        ps.lob_reserved_page_count ,
        ps.row_overflow_reserved_page_count
FROM    sys.dm_db_partition_stats ps
        LEFT JOIN sys.indexes AS si
            ON ps.object_id = si.object_id
               AND ps.index_id = si.index_id
WHERE   OBJECT_NAME(ps.object_id) = 'FactOnlineSales' 
 


--Check for indexes already on the table which may combine with your desired index.
--For example, there may be an existing index which you can add key columns or includes to, 
--That's always better than creating a new index!


--What is our table keyed on?
exec sp_helpindex FactOnlineSales


/**********************
4. Create an index
Try to play it safe and create a narrow index
**************************/  
use ContosoRetailDW;
go
   
CREATE NONCLUSTERED INDEX [ixFactOnlineSales_DateKey_Narrow] ON [dbo].[FactOnlineSales]
([DateKey] ASC) WITH (FILLFACTOR = 98) 
GO



--What did this do to our missing index recommendations?
SELECT *
from admin.dbo.MissingIndexes
ORDER BY Impact desc



--Run a fake workload using the script above to simulate more activity



/**********************
5. Review index usage
**************************/  
use ContosoRetailDW;
go

SELECT  o.name as [Object Name],
        s.index_id as [Index ID],
    ps.partition_number as [Partition Num],
        i.name as [Index Name],
        i.type_desc as [Index Type],
        s.user_seeks + s.user_scans + s.user_lookups as [Total Queries Which Read] ,
        s.user_updates [Total Queries Which Wrote] ,
        ps.row_count as [Row Count],	
        CASE WHEN s.user_updates &lt; 1 THEN 100
             ELSE ( s.user_seeks + s.user_scans + s.user_lookups ) / s.user_updates * 1.0
        END AS [Reads Per Write] 
FROM    sys.dm_db_index_usage_stats s
JOIN sys.dm_db_partition_stats ps on s.object_id=ps.object_id and s.index_id=ps.index_id
JOIN sys.indexes i ON i.index_id = s.index_id
  AND s.object_id = i.object_id
JOIN sys.objects o ON s.object_id = o.object_id
JOIN sys.schemas c ON o.schema_id = c.schema_id
WHERE 
    s.database_id=db_id()
    and o.name = 'FactOnlineSales'





/**********************
6. Do we still have missing indexes?
**************************/  
SELECT *
from admin.dbo.MissingIndexes
ORDER BY Impact desc

--Why is that?




/**********************
7. Try creating a covering index
**************************/  
Use ContosoRetailDW
GO

CREATE NONCLUSTERED INDEX [ixFactOnlineSales_DateKey_Covering] ON [dbo].[FactOnlineSales]
([DateKey],[ProductKey]) INCLUDE ([OnlineSalesKey], [StoreKey], [PromotionKey]) WITH (FILLFACTOR = 95) ON [PRIMARY]
GO

--I'm putting DateKey first, even though it was an 'Inequality'.
--This may not *always* be the best idea, but sometimes it works!



/**********************
8. Compare the size of our indexes
**************************/    
use ContosoRetailDW;
go

SELECT  
        OBJECT_NAME(ps.object_id) AS object_name ,
        ps.index_id ,
        ISNULL(si.name, '(heap)') AS index_name ,
        CAST(ps.reserved_page_count * 8 / 1024. / 1024. AS NUMERIC(10, 2)) AS reserved_GB ,
        ps.row_count ,
        ps.partition_number ,
        ps.in_row_reserved_page_count ,
        ps.lob_reserved_page_count ,
        ps.row_overflow_reserved_page_count
FROM    sys.dm_db_partition_stats ps
        LEFT JOIN sys.indexes AS si
            ON ps.object_id = si.object_id
               AND ps.index_id = si.index_id
WHERE   OBJECT_NAME(ps.object_id) = 'FactOnlineSales' 


--Let's drop that first index.
--Our second contains the same definition.
drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Narrow]


--Run a fake workload using the script above


/**********************
9. Review index usage
**************************/  
use ContosoRetailDW;
go

SELECT  o.name as [Object Name],
        s.index_id as [Index ID],
    ps.partition_number as [Partition Num],
        i.name as [Index Name],
        i.type_desc as [Index Type],
        s.user_seeks + s.user_scans + s.user_lookups as [Total Queries Which Read] ,
        s.user_updates [Total Queries Which Wrote] ,
        ps.row_count as [Row Count],	
        CASE WHEN s.user_updates &lt; 1 THEN 100
             ELSE ( s.user_seeks + s.user_scans + s.user_lookups ) / s.user_updates * 1.0
        END AS [Reads Per Write] 
FROM    sys.dm_db_index_usage_stats s
JOIN sys.dm_db_partition_stats ps on s.object_id=ps.object_id and s.index_id=ps.index_id
JOIN sys.indexes i ON i.index_id = s.index_id
  AND s.object_id = i.object_id
JOIN sys.objects o ON s.object_id = o.object_id
JOIN sys.schemas c ON o.schema_id = c.schema_id
WHERE 
    s.database_id=db_id()
    and o.name = 'FactOnlineSales'




/**********************
10. Do we still have missing indexes?
**************************/  
SELECT *
from admin.dbo.MissingIndexes
ORDER BY Impact desc





/**********************
11. ~ PROFIT ~
**************************/  

--We made things better, and we can prove it!




/**********************
THE END. Drop indexes
**************************/ 
use ContosoRetailDW
go

if indexproperty(object_id('FactOnlineSales'),'ixFactOnlineSales_DateKey_Covering','IsClustered') =0
  drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Covering]
go

if indexproperty(object_id('FactOnlineSales'),'ixFactOnlineSales_DateKey_Covering','IsClustered') =0
  drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Covering]
go

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

/**********************

This is a demo script from https://www.brentozar.com

Scripts provided for testing/demo purposes only.

***********************/

/**********************

0. Preliminary Check

How long has our instance been up?

We want to know to contextualize the number of times the index was needed.

***********************/

select cast(

datediff(hh,sqlserver_start_time,GETDATE())

/24.

as numeric(10,1)) as [Days Uptime]

from sys.dm_os_sys_info

--What does this mean for missing index data?

/**********************

1. Review missing indexes.

Create a view to look at them.

**************************/

Use admin;

if object_id('dbo.MissingIndexes') is null

exec sp_executesql N'create view dbo.MissingIndexes as select ''''as Stub'

ALTER VIEW dbo.MissingIndexes

SELECT

id.statement,

cast(gs.avg_total_user_cost * gs.avg_user_impact * ( gs.user_seeks + gs.user_scans )as int) AS Impact,

cast(gs.avg_total_user_cost as numeric(10,2)) as [Average Total Cost],

cast(gs.avg_user_impact as int) as [% Reduction of Cost],

gs.user_seeks + gs.user_scans as [Missed Opportunities],

id.equality_columns as [Equality Columns],

id.inequality_columns as [Inequality Columns],

id.included_columns as [Included Columns]

FROM sys.dm_db_missing_index_group_stats AS gs

JOIN sys.dm_db_missing_index_groups AS ig ON gs.group_handle = ig.index_group_handle

JOIN sys.dm_db_missing_index_details AS id ON ig.index_handle = id.index_handle

--Run a fake workload using the "workload script" above in another session.

--Then check out your view

SELECT *

from admin.dbo.MissingIndexes

ORDER BY Impact desc

/**********************

2. Make a list of the columns which look interesting

**************************/

--Key:

--Include:

/**********************

3. Review indexes already on the table.

This will give you a sense of the current 'weight' of indexes for the table.

**************************/

use ContosoRetailDW;

SELECT

OBJECT_NAME(ps.object_id) AS object_name ,

ps.index_id ,

ISNULL(si.name, '(heap)') AS index_name ,

CAST(ps.reserved_page_count * 8 / 1024. / 1024. AS NUMERIC(10, 2)) AS reserved_GB ,

ps.row_count ,

ps.partition_number ,

ps.in_row_reserved_page_count ,

ps.lob_reserved_page_count ,

ps.row_overflow_reserved_page_count

FROM sys.dm_db_partition_stats ps

LEFT JOIN sys.indexes AS si

ON ps.object_id = si.object_id

AND ps.index_id = si.index_id

WHERE OBJECT_NAME(ps.object_id) = 'FactOnlineSales'

--Check for indexes already on the table which may combine with your desired index.

--For example, there may be an existing index which you can add key columns or includes to,

--That's always better than creating a new index!

--What is our table keyed on?

exec sp_helpindex FactOnlineSales

/**********************

4. Create an index

Try to play it safe and create a narrow index

**************************/

use ContosoRetailDW;

CREATE NONCLUSTERED INDEX [ixFactOnlineSales_DateKey_Narrow] ON [dbo].[FactOnlineSales]

([DateKey] ASC) WITH (FILLFACTOR = 98)

--What did this do to our missing index recommendations?

SELECT *

from admin.dbo.MissingIndexes

ORDER BY Impact desc

--Run a fake workload using the script above to simulate more activity

/**********************

5. Review index usage

**************************/

use ContosoRetailDW;

SELECT o.name as [Object Name],

s.index_id as [Index ID],

ps.partition_number as [Partition Num],

i.name as [Index Name],

i.type_desc as [Index Type],

s.user_seeks + s.user_scans + s.user_lookups as [Total Queries Which Read] ,

s.user_updates [Total Queries Which Wrote] ,

ps.row_count as [Row Count],

CASE WHEN s.user_updates < 1 THEN 100

ELSE ( s.user_seeks + s.user_scans + s.user_lookups ) / s.user_updates * 1.0

END AS [Reads Per Write]

FROM sys.dm_db_index_usage_stats s

JOIN sys.dm_db_partition_stats ps on s.object_id=ps.object_id and s.index_id=ps.index_id

JOIN sys.indexes i ON i.index_id = s.index_id

AND s.object_id = i.object_id

JOIN sys.objects o ON s.object_id = o.object_id

JOIN sys.schemas c ON o.schema_id = c.schema_id

WHERE

s.database_id=db_id()

and o.name = 'FactOnlineSales'

/**********************

6. Do we still have missing indexes?

**************************/

SELECT *

from admin.dbo.MissingIndexes

ORDER BY Impact desc

--Why is that?

/**********************

7. Try creating a covering index

**************************/

Use ContosoRetailDW

CREATE NONCLUSTERED INDEX [ixFactOnlineSales_DateKey_Covering] ON [dbo].[FactOnlineSales]

([DateKey],[ProductKey]) INCLUDE ([OnlineSalesKey], [StoreKey], [PromotionKey]) WITH (FILLFACTOR = 95) ON [PRIMARY]

--I'm putting DateKey first, even though it was an 'Inequality'.

--This may not *always* be the best idea, but sometimes it works!

/**********************

8. Compare the size of our indexes

**************************/

use ContosoRetailDW;

SELECT

OBJECT_NAME(ps.object_id) AS object_name ,

ps.index_id ,

ISNULL(si.name, '(heap)') AS index_name ,

CAST(ps.reserved_page_count * 8 / 1024. / 1024. AS NUMERIC(10, 2)) AS reserved_GB ,

ps.row_count ,

ps.partition_number ,

ps.in_row_reserved_page_count ,

ps.lob_reserved_page_count ,

ps.row_overflow_reserved_page_count

FROM sys.dm_db_partition_stats ps

LEFT JOIN sys.indexes AS si

ON ps.object_id = si.object_id

AND ps.index_id = si.index_id

WHERE OBJECT_NAME(ps.object_id) = 'FactOnlineSales'

--Let's drop that first index.

--Our second contains the same definition.

drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Narrow]

--Run a fake workload using the script above

/**********************

9. Review index usage

**************************/

use ContosoRetailDW;

SELECT o.name as [Object Name],

s.index_id as [Index ID],

ps.partition_number as [Partition Num],

i.name as [Index Name],

i.type_desc as [Index Type],

s.user_seeks + s.user_scans + s.user_lookups as [Total Queries Which Read] ,

s.user_updates [Total Queries Which Wrote] ,

ps.row_count as [Row Count],

CASE WHEN s.user_updates < 1 THEN 100

ELSE ( s.user_seeks + s.user_scans + s.user_lookups ) / s.user_updates * 1.0

END AS [Reads Per Write]

FROM sys.dm_db_index_usage_stats s

JOIN sys.dm_db_partition_stats ps on s.object_id=ps.object_id and s.index_id=ps.index_id

JOIN sys.indexes i ON i.index_id = s.index_id

AND s.object_id = i.object_id

JOIN sys.objects o ON s.object_id = o.object_id

JOIN sys.schemas c ON o.schema_id = c.schema_id

WHERE

s.database_id=db_id()

and o.name = 'FactOnlineSales'

/**********************

10. Do we still have missing indexes?

**************************/

SELECT *

from admin.dbo.MissingIndexes

ORDER BY Impact desc

/**********************

11. ~ PROFIT ~

**************************/

--We made things better, and we can prove it!

/**********************

THE END. Drop indexes

**************************/

use ContosoRetailDW

if indexproperty(object_id('FactOnlineSales'),'ixFactOnlineSales_DateKey_Covering','IsClustered') =0

drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Covering]

if indexproperty(object_id('FactOnlineSales'),'ixFactOnlineSales_DateKey_Covering','IsClustered') =0

drop index [dbo].[FactOnlineSales].[ixFactOnlineSales_DateKey_Covering]

6 Blog Projects to Improve Your Writing

Last Updated September 13, 2011

You’re stuck in a rut. You’ve been writing the same thing for the same way for months.

No, it's not me. — Dated is also a style.

You’ve deluded yourself into thinking you’ve built up a style that works for you, but that’s the thing about style: it rarely lasts forever. As Heidi Klum says, “One day you’re in, and the next day…you’re out.”

Project #1: Tell a personal story first and a tech story second.

There are millions of dry, boring technical posts around, but there’s only one you. I bet you’ve got some hilarious, awkward, touching, or endearing moments from your past that you’d love to share, but it just doesn’t feel right on your professional blog.

Rather than thinking about a technical feature that you want to illustrate with a personal story, turn it around. Pick your favorite story that you’d like to tell, pour your heart out, and then figure out how to tie something technical into it.

I used this approach with my SQL Server Data Compression: It’s a Party! post. I look back and laugh at the party that got me disinherited from the Ozar Family (Lack Of) Fortune. I loved the story, so to get it out there, I tied it into a SQL Server feature that’s been written about many times, and I brought a personal touch in the process.

Project #2: Forget the answer – describe the question.

It’s not always about helping the community by giving the answer. Sometimes you can help just by clearly explaining all of the challenges involved with a question.

My recent post How Do You Mask Data for Secure Testing post was inspired by @ErinStellato‘s question on #SQLHelp. Twitter’s 140-character limit just doesn’t allow for the full explanation of a complex question like this, and I quickly got frustrated with people suggesting half-ass solutions that I’d seen fail. Rather than tweet one limitation at a time, I poured my heart into the question in a blog post and let readers contribute ideas.

In that particular situation, I knew there wasn’t a good answer, but that’s not always the case. Even when you know there’s several good answers, try to write a post just describing the question. For example, how do you find out the most resource-intensive queries running on a system? Try to write an entire post just describing the problem will improve your ability to step back and see the big picture without taking a knee-jerk reaction to explain a tool.

Leaving the answer open for debate in your blog’s comments encourages readers to get interactive.

Project #3: Schedule a blog post months in advance.

The next time you’re about to hit Publish on a blog post, ask yourself if you’re really proud of that post.

If not, don’t publish it. Click Schedule, and use a date three or four months from now. Walk away from the post because at this point, the pressure’s off. Sure, it’s kinda sorta good enough, and you were okay with going live with it right away, but instead it’s just going to get better with age.

Days or weeks from now, you’ll be inspired. You’ll think of several ways you want to improve the post, and you’ll jump in to write more when you’re in the zone. You’ll be tempted to revise the publication date earlier, but don’t give in. The improvements will just keep coming with time, and when the final publication date arrives, you’ll be giddy with anticipation for the world to see your polished, honed work.

Blog posts are the opposite of milk: the younger they are, the worse they smell. Blog posts are more like wine: you want to craft timeless words that will snowball and bring you more and more visits over time.

Project #4: Ask a real writer for their opinion.

Don’t ask a fellow technology blogger. Print out your blog and take it down to your company’s marketing department – the people who write the brochures. If you have the choice between someone who writes press releases or someone who writes brochures, pick the brochure person, but take whoever you can get.

Say, “I’d like your honest, brutal opinion about something. I’m trying to improve my personal blog. Absolutely nothing is off limits. If you could throw this thing in the Author-o-Matic and remix it completely, what would you do differently?” Tell them to ignore the grammatical mistakes for now and save their red ink for big-picture stuff. (They’ll probably mark up the grammar anyway, but I’m just trying to make you feel better about the inevitable stream of red ink. It happens to me too – Jeremiah constantly kicks my ass about my addiction to commas.)

You don’t have to obey their every whim, but getting this completely different view of your work will help you see things in a new light. Your blog doesn’t have to be a brochure, but I bet they’ve got tricks that will help bring life to your prose.

Project #5: Make a list of storytelling tools you’ve used, and skip them once.

I bet you’ve got at least a few favorite blogs that you could identify even if the author info was missing. They always use exactly the same tools to tell a story: they always use code snippets, always use screenshots, always use polls, always use SEO-friendly titles, etc. It’s great to have an identifiable brand, but that doesn’t mean you have to stagnate as an author.

Reread your blog posts from the last several months and make a list of every non-text tool you used. If that list is short, it’s time to teach your old dog some new tricks. Over the next week, as you read other peoples’ blogs, make a list of the non-text tools they use. Get inspired to experiment.

In my Building a Better BrentOzar.com post, I talked about experimenting with pull quotes. It’s free, it’s easy, and it brings a new dimension to your blog posts. Even better, it makes your posts look like something completely new to the tech community, and that brings us to our last project.

Project #6: Read posts from a completely different genre.

If we only draw inspiration from the database blog community, our content is going to look like the British Royal Family’s gene pool. I try to read at least five blog posts from completely new (to me) blogs per week.

This post is a good example – I shamelessly stole the idea from 4 Photo Projects to Make You Better, heard through @RhondaTipton.

A Sysadmin’s Guide to Microsoft SQL Server Memory

Last Updated March 5, 2017

Learn more, see sample deliverables, and book a free 30-minute call with Brent.

SQL Server

Database servers suck, don’t they? Don’t you just hate dealing with these things? They’re totally different than any other server in the shop, making a mockery of any CPU, memory, and storage you throw at ’em. What the hell’s going on inside? Today I’ll give you the basics – starting with the question every sysadmin always asks me:

SQLservr.exe Mem Usage in Task Manager — Task Manager: a Dirty, Filthy Liar

Why Isn’t SQLServer.exe Using Much Memory?

When you remote desktop into a server and look at Task Manager, sqlservr.exe’s Mem Usage always seems wacky. That’s not SQL Server’s fault. Task Manager is a dirty, filthy liar. (I know, it sounds like the SQL guy is shifting the blame, but bear with me for a second.) On 64-bit boxes, this number is somewhat more accurate, but on 32-bit boxes, it’s just completely off-base.

To truly get an accurate picture of how much memory SQL Server is using, you need a tool like Process Explorer, and you need to identify all of SQL Server’s processes. In the server I’m showing at right, there’s two SQL Server instances (shown by sqlservr.exe), plus SQL Agent, SQL Browser, and SQL Server backup tools. It’s not unusual to also see SQL Server Analysis Services, Integration Services, and Reporting Services also running on the same server – all of which consume memory.

So how much memory is SQL using? I’ll make this easy for you.

SQL Server is using all of the memory. Period.

No matter how much memory you put in a system, SQL Server will use all it can get until it’s caching entire databases in memory and then some. This isn’t an accident, and there’s a good reason for it. SQL Server is a database: programmers store data in SQL Server, and then SQL Server manages writing that data to files on the hard drive. Programmers issue SELECT statements (yes, usually SELECT *) and SQL Server fetches the data back from the drives. The organization of files and drives is abstracted away from the programmers.

To improve performance, SQL Server caches data in memory. SQL Server doesn’t have a shared-disk model: only one server’s SQLserver.exe can touch the data files at any given time. SQL Server knows that once it reads a piece of data from the drives, that data isn’t changing unless SQL Server itself needs to update it. Data can be read into memory once and safely kept around forever. And I do mean forever – as long as SQL Server’s up, it can keep that same data in memory. If you have a server with enough memory to cache the entire database, SQL Server will do just that.

Why Doesn’t SQL Server Release Memory?

Memory makes up for a lot of database sins like:

Slow, cheap storage (like SATA hard drives and 1Gb iSCSI)
Programs that needlessly retrieve too much data
Databases that don’t have good indexes
CPUs that can’t build query plans fast enough

Throw enough memory at these problems and they go away, so SQL Server wants to use all the memory it can get. It also assumes that more queries could come in at any moment, so it never lets go or releases memory unless the server comes under memory pressure (like if other apps need memory and Windows sends out a memory pressure notification).

By default, SQL Server assumes that its server exists for the sole purpose of hosting databases, so the default setting for memory is an unlimited maximum. (There are some version/edition restrictions, but let’s keep things simple for now.) This is a good thing; it means the default setting is covering up for sins. To find out if the server’s memory is effectively covering up sins, we have to do some investigation.

Is SQL Server Caching Data to Lessen IO Demands?

In my SQL Server Perfmon tutorial, one of the counters I recommend checking is SQL Server: Buffer Manager – Page Life Expectancy. This counter records how long SQL Server is able to cache data in memory, measured in seconds. Higher numbers are better. As a general starting point, this number shouldn’t dip below 300 seconds (5 minutes) for too long. Take that number with a grain of salt – we had to pick *some* number to use as a general guideline, but we can’t boil down tons of troubleshooting down to a single number. For example, there are situations like multi-terabyte data warehouses where we simply can’t cache more than a few minutes of data in memory no matter what.

Generally, though, if this number is below 300, the server might benefit from more memory. Added memory would let SQL Server cache data, thereby sending less read requests out to the storage. As you add more memory, the Page Life Expectancy counter should go up, and the Physical Disk: Average Reads/sec counter should go down. A nice side effect is that the Physical Disk: Average Sec/Read counter (aka latency) should also go down, because the less work we make our storage do, the faster it’s able to respond.

If all of these things are true, consider buying memory:

Users are complaining about performance
The total size of the MDF files on the SQL Server’s hard drives is more than 2x memory
Page Life Expectancy is averaging under 300 during end user load times (typically weekday business hours)
The server’s running a 64-bit version of Windows
The server has 32GB of memory or less
Additional memory will cost under $1,000
You’ve got no SQL Server DBA on staff

I know, that’s a lot of qualifications, but I’m trying to give you a no-brainer window where the limited investment in memory is very likely to pay off in increased performance. 16GB of memory for most modern servers comes in at $500 or less, and can make an unbelievable performance difference on a SQL Server. I see a lot of SQL Servers running on boxes with just 4-16GB of memory, trying to support 100GB of databases, and the sysadmin just needs a quick, easy, and risk-free fix. Memory is usually that fix.

If Page Life Expectancy is already over 300 – say, in the tens of thousands – then SQL Server probably has enough memory to cache data. (DBA readers – yes, I know, I’m generalizing here. Give your poor sysadmins a break.) That doesn’t mean you can reduce the amount of memory in the server, either.

Is SQL’s Memory Management Caching Queries to Ease CPU Pressure?

When end users request data, SQL Server has to compile an execution plan: a task list of which tables it needs to hit, what order it needs to hit them, and when to do operations like sorts and calculations. Your end users write some pretty dang bad queries, and execution plans can end up looking like the picture at right. Compiling an execution plan like this is hard work, and hard work means CPU time. When SQL Server is done building that plan, it says, “Whew! I’ll save that execution plan in memory, and if somebody sends in a query like that again later, I’ll be able to reuse this plan instead of building a new one.” To determine how much that’s helping SQL Server performance, check out the Perfmon counters for SQL Server: SQL Statistics – Batch Requests/sec and Compilations/sec. Batch Requests is the number of incoming queries, and Compilations is the number of new plans we had to build.

Microsoft’s SQL Customer Advisory Team’s Top SQL 2005 OLTP Performance Issues says that if Compilations/sec is more than 10% of Batch Requests/sec, you may be experiencing CPU pressure because SQL Server has to build execution plans. This one gets tricky, and frankly, it’s trickier than I want a sysadmin to hassle with. This rule just doesn’t work in too many cases because it ignores the quantity of work being done. If you’ve got a small number of queries coming in, and you’ve armed the server with big multi-core processors, then building execution plans is hardly any work at all even if you’re compiling every single statement from scratch. However, if Compilations/sec is 25% or higher relative to Batch Requests/sec, and if you’ve got in-house developers, it’s time to start asking questions. They’re probably using development tools like LINQ or dynamic SQL that can force SQL Server to build execution plans unnecessarily. We have to work around that by educating the developers, because no amount of memory is going to fix that problem.

Here’s where things really start to suck: if your developers are using those techniques, SQL Server is caching their execution plans – yet never actually reusing them. Your valuable memory is getting used to cache plans that will never be seen again – instead of caching data. Ouch. Thankfully, SQL Server has an “Optimize for Ad Hoc” setting we can enable so that we only cache execution plans after the second time they’re used. I don’t recommend sysadmins set this on their own, either, but I wanted to touch base on it just so you’re aware that there’s an easy fix. (I’m not saying that educating your developers to improve their code isn’t an easy fix. Okay, yeah, that’s exactly what I’m saying.)

How to Reduce SQL Server’s Memory Usage (or Increase It)

If If you’re going to run other software on the server, you can set SQL Server’s maximum amount of memory to leave memory free for other applications.

Before we start, remember, memory is probably covering up for other sins. There’s a reason I put these instructions at the bottom of the post rather than the top. In most cases, reducing SQL Server’s memory footprint will increase your complaints from end users. This might be completely okay when dealing with infrastructure databases, though, so here we go.

Open SQL Server Management Studio. If you don’t have this installed on your local machine, you can remote desktop into the SQL Server. Upon opening SSMS, it will ask you what server to connect to, and that’s the name of the server you’ve RDP’d into. (In some cases like clusters and named instances, this gets more complicated.)

In the Authentication dropdown, choose Windows Authentication and it’ll use your domain credentials. I’m assuming you were the one who installed this SQL Server, or you’re getting lucky that someone added one of your groups into the admin group for this server – just because you’re a local admin or a domain admin doesn’t mean you’re an admin inside SQL Server. If you get a security error, you’ll need to do some legwork to find out who manages this SQL Server.

Click View, Object Explorer, and you should get something like what you see at right – a server name, then a bunch of stuff underneath. Right-click on the server name and click Properties. Click on the Memory section on the left hand side, and you’ll see:

Use AWE to Allocate Memory – generally speaking, this should be checked on 32-bit servers with >4GB of memory, and unchecked the rest of the time.
Minimum Server Memory – default is 0. If reading this article is your first exposure to SQL Server memory, don’t change this number. If this number has already been set at a higher number, ask around to find out who did it. I’ve seen a lot of cases where people set both min and max memory to the same number in an effort to get greedy. If that’s the case here, and you’re reducing the Maximum Server Memory number, then reduce the Minimum as well. If you’re increasing the Max, leave Min where it is.
Maximum Server Memory – default is 2147483647. Many people just assume that’s two petabytes. Not true. It’s actually 214-748-3647, the phone number to Max’s Pizza in Dallas. I highly recommend the pepperoni. Anyway, here’s where things get a little tricky: if it’s still set to Max’s Pizza, and you’re trying to reduce the amount of memory SQL Server uses, then you can set it to anything you want. The lower you set it, the lower performance will generally go. If it’s already set to a different number, then someone’s been in before you. Often I’ve seen people start out with a certain amount of memory in the server – say, 8GB – and they set Max Server memory to a lower number – say, 4GB – to leave memory free for the OS to breathe. Later on, they add more memory to the server, but they forget to increase SQL Server’s Max Server Memory – so that memory just sits around unused.

Some of these changes (like AWE) will only take effect upon restart of the SQL Server service, while others (like decreasing Max Server Memory) will take effect instantly. SQL Server will not restart itself in order for the changes to take effect. This has pros and cons: it means you won’t get an accidental service outage, but it also means you might get a surprise the next time the service is restarted – your changes will suddenly take effect.

Want help? Talk to Brent for free.

The problem probably isn’t memory – you keep restarting and the problems keep coming back.

Our 3-day SQL Critical Care® is a quick, easy process that gets to the root cause of your database health and performance pains.

SQL Server in EC2

Last Updated February 9, 2017

Jeremiah Peschka

Always On Availability Groups, Amazon Web Services (AWS)

The cloud is robust and reliable. The cloud solves all of our scaling needs. The cloud makes my poop smell like roses. While all of these statements are theoretically true it takes some effort to make them true in reality, especially when a database is involved.

Who Is Deploying SQL Server in EC2?

A question I hear a lot is, “Who is putting SQL Server into EC2?” Sometimes there’s a hint of incredulity in that question, hinting that people can’t seriously be deploying SQL Server into EC2. That’s far from the truth: many companies, large and small, are deploying SQL Server into Amazon’s cloud, with varying degrees of success.

Back to the question of who’s deploying SQL Server in EC2…

I work with a start up building a platform on a Microsoft stack. The front end is hosted on mobile devices, the middle tier is written in .NET, and the database is SQL Server. They’re just starting out and they don’t have the time to spend speccing out servers, ordering them, and waiting for delivery. It’s easy to spin up servers to test new features and functionality and if a feature doesn’t catch on the servers are turned back off. Traditional hosting facilities didn’t offer the the flexibility or response times that they were looking for, so they chose Amazon EC2.

Another company I work with is a ISV transitioning to a Software as a Service (SaaS) model. The ISV works in an industry where there is a lot of specialized knowledge and technical ability, but that knowledge and ability doesn’t have anything to do with keeping a server up and running. In the past they’ve added additional maintenance routines to the software that they sell to customers, but sometimes customers don’t have anywhere to host this mission critical software. The ISV has hosted a few customers on their own servers, but they don’t want to keep buying new hardware in response to customer demands – what happens when a customer leaves? The ISV hasn’t made the move to cloud computing yet, they’re feeling out the different options available, but EC2 provides a level of flexibility that they won’t get through traditional IT solutions.

What are some of the problems?

There are, of course, problems with every way that you can possibly deploy SQL Server. When you deploy on premises, you have to purchase hardware, wait for it to arrive, set it up, and then you have to maintain it. When you host your database with another hosting provider, you’re limited by the amount of server space they have available. When you host in EC2 there are a different set of problems.

Cost

One of the perceived problems with deploying into the cloud is the cost. You’re renting something by the hour (like that “friend” your uncle brought to Thanksgiving last year); and renting something by the hour can get really expensive when you want that something to be up and running all the time.

How much would it cost to keep a reasonable SQL Server running 365 days a year? $10,862.40 assuming that you’re using a Double Extra Large Hi-Memory instance (that’s 4x 2.61GHz Xeons with 34.2 GB of memory). You can really kick out the jams and get yourself a Quadruple Extra Large Hi-Memory instance (8x 2.66 GHz Xeons and 68.4GB of memory) for twice the price. Yeah, that’s expensive. Of course, you can also reserve the instance and just buy it outright for a fraction of that cost, but who wants to do that?

What would a similar server cost you from HP? You can get an HP ProLiant DL380 G7 with a pair of fancy pants Intel E5640 CPUs (that’s right around the same speed as the EC2 instance), 32GB of RAM, and 1.2TB of in chassis storage for about $8,600. That price doesn’t include an OS or any other licensing costs. It also doesn’t include power, cooling, or even a spare server sitting around ready to pick up the load if your primary server fails.

Storage isn’t tremendously expensive in Amazon’s cloud – 10 cents per GB per month of provisioned storage. Over the course of a year, 1 terabyte of storage is only $102.40 a month, and that storage is fairly redundant within a single data center.

Despite the cost, Amazon’s cloud is still incredibly popular with many businesses. Why? Simple: it’s easy to start with a small amount of resources and rapidly scale in response to customer demands. If something doesn’t pan out, you just stop the servers and delete the storage and you don’t have to worry about it anymore. The costs go away as soon as you stop using the compute cycles.

Noisy Neighbor

I used to live in an apartment. It wasn’t a great apartment. In fact, it wasn’t even a nice apartment. I could hear everything that my neighbors did. After a while, I knew about their girlfriends, their love of Super Tecmo Bowl, and I learned that they liked to listen to loud music on week nights when they didn’t have to work the next day.

My noisy neighbors made it difficult for me to get things done. When you’re sharing a host, noisy neighbors can make it difficult to get things done, too. This sort of thing can happen on a server right now – when one program is aggressive writing to disk, read performance will suffer. If you’re sharing a physical server with another guest operating system, you have no idea what that other OS is doing. It could be sitting there doing nothing, it could be chugging along doing 3d rendering, or it could be a BitTorrent server. You have no idea what’s going on in the next room and how it could be affecting your performance.

Unfortunately, there’s not a lot that you can do about noisy neighbors apart from moving. In the real world, you move to a new apartment or buy a house. It takes time and money to get a new place, but it’s feasible. In EC2 it’s a lot easier to get a new place: you just upgrade your instance. You pay more by the hour, but it’s easy to get a bigger place.

Update: This is nowhere near the problem that it used to be. In addition, customers can designate an instance as “Dedicated Tenancy” when the instance is created to ensure that any noisy neighbors are your own VMs.

Crashes in general

To be fair, you can’t do much about crashes if you’re using someone else’s hosting service; when the data center loses power your server is going to go down. So, how do you protect yourself from crashes? Redundancy, of course.

Unfortunately, redundancy gets expensive fast. If you want a redundant mirror for that 10k a year server, you need a second 10k a year server. If you want multiple servers spread across the country, you’re paying for each one. Thankfully the cost of the servers includes the Windows license, but it’s still a large cost for many people to stomach.

SQL Server has a number of options to help you keep your servers up and running in case something horrible happens. Traditional database mirroring will work in EC2. You are free to implement whatever Rube Goldberg machine you want. It’s important to keep in mind that you have little control over your storage (there are no SAN snapshots or SAN mirroring that you can control) and there is no dedicated network that you can rely on. All of your connectivity goes over the open infrastructure within Amazon’s data centers. You’re fighting for bandwidth with Amazon, Netflix, Heroku, and everyone else in the cloud.

Limits: Number of Drives

How many drives can you connect to an installation of Windows locally? 26 or so, right? That’s how many letters there are in the alphabet. You can go beyond that using mount points to boost the amount of storage you have. For most purposes, though, you effectively can attach an unlimited number of drives to a Windows machine. Certainly more than you’ll need. SANs make it easy to create huge volumes, after all.

In EC2, that’s not quite the case. You can only attach 16 drives (Amazon call them volumes) to a server. It’s important to keep this limitation in mind when designing your strategy to move into Amazon’s cloud. If you need to store more than 16 terabytes of data, you’ll need to find a way to spread that data out across multiple EC2 instances. Considering some of what we’ll cover later on, this is going to be a really good idea.

Limits: Network Throughput

Did you know that there’s a pretty solid limit on network throughput in EC2? I didn’t either until I start performing some storage benchmarks (more on this soon). It turns out that there’s a single active gigabit connection into each host. That’s right: a gigabit connection per host. You might be sharing that server with a lot of other servers. If there are 4 other SQL Servers sharing the same gigabit connection, you could end up with some very poor performance.

Amazon recently added instances with 10 gigabit network cards, but they’re limited to Linux machines in the US-East region right now. In the future there might be instances with 10 gig connections that run Windows, but there’s nothing in EC2 right now to make the path between a Windows server and the storage any faster. To keep up to date on which instances have 10 gigabit ethernet, visit the list of EC2 Instance Types and search for “I/O Performance: Very High”.

In short, if your applications are chatty, you’re going to need to learn to break them apart into many smaller pieces to keep throughput high.

Limits: Instance Sizes

We just briefly touched on this in the last section – not every size of instance is available in every region. Typically, the US-East region gets the newest fanciest servers, at least as far as I can tell. Other regions slowly get new hardware over time. The name of the instance type (m1.small, m2.4xlarge) will stay the same over time, but it may not be possible to bring up instances of the same size in every region.

Performance: Instances

Just like any other computer, there are a finite number of resources available on any host. Likewise, the VMs that you spin up have a finite amount of resources available to them. Unlike your local data center, you have no control over which VMs share the same host.

Normally you could put different machines together and stagger their functionality around the clock so that guests that did bulk processing at night would be on the same host as machines that record orders during business hours. When everything is under your control, it’s very easy to make sure that any host isn’t overcommitted during a certain time window.

The only way that you can guarantee performance is to allocate larger instances that take up all of the physical resources of the host but cost more. By doing this you’re able to eliminate the effect of noisy neighbors. It’s important to understand how to get the best performance out of your resources. This is still applicable, even when you can configure a dedicated instance.

Performance: Storage

If you want to keep data around for any length of time, it’s important to persist it somewhere. Persisting your data means that you have to write it and, if you aren’t writing it to your users’ computers you have to write it to the database. Hiding beneath all of that fancy database software is something as simple as a pile of disks for storing data in rows and tables. Making sure the disks are working as fast as they can is critical to keeping a database running at peak performance.

A Note About How Gigabit Ethernet Sucks and Infiniband is Your Friend

Gigabit ethernet sucks. If disk I/O is your absolute bottleneck, you only have two options – you need to write to as many drives as possible. We’ll find out why that isn’t entirely possible in EC2. The other option is to make the connection between your database server and the storage as fast as possible. Since you can’t get super fast storage in EC2, you’ll have to find another way to make your database fast.

Just How Fast is Gigabit Ethernet?

The question isn’t rhetorical. It’s important to understand how fast gigabit ethernet is. Gigabit ethernet can really only transfer, at most, 120 megabytes per second. That’s the theoretical maximum assuming that everything along the line is operating perfectly and there is no latency or additional hops to make.

We’re not likely to see this in the real world. You’re not likely to see this in your own server room and you’re sure as hell not going to see it in EC2.

Why is your storage going to be slow? You have keep in mind that not everything involved may be able to push data at 120 MB/s; the drives could be slow, fragment, or you could be sharing space with someone else who is doing a lot of reading and writing. Other people could be reading from the same host as you or they could be streaming movies from the host to a customer in the suburbs.

In an attempt to find an answer, I conducted several tests on an EC2 instance over the course of several days to answer the question, “Just how fast is gigabit ethernet?”

Testing Methodology

For all of the tests, I used the same m2.4xlarge instance. I installed it from a standard Windows Server 2008R2 SP1 AMI that Amazon supply. Nothing was changed with the operating system. This Windows was about as vanilla as it gets.

For my tests with the drives, I used EBS volumes that were allocated in complete 1TB chunks. I made two rounds of tests. In both cases, the drives were formatted with a 64k sector size. The first round of tests was done with drives that were quick formatted. The second round of tests were performed with drives that were not quick formatted. FYI, formatting 10TB of drives in EC2 takes about 2 days. I used a Large Windows instance to format the drives at a cost of about $26 in CPU time plus another $57.20 in I/O cost just to format the drives.

The first set of tests I ran involved using CrystalDiskMark. I used the default configuration and ran it across the quick formatted drives. Drives were configured with 1 EBS volume, 2 EBS volumes, 3 EBS volumes, and 4 EBS volumes. I conducted a second set of CrystalDiskMark tests using fully formatted drives with 1, 2, 3, 4, and 10 EBS volumes in a single drive.

The second set of tests was more complex and involved using SQLIO to simulate sequential and random reads from SQL Server over a period of time. All tests were performed on a single EBS volume, two striped EBS volumes, four striped EBS volumes, and a pair of striped four EBS volumes.

EC2 Storage Test Results

CrystalDiskMark Results - Quick Format — CrystalDiskMark Results – Quick Format

Right away, we can see that performance quickly spikes for sequential reads and writes. While the results for read decrease, this isn’t a reflect of the storage itself so much as it is a reflection of trying to cram a lot of data through a tiny ethernet pipe. Overall, the quick formatted drives don’t show a lot of performance improvement no matter how many spindles we throw at the problem.

CrystalDiskMark Results - Full Format — CrystalDiskMark Results – Full Format

There are some things that we can conclude about the performance of disks in Amazon’s platform. First and foremost: format your drives. Don’t just use a quick format; use the slow format. The downside is that it takes time to format disks and time is money, especially in the cloud.By using a full format, disk performance improved by 30% for 4K writes with a queue depth of 32 and improved, on the whole, by around 15%. There were a few anomalies – specifically the 4k write performance spike with 4 EBS volumes and the sequential read performance spike for 2 and 4 EBS volumes. The only conclusion I can come to is that reads were cached and did not hit disk.

The SQLIO tests show a different story. Right away we can see that sequential I/O performance peaks with a maximum of two volumes, but random reads and writes keep scaling right up until the tests stop – the more spindles that were used, the more random I/O that we could perform. And here’s where things get interesting again. Looking at the graphs you can see that sequential I/O falls apart a little bit north of 1000 IOPS. That’s not a limit of the underlying disks, we’re just maxing out the ethernet connection. When there’s a finite pipe limiting how fast we can read, of course the number of IOPS is going to stop pretty quickly.

SQLIO - MB per second — SQLIO – MB per second

Looking at the throughput in terms of the amount of data (rather than the number of reads and writes) that we can move through EC2, it’s very easy to see where we hit the ceiling. With a theoretical maximum of 120 MB/s, it’s easy to see that sequential read and write max out when two volumes are use. It’s clear, though, that random reads and writes continue to scale as we add more drives. I suspect that if I had been able to add more spindles, the random write performance would have continued to scale up to 120 MB/s.

What does this mean for SQL Server in EC2?

You can get the random I/O performance that you want, but sequential read/write performance is not going to be good. Taking that a step further, here’s what you don’t want to do in EC2:

Data warehousing
Large ETL
Bulk data processing that requires sequential reads
Anything else that requires a lot of sequential read and write

On the flip side, what should you use SQL Server in EC2 for?

OLTP
Off site processing that isn’t time sensitive
Distributed write loads (sharding)

SQL Server hosted in EC2 is going to be very effective for highly random loads and especially for loads where data can be kept in memory (currently less than 68.4 GB).

A Sidebar about Ephemeral Storage

Don’t use it. It goes away and may never come back. Well, it goes away when your instance reboots.

Because it’s the drives that live in your server chassis. You can’t predict where your server is going to be when you turn it back on again, so you can’t predict if that data is going to stay with you or not. This is a best practice for virtualization, if you think about it. When you’re setting things up using VMware or Hyper-V you don’t want to attach long-term storage that’s in the server chassis, it makes your virtualization more brittle. Using local storage, if any thing happens (a motherboard fails, a drive fails, a CPU fan goes out, or memory fails) you’ve lost that host. You can’t bring that VM up on another server because the OS is still on hard drives in that broken server.

Local storage is great for temporary (ephemeral) things. Make sure that you take that into account when you’re designing your infrastructure. Fast, cheap, temporary storage is great for fast, cheap, temporary operations like TempDB or local scratch disks.

First Touch Penalty

Amazon mention in a lot of their documentation that there’s a penalty, and potentially a large one, for the first write to a block of storage. When you look at the numbers below you’ll realize that they weren’t kidding. Amazon further go on to claim that if you benchmark the storage (as I did) you’ll be horrified by the results (as I was) but that you shouldn’t worry because the first write penalty is only really obvious during benchmarking and will be amortized away for time in the real world.

Here’s where I have to call bullshit. If your workload consists almost entirely of writing new data, you’re going to see a first touch penalty until the drive is full. It won’t end. Writes will suck until your drive is full. Then you’ll add more drives and the suck will continue. This is compounded for relational databases where the currently accepted best practice is to ensure sequential writes by clustering on an arbitrarily increasing integer.

The only way to guarantee acceptable performance is to format your drives and write to them block by agonizing block until the storage is full formatted.

Staying Up in the Clouds – HA/DR Options

Clustering: Sad Trombone of the Cloud

You can’t cluster in EC2. Don’t try. There’s no shared storage. EBS volumes can only be attached to a single server at a time. You could share the drives, but you really shouldn’t do that. You could use some Windows based SAN system that mounts drives and pretends it’s a SAN, but without any guarantees of performance or longevity, why would you want to?

We’ll just leave this one as: You can’t cluster in EC2… with SQL Server 2008 R2

Changes in SQL Server 2012 make it possible to host a clustering using shared disks on SMB file shares. Whether or not that’s a good idea is completely up to you, but at least it’s a possibility.

Availability zones and regions

In case you didn’t know, Amazon’s magical cloud is divided up into regions. Each region is made up of multiple availability zones. They’re all right next door to each other, but they’re in separate buildings with separate internet access and separate power.

Availability zones exist to protect users from localized failure. Hopefully, if there’s a problem in a data center, it’s only going to be a problem in one data center (one availability zone). If that zone goes down, the others should be up and running. If you want to keep your server up and running through local failures, you need to make sure you keep a copy of your data in multiple availability zones. If you want to keep your business up and running in the event of a regional failure (the smoking crater scenario), you need to keep copies of your data in multiple regions.

Region – national level.

Availability zone – local level.

Got it? Good.

Mirroring SQL Server in EC2

One of the easier ways to keep things up and running in the event of a failure is to use SQL Server’s mirroring. Using synchronous mirroring with a witness is going to put you in a great place in terms of disaster recovery. It’s both possible and advisable to put the primary in one zone, the secondary in another, and the witness in a third zone. We can feasibly survive some pretty heavy duty hardship by configuring SQL Server mirroring in multiple availability zones. The applications talking to SQL Server need to be able to fail over between the mirrors but that should be pretty easy for today’s modern developers to handle, right guys?

SQL Server Replication in EC2

Replication is always an option and is probably a better option when you want some kind of geographic fail over. Instead of diving into contentious issues around designing a replication topology, I’m going to hit on the pros and cons of replication in the cloud.

We’ve already seen that there significant limits to network throughput in EC2. We’re limited to, at most, a gigabit of storage throughput. We’re also limited by our ability to write to disk in 64k chunks – only a little bit more than 1000 IOPS are possible at a 64k write size. These limitations become a problem for transactional replication. In an on premise set up, transaction log performance is frequently a problem when replication is involved. If you think that the performance of the transaction log won’t be an issue in the cloud, think again. The I/O characteristics of Elastic Block Storage guarantee that the performance of the transaction log will become a problem under heavy load.

Replication latency is another concern for DBAs in EC2. Not only do reads and writes occur slower, but that read/write slowness means that transactions will be committed to the subscriber further and further behind the publisher. Of course most applications aren’t writing constantly, but it’s important to consider what might happen if the subscriber isn’t able to commit the load that’s being written throughout the business day – will it be able to catch up after hours? If it can catch up today, will it be able to catch up tomorrow?

Crashes happen in EC2, entire regions have gone offline. Unfortunately, at any kind of scale something is bound to be broken right now. What do you do when that happens? What do you do when that happens to the distribution database? (You are running the distributor on a separate server, right?) What do you do when the distributor goes offline and never comes back up, drives corrupted irreparably?

I don’t have the answers, but that’s something your recovery model must take into account when you’re deploying into the cloud.

SQL Server Backups in the Cloud

You gotta keep that data safe, right? Taking backups and hoping for the best isn’t going to cut it once you’ve moved all of your data up into the cloud. You need a retention and recovery strategy.

EBS Data Protection

Amazon’s Elastic Block Storage (EBS) looks like some kind of SAN drives that you can attach to your server. When you look more closely at the disks themselves, they look like drives attached to some kind of storage device powered by Red Hat Enterprise Linux. I’m overusing the phrase “some kind” because you just don’t know what’s back there.

There are, however, some certainties around EBS. An EBS volume lives within a single availability zone. There are multiple drives involved and multiple block level copies of your data. There could be a failure of a back end drive and you’ll never know about it.

The downside of EBS is that there’s no global availability of your data. There is no SAN replication, no global single back up of your data. If you want that, you’ll need to look into using Amazon S3 to provide greater safety.

Be Safe with S3

Amazon Simple Storage Service is a very simple storage service indeed. It’s a block level data store that refers to big binary chunks of data by name. It’s like a filesystem that exists everywhere. Sort of. The important part of S3 is that your data can be persisted at a very paranoid level over a very long period of time. In their own words S3 is “Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.”

Why mention S3 at all? It’s not very fast and it’s not very easy to mount S3 storage to Windows. There are two things you can do with S3:

Store SQL Server backups.
Snapshot EBS volumes.

Storing SQL Server backups in S3 should be a no brainer for anyone. S3 storage is cheaper than EBS storage and it’s far more durable. Since S3 is global, it’s also possible to push data up to S3 from one region and pull it down in another. Your servers will live on.

S3 can also be used to store snapshots of EBS volumes. Don’t be fooled by the word “snapshot”. The first time you take a snapshot you will likely make a very bad joke about how this isn’t a snapshot, it’s an oil painting. Don’t do that.

The first EBS snapshot to S3 is going to be slow because there is a block level copy of the entire EBS volume happening in the background. All of your data is zipping off into the internet to be safely stored in a bucket somewhere. The upside is that only the blocks of data that get changed are going to get snapshotted. If you perform frequent snapshots then very little data will need to be sent to S3, snapshots will be faster, and recovery should be a breeze. Keep in mind, of course, that these are not crash safe SAN snapshots, so they shouldn’t be used for live databases. I’m sure you can think of cunning ways to work with this.

A Plan for Safe & Secure SQL Backups in EC2

What’s the safe way to manage backups in EC2? Carefully.

Were it up to me, here’s what you’d do:

Backup your data to an EBS volume.
Mount an S3 drive to your Server.
Copy the data to the S3 drive using xcopy with the /J flag

Of course, if it really were up to me, you’d have mirroring set up with a primary in one availability zone, a secondary in another, and the witness in a third. Oh, and you’d be using synchronous mirroring.

Scaling SQL Server in EC2: Scaling Up

This is, after all, why we’re here, right? You’re not just reading this article because you want to learn about database backup best practices and how many IOPS some magical storage device can perform. You want to know how to give your SQL Server some more zip in EC2. There’s a finite limit to how much you can scale up SQL Server in EC2. That limit is currently 8 cores and 68.4GB of RAM. Those 8 cores currently are Intel Xeon X5550s that clock in at 2.66 GHz, but that will have changed by the time you’re reading this. I wrote this post back in 2011, I updated it in 2015, and who knows what year it will be when you find it.

In EC2, there’s a limitation on how much we can scale up. Commodity hardware really does mean commodity hardware in this case. If you can’t scale up, the only thing left is to scale out.

Scaling SQL Server in EC2: Scaling Out

It’s incredibly easy to create more instances of SQL Server in EC2. Within a few clicks, you’re most of the way there. You can automate the process further using Amazon’s developer tools to script out additional configuration, attach local storage, and attach EBS storage. The developer tools make it a breeze to create and deploy your own custom images so you have pre-configured instances up and running. You can even use tools like Puppet or Chef to completely automate adding more servers to your environment.

Summing It Up

So, there you have it – it’s possible to get great performance for certain types of SQL Server operations in EC2. If you’re looking for a cheap way to set up a data warehouse, you should look elsewhere. You might be able to slap something together in EC2 that performs well enough, but there will be a lot of man power involved in making that solution work. If you’re looking for a quick way to scale your application up and out without worrying about buying hardware and making sure it’s being used appropriately, then you’ve come to the right place.

Want to read it later? Download the PDF.

Learn more about our SQL Server in Amazon cloud resources page.

Does This Transaction Log Make My Database Look Fat?

Last Updated July 20, 2013

SQL Server

Face it; your database doesn’t get much exercise. It spends its day sitting on the same old server in the same old rack eating garbage all day long. Sure once in a while it may go for a spin, but SSDs are chipping away at that. They may hop around in a VM farm, but for the most part they sit there lazily wasting away the day. What goes into the log? What settings affect the size of the log and what can you do to keep it lean? Join Tim in this 30-minute video to find out:

https://www.youtube.com/watch?v=X_cRrtkglB8

People who liked this also liked our library of free SQL Server videos.

Automated Tiered Storage for Databases

Last Updated February 13, 2017

Storage

Database administrators want just one kind of storage: fast.

Companies want just one kind of storage: cheap.

Since fast storage isn’t cheap, and we can’t put everything on slow storage either, we end up buying different tiers of storage. Let’s start by pretending that we’re buying a new HP DL370 G6 to act as a data warehouse server. It can hold up to 14 3.5″ hard drives – so what drives should we buy if we want to use tiers?

All the better to store you with — HP DL370 G6: Our local tiered storage candidate

Tier 1: Mirrored pair of Solid State Drives – SSDs are still pretty expensive, so we only buy two of these. We’re going to store mission-critical data on our SQL Server, so we buy two of these drives and mirror them. That way, if either drive fails (and SSDs certainly do fail), our server won’t go down. However, SSDs have limited capacity, so if we buy two 256GB drives in a mirrored pair, we’re only going to get 256GB of usable very-fast capacity. SSDs are blazin’ fast at random reads and writes because they have no moving parts – they can jump to any portion of the drive at any time. In our sample server, we’re going to use these drives for TempDB because we’ve determined that our application hammers TempDB hard.

Tier B: Four 15K drives in RAID 10 – 15k refers to how fast the drives spin. At 15,000 revolutions per minute, the platters are flying, and lots of data is passing under the hard drive heads. These drives can read or write a lot of data very fast when they’re dealing with sequential data, because the drive heads don’t have to move around. They’re not as good with random reads and writes as SSDs, which have no moving parts. Our 15K drives are larger than SSDs – we can afford to buy the 450GB models, and with four of those in a RAID 10, that gives us about 900GB of unformatted capacity. (You can learn more about calculating RAID sizes in the Wikipedia RAID article.) In our sample server, we’ll use these drives for our transaction logs.

Tier III: Eight cheap 7200RPM SATA drives in RAID 5 – These only spin at 7,200 RPM, roughly half the speed of the above drives, which means data is passing under the hard drive heads more slowly. Just as slower cars are cheaper than faster cars, slower hard drives are cheaper too. We’re going to buy the 2TB models, and with 8 of those in a RAID 5, that gives us 14TB of unformatted capacity. RAID 5 can be much slower for writes, but it can be fast for large reads. Our database stores a lot of wide data because we need to store XML and files in the database, but we don’t update that data very often. It’s mostly kept for archive purposes. We really do need that much space for our user databases.

This is manual tiering with local storage: carefully determining your application’s needs and then crafting a storage solution to fit. It requires picking the right server, the right drives, the right RAID configuration, and then finally, configuring the application (SQL in this case) to store the right data in the right places.

This is a pain in the ass.

Very few people do this because there’s so much labor involved if you want to do it right. If you just make some guesses about loads, buy any old drives, and slap ’em into any old RAID configuration, then you’re locked into bad performance. Six months down the road when you realize the application isn’t really hammering TempDB, and that the real speed bottleneck is one particular set of tables in a user database, you’re going to have a rough time reconfiguring all this. Worst case scenario, we’re talking about backing up all of the databases, reformatting everything, and restoring it all from backup. The bigger your databases get, the more painful this becomes.

In-Drive Storage Tiering

Whenever there’s a pain, vendors step in and offer to ~~take your money~~ help. One of the early successful hybrids was Seagate’s Momentus XT – a 2.5″ laptop drive that partnered 4GB of SSD memory with a 500GB hard drive in one package. The drive learned which data you accessed the most frequently, then cached that data on the SSD. The hybrid drive sped up routine actions like reboots and application launches.

If it was shipping, I'd show a picture of it in my hand. — OCZ RevoDrive Hybrid

The newest contender in the market, the OCZ RevoDrive Hybrid, is a PCI Express drive like Fusion-IO, but it combines a 100GB SSD with a 1TB 2.5″ laptop drive. These aren’t shipping yet, but the press release holds an interesting nugget:

“Advanced caching algorithms learn user behavior and adapt storage policies to ensure optimal performance for each individual user, maximizing productivity for the most demanded programs and applications.”

The word caching implies that like the Momentus XT, the RevoDrive Hybrid is writing everything to the magnetic drive, but just caching frequently accessed data on the SSD for faster reads. However, further down in the release, they brag:

“In addition, the drive not only eliminates the SATA bottleneck unleashing ground-breaking bandwidth up to 910MB/s, but also features up to 120,000 IOPS (4K random write) for high transactional workloads delivering true SSD-level performance.”

The phrase random write implies that writes will hit the SSD first, and then later be pushed down to the hard drive. You can’t get 120,000 IOPs of 4K random writes on a 2.5″ magnetic drive. This also means that if you tried to continuously do 4k random writes, you might be able to fill the available space on the 100GB SSD, and then performance would slow down as the RevoDrive was forced to migrate data down to the hard drive.

I’m not suggesting you use either of these solutions on a production SQL Server, but I’m showing them to introduce you to the concept of tiered storage. In a single drive, some of your data might reside on fast solid state memory, and the rest would live on slower, more capacious magnetic Frisbees.

SQL Server Thinks Drives is Drives

The good news is that your database server doesn’t need to be configured for anything in order to use this storage. In our data warehouse example, we could use several of these RevoDrive Hybrids to store our sales table, which has hundreds of millions of rows of history. It documents our company’s history going back for the last ten years, and our executives have told us we’re not allowed to archive any data. They want to be able to run reports to compare this Christmas season’s sales against the last ten Christmases.

Before automated storage tiering, this type of table was painful for DBAs. How do we decide which data to put on which drives? If we only keep the most recent year of data on fast storage and the rest on slow storage, then our users are going to be screaming during every holiday season. They’ll all try to run reports comparing this year against the last ten years, and the reports will run dead slow. We scream, “WE CAN’T STORE ALL THAT DATA ON FAST DRIVES!” but actually…we can.

With automated storage tiering, the storage gear constantly watches which parts of the drive are being accessed, and moves them to faster/pricier storage. It also watches which parts aren’t being frequently accessed, and moves them to slower/cheaper storage. Depending on your environment, this tiering management might be done in different places:

Inside each drive, like in the Momentus XT and RevoDrive Hybrid
Inside the server’s storage controllers, like in the FusionIO ioCache or Adaptec maxCache
Inside the SAN storage processors, like Dell Compellent’s Automated Tiered Storage or EMC’s Automated Storage Tiering
Inside your virtualization hypervisor, like VMware vSphere 5’s new Storage DRS

DBAs need to know about this technology because it can help you avoid complicated setups like table partitioning or guessing what needs a fast TempDB versus what needs fast transaction logs. When you know how tiered storage works and when you know how to test it, you’ll be able to adapt to changing performance needs faster.

How Do You Mask Data for Secure Testing?

Last Updated September 1, 2011

Data masking, data scrambling, and just plain old obfuscation: these are ways to get developers access to production-quality data for testing purposes without actually giving them real production data. It’s much harder than it looks, though.

The Easy Part: Obfuscating Data

Developers love working with production data. Today’s privacy-concerned companies aren’t quite so keen on giving developers complete access to that data, though, even from a restored backup of production. They’re worried about developers dumping out a list of customers, credit card numbers, birthdays, addresses, you name it, and then selling that data to a competitor.

Companies want to quickly obfuscate production data en route to production. They want to restore production databases in development, run some process, and then let developers test their apps without seeing real production data.

They don’t want to obfuscate every field – for example, financial transactions might want to preserve similar dollar data in order to make it easier to sanity-check reports. We can’t have tax rates being randomized, for example, because we need those to behave predictably when looking at invoices.

Hard Part #1: Maintaining the Storage Profile with Scrambled Data

The easy way to obfuscate data is to encrypt it. However, encrypting data tends to produce a totally different size of the data. To see an example in action, visit this encryption demo page. Click Random to generate a key, type something into the Plain Text, and click Encrypt:

Suddenly, the data’s much longer. That may not be a problem for varchar fields, but it’s a hell of a problem for ints, dates, credit card numbers, etc:

The wider the data becomes, the less fits per page in SQL Server. Encrypting client names suddenly makes each record wider, and that changes how queries perform. Ideally, the obscured data needs to be the same size as the original data.

Hard Part #2: Maintaining Statistics Distribution With Security

In a typical American phone book, there’s a lot of people with the last name of Smith. If we look at a typical histogram of that last name data, some last names are going to have a lot of records, and others won’t have many at all.

SQL Server builds statistics that have histograms showing distribution of data in each column, and then it uses those statistics to build execution plans. When we test SQL Server queries in development, we want to get similar variations in query plans.

It’s easy to obfuscate data by simply randomizing it – if we have a date field, just use a random number generator – but that won’t have the same distribution as our source data.

Ideally, the obscured data needs to have a similar distribution as the original data. If I have a People table with 1,000 records, all of which have a last name of Smith, then my obscured data should all also have the same last name. However, if my People table has 500 people named Smith and then 500 other people with unique last names, my obscured data needs to have 501 unique last names as well, one of which will have 500 records in the table.

(Note to security-savvy readers like K. Brian Kelley: yes, if we keep the same data length and distribution, it’s fairly easy to reverse-engineer data like names. This is the line we have to walk between real encryption and performance tuning production.)

Hard Part #3: Maintaining Referential Integrity While Masking Data

Sometimes the private data is part of the primary key of the table. Putting database design ideas aside, the reality is that some folks have things like SocialSecurityNumber as part of the PK. Even worse, some folks don’t put foreign key relationships in the database, so we end up with two SocialSecurityNumber fields on two different tables that need to join together.

And oh yeah – the fields may not have the same names. Think dbo.Person.SocialSecurityNumber joining to dbo.HealthPlanSubscriber.SubscriberSocialSecurityNumber. We can’t rely on foreign keys, either, because many folks don’t use referential integrity in the database.

Multiple databases might be involved, too. Sometimes customers have client data in one database, sales data in another, and configuration data in yet another, and they all need to link together.

Ideally, the solution would figure out joins where it could and keep the data the same in both tables, plus also let users specify fields that are joined together even if foreign keys aren’t specified in the database. This configuration data should be built once and saved so that users don’t have to repeat themselves every time they refresh production.

Hard Part #4: Fast Data Masking Tools

These situations often involve large amounts of data – say, a terabyte or more – and people want to refresh development with an obfuscated copy of production overnight. This means it usually isn’t practical to export all the data to some kind of app server (SSIS included) and then push it all back in. It also isn’t practical to update the same table repeatedly, once for each column that needs to be obfuscated.

Users will want status updates to know roughly how much of the database has been obfuscated, know how much is left to do, and know that if it breaks mid-process, they can fix a few things and then pick it back up where it left off.

Show Us Your Data Masking Best Practices

Have you run into a challenge like this out in the wild, and if so, how have you solved it? What data masking tools have you used, and have you found a good solution?

Indexing for Deletes

Last Updated February 13, 2017

Jeremiah Peschka

Indexing, SQL Server

If you’re only indexing to make reads faster, you need to think again. Indexes can make every operation in the database faster, even deletes.

The Problem: Deletes are Very Slow

I was lazily researching new development techniques one day when I received an email from a client asking why deletes could be slow. I rattled off a few quick possibilities and promised that I’d look into it as soon as I was able to. Due to the vagaries of travel, it took me more time than I’d expected to dig into the problem, but I found something interesting that I should have come to mind right from the start.

This database contains some hierarchical data. My initial thought was that there was a cascading delete taking place in the hierarchy. After some initial checks into cascading deletes, fragmentation, statistics, and triggers, I ran an actual delete on a test system and found something very interesting – almost all of the time spent deleting the row was spent performing a clustered index scan on a different table.

What Happens During a Delete?

When you try to delete a row, a few things happen. SQL Server says “OK, let’s make sure that we can actually delete this row, what else depends on it?” SQL Server will check for dependent rows by examining all foreign keys. It will then check any related tables for data. If there is an index, SQL Server will use that index to check for related data. If there isn’t an index, though, SQL Server will have to scan the table for data.

Deletes and Table Scans

Don’t believe me? Try this out yourself.

Make a new database. Copy data in from the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables in AdventureWorks. I use the Import Data wizard to quickly copy data from one database to another.

ALTER TABLE Sales.SalesOrderHeader
ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail
ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader
FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)
ON DELETE CASCADE;

ALTER TABLE Sales.SalesOrderHeader

ADD CONSTRAINT PK_SalesOrderHeader PRIMARY KEY (SalesOrderID);

ALTER TABLE Sales.SalesOrderDetail

ADD CONSTRAINT PK_SalesOrderDetail PRIMARY KEY (SalesOrderDetailID);

ALTER TABLE Sales.SalesOrderDetail

ADD CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader

FOREIGN KEY (SalesOrderID) REFERENCES Sales.SalesOrderHeader(SalesOrderID)

ON DELETE CASCADE;

With these three statements in place, we’re able to create a situation where SQL Server has to perform a full table scan just to delete a single row. Make sure you’ve told SQL Server to include the actual execution plan and run this:

DELETE FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 51721;

1 2	DELETE FROM Sales.SalesOrderHeader WHERE SalesOrderID = 51721;

Once that query runs, the execution plan is going to look a bit like the execution plan below. If you add it up, 99% of the work comes from finding the rows to delete in the SalesOrderDetail table and then actually deleting them.

A Delete — Who knew that deleting a row was so much work?

Making Deletes Faster

How would we go about making deletes like this faster? By adding an index, of course. Astute readers will have noticed the missing index information in that execution plan I took a screenshot of. In this case, the missing index looks something like this:

CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID
ON Sales.SalesOrderDetail(SalesOrderID);

1 2	CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_SalesOrderID ON Sales.SalesOrderDetail(SalesOrderID);

Before adding the index, the query had a cost of 2.35678. After adding the index, the delete has a cost of 0.0373635. To put it another way: adding one index made the delete operation 63 times faster. When you have a busy environment, even tiny changes like this one can make it faster to find and delete records in the database.

What’s All of This Mean?

When you’re looking into database performance problems, remember that you aren’t always reading just to return data to the user, sometimes you need to find data in order to delete it. Even when we’re trying to get rid of data, it can be helpful to have an index to make deletes go faster.

Update: Also Use Fast Ordered Deletes – the Microsoft SQL Customer Advisory Team wrote about the Fast Ordered Delete technique of using a view or a CTE.

The Dedicated Admin Connection: Why You Want It, When You Need It, and How To Tell Who’s Using It

Last Updated August 27, 2017

To learn more, watch our DAC training video.

SQL Server

There’s a DAC in SQL Server that you should know and love. Sadly, you probably either haven’t met or you’ve forgotten to turn it on.

DAC? What’s That?

First, a little disambiguation. The acronym ‘DAC’ is too popular.

For the context of this article, our DAC is the ‘Dedicated Admin Connection’.

SQL Server implements a totally unrelated DAC as well: a data tier application or DAC package. We’re not talking about that here.

What Can DAC Do For You?

Have a SQL Server that’s in crisis? The DAC can help you stage an intervention.

The Dedicated Admin Connection was built to help you connect and run basic troubleshooting queries in cases of serious performance problems. This is your opening to grab a ‘Get Out of Jail’ card, but since you don’t use this on a regular basis it’s easy to forget how to use it. It’s also easy to forget to enable access to the DAC remotely.

How the Dedicated Admin Connection Works

The DAC uses a special reserved scheduler which has one thread for processing requests. This essentially means that SQL Server is keeping a backdoor open to processor resources available just for you.

Don’t be too tempted to abuse this privilege. That one thread is just one thread— there’s no parallelism for queries running on the DAC. Two percent of you will be tempted to use this for your maintenance jobs on busy systems. Seriously, just don’t go there. The DAC was not designed for high performance.

How to Enable the DAC for Remote Connections and Clusters

By default the DAC is only enabled for accounts logged onto the local machine. For production servers, that means it only works for remote desktop sessions to non-clustered SQL Server instances. If your instance is clustered or if you’re connecting over TCP/IP, you’re out of luck unless you change one setting. That setting is ‘Remote Admin Connections’.

Should you enable remote connections? I agree with Books Online, which points out:

If SQL Server is unresponsive and the DAC listener is not enabled, you might have to restart SQL Server to connect with the DAC.

For clusters, sign me up! I’m in favor of enabling it for other instances as well. In times of trouble, you want to minimize the amount of time you spend using remote desktop on a server having problems. You want to use that only to gather information you can’t get another way.

Enabling the DAC for remote connections is easy as pie. It is controlled by the ‘Remote Admin Connections’ setting. To enable it, you simply run this bit of code:

 EXEC sp_configure 'remote admin connections', 1;
 GO
 RECONFIGURE
 GO

EXEC sp_configure 'remote admin connections', 1;

RECONFIGURE

There’s always a catch. You may need to get firewall ports opened as well, depending on your environment and where you intend to connect from. This will probably be port 1434, but that will vary depending on your configuration. (Books Online has got your back: read more in the “DAC Port” section here.)

Only One Sysadmin Can Ride This Horse At a Time

This isn’t a party line, only one sysadmin at a time can use the DAC. Also, you should only run simple, quick queries using the DAC.

In other words, only connect to the DAC when you really need to. When you’re done ~~wipe the seat~~ make sure to disconnect.

How to Connect to the DAC

You can connect to the DAC using the command line. Use the “-A” option with SQLCMD.exe.

I find it more convenient to connect in Management Studio itself. You do this by prefixing the instance name you are connecting to with “Admin:”.

One FYI: Object Explorer can’t connect to the DAC. If you open SSMS and have Object Explorer connecting by default, the first connection prompt you see with be for that. If you try to tell that to connect to the DAC, it’ll fail. That’s a good thing, we wouldn’t want the power to go to Object Explorer’s head.

Who’s Been Sleeping in My DAC? How to Tell Who’s using the Dedicated Admin Connection.

If you try to connect to the DAC when someone is already connected, you’ll see a connection error. It probably won’t tell you straight out that someone’s connected to the DAC already, but if you check the SQL Server Error log you should see the message:

Could not connect because the maximum number of ‘1’ dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process.

So if you can’t get the DAC, how can you tell who’s using it?

Here’s a hint— the DAC uses a Dedicated TCP Endpoint. It’s even endpoint #1. The DAC is ranked #1 by endpoints everywhere!

This makes finding someone using the DAC nice and simple, because endpoint_id is listed in sys.dm_exec_sessions. So you’ll just need a little something like this:

SELECT
CASE
WHEN ses.session_id= @@SPID THEN 'It''s me! '
ELSE '' END 
+ coalesce(ses.login_name,'???') as WhosGotTheDAC,
ses.session_id,
ses.login_time,
ses.status,
ses.original_login_name
from sys.endpoints as en
join sys.dm_exec_sessions ses on
en.endpoint_id=ses.endpoint_id
where en.name='Dedicated Admin Connection'

SELECT

CASE

WHEN ses.session_id= @@SPID THEN 'It''s me! '

ELSE '' END

+ coalesce(ses.login_name,'???') as WhosGotTheDAC,

ses.session_id,

ses.login_time,

ses.status,

ses.original_login_name

from sys.endpoints as en

join sys.dm_exec_sessions ses on

en.endpoint_id=ses.endpoint_id

where en.name='Dedicated Admin Connection'

Your Mission: Get Back with your DAC Today

Take a few minutes today to connect with the DAC in a test environment. Check whether you have remote admin connections enabled in your environments, and talk about what the impact of that might be.

The DAC can save your bacon— make sure you have the ability, and use the power wisely.

You’ll be Reunited with an Old Friend in 2012

Last Updated August 26, 2011