Monthly Archives: November 2008

Winter in Whitehall



Winter in Whitehall

Whitehall in a nutshell right here: the dog’s wearing two coats because it’s so cold she won’t go outside otherwise, there’s ice on the water already in late November, all the boats are gone, I’m wearing gloves, and I’m drinking a cappuccino from McDonald’s because it’s my only coffee option here. There’s a small coffee shop in Montague, but it doesn’t open until long after I stop wanting coffee.

The grass is still visible, but that should change by the end of the day – we’re supposed to get 6″ of snow today.

We’re having a great time, though!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

22″ USB monitor for $150

Breaking news – Newegg’s running a pretty good Black Friday sale, but one particular item really stands out.

The Asus VW223B 22″ LCD monitor is $150.  That’s a decent price for a 22″ monitor, but the DisplayLink USB connectivity makes it a truly killer deal.  You can connect this monitor to a PC using not DVI, not HDMI, but plain old USB.  It uses a software driver called DisplayLink to make the magic happen.  You can connect up to 6 monitors to your laptop or Mac Mini, even when those devices don’t support multiple monitors.

Here’s a review of the Asus VW223B talking about the USB connectivity and DisplayLink.

Note that this is NOT an ordinary flat panel monitor with a USB hub – this is a monitor where the actual video signals go over the USB connection.  No VGA cables, no DVI cables, etc.

Obviously, this isn’t a high-end gaming video adapter by any means, but it makes a great rig just to show your TweetDeck, email, browser, etc.  I bought one for my Macbook Pro just to show TweetDeck – it’s a great piece of software, but it’s a screen hog.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

SQL Server and Cloud Links for the Week

Bringing this one out a little early due to the holiday weekend.  Things I found interesting this week:

SQL Server Links

Summer Camp for DBAs is Over – Tjay Belt explains how Twitter made the PASS Summit conference much more enjoyable for him.

SSIS Twitter Suite – Andy Leonard and Jessica Moss wrote this toolkit to read and write to Twitter from SQL Server Integration Services.

SQL Server Magazine podcast – Sheila Molnar interviewed Questies Billy Bosworth, Kevin Kline, Douglas Chrystall and yours truly for about fifteen minutes.  We talked about some of the Microsoft announcements at PASS and industry trends.

SQL Server 2008 upgrade guide is out – I get a lot of “what’s-the-easiest-way-to-upgrade-X” questions and this guide has those answers.

Microsoft Visual Studio Team System 2008 Database Edition GDR RTM – Ever wonder why they call it “Data Dude” for short?  Every time I look at this product’s full name, all I can think of is the Pizza Crepe Taco Pancake Chili Bag from Taco Town:

Cloud & Virtualization Links

The Death of the DBA Part 2 – Jason Massie analyzes the cloud database announcements that keep comin’ and suggests that DBA jobs will be down 25% in 10 years.  I don’t think that’s unreasonable, but it’s predicated on a lack of SQL Server growth.  I think SDS as a product will conquer deployments that wouldn’t normally see SQL Server deployed, so I think there could be a net growth of jobs.  Just guessing though.

VMware ESX beats Hyper-V performance – In independent testing, ESX smoked Hyper-V by 125% on SQL Server performance testing.  I don’t hold this as the final answer – or even a good answer – but more like the opening salvo in a round of overdue conversations.

The Junk Drawer

FAIL and WIN stamp sets – funny gift set for people who deal with paperwork and lolcats.

Smart Car powered by a Hayabusa – take a super-light tiny car and stuff a massively powerful motorcycle engine in it.  Toss in a death wish, mix vigorously with power slides and you get this video.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Steel Cage Blogmatch Part Two: The SQL

Jason Massie and I had a steel cage blogmatch about whether to configure a SAN with shared or dedicated drives, and we both passed out for exhaustion.

This debate is starting up again but with a slightly different spin: virtualization.  In an environment with Microsoft Hyper-V or VMware ESX, more than one SQL Server may have its virtual drives on the same physical hard drives.

If we have two SQL Servers, are we better off keeping their virtual hard drives on separate physical hard drives, or using one larger pool of hard drives shared between the two?  Microsoft’s technical article “Running SQL Server 2008 in a Hyper-V Environment: Best Practices and Performance Considerations” attempts to answer that burning question with Figure 16 on page 24:

Shared Disks vs Dedicated Disks

Shared Disks vs Dedicated Disks

This graph shows two things:

  1. Nobody spell checked the graph title, and
  2. Dedicated drives were faster.

However, they’re only 3.5% faster on average.  Management of the shared disk approach is much more than 3.5% easier, so I’d have to vote for the shared drive approach, much to Jason Massie’s glee.

Plus, keep in mind that your servers probably won’t all be under full load at all times.  During periods where only one server is under heavy load, that server should achieve higher performance throughput since they’ll have more idle spindles available to them.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Appearing on the Mind of Root show

On Wednesday night at 8:30 PM CST, I’ll be appearing on the live Mind of Root show with Steve Murawski:

We will have Brent Ozar, SQL Server Expert for Quest Software, joining us to talk about becoming a DBA (Database Administrator), a sysadmin’s guide to working with DBA’s, and tips for sysadmins who have become “Accidental DBA’s”.  We’ll also have a discussion about DBA’s and scripting, and some news from the PASS (The Professional Assoication for SQL Server) Summit.

This should be fun: I did network administration and programming before I focused on SQL Server, and I tend to pimp SQL Server database administration as a great career path.  We’ll see if I can sell the audience on ditching their Wintel admin tasks and learnin’ to love the T-SQL.

I’ll have my Twitter stream open during the live show, so if you want to ask me anything live or just make fun of my radio-announcer-voice (“Suuuunday Sunday Sunday!”) then give me a tweet.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

PASS Interview with Tom Casey, BI Thoughts

Microsoft’s keynotes and sessions catered to the business intelligence sector of the audience.  In Tom Casey’s keynote on Thursday, one slide said:

“You are on your way to becoming a BI expert.”

I know what you’re thinking, dear reader.  If you’re a DBA, whether it’s a production DBA or a development DBA, you’re thinking, “That’s crazy.” And if I hadn’t been at PASS, I would have agreed with you.  Take me, for example – I’m an engine guy. I don’t know jack about SSIS, SSAS or SSRS, and all I really care about is making the engine run faster and find ways to manage it easier.

But if someone needs to implement a large BI solution with a big data warehouse, my knowledge will help get the job done because I’m using exactly the same tools that the largest data warehouses will be running on.

The Background on DATAllegro

Microsoft recently acquired a company called DATAllegro that offered a scale-out database solution. If you needed to grow your SQL Server, you could mash together several cheap rack-mount servers, and those servers would cooperate to run the database together. When you executed a query, the work was spread out across all of the servers, and the results were returned faster than a single big server could do it.

DATAllegro’s stuff ran on Linux and open source databases, but they built it in a way that the OS and the database didn’t really matter: with coding work, a different database platform could be substituted in. That’s exactly what Microsoft has done since acquiring DATAllegro: they ripped out the open source under the covers, stuffed in Microsoft products, called it Project Madison and they’re relaunching it in the first half of 2010.

Microsoft demoed this at PASS by using a multi-node system, running a big query on it, and showing how all of the nodes crunched away to deliver the query results.  All of the nodes are running SQL Server.  I don’t know the specifics of how it’s managed – like, do you connect with SQL Server Management Studio, do you manage the nodes individually, etc – and from the sounds of it, those things are still in flux.  But forget the specifics for a second and think about the general idea.

Dr. David DeWitt’s keynote on Friday talked about the challenges of building a parallel query processor: how do you partition the rows, how do you handle partition skew, how do you make sure the nodes get roughly the same amount of work assigned, etc.  It was mesmerizing to me as an engine DBA because really, at the heart of it, I’m taking the same engine concepts I’ve been using for years and extending the size of the engine.

Before:

  • A query was broken up into multiple threads
  • Each thread executed on a different processor
  • Each thread may have been hitting different disks on the SAN

With Project Madison:

  • A query is broken up into multiple threads (by the Control Node)
  • Each thread executes on a different node
  • Each thread will definitely hit different disks, probably local to the node

It’s really not all that different.  The jump from data warehousing to Project Madison isn’t large at all, and the jump from small OLTP databases to Project Madison is about the same as from OLTP to data warehousing.  It’s big, but it’s doable as long as you have mentors or senior DBAs around.

So I was listening to DeWitt’s presentation saying to myself, “I can do this.  I bet I could build one of these in my lab the day it comes out.  This won’t be rocket science.  How fast can I get this, and how much will it cost?”

Interview with Tom Casey, SQL Server GM

I had the chance to talk to Tom Casey and one of my questions was about how this will be licensed.  In my mind, I was thinking about how easy it would be to sell this to a customer (not that I do sales) if they’re already on Enterprise Edition.  A data warehouse is probably already using partitioning, and if it’s easy enough to scale out, I could see moving over to Madison fairly easily as long as the licensing isn’t too prohibitive.  Unfortunately, it’s too early to tell about licensing or CTP preview dates.

Microsoft’s vision for the cloud is to offer the same services in the cloud that they offer at a customer’s site.  Microsoft SQL Data Services is the ground layer that they’ll be building on, and gradually offering more and more capabilities over time.  Same pricing/licensing questions pop up here too: it’s just too early to tell.

I asked about the jaw-dropping visual graphs that were demoed in PASS keynotes, especially the moving-timeline growing-circles showstopper.  Tom said Microsoft is looking more at visual communication of data, and that they’re doing some animation work to convey trends.  I asked about Silverlight, and he said yes, that’s a possibility too, but it’s too early to tell.

Security, the thorn in the side of cloud pushers everywhere, seems to have an easier, more basic answer.  Tom’s guidance was to think of the application as the security container: SQL Server already has the ability to do pretty robust security at the instance level.  You can put multiple databases on the same server, tighten down security, and each user will only be able to hit the databases that you want them to hit.  Sure, we end up with a bunch of SA’s because people write crappy applications that demand SA, but in theory, there won’t be such a thing as SA in the cloud.  You will be locked down at the application level, because the service is designed from the ground up to be multi-tenant.

This ties into the Fabric concept and DAC packs: my guess would be that a DAC pack has self-contained security, and you won’t be able to deploy a DAC pack that needs server admin requirements.  Otherwise, you could deploy a malicious DAC pack that got SA control on the server, and that’d be a nightmare.  If DAC packs don’t get that level of security, then a DAC pack could be deployed either to a local instance or to the cloud (once the cloud services catch up to local capabilities).

Or to a Project Madison farm, depending on licensing.  I’m probably crazy to ask for this, but I would love to be able to manage my single instances, clusters, Project Madison farms and cloud-based applications all with a single tool and a single concept.  Especially now that…

I’m Going for the Master Certification

This is still up in the air, but it looks like I’ll be making a run at the Microsoft Certified Master program.  The folks at Quest are encouraging me to give it a shot, and I’d be an idiot to turn down an opportunity like that.

I think now is the perfect time to do it: databases are about to make a few rapid changes.  The cloud is coming, Project Madison is coming, and the Fabric is coming, and if you get in now and nail down your ground level of knowledge, you’ll be able to get the inside track from Microsoft on training with these advanced technologies.  Call me crazy, call me greedy, but this seems like a great chance, and I’ll take it.

That also means I’ll be heads-down in Microsoft exams for the next couple of months.  I haven’t taken a Microsoft test since 1999, and the Master prerequisite includes a whole slew of certs!  Wish me luck.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Wanna go to #SQLPass? Send this to your manager.

Dear Mr. or Mrs. Manager,

Someone’s sending you this because they want to go to the PASS Summit in Seattle.

I’m Brent, a guy who’s active in the community and I’ve been to a couple of summits, and they want me to explain why it’s worth the money.

What’s the PASS Summit and How Much Does It Cost?

It’s a weeklong conference in Seattle run by the Professional Association for SQL Server. It costs around $2,500 plus airfare.

Your employee usually won’t have to pay for breakfast, lunch, or dinner. PASS provides breakfast and lunch during the day, and vendors like Quest host dinner events each night. Since your employee is savvy enough to find my blog entry, they’ll know where the parties are, because I post ‘em on my blog in an effort to save them money and get them out of their hotel rooms at night.

Will They Be Reachable if Something Breaks?

Your employee will be surrounded by thousands of SQL Server professionals. You might HOPE that something breaks this week, because they’ll get the best SQL Server support possible at any time of the year! They’ll be able to turn to their left and right and get amazing advice.

How Does The Company Benefit?

Make a list of the toughest SQL Server problems your company is facing – whether it’s how to design a solution, how to make a query faster, how to implement a new architecture, you name it. Think in terms of things you’d love to hire expensive consultants to do, but you don’t have the money.

Make your employee document each of the problems in writing, and put each problem on a separate page. If applicable, show schema designs, diagrams, or queries. Print out several copies of each problem.

Your employee can take these issues to some of the brightest minds in the SQL Server industry like:

Microsoft has an area in the exhibit hall dedicated just to customer issues. Anyone can come in, sit down, and start talking with Microsoft developers, support engineers and architects about the issues they’re having or things they’d like to implement. Bringing documentation of these problems helps make it easier to get detailed answers: the DBA can scribble down thoughts about the problem while the discussion is going on, and they’ll be more likely to remember every aspect of the issue.

Brilliant SQL Server minds come to the PASS Summit to give presentations. Tjay Belt said it best when he said, “I could walk into any SQL Server person’s office in the world, point at books on their bookshelf, and say that I’ve talked to the authors in person at the PASS Summit.” Some of them consult for hundreds of dollars per hour, and your employee can get face time with them for free at the PASS Summit.

Vendors like Quest Software have a big presence at the show, and it’s not just sales and marketing. Vendors send their best technical minds to network with customers, other vendors and with Microsoft. I personally look forward to the PASS Summit because I get to spend some face time with my own coworkers that I don’t get to see too often! While it’s not the right place to bring support questions, it’s a great time to ask architecture or implementation questions.

If you really want to take it to the next level, have your employee set up a lab environment on their laptop with SQL Server and a copy of the databases they need to work with. I’ve seen users pop open their laptop, hand it to a SQL Server expert and say, “Here’s what my database looks like, and I need to find a way to do _______. Have you seen something like this before?”

Who Goes to the PASS Summit?

It’s for SQL Server professionals: developers, database administrators and business intelligence people. PASS tries to balance the content across those disciplines.

At a DBA-focused breakfast panel at PASS 2008, I asked for a show of hands to see how many instances each DBA managed:

  • 0-10 instances – about 10% of the attendees
  • 10-50 instances – about 40% of the attendees
  • 50-100 instances – about 40% of the attendees
  • Over 100 instances – about 10% of the attendees

Can They Just Watch Videos Online?

First, the company won’t benefit as much because your employee won’t be able to ask questions to help with your specific environment. Maybe you’ll get lucky and someone else will have asked the same question during the session – but probably not. The speakers have to leave the stage quickly after each session so that the next speaker can set up, so the speakers tend to walk out into the hallway and spend 15-20 minutes answering questions from attendees. Those questions aren’t videotaped.

Second, your employee won’t build personal relationships with the experts that can help solve their problems. At each summit, I meet dozens of people that ask me for help later through the year. I’m more likely to drop what I’m doing and help them if I know them personally, whereas if it’s someone I’ve never seen before, I probably won’t spend as much time digging into their questions. The experts get more questions than they can handle. Sending your employee there in person gets them to the front of the queue all year long.

Is There a Way to Go for Free?

Yes: their registration fee is waived if they present a session, although they still have to pay for hotel and airfare.

Only about 1 out of 4 presentation submissions are accepted (and yes, I was turned down.) There’s a ton of top-notch talent competing for slots. If you want your employee to make the cut, they need to build up their presentation skills first by presenting at their local PASS chapter and at regional SQL Server events.

Each presentation will take them a day or two to build. If you want to help them get a speaking slot, you can help by dedicating one day per month to building a presentation.

Building these presentations forces them to get better at their job because they have to document something in a way to educate others. You want your employees to be better at documentation and cross-training, right? This sounds like a two-for-one deal: they get better at documentation, and they save you money toward their PASS costs.

There’s another way to get registration fees waived, and it’s a sure bet instead of a risk: become a PASS chapter leader. The downside is that it requires significantly more time, because it takes 4-5 hours a week to lead a PASS chapter. They have to entice speakers for the monthly meetings, promote the group, talk to other chapter leads, and so on. It takes more commitment out of their workday time, but if you want them to attend for free, that’s a sure way to save money.

Why Don’t They Pay Their Own Way?

Some attendees do. As their manager, though, there’s something you need to think about before you let an employee go on their own dime.

Your employee will be networking with other SQL Server professionals – folks who are working for other companies who picked up the tab. During the summit, everyone talks to each other about their jobs, their companies and their bosses. A lot of these SQL Server people will be trying to hire more help, and they’ll be talking about their benefits.  Benefits like the ability to go to the PASS Summit for training.

I’m not sayin’, I’m just sayin’ – you might want to write this into the 2009 budget as a part of their benefits and training package.

Update 5/2009: More Reasons from Steve Jones

Steve Jones of SQLServerCentral adds more reasons why you should go to #SQLPass.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Why does utility support suck so bad?

I’ve gone through two hellish customer support experiences lately with utility companies, and I just had a third.  Want to know what it’s like to order phone service from Charter?  After you go through the web site, put in your info, pick your plan and hit Submit, you get a live chat window.  The formatting is a copy/paste out of their chat window, typos and all:

Thank you for choosing Charter Chat Live! A Customer Care representative
from Sales Support will be with you shortly.

You have been connected to CVW Russell E .

brent ozar: Hello, you need anything from me?

CVW Russell E : Hello, brent. My name is Russell. Thank you for contacting Charter Communications.  How may I help you with your service order?

brent ozar: I just placed my order and this popped up.

CVW Russell E : I would be happy to help you with your order. Then I’d like to go over your services with you to ensure your needs are take care of, okay?

brent ozar: OK, go fast, I’m on my way out to lunch. Thanks!

CVW Russell E : Phone orders take a while to enter.

CVW Russell E : To assist you I will have to first verify your intormation. I’ll need ALL of the following information. please.
Your full name:
Service Street address:
Service Apt (If applicable):
Service City:
Service State:
Service Zipcode:
10 digit Phone number:

brent ozar: 6161 S Shore Drive, Whitehall, MI 49461

CVW Russell E : Your full name:

brent ozar: I don’t have a phone number. That’s kinda why I’m ordering phone service.

brent ozar: Brent Ozar

CVW Russell E : For security verification purposes I’ll need the last four digits for your SS#, please. (Numbers only, please)

brent ozar: XXXX

CVW Russell E : Do you want to be able to make international calls? (free to block)
Do you want to block third party and collect calls? (free to block)
Do you want your name and number listed? ($2/month for unlisted)

brent ozar: Intl – no. Block collect – yes. Name listed – yes.

CVW Russell E : Your Charter installed telephone equipment must not be moved without first contacting Charter. If it is relocated, in the event you dial 9-1-1, your location information will not be transmitted properly and you will be required to supply the emergency operator with the address of the emergency.  In the event of a power outage, 911 services will be available only if you purchase and install a back-up battery directly from the device manufacturer, Arris. You can order a back-up battery online at www.arrisistore.com or by calling 877-587-6554.  Do you have any questions?

brent ozar: No.

brent ozar: Are we done?

CVW Russell E : To ensure that your rights are protected, FCC mandates that you have the ability to verify through an independent third party your selection of Charter telephone as your service provider. However, we do need to speak to you.  We are available at 1-888-438-2427, within the next 2 hours.

brent ozar: For what?

CVW Russell E : FCC mandates.

brent ozar: What are FCC mandates?

CVW Russell E : It’s required. You can call the number to get that done, when you have more time. It is preffered within the next two hours though.

brent ozar: And what do I need to do?

brent ozar: I’m calling now and it’s asking for my home number.

CVW Russell E : Either call in and ask to do the TPV, or give me a phone number to call.
brent ozar: Sure, call 281-433-4054.

CVW Russell E : It has to be after you and I are done chatting, I’m not done the order they need access to it.

brent ozar: OK, call me now at 281-433-4054.

CVW Russell E : What ten digit contact phone number may we reach you at?
(Digital phone orders must use a different number than that house line)  You may recieve a courtesy call twice on the installation date, by an automated system.  Please don’t hang up, you’ll be asked to confirm your appointment.

brent ozar: 2814334054

CVW Russell E : 112608 @ 3pm – 5pm is the soonest I can get someone there to install Telephone.
Appointment requirements:
-Access to all needed areas of the building
-English speaking person who is over 18 years old (must show proof of age, and must sign the work order)
-Pets may need to be secured
Are you available during that time?

brent ozar: Yes.

CVW Russell E : Your order number is 42535159.  Just to recap we are setting up Digital phone at $29.99/month for 6 months. (Taxes not included)  Telephone installed on 112608 @ 3pm – 5pm.  The charges will show on the next bill, your bill will also be prorated to sync with your billing cycle. Our service will be billed one month in advance.  If you need to make changes to the services or appointment time on your order, please call us at 888-438-2427.  Enjoy your service and welcome to the Charter family..

brent ozar: OK, great, thanks!

CVW Russell E : Do you still want me to call or will you call back whne you have more time, it takes a few minutes to complete.

brent ozar: Yes, please call me now, I’m waiting.

brent ozar: 281-433-4054

(several minutes later without any chat activity)

CVW Russell E : Is there anything else I may help you with today?

brent ozar: No, do you need to call me first?

brent ozar: My phone still hasn’t rang.

CVW Russell E : I haven’t called you yet.

(What is it about utility companies that makes their service suck so bad?  This is my biggest fear about utility pricing.  Forget security – that pales next to the thought of having to call my utility computing vendor for support…)

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

PASS Session on Analytics by Donald Farmer (#sqlpass)

George Box said, “Essentially, all models are wrong, but some are useful.”  Donald illustrated this point with a brief history lesson.  Early models of the universe said the sun revolved around the earth, and with that model, marvelous things were possible in architecture, celestial navigation and science.  Even though the model was wrong, it was still extremely useful.  Data modeling holds to that same statement: your models will never be right, but your key is to figure out which ones are useful enough.

Donald holds four things to be important: data mining must be actionable, innovative, trustworthy and seamless.  I think the innovative part is what keeps data mining in the back closet: data mining gives companies an edge, and when they get good at it, they don’t necessarily show it off for fear of losing their newfound competitive advantage.

He noted that last year, Microsoft’s BI lifecycle charts had steps for integration, reporting and analytics.  The new ones include steps for data entry, because that’s also a part of the process.  This points to how Excel is being integrated into the process because end users have data of their own that they weren’t necessarily integrating into the data mining lifecycle.  The IT team might not be able to integrate it in time, and the user wants to go go go.  The users want to take our historical data in the data warehouse, toss in some new data (like maybe about a new ad campaign) and make predictions about what’s going to happen.

He explained that clustering is the science of finding bad data by looking for outliers.  For example, your income data might fall into a model like this:

  • Young people have low income
  • Middle-aged people have high income
  • Older people have low income as they go into retirement

If Britney Spears comes in to apply for a loan, her data might be an outlier.  In data mining, you need to figure out if it’s valid data or not.  Malcolm Gladwell’s recent book Outliers wasn’t mentioned in the speech, but for a pop science version of data mining, take a look at it.

Donald showed how to use analytics to predict outliers in a web form and explained that you don’t have to hard-code the rules to find outliers.  For example, you don’t have to hard-code ages and income ranges.  That’s helpful in case your business changes dramatically – like if you merge with a bigger company with more customers, or move geographic ranges – you don’t have to recode your hard-coded business rules.  The database engine just uses the analytics to determine the rules.

In SSIS, you can add a data mining query step which is essentially making a prediction.  It calls out to an SSAS mining model to guess what your missing values are (or to create additional values) as part of the data flow.  You might have a sales promotion to be emailed to customers aged 25-35, but not all of your source data has the customer’s age.  A data mining task in SSIS could fill those gaps in your data.

He demoed the Excel Table Analysis Tools with the passenger manifest from the Titanic.  Very funny.  Later on, after the session, he happened to run across me and some other DBAs sitting in a hallway with our Macbooks open.  One question led to another, and next thing you know, he had his laptop open and we were data mining the Titanic survivors to see if older women were more likely to survive than younger women.  (As it turns out, the answer was no.)

Jeremiah Peschka of Facility9.com summed it up when he turned to me during the session and said that he wanted to go find out as much as he possibly could about data mining.  I’m fall into that same cluster, so to speak – data mining isn’t my “job” either, but it has so many cool benefits that I have to figure out how to integrate it into my workflow.  That was essentially the message in Donald Farmer’s session: predictive analytics works best when it’s an integral part of our daily jobs, built into the tools we use every day.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

PASS Summit Keynote Notes (#sqlpass)

Updates

It’s Friday, it’s 8:30 AM, and it’s time for another keynote.

8:30 AM – Bill Graziano just rode out on a trike!

8:40 – Two sessions will be repeated today due to high demand. At 10:45 in room 604 they’re repeating the clustering session, and 2:30 in room 6E is SQL Server on SAN.

8:45 – The PASS Board Election results are in! The new board members are Douglas McDowell, Lynda Rab and Andy Warren! That means Tom LaRock, Pat Wright and Louis Davidson didn’t make it. Here’s some blog links from each candidate’s experiences:

8:47 – now announcing winners of the SQL Heroes contest.

8:50 – The winner: BIDS Helper with a perfect score of 55! I wish I could tell you the rest, but the slides rolled off the screen within 30 seconds. Damn. Update: thanks to Adam Machanic, they’re:

  1. BIDS Helper – I’ve heard this mentioned repeatedly over the week by presenters
  2. Extended Events Manager
  3. SSISUnit
  4. CDC Helper
  5. QPee tools by Jason Massie! Woohoo!

8:55 – Dell reviewing the Microsoft BI platform reference architecture with the entire BI stack including PerformancePoint.

9:00 – Bill Graziano back up onstage to introduce Dr. David DeWitt from Microsoft.

9:03 – David’s great – he’s identifying all the reasons why his presentation will suck: he doesn’t have a motorcycle, he doesn’t have new presentations, etc! Hilarious. Love it.

9:10 – He’s going through the basics of why linear scaleup is a good solution, and how the numbers work.

9:12 – Ebay has a couple of >1 petabyte warehouses with 85 disk drives for storage. As a former SAN guy, I’m guessing I heard those numbers wrong, because that sounds ridiculously impossible. (Update – that numbers is 8500 disk drives, which also sounds ridiculous but on the other extreme, hahaha.)

Dr David DeWitt Onstage

Dr David DeWitt Onstage

9:15 – DATAllegro works because it’s shared-nothing: no shared memory, no shared SAN, etc. Runs with commodity gear like a bunch of rackmount servers connected via gigabit networks or Infiniband. This same hardware model is used by all the major search engines.

9:20 – Interesting – he’s getting ready to talk about design alternatives to shared-nothing cluster approaches. I like it when a vendor is honest and says, “This is another way to do it other than what we’re talking about.” He’s covering horizontal partitioning now.

9:23 – Brilliant. He’s using animated pictures to show how the different partitioning methods work when they’re loading data. As he shows round-robin partitioning, the loaded rows populate onto the different visual database servers. It occurs to me that he’s never really left education – he understands how to convey complicated information to a wide audience.

Dr David DeWitt Explains Partitioning

Dr David DeWitt Explains Partitioning

9:30 – He’s explaining about how you scale out joins in shared-nothing systems and how relational operators affect the joins. This is amazing stuff.

9:32 – With Project Madison, reading execution plans is going to be a lot more interesting: we’re going to need to pay more attention to how each individual thread works, because the load really needs to be balanced. Right now, if you have a parallel operation broken out across 8 threads, for example, you’re not too concerned whether one thread finishes faster. When each thread is running on a separate machine, each with a separate set of data, you need to pay close attention to what kinds of data you’re putting on each machine. This is mesmerizing for engine guys like me.

9:38 – He’s talking about the role of indexes. I’m not even going to try to blog this from here on out. If you’re into query execution plans, if you’re into data warehousing or if you’re into scale-out architectures, you definitely want to watch the recording of this presentation. He’s brilliant and it has a lot of impact on people who will use Project Madison.

9:47 – I said I wasn’t going to keep blogging, but man, this is really, really important. Your indexing strategies, especially your clustered index strategy, is going to be dramatically different. There’s some similarities with OLTP partitioning – scenarios where you partitioned in order to gain concurrency benefits – but man, you gotta be paying attention. I can easily see where a Project Madison implementation will benefit from having a person specifically dedicated to determining the correct partitioning keys for a month or more, just testing and benchmarking. Schema changes will also require careful planning to make sure the data lands on the right nodes for querying speed.

9:52 – When the engine has to join two large tables that are partitioned in different ways, it’s called Table Repartitioning. Each node splits its part of the table out into N parts (N being the number of nodes) and then splits that data out to all of the other nodes. It’s done by splitting, shuffling and combining. Networking is fast enough that it’s not a bottleneck, they say. Can’t wait to see that part in action – technologically impressive!

9:55 – Thinking about concurrency problems, I’m amazed that this stuff works. It’s shuffling data between nodes in order to recombine for a query, and yet you could have inserts/updates/deletes going on at the same time too. I’d love to play with benchmarks to see when Madison starts to pay off: for example, if you have a 16-core box with 128gb of memory, how many Madison nodes would you need to keep up when there’s heavy insert/update/delete activity going on? I’m not saying Madison doesn’t pay off – I’m sure it will – but it’s a neat theoretical question to ask. (But it depends on the activity and the schema, of course – it’d cost a lot to find the answer, and the answer would only be valid in limited scenarios.)

9:57 – This has awesome backup implications. It’s been said before, but it bears repeating: on a scaled-out Madison system, the dimensions will be copied across all of the nodes, but fact tables will be spread out and not copied everywhere. Now, for the new part: I wonder how backups will work. With data spread out so far, the amount of throughput coming from Madison to the backup system will be insanely, screamingly fast. Toss in a compression product like LiteSpeed, and wow, I bet I could back up terabytes in minutes. I’m going to need a lot of storage speed on the target end just to keep up!

10:00 – Partition skew = an unbalanced amount of data in specific partitions (now nodes). The engine can either has functions (almost random) or range partitioning (manual setup). Since the node with the most data has the slowest response time, and that determines the query’s total execution time, then you have to pay attention to partition skew or your queries will slow down.

10:02 – DeWitt’s Gray Systems Lab is working with the DATAllegro team to build a new “world-class parallel optimizer” to make it easier to work around partition skew, among other things. Wow – brain power. “Microsoft intends to become the premier supplier of scalable database systems for data warehousing.” Interesting – that wouldn’t appear to put them in competition for lower-end scale-out systems.

10:03 – Done. Wow, he was awesome for engine guys like me! Off to the sessions…

Other bloggers took some great notes during the keynote:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube