Tag Archive: stackoverflow

Data Mining the StackOverflow Database

StackOverflow released a public dump of their database this morning. Jeff Atwood and the guys believe that if you, the community, are putting the work into this huge body of knowledge, then you should be able to have rights to use it.

This is a great dataset to show off one of my favorite toys from the Microsoft SQL Server Data Mining team. In this half-hour video, Tom LaRock and I will walk you through data mining the StackOverflow user list to find out more about the users and see what makes the rockstar high-reputation users different from the worker bees like me.

If this looks interesting to you, here’s what else I’ve been doing with the StackOverflow data:

Now, back to what I did in the video – let’s talk about the tools I used.

Microsoft’s Free Data Mining Tools

For today’s demo, I’m using SQL Server Analysis Services installed on my desktop. Relax – it’s really easy. Literally just install SQL Server 2005 or 2008 Developer Edition, check the box for Analysis Services, and use the defaults. You don’t have to know what you’re doing in order to get it up and running, and it just runs in the background as a service. After you’re done playing around, you can stop the service and set it to manual to prevent it from sapping your system resources. Go into Control Panel, Administrative Tools, double-click on the SQL Server Analysis Services service, and change the startup type to Manual.

Depending on your version of SQL Server and Excel, you’ll need one of these free plugins from Microsoft:

If you want to avoid the whole SQL Server Analysis Services thing altogether, you can also use Microsoft’s free SQL Server Data Mining in the Cloud plugin. Be aware that it’s a technical preview, not a fully supported & released product. Their cloud servers can (and do) go down. Also know that your data is going into the cloud, which has its own ramifications as I’ve discussed in my previous cloud data mining tutorial.

What’s Coming Next: SQL Server 2008 R2 with BI in Excel

In the next version of SQL Server, Microsoft will deliver business intelligence to end users through Excel. At the Professional Association for SQL Server Summit last November, Donald Farmer demoed slicing and dicing of huge spreadsheets with real-time analytics that previously would have required some pretty hefty hardware.

Excel 2007 has a million-row limit, but the forthcoming version will not. Some of the StackOverflow export tables like Votes have more than a million rows, so we can’t yet data mine those using Excel as a front end, but we can play with the Users table today.

Subscribing or Downloading My Podcasts

If you have an MP3 player or a portable video player and you want to download my podcasts automatically, you can subscribe to the SQLServerPedia podcast feeds here:

You can also download this video to watch it later:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

ServerFault.com – Like StackOverflow, but for IT

I hate forums and newsgroups.  I can’t do a better job of explaining why than Tom LaRock did in his post Why I Dislike Newsgroups, including this bullet point list:

  1. It takes too long to get an answer, especially if you need an answer quickly.
  2. Sometimes, people are quite rude.
  3. Most times, the answers are flat out wrong.
  4. Many questions are not being asked in the right forums.
  5. Moderators spend far too much time moving questions between forums.
  6. End users get frustrated when their questions are moved.
  7. You do not know who you can trust.
  8. You can review threads later, but have no idea which answer was correct.

Amen.

The Solutions: StackOverflow.com and ServerFault.com

StackOverflow is for programmers, and ServerFault is for IT workers (sysadmins, Exchange guys, SharePoint folks, network people, rack monkeys, etc).  They both work the same way:

  • A user submits a question.
    • Other users can comment on the question, thereby encouraging the asker to clarify their question or improve it.
    • Users can tag the question with mutiple tags.  This replaces the old group-based forums where you had to move a question around between multiple groups trying to find the right user base to answer it.  A question might involve C#, SQLServer and SQLServer2008, for example, and tagging with all three gets the right audience involved.
    • Users can vote the question up or down.  Highly rated questions get more attention by floating to the top of the question list, and poor questions (like not enough information or inflammatory questions) sink down toward the bottom of the list.
  • A user submits an answer.
    • Other users can comment on the answer, and vote it up or down.  High-ranking answers move to the top of the answer list.
    • Users with high reputation counts can edit and improve the answer.
  • The questioner accepts an answer. This moves it to the very top of the answer list.
  • Other people search the web for similar questions, and end up at StackOverflow or ServerFault.  They see an elegantly arranged list of answers with the best answers at the top.

StackOverflow and ServerFault make it easy for users to get their questions answered and make it easy for answerers to find relevant questions to work on.  Even better, the reputation system rewards everybody’s work: every time a question, comment or answer is voted up or down, it helps record who’s doing good work and gives their work more credibility in the site.

Long-term, I think both of these sites will function like resumes for programmers and IT workers.  If someone wants to work for me, and they can point to a long history of answering questions on sites like this, with steadily increasing reputations and good answers, that’s better than a letter of reference.  It shows intelligence plus dedication to the community.

Getting Started with StackOverflow and ServerFault

StackOverflow has been open for a while, so to get started you just surf over to StackOverflow.com and register for an account with your OpenID.  OpenID is what the old Microsoft Passport system always aimed to be, a single login good for anywhere, except that you actually control your own OpenID.  You can run an OpenID on your own web site, like I do at BrentOzar.com, or you can use one from Google, AOL or Yahoo, among other providers.

ServerFault is brand spankin’ new, and it’s in a semi-private beta for the next week or so.  Go to ServerFault.com and give the secret password “alt.sysadmin.recovery” to register for an account with your OpenID.

And by the way, the race is on – the current high-scoring ServerFault user is Stefan Plattner, aka @SPlattne on Twitter, and he’s positively smoking both me and K. Brian Kelley, and even Jon Skeet for that matter.  If you’re going to compete, use TweetDeck and set up a search column for serverfault OR stackalert.  That term combination will catch alerts from the Twitter bots @ServerFault and @StackAlert, which tweet whenever there’s a new question.  Using a separate column, rather than actually following these accounts, will keep your friend stream clean.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Getting Help With A Slow Query

StackOverflow users often ask, “I’ve got this query that runs slow – can you help?”  Here’s a few tips to get better, faster answers.

Get the query execution plan.

DBA Porn

DBA Porn

In SQL Server Management Studio, when you’re looking at the query editor window, click Query, Display Estimated Execution Plan.  It’ll show a fancy-pants flow chart diagram thingy that gets experienced DBAs all excited.

Right-click anywhere on that diagram and click Save As.  Save it to your local machine somewhere, then upload it to a free file-sharing service like Drop.io or Filedropper.com.  These services let you upload a file, and then share that link with the public.

This diagram does not include the data inside the query (like your customers or sales data), but it does include information about your database schema (tables, indexes, views).  If your schema is vitally secret,well, frankly, you need to hire a DBA, because you’re also probably not encrypting anything.  But I digress.

In your StackOverflow question, include the link to this query plan.

Capture Perfmon data.

Go to my Perfmon tutorial and follow the instructions to capture performance metrics data about your SQL Server.  This will help database administrators find out what parts of your server are the current performance bottleneck.  Capture data for 15 minutes before the query runs, while the query is running, and 15 minutes after the query runs.  Stop the perfmon logging, save the CSV file to your machine, and then upload it to a file sharing service as well.

In your StackOverflow question, include the link to the Perfmon data, and also point out what time the query was executing at.

This file does not include any identifying data about your database other than the server name, so it’s pretty safe to post online.

What happens if…

If a query takes 5 minutes to run, then how long does it take to run if you immediately execute it again with exactly the same parameters?  If it takes 5 minutes again, then we’ve got a different set of problems than if it takes a few seconds the second time.

If the query runs slowly from your application, then try running the exact same query from SQL Server Management Studio.  If you’re running a query with dynamic SQL, like sp_executesql, try running it just by itself.  Copy the string out, paste it into a new query window, and run it.  Is it faster?  Does the execution plan look different?  If so, grab that and upload it as well.

If the query uses parameters, and it’s slow with some parameters but not others (like some customers run fast, but other customers run slow), then it might be a plan problem or a statistics problem.  Run it with both sets of parameters (the fast and the slow) and include both sets of execution plans with your question.

Check your Windows event logs.

On the SQL Server, go into Control Panel, Administrative Tools, Event Viewer, and look at the System and Application logs.  If you see any alerts that aren’t “Informational”, you might have a problem on the server.  Examples include RAID array rebuilds due to a failed hard drive, memory failures, SAN controller errors, or an application that keeps crashing.

Sometimes, even Informational messages point to things that are making your query run slow.  For example, I’ve seen instances where people complained about slow query speeds off and on through the day, and I found out that the antivirus software on the server was doing frequent definition updates.  After each update, Symantec/Norton Antivirus does a scan of memory and a few files, which briefly brings performance to a crawl.  This doesn’t show up as an error, but just as an informational message, but it’s a problem for you.

I wouldn’t post these on the internet, though, because they can include some detailed information about your server.  If you’ve got questions about a specific event, it’s best to take a screen shot of that one event’s details and just post that.

Follow up with your question fast.

After you post the question on StackOverflow, set yourself a one hour timer with your phone, your computer, your microwave, whatever.  After an hour, go back to the site, and answer any and all comments about your question that have popped up since.  People will often need more information to solve your problem, and you want to catch ‘em while you’ve still got their attention.  Continue this for the rest of the first day (revisiting hourly) and then set yourself a to-do item in your task list to go back each morning and follow up again.

I’ve seen so many questions that get answered, but the original questioner never revisits the site to find out what the problem was.  Even if you solve it on your own, at least go in there and make a note of that so that other people aren’t pulling up your question over and over to read through it.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Okay, forget the shirt idea.

A couple of weeks ago, I blogged about starting a web site where you could get your own custom Twitter shirt.  Here’s what the visits looked like:

Social-Shirt.com Visits

Social-Shirt.com Visits

I got roughly a thousand visits in two days when I posted the blog entry, and after that – silencio.  Got a lot of retweets on Twitter, lots of people saying they were interested, but….

My Twitter Shirt

My Twitter Shirt

Nobody actually bought one!

Well, other than me, as shown here.  I offered free ones to a few Twitter celebrities, and nobody was interested even in free ones!

Interestingly, Zazzle (the t-shirt vendor) lists shirts by how hot they’re selling, and my Twitter shirts were consistently in the top 1/4 of the list of Twitter shirts – even though I hadn’t sold a single one.  Evidently nobody else is getting rich off these either.

I loved this as an experiment.  I had a business idea, and I was able to get it off the ground in minimal time with minimal skills and minimal investments – roughly $10 for the domain name, and around $2 for the traffic on Amazon S3.

I learned that people aren’t quite ready to wear their Twitter profile on their chest.  This makes sense, as a lot of us have our own picture as our avatar.  I tried selling shirts without the picture, too, but that just didn’t have the pizzazz factor.

So I imploded it, slapped WordPress on there, turned it into a Twitter shirt portal, and did some links to Zazzle.  I left it up until the domain expires, but I’ll be surprised if anybody ever buys a shirt off there.

Except me.  You’ll recognize me at user group meetings because I’ll be wearing my Twitter profile or StackOverflow score when I’m not presenting.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

StackOverflow adds “bounty” for answering questions

I thought StackOverflow was addictive before, but now it’s electronic crack.

The latest feature is a reputation bounty.  If you want a question answered faster, you can offer a portion of your own reputation score as a bounty.  The question stays open for 7 days, and it’s on the Featured Questions tab of the site.  (You’d better believe the top answerers are watching this tab closely.)

When you accept someone’s answer, they get the bounty.

If after 7 days you haven’t accepted one, the top-voted answer gets the half of the bounty.

Thank God they’re not using real money, only reputation scores.  If they used real money, I’d probably stay up all night hitting refresh, waiting for new questions to come in.  High Score!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

New StackOverflow database server coming

SQL Server for StackOverflow

SQL Server for StackOverflow

Today, Jeff Atwood blogged about StackOverflow’s new SQL Server hardware: a Lenovo RD120, the artist formerly known as an IBM x346.  Notice the big heart on the box.  Servers run faster with love.

Next, I’m aiming to do log shipping from the box pictured here over to Amazon S3.  The StackOverflow transaction logs will be copied up to cloud-based storage in S3.  That will enable us to do log shipping to the cloud without actually running a SQL Server on the other end until it’s absolutely necessary, thereby avoiding the expensive cloud-based SQL licensing (around $1/hour at the moment).  When disaster strikes, we’ll turn on a SQL Server in EC2, apply the logs, and be up and running quickly.

Well, I dunno about running, but we’ll be able to get up and walking, I suppose – EC2 disk performance isn’t exactly stellar.  A couple of months ago, I ran SQLIO performance tests on Amazon EC2 to see how it performed, and the results were horrific.

In fact, EC2 performance doesn’t even come close to what Jeff was able to get with relatively cheap off-the-shelf hardware. You can replicate those same disk performance tests yourself on your own gear, although be aware that it hammers the server for about a day.

Stay tuned and I’ll be documenting how to log ship SQL Server into the cloud.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

T-shirts, get your hot fresh Twitter t-shirts here…

Twitter Shirt

Twitter Shirt from Social-Shirt.com

Whenever a group of people from Twitter meet up, we all ask each other the same question:

“Who are you on Twitter?”

Enter Social-Shirt.com, where for $17.55 (til Jan 20, see the last paragraph), you can get a t-shirt with your Twitter profile on it.

They also have shirts for StackOverflow, Identi.ca and FeedBurner.  I just ordered a few of them, and I’ll let you know what I think of the quality when they come in.

If they suck, I’ll slap the owner of Social-Shirt.com personally.  But not too hard, because, uh, it’s me.

I’ve had this idea banging around inside my empty head since the last Houston Twitter meetup.  I kept telling myself I was going to learn enough Python or PHP or whatever to slap it together and host it on Google Apps or Amazon EC2.  But here’s the problem: we’re talking about a site targeted at Twitter users, and you know what that means – it’s gonna fail.

Sites Can’t Handle Twitter’s Load.

Cool sites for Twitter users spring up all the time.  They get popular, and then all of a sudden, wham, they fall over under the load.  We all groan, and we say we’ll come back later.  Maybe we do, and maybe we don’t.

Meanwhile, the site owner is sweating bullets, throwing money and time into infrastructure trying to figure out how to handle the load or how to make the application run faster.  The fun of launching a business turns into sheer terror.

So why would I build something like that, especially with near-zero web application skills?  The last things I coded on the web were in Classic ASP, and I had no desire to revisit that again.

Built With HTML, Hosted on Amazon S3

I found a shirt vendor that had an API.  You can pass it a query string with stuff that you want to go on the shirt (like the user’s Twitter name, avatar, etc) and they build the shirt on the fly.  All I had to do was build some HTML forms, and presto, I was in the shirt business.

There’s a lot of drawbacks – for example, I’d originally wanted the user to type in their Twitter name and I’d build the rest of the shirt for them.  I’d call the Twitter API, pull the info I needed, parse it, build the shirt, etc.  But realistically, me with my limited web app skills, I can’t get there from here quickly.  I’ve got so many other things on my plate that I just wasn’t going to be able to bang that out.

So I gave up.  I built it in plain old HTML, made the users type in their own profile info, hosted it on Amazon S3, and called it done.  Besides, if I have a choice between putting money into the platform, or putting money into promotion, guess which one is going to yield more?  My money’s on promotion.

Speaking of Which: They’re On Sale for $17.55!

You get $4.40 off each shirt if you enter the promo code 440SHIRTSALE during checkout before January 20, 2009 at 11:59pm PST , bringing the shirts down to $17.55.  Pretty good deal, actually.

If you buy one, let me know what you think of it!  I’m already giddy with excitement for mine to come in.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Tony Davis disses StackOverflow.com

How many of you work only with Microsoft SQL Server?  No other technologies – not Windows, not IIS, not Visual Studio, not Java, not JQuery, just purely SQL Server and absolutely nothing else?

Show of hands?

Not too many, I bet.  Heck, my job title is SQL Server Expert, but on any given day, I’m all over the place – working with everything from SAP to Visual Studio Data Dude to Microsoft DPM to Quest tools like Toad and LiteSpeed.  These days, database guys have to wear a lot of hats.

Next question: when you have a technical problem, how many times can you say, “I know for sure that this is due to SQL Server, and nothing else – not SAP, not BusinessObjects, not Crystal Reports, not any of my third-party tools, just flat out SQL Server?”

Well, if you ask a developer, sure, it’s always a database problem, but in reality there’s a lot of places we have to look before we can point the blame at SQL Server.  So when you have a technical question that you need to post, where do you want to post it?

  1. A pure Microsoft SQL Server forum
  2. The application vendor’s forum (SAP, BusinessObjects, etc)
  3. A forum that handles all technical questions for all software

I used to believe that pure Microsoft SQL Server forums like SQLServerCentral were the best way to go, because that’s where the best SQL Server experts hung out. Over time, though, I’ve realized that my problems have lots of causes, and they’re not always SQL Server.  Often I need help from experts in other fields, and I want to use the same forum instead of having to research the right forum every time.  Quick, what’s the right forum to post a BusinessObjects question?  Do you have an account there?  How long would it take you to set one up?  What if you post to the wrong forum that nobody ever visits?

Enter StackOverflow – Tagging, Not Groups

That’s where StackOverflow comes in.  You can post a question with a lot of components – say, a Crystal Report running in IIS called by Visual Studio querying a SQL Server database – and tag your question with several keywords like CrystalReports, IIS, VisualStudio, etc.  Experts in each field will see your question, contribute responses and help you get to the right answer no matter where the underlying problem lies.

Want to see it in action? Check out this StackOverflow question about LINQ to SQL, which ended up getting answered by Jeremiah Peschka.  If this question had been posted in a purely SQL Server forum, the answer might not have been as good, because not so many DBAs have LINQ experience.

Not everybody sees the universal-forum approach as a good thing.  Yesterday, Tony Davis wrote a blog post called Building Technical Communities where he argues:

“Whereas one can argue that everyone’s opinion is of equal value, it is more difficult to believe that expertise is so widely distributed.”

I disagree – in fact, I find experts everywhere.  And even worse, I find people who proclaim to be experts when they don’t know their rear from their elbow, but they just happen to have a high number of posts on a given site, so they get a lot of credibility.  If I ask a question in a purely SQL Server forum, I might get somebody who professes to know a lot about Crystal Reports, and says it’s a Crystal Reports problem – but he might not have an idea.  Sure, he’s got a lot of posts, but what does that actually mean?

Web 2.0 != Bad Advice

Tony goes on to suggest SQLServerCentral’s forums are a better place to get your answers because:

“Recently, I’ve read about several cases of people getting misleading advice from one of the numerous user-powered medical websites that have sprung up. I’m certain that the same thing happens in technical communities, and also that the “web2.0″ style ones are far more prone to it than traditional forums.”

We definitely agree that bad advice happens in technical communities, but I’m not sure where he’s believing that “web2.0 style ones” are more subject to it.  I’ve found quite the opposite – people with high reputation rankings on StackOverflow didn’t just get there by posting a lot, but rather by providing answers that the community valued and uprated as legit.

But speaking of ratings, Tony even goes so far as to slam StackOverflow’s post rating system:

“In a forum, approval or disapproval takes the form of a discussion (a thread) where you’re required to state your case clearly, and with proof, and so is subject to true peer review. You cannot correct someone else’s advice anonymously. Hitting a “tick” or “thumbs down” button requires no such effort and plays to the “herd instinct”: applaud the “leader” when others do so, and “go in for the kill” when you spot a straggler.”

I’m confused.  If I read Tony’s criticism correctly, he’s saying that just marking a post’s rating without fully justifying your case is wrong – but look here:

That would be the pot calling the kettle black….

Side note – yes, I’ve been pimping StackOverflow a lot lately, and yes, I even have an entry on their “about” page now, but no, I’m not getting paid.  As the Editor-in-Chief of SQLServerPedia.com, I have every reason to keep my mouth shut and not tell you about StackOverflow.com, because I should want you to only come to my own site in order to get your answers.

I pimp StackOverflow.com because if I was a DBA wearing a lot of hats, I’d want to know about it.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

StackOverflow’s SQL 2008 FTS issue solved

Whew – the Microsoft folks really came through on this one and tracked down the problem pretty quickly.  I want to personally thank Denis Churin, Nikita Shamgunov and Sundaram Narayanan for their detailed investigation into the issue and helping us get it fixed. I’m going to explain the issue to help anybody else get through it in case they run into similar problems.

How SQL 2008′s Full Text Search is Different than 2005

Let’s start with the basics: in SQL Server 2005, the full text data lived outside of SQL Server and wasn’t subject to transactional locking.  If you inserted a gazillion records into a table that had a full text index on it, the indexes were rebuilt without worrying about simultaneous inserts and locking.  It wasn’t like a traditional SQL Server table.

In SQL Server 2008, the full text indexes were moved inside the database server and became subject to transactional locking.  By default, the indexes are updated automatically whenever SQL Server determines they need to be updated.  When that merge process happens, SQL Server needs to obtain some locks on the indexed table.  Ideally, it grabs the locks when there’s a brief period of no load, does its merge work, and lets go of the locks.  The merge process can be quite brief (well under a second) as long as the amount of data hasn’t changed dramatically.

In a heavily transactional environment when there’s a whole lot of inserts/updates/deletes on the indexed table, the DBA may need to restrict those merge activities to only a certain time window of the day.  SQL Server 2008 gives you that ability to do merges manually, but I wanted to avoid that on StackOverflow.com.  Every time I put in a manual job into a solution, it requires manual maintenance, alerting and corrective actions, and I don’t take that lightly.

Where The Problem Comes In: Convoys of Queries

Imagine this scenario:

  1. A full-text select query is issued that looks like it’ll finish extremely quickly
  2. The SQL Server doesn’t see much activity in the full text table (only selects, not inserts/updates/deletes) so it kicks off a merge
  3. The select query doesn’t finish as quickly as the engine expected, and the merge can’t start until it obtains the locks it needs
  4. More full-text queries come in (could be selects, inserts, updates, deletes) that need to obtain locks

Those newly issued queries in step 4 are suddenly delayed while waiting for query #1 to finish.  The impact on full text performance varies depending on how long it takes query #1 to finish – might be milliseconds, might be seconds.

Denis Churin, Nikita Shamgunov and Sundaram Narayanan (the Microsoft heroes) suspected this might be our performance problem at StackOverflow, so they had us grab a memory dump and a database backup at the exact moment we were having performance problems.  They looked at the memory dump and isolated a single particular full-text query that was confusing the engine.  The engine was building an execution plan for it that didn’t work well, and instead of taking milliseconds, it took seconds (as many as 50 seconds).  During that time, performance went into the toilet.

They rewrote the query in a different way the engine would analyze better, and when that query was changed, presto, the performance problems disappeared.

There’s a QFE coming in a few weeks that will let the merge thread run without blocking other queries, but for now, we’re in good shape.

Diagnosing The Problem in Your Environment

Sundaram gave us this query to help troubleshoot when a long-running query is blocking other queries.  This helps identify the issue when a full-text select query suddenly blocks the SQL Server 2008 full text merge thread, and you can look at that query to make it run faster.  I haven’t tested this in depth, but these guys have proven to be much more qualified than me, so I have a hunch it’ll work, heh:

declare @temp int
declare @parent int
declare @final int
set @parent = 0
while (@parent = 0)
BEGIN
select @parent=blocked from sys.sysprocesses where lastwaittype=’LCK_M_SCH_M’ and waittime > 30000
WAITFOR DELAY ’00:00:01′;
END

WHILE (@parent <> 0)
BEGIN
set @final = @parent
select @temp=blocked from sys.sysprocesses where spid = @parent
set @parent = @temp
END
select * from sys.sysprocesses where spid = @final
select * from sys.dm_exec_requests ER
CROSS APPLY sys.dm_exec_sql_text(ER.plan_handle) ST
where ER.session_id = @final
select * from sys.dm_exec_requests ER
CROSS APPLY sys.dm_exec_query_plan(ER.plan_handle) ST
where ER.session_id = @final

And now, finally, I can start performance tuning that server!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

SQL 2008 upgrade & tuning for StackOverflow.com

I’ve mentioned Jeff Atwood of CodingHorror.com a few times over the years here, and it bears repeating: he writes a great blog for developers, and DBAs need to read it too.  I also follow him on Twitter, and a couple of weeks ago he mentioned he was having problems deciphering execution plans:

Holy cow – I could actually help the world’s most dangerous programming blogger with something!  I fired off a Tweet and started helping him read execution plans.  One thing led to another, and next thing I know, I’m tuning StackOverflow‘s SQL Server for him.

Now, I’ve done a fair bit of performance tuning, but I should have known that tuning is a little different in Jeff’s world.  The easiest way to explain it is by relaying the first thing out of somebody’s mouth when we start performance tuning:

  • Data warehouse manager – “The nightly loads are taking 6-7 hours a night, and we need to get that number down.”
  • Web site manager – “Our queries are timing out after 30-60 seconds.”
  • Jeff Atwood – “This query is taking 150ms, and I want it faster.”

Gulp.

And I should mention that Jeff’s written blog entries like:

Nooo pressure.  No pressure.

But wait, there’s more.  Usually, when I go into a shop that’s never had a DBA, the server is a mess.  Tables, views, field naming conventions and formats all over the place, no consistency, nobody knows if anything’s actually in use, etc.  Not here.  The schema on this thing was tighter than the Pope’s poop chute, as they say.  (Really, they do.  “They” being my parents.)

There is almost no low-hanging fruit here.  Well, I mean, there’s a little, but we’re not talking big fruit.  Berries.  And they’re eight feet up.  I got all the way down to comparing specific query plans on 2005 vs 2008 to find out why exactly one table was in the wrong join order on the execution plan to save 80ms on a query.  I fixed it with SQL Server’s trace flag 2301 to get it to spend more time building the execution plan, but when we went live, the server couldn’t handle the load on queries using sp_executesql, so I had to rip that back out.  Dammit, I want my 80ms back, and I gotta figure out a way to get it.

Anyway, I’m proud to say that last night I helped Geoff Dalgas upgrade StackOverflow to SQL Server 2008, do more performance tuning.  If StackOverflow is slow today, it’s all me.

Update 10/30 – it was in fact slower, but it wasn’t me.  I blogged about the problems we’re having with SQL Server 2008 full text search performance.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube