This month’s Redmond Mag glows about some new features in SQL 2008, and yes, it does have a lot of cool tricks up its sleeve. But before you go upgrading your servers to get those new features, there’s one thing you need to know.
New versions of SQL Server are not always faster for every query.
This may come as a surprise to you, but every versions of SQL Server have areas where they require manual tweaking in order to be as fast as the last version. I worked with Jeff Atwood and the guys at StackOverflow this past weekend to move them onto SQL Server 2008, and we had a nasty surprise. Jeff summed up the issues with the SQL 2008 upgrade on his blog, but I’ll cover it here from a DBA perspective.
The app and SQL 2005 were on the same box (which tells you a lot about the performance of his code, because that site is doing pretty well) and they got a new box running SQL 2008. We restored the databases over to the 2008 box for testing, ran queries, and compared performance.
Full text search results were slower, but we didn’t catch just how slow they were because we focused on the queries that were currently running slow on the 2005 box. Some of the queries we didn’t test had been running great on 2005, but were dead slow on 2008.
Why slower? Different execution plans.
We did catch a second issue: on a particularly slow set of queries, SQL 2008 was building a different execution plan than 2005. This execution plan worked – but not for the kind of load StackOverflow has. I narrowed down the differences and I was able to overcome it with trace flag 2301, which lets SQL spend a little more time building a good execution plan. By spending more time compiling the plan initially, and saving that plan in cache, we got phenomenally better results. Query times went from 190ms on SQL 2005 to 40ms on SQL 2008. Hubba hubba. All systems go.
Denny Cherry, a performance tuning guru with a long history at MySpace.com and Awareness Technologies, asked me why I didn’t manually set up query execution plans for them. If it was my server and I was Jeff’s full-time employee, that’s exactly what I’d do. Problem is, if you don’t have a full-time DBA to watch the server and identify what the right (and wrong) execution plans are, you introduce an element of mystery. I can imagine what would happen three months down the road: performance would go to hell in a handbasket as schemas, queries and indexes changed over time. Jeff wouldn’t know what things were the fault of the engine, versus what things were the fault of the DBA who’d changed these settings a while back. So I had to pick a solution that wouldn’t require StackOverflow to incur a huge new payroll expense.
Went live with 2008, -T2301 killed us.
We went live with SQL 2008, rebuilt the indexes & stats, turned on the site (now hosting IIS on a separate box, mind you) and immediately the server slowed to a crawl. I figured it’d take a few minutes to get a good set of execution plans built, but the server just wasn’t recovering. Doing diagnostics on the server, I discovered that queries using sp_executesql were just knocking the server over. Ahhh-ha! Those were dynamic SQL strings, and those would probably get new execution plans built every time. The trace flag -T2301 failed us there, so we had to rip it back out.
How much would you pay to avoid a scenario like this? $19.95? $29.95? But wait, there’s more!
After ripping out the trace flag, the server stabilized around 20-30% CPU, but those numbers were too high for a weekend night. When they came up to full load during the week, the server fell over, averaging 100% CPU for a few minutes at a time. The problem query was doing a union between three full text searches, but before you scream bloody murder about a union, even running the three searches independently was taking 60-70% of the time they took when unioned together. We were screwed. The guys had to make a change to their application and cache data on the web server’s hard drive in order to sustain their normal load.
Ugh. As a DBA, that’s a failure when the app guys tell me that. This is an application that used to live fine on a single box, and now, even with SQL 2008 on its own hardware, the app guys have to work around a weakness in SQL 2008. Ouch. I take that pretty personally.
The lesson: capture a full trace before you upgrade SQL Server.
The lesson: before you upgrade, capture a full trace of your server’s load and replay it against the new version. Analyze before and after duration times and CPU numbers for both versions, and identify the list of queries that run slower. Examine how often they actually run in production, and think about how that’s going to affect your load. This was my own failure – after working with the guys at StackOverflow and seeing how tight their queries were, it seemed like the slowest queries on SQL 2005 were still in pretty good shape. Unfortunately, hidden below the surface in queries that were running in 50-75ms on SQL 2005, were some queries that ballooned to over 1 second on SQL 2008, and went much higher under load.
Furthermore, a simple trace replay still won’t give you the full picture because traces don’t throw the same amount of load at the replay server in the same time. In a web server scenario, you may have a hundred queries come in simultaneously, and you want to see exactly how the new server will be affected by that – but replaying a trace with the Microsoft native tools won’t give you that answer. For that, you need a benchmarking tool like Quest Benchmark Factory or HP Mercury LoadRunner, etc – something that can capture a load and then replay it with the same velocity and bandwidth.
Do I like SQL Server 2008? Yeah. But do I wish we could have avoided what happened this week with StackOverflow? Hell yeah.
No one still knows the reason why full text searching is slower in 2008? I really hope Microsoft has a rational reason (and please don’t say security).
So are you making a case for not needing a full time DBA? 🙂
@Sire – it’s faster in most scenarios, but not in Jeff’s scenario. I won’t go into specifics, but it’s not his fault. 2008’s full text moved into the engine, which means there’s some overhead inside the engine for things like logging. Before, it was handled by the OS, which was faster but sucked for administration.
@Aaron – I know, right? But it’s one of those cases where they only need a DBA like four hours a week, maybe eight max.
“The lesson: before you upgrade, capture a full trace of your server’s load and replay it against the new version.”
This seems like a very good idea, but I am a little hazy on the details. What exactly do you mean by this, and how would you go about doing it?
Hi, Mick. You use the SQL Server Profiler tool to capture traces of your activity. Here’s a couple of links:
Be aware that you don’t want to do this on a production server unless you know SQL Server inside and out. It’s pretty easy to capture an overwhelming amount of data and crash a heavily active server.
SQLCat just released a
Best Practices for Integrated Full Text Search (iFTS) in SQL 2008
don’t know if it can help or not
It seems form this link the link below, that the problem may have been resolved… can you please share the specifics of the problem and the fix?
I intend to migrate our database to SQL 2008, but the problem you faced makes me think twice about it…
You mentioned “manually set up query execution plans”. Is this accomplished with query hints, or is there a more explicit method of setting the plans?
PM – Yep, I blogged about it here:
Clayton – I’d love to explain it in a comment box, but it’s a pretty big topic. I’d recommend Grant Fritchey’s new book, “SQL Server 2008 Query Performance Tuning Distilled.” I’m in the midst of reading it as we speak, and it’s chock full of good stuff.