Hi. I’m a former developer who’s moved into database administration, and here’s what I wish somebody would have told me when I got started.
7. SQL functions rarely perform well.
Good developers like to reuse code by putting it into functions, and then calling those functions from multiple places. That’s a great practice in the app tier, but it has huge performance drawbacks in the database tier.
Check out Paul White’s excellent post on Forcing a Parallel Query Plan – in particular, the list of things that produce a serial zone in the plan. Most functions will cause your query to go single-threaded. Sad trombone.
If you do want to reuse code, consider stored procedures and views instead. (Granted, they can come with their own performance drawbacks, but I’m just trying to get you started on the right foot as quickly as possible here, and functions are a broken foot.)
6. “WITH (NOLOCK)” doesn’t actually mean no locking.
At some point in your career, you’re going to start using WITH (NOLOCK) on everything because it gets your query results faster. That’s not necessarily a bad idea, but it can come with some surprising side effects.
When you query a table – even WITH (NOLOCK) – you take out a schema stability lock. No one else can change that table or indexes until your query is finished. That doesn’t sound like a big deal until you need to drop an index, but you can’t because people are constantly querying a table, and they think there’s no overhead as long as they use WITH (NOLOCK).
There’s no silver bullet here, but start by reading about SQL Server’s isolation levels – I bet READ COMMITTED SNAPSHOT ISOLATION is an even better choice for your app. It gets you consistent data with less blocking hassles.
5. Use 3 connection strings in your app.
I know, you’ve only got one SQL Server today, but trust me, this is worth it. Set up three connection strings that all point to the same destination today, but down the road, when you need to scale, you’ll be able to set up different database servers to handle each of these:
- Connection for Writes & Realtime Reads – this is the connection string you’re already using today, and you think that all data needs to come from here. You can leave all of your code in place, but as you write new code or touch existing pages, think about changing each query to one of the below connections.
- Connection for Data 5-15 Minutes Old – this is for data that can be slightly stale, but still needs to be from today.
- Connection for Data as of Yesterday – for management reports and trending data. If you run an online store, you might pull reviews from here, for example, and tell users that their reviews take a day to publish.
That first connection string is the toughest one to scale; we don’t have a lot of options in SQL Server to scale out multiple servers that handle writes. (We do have options – they’re just painful to implement and manage.) The lower-tier connection strings 2 and 3 are much, much easier and cheaper to scale. For more about this technique, check out my 3 Favorite Connection String Tips.
4. Use a staging/apptempdb database.
Your app probably uses the database for some scratch work – processing, sorting, loading, caching, etc. It wouldn’t break your heart if this data disappeared, but you’d like to keep the table structures around permanently. Today, you’re doing this work in your main application database.
Create a separate database – call it MyAppTemp – and do your work in there instead. Put this database in simple recovery mode, and only back it up once daily. Don’t hassle with high availability or disaster recovery on this database.
This technique accomplishes a lot of really cool scalability stuff. It minimizes the changes to the main app database, which means you get faster transaction log backups and differential backups for it. If you’re log shipping this database to a disaster recovery site, your important data will arrive faster – and not be impeded by all the scratch work. You can even use different storage for these different databases – perhaps cheap local SSD for MyAppTemp, keeping your shared storage connection free for the critical production stuff.
3. Yesterday’s articles and books are often wrong today.
SQL Server has been out for over a decade, and a lot has changed over the years. Unfortunately, the old material isn’t updated to cover what’s happening today. Even today’s material from reputable sources is often wrong – take this critique of Microsoft’s Performance Tuning SQL Server guide. Fellow Microsoft Certified Master Jonathan Kehayias points out a bunch of really bad advice that comes straight from a Microsoft document.
When you read something that sounds like good advice, I like to try the Anti-Doctor-Phil strategy. Dr. Phil preaches that you should love every idea for fifteen minutes. Instead, try hating it – try to disprove what you read before you put it into production. Even when advice is generally good, it might not be good advice for your own environment. (Yes, that includes my advice too.)
2. Avoid ORDER BY; sort in the app instead.
To sort your query results, SQL Server burns CPU time. SQL Server Enterprise Edition goes for about $7,000 per CPU core – not per processor, but per core. A two-socket, 6-core-each server rings up at around $84k – and that’s just the licensing costs, not the hardware costs. You can buy a heck of a lot of application servers (even ones with 256GB or more of memory) for $84k.
Consume all of the query results as fast as possible into memory in your app, and then sort. Your application is already designed in a way that you can scale out multiple app servers to distribute CPU load, whereas your database server…is not.
UPDATE: I’ve gotten a lot of comments wailing about how the app only needs ten rows of a ten million row dataset. Sure, if you’re doing TOP 10, you’ll need an order by – but how about reworking the query to avoid juggling so much data? And if the data sounds like too much for the app server to sort, it’s probably causing work on the SQL Server too. We talk about how to find those queries in the webcast listed at the bottom of this post. Also, keep in mind that I said “Avoid ORDER BY”, not “Never use ORDER BY”. I use ORDER BY myself too – but if I can avoid that work in the very expensive data tier, I’ll avoid it. That’s what avoid means.
(This part here is where the MySQL and PostgreSQL guys start screaming about how you can avoid licensing costs altogether with open source databases.) (This part here is where you would expect me to have a witty retort, but I don’t.)
1. SQL Server has built-in zero-impact instrumentation tools.
SQL Server’s dynamic management views (DMVs) can tell you all kinds of killer stuff like:
- Which SQL statements are causing the most load on your server
- Which indexes are wasting space and slowing down inserts/updates/deletes
- How fast storage is responding to requests on a database-by-database level (and even more fine-grained than that)
- Where your server’s bottleneck is, like CPU, disk, network, locking, etc
All you have to know is where to look, and we’ll show you in the How I Use the First Responder Kit class.
I disagree with #2 (“Avoid ORDER BY; sort in the app instead”) without any stipulations. If you are presenting a paged view of data, you should ABSOLUTELY sort on the db side, else you have to send the entire result set over the wire, sort and filer on the app server, and THEN present the sliced results. That can choke a network with unnecessary traffic if the result set is particularly large (which is why you present paged views of data in the first place). In most other cases, I’d agree with your advice.
Tyler – why not consume the entire dataset down to the app tier and cache it there? Then you can avoid repeatedly forcing SQL Server to build the list and pluck out specific rows for you.
No, on second read, I also start screaming in rage over this. (MS SQL of course.)
Sort on the DB. It’s designed for this type of operation.
Indexes have a sort order. Time your query without sort and order, time it with; you’ll see it usually costs 0ms extra. 0ms.
You’ll never convince me that this “don’t order by or sort on db” is a good rule of thumb. The opposite is true. The rule of thumb should say: “sort your queries on the sql server”.Some exceptions, such as very lightweight desktop apps.
That’s interesting that your queries don’t cost extra to sort – but I’d point out that milliseconds isn’t the only measurement of work in SQL Server. There’s also TempDB writes if your data is sorted on disk. I won’t convince you, though, and that’s fine, I understand. Thanks for the comment anyway!
Totally agree with Tyler and NNM and I’ve been doing data since either just before or just after Noah launched the ark (I forget). If you’re consuming a lot of resource on ORDER BY then your query isn’t properly tuned. If it’s a frequent query to provide, for example, paged views, then a) the query should be indexed to retrieve the rows pretty much instantaneously and b) there’s a concept called prefetch or read ahead (depending on DBMS) which means the DBMS will do asynchronous I/O for you to get the next few rows into a buffer ahead of the user requesting them, but it won’t necessarily go and read the results of the entire query until the app asks for them (the user clicks next page).
DB2 does this superbly and doesn’t even retrieve the entire result set if the app doesn’t request it (implements cursors perfectly and has a brilliant user agent thread architecture). SQL Server is less adept – the last time I checked it still retrieves the entire result set, but it doesn’t pass it across to the app until requested, and it does prefetch.
So if your user is typically going to look at the first 20 or 30 rows of say a 20 million row dataset, then that is the work that the DBMS will do before providing data – an indexed read of the first 20 or 30 rows with the next few pages of data being read ahead of the user’s request so he/she doesn’t wait for I/O – and that is what the user will have to wait for.
Versus reading 20 million rows into the app, usually across the network, and then sorting them? Uh-uh.
Caroza – several interesting things in that comment that I want to address separately.
“If you’re consuming a lot of resource on ORDER BY then your query isn’t properly tuned.” – In a perfect world, we’d tune every query. Of course, back here in the real world, that’s just not a realistic option.
“there’s a concept called prefetch or read ahead” – This works for a single table, but not two or more tables that are joined together. The database engine has to re-join the rows every time the query is issued. SQL Server doesn’t cache query results – it caches tables. Every time you issue the query, SQL Server has to rejoin the data together and sort it for you.
“DB2 does this superbly” – That’s great to hear.
“So if your user is typically going to look at the first 20 or 30 rows of say a 20 million row dataset” – Then of course we’d be selecting TOP XX, not the entire dataset, and the ORDER BY would make sense. However, this next part:
“I/O – and that is what the user will have to wait for.” – If you’re querying multiple tables, SQL Server has to put the results together before it can start delivering the data to you. Unless you use the FASTFIRSTROW hint, you don’t get results one at a time – SQL Server builds the dataset and then outputs it for you.
‘“If you’re consuming a lot of resource on ORDER BY then your query isn’t properly tuned.” – In a perfect world, we’d tune every query. Of course, back here in the real world, that’s just not a realistic option.’
No, but you tune the ones which give you the most pain. So presumably if you’re trying to optimise resource usage on a query, we’re allowed to assume that it’s one which is hurting and thus a candidate for tuning.
‘“there’s a concept called prefetch or read ahead” – This works for a single table, but not two or more tables that are joined together.’
Your suggestion was to move all the data to the app and sort there instead of doing an ORDER BY. Now you’ve introduced a joined query, which you didn’t specify in your article. Are you going to do the join in the app as well? Unless you’ve got a particularly difficult business requirement for a query or your database is badly designed, very often the indexes which support a join will support an order by as well, so it would make even less sense to sort in the app in this case.
The point about the paging query was that the user could potentially request all the rows although he/she will almost never do this in practice. How you position/whether you requery for subsequent pages is an app design issue. But if you’re doing the sorting in the app you have no option but to make the dbms retrieve everything first, regardless of the size of the total result set relative to what the user might actually want to see, and then pass it across the network, and then sort it in the app, before the user sees anything. This does not make any sense.
“No, but you tune the ones which give you the most pain.” – Why wait until they cause you pain? Why not start with good practices and avoid the pain to begin with? That’s what my tuning tips are about. I totally understand if that’s not your style, and that’s okay.
“Now you’ve introduced a joined query, which you didn’t specify in your article.” – Again, I understand if your queries only hit a single table. I just don’t see a lot of those kinds of queries, but our experiences appear to be different.
“Are you going to do the join in the app as well?” – No.
Sounds like we’ve just got different backgrounds. Your approach is totally legit for your background – if you’re dealing with single-table queries and you want to performance tune only after things become a problem, then you’re doing the right thing. Thanks for the comments!
I’m working with an app that uses loads of views and a good many of them have an order by clause. When looking at the query plan I often see that the cost is usually 25% and sometimes more. On top of this, these views are not usually directly selected from. Rather, they are joined as part of a larger query making up an even bigger view with multiple joins to views like what I described and yet another order by for it. Too may times these views do not have a where clause in the statement the app issues so far too many rows get returned. In addition to the order by cost all these views have a high DOP which makes it worse and some of these views call UDFs but I got rid of most of that.
Get only the data you need and sort that on the app side and all should be well. It is when a developer doesn’t understand that ordering a result set of hundreds of thousands of rows is problematic for the server.
Although it’s a relatively old post, I want to comment and be the Devil’s (Brent’s) advocate.
What if your app is flexible enough and allows the user to sort the dataset in every possible way (by clicking on the column name on the results table)? How can you handle the different ORDER BY options with relevant indexes?
You just can’t.
Or your app is not flexible enough…
some of you may be dba’s and some of you mite be decent programmers
He never said dont use order by , think about the app , the database, how busy is it there are tons and tons of record sets that go back to a app for reference combo boxes etc
all of these querys can be sorted in the app code.
with a Busy Application or Site this equates to better performance in the long run
A Database is designed to be a data work horse. You should NEVER have to sort data retrieved from a database on the application side. ALWAYS let the database do the work. Sorting in the code in the application will just give the webserver yet another task to perform. You can argue this till the cows come home if you want but if you think it’s better to sort on the application side then you are wrong. Been doing this for 25+ years now and have tried both ways.
Julian – I’ve reread this comment a couple of times, but I haven’t found any supporting evidence in it. I do appreciate that you swing both ways, though.
It depends on the size of the dataset: meanwhile, there’s a third way, especially if you’re passing data to a client via a web service: in cases like that, it sometimes does make sense to make the sorting part the client’s problem. But If you’re passing a large dataset, then usually the SQL server is better at sorting the data: it’s what it’s designed to do, after all. Indexing the thing properly does mitigate the load on the server to a great extent.
It’s an old post but it might help anyone referring here. If you really want to sort in db then first get the dataset wrap it in derived table and then apply the order by. It sorts the results and not the nasty copies in tempdb.
Brent – I still disagree. Pulling down millions of rows to the app tier makes for a LOT of network traffic (albeit in one large burst), and if you expand the app tier by, say, 50 servers you then have to prime all of those caches, which is again, a LOT of traffic. Those caches also have the potential to get out of sync. The DB, with it’s tried-and-true indexes, is *perfectly* suited to do this work. It’s simple, it’s efficient, and you don’t have to worry about hitting a cache that is outdated, priming new caches, large bursts of network traffic, etc.
Caching some data on the app tier makes sense in many cases, but for just a paged view on (if you do it right) an index? I don’t see the need to add complication for that.
Tyler – instead of caching in 50 different app servers, you’d use a cache tier like Redis, Memcached, or Windows AppFabric Velocity. That’s just one cache to rule them all.
Using Order by desc only affect the performance.if it’s in the ascending order there is no issue for the performance.
Ramesh – uh, no. Why do you think it wouldn’t cost anything to order by a field when there’s no index, or if multiple tables have been joined/grouped together? How do you think SQL Server’s going to put that data in order?
As a full-time developer working with 2008 R2, I have a slightly different perspective:
#7 In our system inline UDFs are as fast as is goes, there is no performance penalty. Also they are convenient to use: if I add a column to the result set, it does not break the code that invokes the UDF.
On the other hand, if I reuse a select wrapped in a stored procedure, there is performance penalty. Besides, catching the output of a stored procedure requires me to write redundant boilerplate code. On top of that the boilerplate code is brittle – if I add a column to the output of a stored procedure, all the modules that use it are broken and need to be fixed.
As such, in our 2008 R2 system an inline UDF is the most common way to reuse code.
A have no experience with 2012, and do not plan to upgrade any time soon. Are there any changes that caused inline UDFs to become slow?
#6 Enabling READ COMMITTED SNAPSHOT in 2005, 2008, and 2008 R2 may introduce hundreds of subtle bugs caused by concurrency, eroding our users’ confidence, annoying and causing lost time for both users and developers. In our 2008 R2 system SNAPSHOT is a safer choice.
#5 In our Agile environment, developing things that we do not need yet is almost always counter-productive. Although it is possible that we might need those three connection strings later, there are hundreds of other things that we anticipate. We cannot protect against them all.
Quite likely some un-anticipated change will happen first. If that is the case, we shall end up wasting time, maintaining these three strings even though we do not need them yet, and might never need them at all.
As such, Agile teams avoid spending time on features that are not necessary right now – it is cheaper to add them when they are actually needed.
I agree that the change in EE costs is huge. Yet I think there are many alternative solutions. Given the amount of money involved, migrating to PostreSql, for example, may be cheaper than changing all the client apps to add sorts as you are suggesting.
If we go for that alternative, we can also benefit from faster/easier troubleshooting, doing it ourselves with a debugger, and from very fast hotfixes.
What do you think?
AK – here’s my thoughts:
#7 – if your data is small enough to only need one CPU core, then you won’t notice a performance hit from UDFs. Of course, as it grows, you won’t be able to take advantage of parallelism, and that’s where the problem arises. When you say “if I reuse a select wrapped in a stored procedure, there is performance penalty” can you elaborate on that? About versions – the UDF problem has been present throughout SQL Server’s history. Check out the link I included in that section for more details.
#6 – That’s fine. The links I included give you more information about choosing the right isolation level for your app, and snapshot can be a great choice too. Be careful when you say “may introduce hundreds of subtle bugs caused by concurrency” – I’d argue that the solution to that would be serialization, and nobody wants that answer.
#5 – Yeah, I get called in a lot to fix performance problems created by Agile teams. 😀
#2 – If you can change database back ends faster than you can move ORDER BY into the application, then by all means, go for it. Seems surprising though – you’re probably going to have to change the queries themselves if you change back ends.
I can attest to UDFs. At first I used them for simplicity sake, but eventually my procedures that used those functions started performing badly. I finally took time to get rid of them and I got a 93% performance increase……93%!
When talking about Functions you need to distinguish between table-valued / scalar-valued and aggregate functions. I have found large improvements in performance by using table-valued functions with input parameters compared to views when querying large tables.
As always test in your own case.
I agree with darkdusky here. Actually the whole proposition “avoid functions, use views or Stored procedures instead” is weird. In most cases where you could use a view or SP instead of a function, the function does not imply performance penalties compared to the alternative! SPs ARE super heavy and single-statement tabular functions are in now way slower than views.
Tec-goblin and darkdusky – would you mind posting links to back up what you’re saying, as I did? Thanks!
“Yeah, I get called in a lot to fix performance problems created by Agile teams. :-D”
– LoL! Ain’t that the truth, bro?
“#5 – Yeah, I get called in a lot to fix performance problems created by Agile teams”
You have just described 95% of my business.
“Yeah, I get called in a lot to fix performance problems created by Agile teams. :-D”
Haha! Sometimes I think developers use the term Agile to just quickly push out code and then know they’ll fix whatever is broken during the next scrum. Seriously, having three connection strings in a configuration file is “counter productive”?
Neither is it actually productive if you haven’t already changed the application code to make use of the connection strings. The idea of splitting up the data is nice, but reserving some config settings doesn’t help.
Expertise in one area does not make us experts in another one, even it they are related. For example, being fluent in German does not make us fluent in English, and vice versa, even though these two languages are related.
Similarly, no amount of expertise in software development makes us competent enough to advise DBAs on administering databases. And vice versa: no amount of expertise in database administration makes us competent enough to advise developers on software development.
Regarding your following statement “having three connection strings in a configuration file is “counter productive”, you have omitted the key component. let me add it back: “having three connection strings in a configuration file, *if we do not need them now*, is counter productive”.
Why? This is a clear violation of YAGNI principle: You Are Not Gonna Need It. I encourage you to read about it on wiki, as well as books by Kent Beck, Martin Fowler, and Bob Martin. These books are not a substitute for real life experience, but they might help you understand our point of you.
Violating YAGNI is like smoking a cigarette – doing it just once won’t kill you, but it is a very bad habit.
Regarding your statement that “called a lot to fix performance problems”, you are observing “survivor’s bias. IT projects have a dismal success rate. Most do not make it. Most projects fail early and as such never need to scale, and do not need consultants to speed it up.
Only the projects that succeed need and can afford scaling up and such. You can read about it more in the books I have suggested.
AK – great thoughts. I’ll touch on a few of them:
“And vice versa: no amount of expertise in database administration makes us competent enough to advise developers on software development.” – That’s an interesting thought. I would hope that software developers would want to hear how they can improve how their app works with a database, though. Software development doesn’t happen in a vacuum, right?
“This is a clear violation of YAGNI principle: You Are Not Gonna Need It.” – If you develop applications on a database platform that costs $7k per CPU, you actually need these tips when you go live. It’s not premature optimization – this is just good basic groundwork. Your book recommendations are quite good (and I’ve read a few of ’em) and I’d encourage you to check out my book as well.
“Survivor’s bias” – you’re absolutely right there. I tend to be involved in projects that have gathered momentum and are used by paying customers, also known as survivors. If your goal is to design a – well, dead app – then you’re totally fine skipping this advice. As Jeff Atwood has repeatedly blogged, though, performance is a feature, and building speed in from the beginning will get you more users.
But of course, if that’s not your goal, I totally understand not taking basic precautions for performance. It sounds like you’re doing just fine without it, and that’s cool too. Thanks!
When you say inline UDF, are you talking about an inline table-valued function (ITVF)? These do perform well, and are optimized similarly to views (with the addition of parameters). ITVF lie between views and stored procs on the complexity and reusability scale and are generally no more harmful than a view (unless there is some aggregation which stops the optimizer from being able to do it’s job as well as it might because you’ve re-used something which you shouldn’t really have). Because they can be referred to as easily as a table or view, ITVFs are highly reusable.
I’m pretty sure Brent is specifically referring to scalar UDFs (and multi-statement table-valued functions), which a lot of developers will use to make modular calls to transform data or calculate a result from a few columns.
Scalar UDFs do perform very poorly on large sets sompared to the same code repeated inline (if possible).
I think some comments might be coming from the perspective of small data. I have handled large databases which started off with no penalties on using inline UDFs, for instance, but as the data increased, it became very difficult to keep the UDFs and the views… I guess we might all think different after we handle large amounts of data
This is total garbage, from the suggestion of stored procs over UDFs (LOL, how reusable!) to ordering result sets in the application layer. Yeah, I’ve got 100,000 results and the end-user only needs to see 10 of them. Sorting at the app layer is definitely the right way to go, heck you might as well do your joins there as well! To hell with indexes, it’s too much work for an RDBMS to handle. Since we’re doing so much work on disposable app servers, you can completely throw out step #5 and do caching at the app server as well.
Hmmm – you can’t reuse stored procedures?
2. Avoid ORDER BY; sort in the app instead.
To sort your query results, SQL Server burns CPU time
What if my SQL server is a monster; a beast. The web server is just barely surviving as it is.
I’d shoot the dev that reads this and starts sorting on iis.
When you say SQL Server is a beast, go back to the licensing cost issue. If you can avoid spending $84k or more on the SQL Server, that buys a lot of IIS boxes. Powerful ones at that. Don’t put all your money in the database server alone – you end up looking like the roid rage guys that only work out their upper bodies.
The licensing costs are why SQL Server 2008 (and 2008R2) servers are extremely likely to remain un-upgraded for a very long time — and I wouldn’t be surprised to see an ongoing de facto market for purchasing 2008R2 instead of 2012 for new SQL Server installations. I suspect that somebody from the Vista marketing team has found a new home coming up with pricing strategies for SQL Server 2012.
As to over-building that database server, I work in an environment where we’re authorized our own hardware for a database server, but a new application server gets a cookie-cutter spec of a VM with 2 nominal CPUs, 4 GB of RAM and a slice of disk space on an over-committed SAN — and that’s after documenting that the existing app server is running effectively maxed out for 3/4 of the business day. So applied business logic means that I’ll do sorting and heavy lifting where I can provide adequate resources to do it — and that goes double if it means not sending 1.5 million records over a wire for consuming large chunks of that 4 GB of RAM.
William – that’s hilarious! You’re allowed to pick any hardware you want for a database server that costs $7k per core, but you have to suffer with tiny app servers? That’s some funny priorities. I bet if you were in charge, you’d turn that around – I certainly would. Have you thought about putting together a holistic proposal to show management how much money you could save by right-sizing the tiers? When I was a full time DBA in a similar situation, I got a five-figure bonus by right-sizing the SQL Servers. I saved the company millions on licensing.
I’m just an employee on a fixed-price contract. The SAN, network, and VM infrastructure are operated by other contractors, so it’s highly unlikely any proposal by me would be taken seriously by them.
Besides, I rarely get asked to assist in speccing hardware — the last time I injected myself in that decision process, the purchase price for a server dropped from over $20K to somewhere around $5K, if I remember correctly. Nor was any bonus offered when I did the legwork to justify converting a finicky Oracle RAC configuration to a single-instance server based on the actual workload.
William – remember, just because you’re on a contract today doesn’t mean you’ll always be on it. Show value so that they keep you around, and show architectural value so that when people move to other companies, they’ll bring you in for more architectural services. (And that’s where the money is.) Check out the book Secrets of Consulting. Hope that helps!
One of my cardinal rules for staying employed is to always make sure that management sees me as providing more value than expense, preferably by a couple of orders of magnitude. I can honestly say that the cost savings I provided in my “spare time” during my first tour in Iraq were more than enough to pay for my salary since then (including 2 more trips over where I was getting paid hourly for 12×7 work week plus an “unfriendly neighborhood” uptick), most of which time I had a conveniently vague job description (a defined deliverable or two, plus “other stuff as needed”).
But I will definitely check out your reading recommendation — if nothing else, it may help me avoid more of those awkward moments when I stubbornly attempted to speak unpleasant truths to power.
“and I wouldn’t be surprised to see an ongoing de facto market for purchasing 2008R2 instead of 2012 for new SQL Server installations.”
Except that now that SQL 2012 is released, you cannot buy a SQL Server 2008 R2 license. You can only buy a SQL 2012 license, and apply that to an install of SQL 2008 R2 if you so desired. The only way you could still get SQL Server 2008 R2 licensing prices is if you have an active EA or other agreement with MS, and then that only lasts for the duration of the EA.
Upgrading from an older version to a newer version won’t cost you a dime if you have SA with your SQL Server licenses. You would simply do a “true up” where you convert your per-processor licenses to per-core licenses (assuming you’re using per-proc licenses).
Oh, and last time I looked, a quad core processor system would cost exactly the same with a SQL 2012 Per-Core license as it would have on a SQL Server 2008 R2 Per-Processor license. Same with dual Quad Cores. It changes prices when you have processors with more than 4 cores – 6,8,10, etc. Then you’re paying a premium for those additional cores. I’m not aware of that changing.
Jason – right, but it’s getting pretty tough to find a 4-core server. Try buying a new HP DL380 or DL580 with 4 cores per processor.
Is it? I’ve been pushing our Infrastructure manager to buy me new Cisco servers with 10 Core procs, so I can maximize my license true-up when our EA is renewed early next year!
Regarding #7: My experiences are functions are awful. They’re good for code-reuse, but can seriously hit performance. I use them, but tentative and mostly only for non-time critical functions.
I avoid them when I can, but real life dictates sometimes you’ll need such things.
#2: Not too convinced about this. I’d say it’s very much a case-by-case basis. At times I encourage orders in the code, other times, the result sets doesn’t even come from a DB but from a search engine (lucene for example) and at times – ORDER BY is definitely the better approach.
I think what people react to is the ‘certainty’ behind the statement more so than the suggestion that at times – it can definitely be better to sort in the application layer.
Also – my own input regarding #4 and #5:
Remember to use different user name / log in for the various connection strings and staging/production database.
It’ll not only help you keep track of rights, but it’ll also help you with debugging a process if you just from the username can see which it is.
A common ‘mistake’ I see when making such configuration (especially the staging/production set up, coupled with a similar development set up which can mean loads of active connections) – is the reuse the login/user name across the setup and then debugging which application actually is blocking because of which. 🙂
Your post is great. Do not forget that every situation is unique.
For example, it is true that using too much or not adequately clause “order by” is a vice. If my returned data are few, sort them later allows us to save CPU time. By cons, if I have a web application, I can not afford it. I need to use better indexes.
Also, if I return a lot of data, I will not slow down the CPU time on the server, but I will significantly slow network speed. All users of the system will get affected.
There are no perfect solutions.
It is probably for this reason that your point “3. Yesterday’s articles and books are Often wrong today.” came out. Each situation is unique. What we read is often true, but is false in our context. Remember the “algebra of set” in elementary school. If A is part of B and C is also part of B, this does not imply that A is B.
Nothing personal, but the only thing I don consider complete garbage in this post is number 1, “SQL Server has built-in zero-impact instrumentation tools.”
I think the other observers have argued successfully against them. Seriously, inexperienced SQL Server developers and administrators should consider this article bad advice from the word go.
Dan – can you elaborate on why you’d consider it bad advice?
I am really surprised by some of the comments on this article.
My guess, some people are worried to see their job stolen. A bit like at my current client location, they have the idea that writing all the business logic in SP is the way to go while having DBA only authorized to run scripts.
Great post! I think what everyone commenting here seems to be missing is the universal SQL Server byline that I imagine Brent has internalized so much he didn’t write it out here – It Depends®
No technique is best for every situation, but these ARE good general things to consider. The NOLOCK thing especially since a lot of devs seem to use it religiously and then get upset when schema locks occur.
You can of course find any number of situations in which every bullet point here would NOT be great advice, but that’s nitpicking. You should generally obey the posted speed limits but if you are rushing someone to the hospital or being pursued by an avalanche it’s OK to break the rules.
Similarly if you are choosing 10 rows out of 10m then by all means ORDER BY in the DB. If you are choosing 900 rows out of 1000 then you should bring the whole result set back to the app layer.
Great and sound advice. These are excellent guidelines, and what others don’t seem to understand is that many of these items aren’t hard and fast rules not meant to be deviated from. Certain environments may not allow particular points to be followed 110% (i.e. a third-party application that you can’t just alter code to sort from the app tier). But it doesn’t mean it’s not worth mentioning and explaining.
To all those writing comments with haphazard attempts to disprove Brent’s guidelines, take these as good lessons. Lessons learned from a data professional that has seen more than most ever will with poorly designed environments and dreadfully implemented practices. Nothing will ever replace good planning and knowledge your environment (otherwise a robot could do this job).
Great stuff, thanks for the great read!
My pleasure, sir, glad you liked it!
Brent, you are dodging the crazy like Neo. Keep up the good work! Commenters, dude is pretty smart, so think twice, type once.
Definitely dodging like Neo, but a little more passive aggressive than Neo…..
Interesting points on both sides and I like any other pieces of advice, ‘Your mileage may vary depending on actual driving conditions/situations’.
In the real world the SQL server should do all of the database tasks that is what it is designed to do. The app and app server should only display and fetch the data. This is another article written with advice for a niche application. I hope I never have to sweat out the nano seconds the SQL server took to order my data. It is not worth the time to sweat these small details when the database and app can be up and working without a developer trying to tweak out nano seconds on a single query in a app. There are many ways to skin the cat so to speak. Most database designs are more simple and designed properly in the first place. Your experience may vary.
James – you write that “The app and app server should only display and fetch the data”. That’s certainly a valid take. Other design approaches include things like MVC, or separating the business logic from the data storage.
“I hope I never have to sweat out the nano seconds the SQL server took to order my data.” – Hey, no problem, if you do, that’s when I come in. I’m a SQL Server professional who performance tunes databases and code. When people call me in, it’s because someone underengineered an app, and it’s falling over due to load. That’s why I published this advice – I’ve been there, and it’s a doggone shame when I have to say, “80% of this SQL Server’s work is involved in just sorting data.”
Hey Brent, I think this is great advice and I’m going to send out this link to all the devs in my department. I’m a little bit surprised at all the negative feedback especially related to functions and ordering.
I know that I personally made quite the impressive “code reuse” set of UDFs with cursors and recursion to replicate some API algorithms for a product of ours (for QM use). After I got over my self awesomeness and my code reuse ideas a year later I rewrote them into CTEs and the execution time dropped from several minutes/timeout to about 10 seconds.
I agree about order by too. Especially in views. Why oh why would you ever put an order by in a view. I know that it’s useful in “GUI design” but after that its just a waste of processor. Whatever the last caller is determines the sort so all the sorting in-between is worthless (I spent some time last week removing sorts from some views. For the several seconds it shaved off of some run times, I’m very dubious of the “0” performance hit)
Thanks for the tips!
Order by in a view is thrown away by the optimizer, and as such it is never a “waste of processor.” A waste of characters, yes. I don’t think that’s the “ordering in the database” that Brent wants you to think is evil.
You, sir, appear to be correct. The views in question had a TOP 2000000 and an ORDER BY and I thought it was being forced but a look at the execution plan says otherwise. I would guess I was probably noticing a plan caching “speedup”. I need to come up with a better method of actually testing query execution efficiencies. Thanks!
Hi Brent.. my2cents along…
7. SQL functions rarely perform well.
Nothing else to add ..
Of course, as some users are commenting, the implications vary a lot, but as a rule of thumb this makes more than sense.
6. “WITH (NOLOCK)” doesn’t actually mean no locking.
This can’t be stated often enough
5. Use 3 connection strings in your app.
I especially like this approach. I have gone with 2 so far. Unfortunately it’s always quite a hassle to convince a developer team to prepare for something later on along the road
4. Use a staging/apptempdb database.
Couldn’t agree more: I always like to take databases apart for maintenance and security reasons
3. Yesterday’s articles and books are often wrong today.
Unfortunately, trying to prove something (or the opposite) takes time and thought.. which often tends to be avoided.
2. Avoid ORDER BY; sort in the app instead.
This is a real “it depends”. I think the different reasons for either one have been elaborated on throughout the comments 😉
1. SQL Server has built-in zero-impact instrumentation tools.
I slightly disagree with the “zero (and always) impact”. There are some DMV/DMFs that actually should not be run if the resources are low.. ie reading plan cache.
In general of course, they are very lightweight and everyon should know about them – And I think this is what you want to say mainly 🙂
Thanks for the good list.. and the feed for a lively discussion 🙂
Brent, I respect you a great deal and utilize your information all the time so I say this with all due respect. #2 doesn’t make a lot of sense in real world scenarios. I can see if you are getting 30 items to put in a drop down list or something that you can sort in memory but I have seen many developers fall into the trap of thinking they can do everything in memory. Many systems use XML columns to store data in them and that eats memory like it’s a contest. Also, as others have mentioned based on the scale of the application you could have millions of rows in the database. Normally when an application is servicing data it is in small chunks like pages. I think most would agree that when an application is setup correctly you shouldn’t need to go to page 2 or 3 of a result set very often. So having all that data in memory is a waste if isn’t every going to be served to a user.
Ben – so, if we’re not going to show the user all that data, why are we querying it? If a system uses XML to store data, why are we making SQL Server sort that?
To be clear, we wouldn’t query it. I am on board with others who are talking about the paging scenario. But the pages always have a default sort order which is optionally set by the user. So to get the correct page of data you have to sort it in the database.
The xml column wouldn’t be sorted but it is in a column so it is returned in the result set.
Ben – then why not pull the result set down onto the app tier just once, sort it there, and repeatedly serve the same data out of cache? Why keep hitting the database server over and over for each page of results?
Even better, use that same cached data to serve result requests from multiple users, and avoid multiple hits to the database.
Of course, I’m not suggesting things like that for lists of bank balances, but for lists of, say, stock trades over the last 10 seconds, cache once and serve repeatedly.
In my experience, permissions have made it more difficult to cache things intelligently. Although we do use the cache but to find out which 10 items you have permission to and are sorted in this order get to be more specific for each user. The database does a great job giving small pieces of that. In addition, normally you want your apps to manage their own caching which means we have created objects out of all that data already which has additional overhead.
On another note, on the XML columns I would love to hear your opinion on having them in individual tables versus having one table that those tables reference. So, for example, you may have a Person table with an xml column containing a collection of their favorite links. But you find you are using other XML columns for similar purposes (metadata) throughout the database. I am leaning towards having one table with an XML column, type, and id to be referenced by the others.
A really nice article,
food for thought to wrap my mind round that will make
me a better dev.
A contribution to the previous comment on order by:
How about the following scenario:
Imagine some idiot performs a search on a huge table (think millions of rows)
returning the entire table. You cant pull the entire table on the app device,
so you use TOP 1000. You do your sorting on the app, and not on the db,
and your users will automaticly make wrong assumptions, for example that
since ‘alan’ comes after ‘abe’, there is no ‘adam’ in your database.
It happened to me in the past. Your users, who have no IT mind at all,
WILL treat this as a bug, not as a feature, no matter how well you explain
MrPerq – if you only got the top 1000 rows from the database then you sorted there already. No need to sort again.
Typically in search environments, users don’t need real time results for the entire DB. In that case, you can either search in cache, or tell users their search returned more than X results and they’ll need to get more specific.
Except ORDER BY is overloaded, and sometimes “TOP 1000” is actually achieved using a CTE and ROW_NUMBER. Sorting isn’t always desired to be the same as TOP – for example show me the top 10 finishers of a race, from 10 to 1.
It’s funny; I would have normally placed a caveat by point #7. I’ve always believed that inline table-valued functions (not their ugly stepsisters, the multi-statement TVF or the scalar function) would perform reasonably well since the optimizer should treat them as a parameterized view. Oddly enough, I’ve been recently struggling with a project that uses a lot of inline TVF’s, and I’ve discovered that the optimizer doesn’t handle parameters well. In other words, I get very inconsistent performance when I do:
DECLARE @var int =1
SELECT col FROM inlineTVF (@var)
SELECT col FROM inline TVF (1)
Same plan, but the first version takes seconds more than the latter when delivering actual data to the application.
The performance difference only became noticeable when we scaled up the database from a few thousand test subjects to a million test subjects; also, note that these TVF’s are complex. The whole experience has made me very gun-shy about functions again; I’m now refactoring them to use views, which seem to interpret parameters/non-parameters consistently.
I also wanted to mention (before someone else did) that when testing the two sample conditions I described above, I did rule out plan re-use ( I flushed the plan cache between executions); the timing was still the same. On complex inline TVF’s with lots of data and with supplied parameters (as opposed to a view or the same TVF with hard-coded inputs), the supplied parameter version was always slower.
Stuart: Get friendly with OPTION(RECOMPILE).
I’m going to jump in on the #7 point too. I was pretty surprised to see a blank hate on functions when I’ve seen other people I respect present deep analysis of why inline single-statement functions work well (hi Adam!) when used properly and I’ve had good success with them in the past. Maybe it was just a ‘if you don’t know how to use them properly’ general audience approach? I worry that without the nuance on the point it risks propagating another myth that all types of functions are created equal, when they are clearly not.
For the record, no complaint on the other points.
I think something that should have made the list is use of SQL Profiler (even though it is being deprecated, it still exists now). SQL Profiler has saved me countless hours of application debugging time as an application developer and as a database developer when I show this to app developers they really see the value in it.
I have another issue concerning connection strings. Particularly that 3rd connection string is about report data. Reports shouldn’t be handled the same way as CRUD, period. There are AWESOME tools for reporting in SQL Server (SSAS Cube, SSAS Tabular, COLUMNSTORE indexes), but they all require a very different way of programming and a different database, thus a different data access layer.
Actually complicating things for developers who don’t understand databases nor BI anyway is not the solution (I’m just thinking of what happens when Entity Framework sees two connection strings in a transaction and kicks in DTC, such fun!). Separate the concerns and have amazing gains in performance.
this article should be called “7 Things Wannabees Should Know About SQL Server”.
as a SQL professional I disagree on all points except maybe #5
#7 is not always true. Real SQL Developers know there age good practices and bad practices. Wannabees look for rules of thumb.
#6 Do you think users query the database more because they think NOLOCK has no impact? They do it because they need to. And the example is ridiculous; you won’t be able to drop the index as long as the table is in use regardless of the NOLOCK hint.
#4 There is such a database called TembDB. It gets rebuild every time SQL server starts.
#3 I object to the word “often”. Sometimes… maybe, but often? Have you compiled a statistic of the good vs wrong advices in said books?
#2 “order by” will definitely require a sort. But so do “distinct”, “group by” and other such constructs. Therefore it sometimes happen be that order by can use a previous sort. With no penalty. Again, professionals should not rely on rules of thumb.
#1 nothing is zero impact. The overhead is built into SQL Server.
Legion – thanks for stopping by! A few thoughts:
“#7 is not always true.” – Did you notice the word “rarely”? Rarely means that it is possible that functions will perform well – it’s just rare. I think we both agree here.
“#6 Do you think users query the database more because they think NOLOCK has no impact?” – Yes, and I’ve had developers tell me that they just queried the database every time for simple configuration data because they thought NOLOCK meant there was no impact.
“#4 There is such a database called TembDB.” – Unfortunately, in TempDB, you’re competing with the database engine itself. Things like RCSI are actually built in TempDB now.
“#3 I object to the word “often”.” – Okay, cool.
“Again, professionals should not rely on rules of thumb.” – Hmm, I think we disagree there. Professionals get started with a set of ground rules, and then vary from those when requirements dictate. If you make careful, reasoned decisions every time you do anything, you’ll never deploy anything. 😀
Here’s another one: if you’re writing CLR which returns large datasets, don’t use context connection=true connection strings, build arrays and return them; instead, use a trusted (external) connection, then use “yield return” to return each element in the IEnumerable function: this streams the data to the output instead of building a large structure in RAM and then iterating through it. You need to use “Enlist=false” in the connection string (which isn’t supported by context connections, thus the need for an external connection). Your connection string looks like this: “Data Source=.;Initial Catalog=MyDB;Integrated Security=sspi;Enlist=false”. Try it: it’s brilliant…
Dan – errr, wow, that’s a bad idea. Pre-SQL2012, CLR runs outside of SQL Server’s memory space. If you do that frequently, you can end up with horrendous memory pressure.
…well (a) I’m running 2012; (b) there are some things you need CLR for, especially if they involve things like reading data from a web service somewhere. It works fine for me, at any rate, and streaming data that way is preferable to filling up RAM with 30,000 records and then returning the data set.
Note that I’m not suggesting using CLR to replace normal stored procedures – I’m saying that if you have no choice but to use CLR for something like this, then that’s the way to return large datasets.
Friends shouldn’t let friends do stuff like using the CLR on the database server to execute web API’s and the like. This kind of stuff belongs in the application tier.
Not when you’re using an off-the-shelf application it doesn’t. If you can’t modify the application but you still want it to react in real time, then you have little choice in the matter.
Nope, no extreme memory pressure with that technique. Weren’t you paying attention when I taught SQLCLR at the MCM class??? 🙂
Now that I’ve passed, I can honestly say, “No.” 😀
Well, at least you’re honest when you pull an opinion from some nether region instead of reality 🙂
How about this instead. Don’t use CLR for DML at all. That’s not what it’s there for.
What do you use it for?
I use it to manually force parallelism. This really smart guy showed me how to pass a query to CLR and then branch out multiple individual queries. That way I can leave MAXDOP set to 1 and yet – waaaaaait a minute, is this a trick question? 😉
I use it for things like, “Hit the USPS API to get all shipping options and prices for this order, then return that as a dataset” or “When someone ships this order from our (off-the-shelf thus unmodifiable) ERP client, notify our shipper via FTP to tell them to come pick it up” and so on.
@Brent: Hey, that’s a good trick. I think I’ll steal it.
@Dan: I was specifically asking Randy, since he made a blanket statement about what it’s not there for, but provided no input into a “valid” use case. (Me? I use it for “DML” all the time. I guess I’m not valid. Does that make me an invalid?)
Ah, yes… I also find it’s nice in terms of the way it can encapsulate itself in a transaction: for example, if someone cancels an order, use CLR from the update trigger on the table to go void the transaction at authorize.net as well: since the whole thing is encapsulated in a single transaction, if something fails, the whole thing is backed out.
I can’t tell if you’re being facetious here or not. Why would you want all that business logic buried inside your SQL Server? You’re using CLR (.NET code), so why not use it on a cheap and infinitely scalable application server rather than an expensive and very limited scalable database instance?
Brent’s article was a list of suggestions to developers on how to start out an application that will scale well. As a developer turned DBA, and one who lived through a horrible growing experience when a small little side project at a fortune 100 company was promoted and re-deployed to over 100,000 employees… I can relate to everything Brent has suggested. If you’re building small little one-off systems that will never be scaled to use by large numbers of concurrent users, then do what you want.
@Jason: The question of “business logic and where it belongs” is always a fun one to consider. First you need to succinctly define business logic. Then you need to figure out why coupling it with the data is akin to “burying” rather than “centralizing.” And of course you need to figure out why it might — or might not — present a scalability concern.
Added to this is the fact that there are various workload-specific considerations around scalability. A 100,000 user OLTP application has very different scalability concerns than, for example, a large analytic data warehouse. Both need to be scaled, but it’s a totally different exercise.
One size does not fit all.
I agree. The database server should not be treated like an application server. All this CLR code should be packaged up and pulled out of the Database Server into its own service which you could create (and scale). I say this with scalability as the long term goal.
@David So would it be fair to say that Microsoft made a huge mistake every implementing SQLCLR integration, and an even bigger mistake using it for several core SQL Server features? Should SQL Server itself require external services and application servers?
Just saw @Dan’s later responses on his use of the CLR, which is what I was questioning. In the face of supporting COTS and extending it, I can see valid use for it. I would worry about that type of use in a homegrown application though. I think CLR has it’s uses, but I would always strongly question and philosophize it’s use for business logic when/if an n-Tier architecture is available.
I agree, and it’s really only something which should be used if you can’t solve the problem in an n-tier environment. But the fact that you can encapsulate a remote transaction in a local trigger such that if it fails, so does the local update – that’s very cool. It’s also useful if you have multiple applications written in different languages on different operating systems all using the same database, and for which the same rules should apply: it removes the problem of having to write the same thing over and over again for different systems.
Great post… Not so sure about most of the comments. I will just sit here and watch.
/me loves “FREE ENTERTAINMENT”
its very good site i am impressed with article
Well done Brent. Most epic can-o-worms opening of 2013 so far. 😀
Surprised you didn’t mention DISTINCT queries. I often have to tune out developers overuse of distinct because they don’t take the time to understand their join conditions.
“SQL Server has built-in zero-impact instrumentation tools” – some Microsoft licensing wiz obviously forgot to make this an Enterprise Edition only feature 🙂
no, do you used “NoSQL”?
This post was helpful the first time I read it, but it’s even better when all the objections are covered.
But also: Can we send you another internet horde and watch you vanquish them? It’s like watching a chessmaster play a simul against “experts” 🙂
HAHAHAHA, thanks, sir. I think I’ll take a breather before I take on another horde, heh!
Quick question for #4. Should stored procs to move data from staging to the main db be in the staging db or the core one?
Another great article. You really shook up a hornet’s nest. I would be interested to read the comments if you tackled something controversial.
With regards to functions, there is a time and a place. There have been a few occasions where I would need a function that Oracle provided but SQL Server did not so I wrote one (greatest) or there was a situation where a function was the only way to satisfy a users need for something when they were using an Ad Hoc tool. My rule is to always look at what the function is supposed to do, how often it will be called and then look at alternatives so the initial response is NO when it comes to implementing a function and yes after no other (read better) alternative can be found.
Also, sometimes devs may not know that SQL Server already has a built in function or they do but don’t see the performance impact of not using it directly. As an example I removed a UDF that did LTRIM() and RTRIM() on a value and this thing was called all over the place. The creator of that UDF clearly knew the database offered those trim functions but it seemed logical to put them in a UDF and call that from everywhere. By getting rid of that UDF I was able to reduce execution times tremendously.
And always comment your function with the rationale for its existence and don’t skimp on details.
Love it, and good advice. All the commentators complaining about #7 and #2, keep in mind this is good advice as a default. If you know better, change it to your situation, but if you don’t, use it. If a developer doesn’t know much about SQL do you want them writing functions everywhere? You don’t, because you’ll get more scalar ones than TVFs.
As for #2, I’ve argued for this over years and rarely been successful. If you can join and shoot the data back to the middle tier/app, then you can sort it there. It’s not that hard, though it is a little development work. It’s not the query that’s the issue, it’s the 1000s of queries running constantly. I saw this fail an intranet app, getting > 25k simple queries/sec, for the same data, sorted. Once the database server died, they rewrote the queries (mostly menus and common “news” type data) to run rarely and cache the data at the app servers.
Agree on multiple connection strings. It’s not hard, and it allows you to move to AlwaysOn or other technologies later.
On another note, I’d like to read more about the MyAppTempDB. Maybe a blog on how/where you use this?
thanks a lot for your article 🙂
Would you mind I translate it into Russian for russian IT-guys?
Thank you in advance.
Tatiana – glad you liked it! Unfortunately no, we don’t allow our work to be hosted elsewhere. If you’d like to translate it into Russian we’d be glad to host a copy here, though. You can email us at Help@BrentOzar.com to talk about that. Thanks!
But, sorting and aggregation and other large group operations are the whole reason you spent that money on a SQL Server in the first place. If all you needed was some tables on a disk, and you bought SQL Server, you spent too much. So, I’m gonna continue to make the SQL Server do my heavy lifting, so I can make the web server run faster. In a two-tier environment, my experience says that’s the way to go if you want to have a responsive site.
Furthermore “send all the rows to the app server and have it filter and sort” is a waste of network time. If you send only the rows that the app actually needs, you save network load, and in my experience the wait time for that can be huge.
I work with accounting applications, and before that it was telemetry data (we’ve been collecting 100 rows per second, for 5 years, that kinda stuff) – it would take several days to send all those rows from the SQL Server to the application server. We set up our clustered indexes so the data is already grouped and sorted for our most common needs, but in the uncommon cases, we do sorting and filtering of financial and telemetry data on the SQL Server – there is just no other way.
I think that people who are complaining about what you said must not understand that for situations like mine, your message just doesn’t apply, and it’s still fine to use sorting on the SQL Server. That is, I have the experience to realize where you can apply what you said and where you can’t.
I came here looking for info on NOLOCK – it annoys the hell out of me.
Jasmine – if sorting and aggregations are the reason you bought a SQL Server, maybe we bought the wrong thing. Database servers are a persistence layer, not an application layer.
Databases are *so much more* than a simple storage location. If you’re paying for a RDBMS like SQL Server, and you’re using it as a simple storage device, you’re not using a whole lot of capability that you paid for. I’m not saying you’re wrong – ORDER BY can be extremely costly, and I’ve seen it used when it wasn’t needed, causing an unnecessary slow-down. What I’m saying is exactly what you are saying in your comment – this is not always the case and developers must be smart – don’t eschew ORDER BY completely, it’s useful in some cases. What I attempted to add to your comment is, yes, there’s some cases (and I gave specifics) where the performance impact is lowered by using the sorting and filtering capabilities of the database. You paid for those capabilities – use them when it’s appropriate. If you’re never going to use those features, get a cheaper database, consider using file-based storage, or consider using noSQL or something else. If you don’t need relational database capabilities, don’t pay for that.
I don’t think there is any best way to do sorting and aggregation
I try not to use order by or Group By, when I can. I would send record sets
that the application needed to the app Server and sort it there
using dot nets Datatable.DataView. If you use Json you can also
create json Reader SQL Data To json.
You’ve missed one of the easiest things regarding proper security management and integrated security. Application Roles. I’d be willing to bet only 1 or 2 developers out of 100 have any idea about this feature, and maybe only a handful of DBAs too.
John – That’s a great idea. Why don’t you write about the 7 things developers should know about SQL Server security?
Ha! Only limiting it to 7 would be a trick.
Nothing to add on the SQL Server’s side…I would add that good developers/database application designers necessarily have a sufficient grasp of database fundamentals in relational modeling. On the contrary to usual trends and fads, a good database education will make you a better developers for life.
Action to take into considerations includes the following:
> A good knowledge of relational modeling is important to establish predictable behavior in any SQL engine. Without mastering all levels of the 800 pound named normalization, developers should at least have basic grasp of the advantages of maintaining business logic into the data layer.
> Getting away of procedural programming to set based thinking is a guarantee of making the most of any SQL engine: I have very few examples where procedural thinking outperforms set thinking in terms of either resource consumption, simplicity and robustness.
> Correcting misconceptions in design are important to create good applications. Getting away from a cookbook approaches in design is very important to establish predictable applications. This point mainly concerns result correctness via key selection and treatment of NULL values.
Two important points to note about functions, specifically TVFs are:
1. The optimizer makes guesses as to the number of rows that come back from them, they fall into what the optimizer team refer to as “Out of model” scenarios, for simple statements that select straight from a TVF this works fine, for more complex ones in which the plans are larger, these guesses will ripple throughout the rest of the plan.
2. If you use table variables within a TVF and modify their contents this will cause the optimizer to always generate a serial plan
Just my little contribution, sadly I’m not very articulate, so I hope you’ll bear with me. 🙂
#7 : I’ve seen this first hand, initially the function worked fantastically until the number of rows being affected hit a magic number. When it hit this, the function immediately ground the query to a halt. Ever since I’ve always responded to our Developers saying that if you’re only going to work with a couple of thousand or so rows, then a function is fine, once you hit around 10k rows, then you might see some RAPID performance decreases.
#2 : seems to be a point of contention here, but again, I’ve seen similar issues with ordering and I’ve often suggested the same thing to our developers (tho I didnt have much concrete evidence). As a sort of workaround, and I’m sure this is awful practice, we’ve found great gains by using stored procedures which create a temp table of the results we want, and then ordering and returning the contents of the temp table.
Thanks for your input!
its the best article on sql server i have ever read,in all the years as a .net developer.
I am not a sql server specialist but i do have question rather a challenge for you.
my client want server side pagination for grids(10 records per page) and sorting too.
server side pagination was easy to implement, but sorting !!!
my initial go was to sort the data in the app after i get (10 records per page) the records to the app.
but my client wants the sort to be applied on the total record set(say 1000 records) and then show the result in the app.
now as per you i should not use order by, but then how will i implement it.
As of now i have used order by in the sps,
I am waiting for your valuable suggestions.
Thanks in advance.
Priyadarshi – thanks, glad you liked the post. Reread that answer about ORDER BY, and you’ll notice that the answer is already in there. 😀 Enjoy!
On the whole sound advice. Its a constant battle between the DBAs and the APP developers as to the best way to do things. Each have valid points and rather than argue and battle it would be great if we could respect each other expertise enough to take advice on board.
My advice to the DBAs out there show the dev team the evidence you have tools in SQL server to show them the bad queries/Functions etc use the DMVs, query plans, hell even profile traces convince them of your argument with fact and figures… If we change this function to an SP/View this would be your performance gain etc.
The problem with that is the expectation that the DBA should spend all their time proving what they know. Not exactly efficient.Now and then this would be fine, but every recommendation doesn’t need a demonstration or a committee meeting. Sometimes people just need to listen and that street runs in both directions.
We have historical data that we need to access, but since we replaced our SQL server, the SQL connection strings are incorrect.
We need help to change this info or can someone explain how to make the changes so we can do this.
Jay – unfortunately, personalized one-on-one troubleshooting is a little beyond the scope of what we can do in an unrelated blog post comment. You might consider posting this on something like http://stackoverflow.com, but include as much info as you can about the error message you’re getting, what the old connection string looked like, what changes need to be made, etc.
Also my comment:)
Usually end user requests to have sort possible on all columns inside some grid for example. Well, you can’t put index on every column to have efficient sort on SQL server side. And you can’t say to end user to must use sort on only one or 2 columns because of optimization. So what now?
Query almost never touch only one table – in real word you have 10 or 20 tables joined together in one query result. Even after using WHERE clause, you can have 10 millions of rows.Well, transferring all this rows to app to just show user 20 rows at a time is nonsense.I have seen something similar: when user clicked search it took about 5 minutes to show the results(4.5GB of data was sent to web server).
User almost never clicks next page in real life(or maybe once or twice). Usually user change search criteria and click search button again.So, at this point you have to go back to the database, since you have different where clause. And also data cashed in app are already obsolete. In very rare case, where new search criteria is child of previous search criteria and user is satisfied with obsolete data, you can use app cached data.(unless you have some system with updating cache with changes but I can’t image the maintenance of all this data sets).
So, you have to go back to sql to refresh and get new data set of some GB every time. Can you imagine?
So, I decided to sort all my data in database. It is much easier and still much more efficient than sort in app in my case. One trick i usually do, when result has to many rows – I limit to 5000 rows for example and then use sort only on this 5000. I also send user message to use more filters. I also have partitions and by default where clause include sort only on one partition(well, user can change that).
Is there any other concept in sql server to get more efficient sort in database?
Is there some good example how to sort millions of rows in app cache – something like NCache or appfabric, but with example? And other idea?
Simon – you’ve got a lot of great questions in here, so I’ll start with an easy one. You wrote:
“One trick i usually do, when result has to many rows – I limit to 5000 rows for example and then use sort only on this 5000.”
When you’re sorting in the database, how do you get the database to only sort 5000 rows of your result set?
Well, it is like that. Lets say you have grid with orders.User by default wants to see the opened orders. So, i have them in different partition and the number of all is usually so small that sort by any column is not a big deal. If user change filter to lets say all orders from the beginning of company, there are millions of them.So, I send user message like this: “There is more then 5000 records. Please use filter!” At this point you can show user the last 20 rows ordered by default (cluster or similar) or show him nothing. Since user usually search for specific data what is the point to showing him first page(20 rows for example) of 20 millions?What is the possibility that this 20 rows are the ones it is searching for?
So, user must use filter to reduce the number of rows to some more logical one, more practical to read them and to work with them. This could be 5.000, 50.000 or just 500, depends on scenario. And when you have 5000 rows in some temp table there is couple of milliseconds to sort them by any possible column(even combination of them) and return the current page of 20 records(for example).
I’m not sure if this way is the best one, it is just an option.
I would like to know if it is possible to use “prefetch or read ahead” scenario here – I don’t know about it.
And i would like to hear how others are solving this kind of problems. My boss wants to use app layer more because app licence is free compared to database license. But i can’t imagine how i would do that. If you have data in app layer, you must do all SQL optimizer jobs by yourself(joins and similar). It looks like impossible to me or years of work. Yes, at least sorts could be there but I don’t see the way. So, some example would be nice(and not with couple of rows and one table – that is not usual case in many apps, but some real case scenario).
Simon – great, getting closer! Let’s focus in on exactly what you’re saying.
How do you only sort 5000 records in the database server?
It sounds like you’re creating a temp table, and then only inserting 5000 records into that table. Does your select statement look like this:
INSERT INTO #MyTempTable
SELECT TOP 5000 …
ORDER BY …
And if that’s the case, you do realize you’re sorting the entire result set to get the top 5,000, right?
Or does your select statement look like this:
INSERT INTO #MyTempTable
SELECT TOP 5000 …
Meaning, you’re putting a random 5000 records into the temp table, and then you’re turning around and sorting them by some arbitrary field? In that case, do the users really understand they’re getting random records, not sorted records? I’ve never seen users that were okay with getting random data back, but your mileage may vary.
Yes, it is :
INSERT INTO #MyTempTable
SELECT TOP 5000 …
This part could be ordered by default index order – than you don’t have sort operation at all. Usually I have the latest ones(date or identity).
Then you can sort temp table by customer name for example, and user would get message that this sort is on last 5000 records. Please use filter! In many cases the sort on last 5000 records (or you can have partition – the records not older than 1 year and sort on this partition is not so expensive) is ok with users.Otherwise it must use filter. What is the point to view sorted 20 records of millions – usually doesn’t make sense anyway. Only in 1% of cases, for example I would like to see the most expensive orders between 20 millions and that is why I wont sort on all of them. But this can be easily solved with filter – for example, (value>5000€) and you get less than 5000 orders and see the most expensive ones. You know what I mean?
And how do you deal with this now?
Simon – bad news there. Order isn’t guaranteed without an ORDER BY statement, so the data isn’t always ordered by the clustered index:
So you’re sorting a random result set. I’d make sure your users are actually okay with that – that’s not typical behavior for an app.
Sorry, I was not clear enough.
1. You can show user just message that it must use filter.
2. You can show user last 20 records with message that it must use filter and sort is on subset of data
(5.000, 50.000, last year or some different partition or similar).
In 90% or more users are satisfy with one of this options.
The last 20 records you get for example with:
“ORDER BY ID DESC”
If ID column is already sorted inside index then there won’t be sort operation in execution plan:
All other sorts are on subset which is fast enough. Any other suggestion?
Simon – note that in the BOL example, there’s an ORDER BY in their query.
Sorry, but you’re not hearing what I’m saying, so I think it’s best if we agree to disagree. Best of luck on your journey, though!
it is ORDER BY in query, but it is not executed. That is the point. You always get the last 20 rows without need to sort since data are already physically sorted that way(if you have design like this).
So, how do you deal in app with this sort, for example you have result set with 4GB of data? Can you describe a little more? Thanks,
Simon – again, you’re not getting it. You only get guaranteed order with the ORDER BY. Good luck on your journey, sir.
Is there a chance to find who created a particular stored procedure?
Brent- I totally agree with you that the order by should be implemented on the application side. I just had a query that cost me 74% of the execution plan and also the ORDER BY clause will kill your tempdb, It’s true my logical reads, or CPU time did not have a big impact, however i got lucky and the table was small. If the table was large, My tempdb would be out of business. I resolved the issue by having a basic conversation with the developer and come to find out the ORDER by clause was not needed.
I didn’t read all of the comments but I have to say that #7 is true but should never be true. It IS true that a great many people have no clue on how to make functions that will run as fast as inline code. That’s also the reason why it should never be true… using Inline Table Valued Functions, you should almost always be able to make a function that runs as fast as inline code because it’s not casually called an “INLINE” table valued function.
Of course, a lot of folks confuse those with mTVFs (multi-statement Table Valued Functions). The key is, if your function has the word BEGIN in it, it’s NOT a high performance INLINE Table Valued Function.
If anyone is interested, here’s an introduction to how to make your functions run faster.
Jeff – yeah, I totally agree. The way I usually explain it to people is, “If you could move your function into a view, then it’ll inline. Now, to make sure nobody comes behind you and hoses your work by adding statements, let’s future-proof it – just use a view.”
(It’s not a hard-and-fast rule to always use views by any means, but it’s one of those eye-opening statements I make to challenge developers on the decisions they’re making. I love a good function.)
I went back for a re-read on this. In retrospect, #3 is only partially true. Articles written “today” are also known to be flat out wrong and articles written more than a decade ago can actually be worth more now than ever before because they explain simple methods rather than the more complicated and frequently less effective ones introduced by the latest round of “too cool for school” bright, shinny objects that should never have been allowed to hit the streets.
order by is a much more complex issue
do you need the order by?
does the order by have a direct impact on the data you want to receive
is the order by going to be supported by approriate indexing and/or indexed views
what stresses do the db server suffer from i/o, network, disk, memory, cpu?
how potent are the clients?
what network bandwidth do they have?
do you have an app server sat between client and db or a two tier architecture?
do you have split entry and MIS databases?
Order by can make or break a database and the applications that depend on it – simplistic decisions to either not order or order are deeply stupid
SQL Server is like any other technology, even a hammer and a saw.
Those who know how to use can create beautiful things.
Those that don’t well; they get a lot of busted fingers and cuts…
Hey! I resemble that remark!
Lots of good advice here but I think that people should take rule #2 and then immediately apply rule#3!
One problem with rules of thumb is that they get used religiously even when they shouldn’t. Sorting at the client may be your first inclination. However, I have to argue with developers who are terrified of sorting in the database and the cost that that seems to incur. Generally, (can you spot a dangerous generalisation coming down the track!) a database can often do a lot of filtering, aggregation and refinement before the order by clause applies to an easily processable result set that an Order By clause can implement at almost no cost.
Your client developers should absolutely avoid asking for a large result set that is ordered for their convenience before they aggregate and present it and in that scenario I would wholeheartedly agree with you telling the developers to go and do their dirty work on the client.
As others have said, YMMV, beware rules of thumb – and apply rule #3 if you have the test data to go a different way.
Pardon? Is there a link between this discussion about SQL Server and an article about DNS settings in MacOSX?
Like a drunk Hawaiian, you’re talking to Spam 🙂
I’m glad someone else is taking about sorts. Obviously every situation is different, but back in the day, the old timers were emphatic not to sort in the db if it could be avoided. Why spend resources there when your app can sort it and the db can use those resources to service other requests? I said this in stock exchange and got my first ever down vote. Good luck 🙂
MSSQLServer is a crap. Amen.
I like to think you’re Italian and you said that in Wario’s accent. “MSSQLServer, it’s-a-crap! Amen!”
Oh, grandpa Brent. “CRAP” is what the kids type to each other on their internet devices, and it’s short for “Cool! Really awesome program!”.
I am new to data warehousing and sql server. I am working on creating a DWH where I am loading the data in Staging DB and when I load them into final DB I apply all the udf that I have created on the data as per below.
I was not processing anything on staging to have a quick load. Is it quicker to apply any udfs when the data is in staging itself or should it be done when loading the data to final DB.
Below facility_cd is a float value and I am passing it to a function to get the corresponding description.The table where its getting the description from is in the final DB.
udf_replace_special_char is a simple function which is replacing a few special characters with NULL.
In general what should be a better practice? Should I be updating this in staging and then load the data after all conversions to Final DB.
Hi, Nancy. For questions & answers, head on over to a Q&A site like https://dba.stackexchange.com.
As I understand it every call to a SQL stored procedure in an SQL database from say a c# application requires an open an close connection to the database done internally. Is this correct?
Now if the role of the Unit of Work and Repository pattern is to have only one connection to the database that is kept open and used in repeatedly places, does not the use of stored procedures kill the use of the one connection opened and used repeatedly?
No, that is not correct.
Ok explain this to me then. A c# application calls a stored procedure in database the openand close operation are ok done via the c# application. It still require an open operation. I guess a close operation is not needed. The connection can remain open for further use.
Never mind I had my head up my ass just omit the question I asked previously.
They deleted your SO post 🙁
Too bad, I was curious.