If you could give just one piece of advice about writing good queries, what would it be?
I asked Twitter yesterday because I wanted to make sure I didn’t miss anything in a new training course I’m working on, and the replies were fantastic. Here were some of my favorites:
As you’re writing a query, give a play by play in the comments explaining what you’re doing and why. A. You will catch more of your mistakes.
— BD Softley (@BigfootSql) May 3, 2021
Understand execution plans, look at actual ones and analyse.
— Stefán Jökull Sigurðarson – CCP Ghostrider (@stebets) May 3, 2021
anything that makes your life easier makes the optimizer's job harder
— Erik Darling Data (@erikdarlingdata) May 3, 2021
Don’t let perfect be the enemy of good. Write queries up to the point that they do what you want in an acceptable amount of time and using an acceptable amount of resources.
— Mike Scalise (@themikescalise) May 3, 2021
Write two. Pick the better one, if you can work out why it's better.
— Data Quizard (@TimBSpeaks) May 3, 2021
Do less
— George Stocker (@gortok) May 3, 2021
It formats the query or it gets the hose again.
— Cody Konior (@codykonior) May 3, 2021
Document your code.
— Dustin Mueller Staying the F home (@sqlcheesecake) May 3, 2021
Write it, give it to the person next to you. If they can understand its intent, then everything else is easy.
(Because now, there is an infinite amount of help out there should you run into problems…But if no-one can understand it…its a tough road ahead)
— Connor McDonald ??? ? lots of ?? (@connor_mc_d) May 3, 2021
this is my second one, couldn’t resist giving one more:
comment the “special tricks” employed in a queryeg why is there an ‘optimize for’, was a redundant filter added intentionally, etc https://t.co/DCcAZDZvmy
— L_ N___ (@sqL_handLe) May 3, 2021
SET STATISTICS IO, TIME ON;
— Dave Mason (@BeginTry) May 3, 2021
Understand that your query is only half of what creates the program to fetch your results. The rest is what’s there on the server – statistics, indexes, the capabilities of operators, etc.
— Rob Farley (@rob_farley) May 3, 2021
Share your query online with the title "the best way to write queries”.
Next day you will have a good selection to copy paste.— ?ister?agoo #Blazor (@mistermag00) May 3, 2021
What about y’all? Leave your most important tip in the comments, and read the Twitter thread for tons of gold nuggets.
33 Comments. Leave new
Format the damn thing.
Try to use new built-in functions or stored procedures which each new release of the SQL Server or any other DB for that matter seems to be coming up with.
This is how you get Merge commands! Is that what you want :P?
Keep it simple, the simpler the query, the better it will run and the easier it is to modify.
Result sets are not ordered so if you care about the order then you must use an Order By.
“SQL Server, the second most expensive place to sort things”
Seems you haven’t sorted in SSIS before.
Think relational, not algorithmic.
Document Why it’s written the way it is. Any business rules that it implements, any “clever” code, anything that another person would need to answer the question, “What was he thinking when he wrote this?” Most good SQL devs/engineers will be able to figure out What the code does, or at least what the steps in it do, but Why is critical. Most likely, the person asking “What was he thinking?” will actually be future-you asking, “What was I thinking when I wrote this?” Make life easier on future-you and you’ll end up with less to regret and more to be happy about.
If you’re joining tables, put the damn table name (or alias if using those) before each column name. Don’t assume everyone knows which column goes in which table.
Second this!
Learn and appreciate niche features; use them when appropriate.
Things you once thought of niche features will become everyday tools. For me personally, a whole new world was opened up each time when I learned about windowing functions, grouping sets, and (un)pivot functions.
The fastest path to the records you want is to eliminate the ones you don’t as quickly in the process as possible.
Yes!
The query optimiser will _probably_ make sensible decions. But for clarity I much prefer
SELECT things
FROM
(SELECT OneRow FROM SmallLookupTable WHERE Name=’TheOneImLookingFor’) tinyset
INNER JOIN
(SELECT LoadsOfStuff
FROM
[Loads of joined enormous tables])
to
SELECT Things
FROM
[Loads of massive tables]
[…continues for several pages]
WHERE
[complicated logic, continues for lines and line] AND
[Actually I only wanted one row out of the lookup table]
If you’re constantly using DISTINCT (or indiscriminate GROUP BY), UNION or CTEs, you’re…. doing it wrong. And, as Cody puts it so well above: it gets the hose again.
These are all incredibly useful in their place. Too often they’re used out of place.
DISTINCT because for “some reason” the resultset (or subset) is returning duplicates. Why are there duplicates? Do you understand the cardinality of the tables involved?
GROUP BY… again, because duplicates are being returned.
UNION to filter the same set using an OR in the WHERE. Use ORRRRRR, that’s what it’s FORRRRR . UNION hits the set (e.g. table) twice. Or even more times….
CTEs because the logic is “complicated”. You can only use a CTE in a single SELECT statement. No, this is not a reason to stick in a UNION to make a single SELECT statement possible. Every time someone else scrolls up 4 pages to find your ****verd****te CTE and figure out how it interacts with the logic 4 pages lower, a puppy dies. And fantasies of vengeance grow another inch towards realisation. If you use multiple CTEs all referring to each other, spanning 4 pages, then 2000 puppies and 1500 fairies die, the Virgin Mary cries, and the hitman is already turning the corner of your street.
Put your logic all in the same place so someone else can see what you’re doing.
Iced the hit man, told Mary to get over it, clapped real hard to bring the fairies back to life, ¯\_(?)_/¯ not sure what to do about the puppies, someone should have spade or neutered their pet like Bob Barker recommends; 4 pages of CTE’s referencing themselves for the win. ;-P
Good recommendations though.
Regarding the union: It depends. If you want to filter on column A or B and both have an index, it may be better to do the union as SQL Server sometimes prefers to ignore both indexes with an or. Then the or becomes much more expensive than the union.
Nice tip, I didn’t know that!
Check for new features (or older ones) that can do what you want with ease.
Heh…
I have 4 basic rules for code writing… “Make it work, make it fast, make it pretty… and it ain’t done ’til it’s pretty”.
I have one rule for scheduling… “If you want it real bad, you’ll probably get it that way”.
When it comes to choosing two of the three items of “good, fast, or cheap”, there’s only one choice… “good”… because if you always practice “good”, fast and cheap will naturally follow.
Shout out to my homie, MB. Friends don’t let friends ‘SELECT *’.
Notable moments over the past few weeks:
this is what it is doing… from the start of the select * to the load of the truncated table
-MB
the query is
SELECT *
FROM [Table]
-MB
i don’t know why it takes so long now, it only uses 4 columns. it’s a select * but that shouldn’t matter
-MB
Avoid scalar functions whenever possible – inline table-valued functions work much better and faster.
Prefix all your fields with table names or aliases, and if using aliases, make sure the alias name conveys the intent of the table name. (i.e., don’t ever use anonymous aliases – aa, bb, cc, x, t, etc.).
When returning wide result sets, try to get the record keys in an efficient inner query, then add any fields that aren’t part of the query criteria at the end – don’t make the system shuffle gigabytes of data when your inner query could return a simple set of record keys.
If you are doing something more than once, that something should be a function.
If you coded something well, it should feel elegant, and vice-versa.
“When returning wide result sets, try to get the record keys in an efficient inner query, then add any fields that aren’t part of the query criteria at the end”
Oh, this. This, a million times. Work with keys, work with key, work with keys… Get a resultset with just the key(s) you need. Check that the population is correct and unduplicated. Then decorate it.
This avoids monstrosities like
SELECT
PersonID,
MIN(FullName),
MIN(AddressLine1),
MAX(TaxPayerReference), — Ticket number BLARG666: MIN didn’t work for [reasons], shove MAX at it and hope
AVG(AgeInYears)
FROM
[I can’t be bothered to work out why the monstrous procession of INNER, OUTER and CROSS JOINS, UNIONs etc, which sits here, returns duplicates]
GROUP BY PersonID
But you have know what the unique keys are. This is just one of the many reasons why heaps are evil. A table with a PK is self-documenting. Whatever other nonsense is going on in there, the PK column is guaranteed unique.
Only one thing? Collect most of above suggestions, put them in the Coding Standards of your company, and tell everyone to:
“Follow the Coding Standards.”
It will not work, but at least you tried…
HAHAHA, I was so excited until that last line, and then I remembered that yes, you’re right. Dang.
Heh… you should still be excited… it helps consultants, trainers, a book writers busy forever. 😀 It’s like a volcano… it’s always there just waiting to happen and, when it does, you don’t know which direction it’s going to go but it’s always a spec-hackular opportunity. 😀
And pardon the typo’s… I’m only on my second cup o’ joe! 😀 I think I’m legally dead until my third one. 😀
Coding Standard: Do not code before ingesting at least three cups of coffee.
Or three Mountain Dews, whatever your poison may be. 🙂
I’ve compiled the list below from all suggestions until now! Great advices from and for all! Congrats!
1. Write neat code.
2. Pay attention to the indentation.
3. Leave a comment where necessary – think about other team members who’s gonna read and understand your code.
4. Splitting in two separate SPs (not too many) can make it easier to read/understand the main logic/control flow
5. Understand the “what” and the “why”, aka the business requirements, before you start coding.
6. Before running UPDATE or DELETE, rewrite the query to a SELECT query and make sure that the outcome is what you would expect.
7. Know your data
8. Know your indexes
9. Know your query plan – always start with your FROM and JOINs, then WHERE, then any aggregate clauses, and only at the end worry about the SELECT clause fields (COUNT(*) or COUNT(1)).
10. If you need to deconstruct for simplicity, do it by chaining CTEs not using temp tables (unless reusing data and not just simplifying)
11. Do not abuse DISTINCT, only use it when you really need it and only after you’ve tested your query with COUNT(*) to make sure you are pulling only the necessary records – in other words, only use DISTINCT when it becomes obvious you need it to eliminate duplicates, which should be rare in most cases.
12. Keep it short and simple as possible – a future maintainer (maybe yourself) will thank you.
13. What isolation level can you get away with. Are dirty reads OK?
14. Do not use more joins than necessary nor more columns than required.
15. Avoid “Select * FROM”.
16. Make sure you are on the test server before you press F5.
17. If you have some joins, just don’t start writing like a dev maniac: think about the best possible way to filter data on each join so each one will need less IO than the previous one.
18. Think about its reusability.
19. Anything that makes your life easier makes the optimizer’s job harder
more often you can, review code and comments the next day
Understand how data is stored and how the engine does its thing. There are many ways to get the data that you want, and in most cases, there is a “best” way.