What Developers Need to Know About Designing Databases
Understand that changing database design at a later date is harder than changing code. Getting it right the first time is important #dellsql
— Gail Shaw (@SQLintheWild) October 24, 2013
One of my favorite tips, and I never thought of it that way. Code is agile, and databases are brittle. It seems easy to refactor the database in the beginning, but as your app grows and more stuff interfaces with it, life gets a lot harder.
@BrentO @SQLrich #RuleNo1: Tell your engine what you konw [ and you may have to dig deep to realise you (should) know ]
— Johan Bijnens (@alzdba) October 24, 2013
Johan means you should use the right data types, like using date or time when you’re just storing one of those. Plus, if you know a field is nullable or unique or relates to a field in another table, tell the database by setting up constraints or foreign key relationships. This can actually make your queries faster because SQL Server uses this knowledge to build better execution plans.
@BrentO @GregRobson_UK Use Date if you don't need time (Or an INT date in a warehouse) #dellsql
— Mark Stacey (@MarkGStacey) October 24, 2013
For example, DATE is a narrower field than DATETIME, requiring less storage, so you can pack more rows per page on your date indexes.
https://twitter.com/RealSQLGuy/status/393377820799680512
NVARCHAR is Unicode, which SQL Server uses 2x the storage space for as compared to just plain VARCHAR. This means you can cache half as many rows in memory, and your storage IO takes twice as long. If you’re not seriously going to store Unicode, stick with VARCHAR. (Then of course right-size your fields as opposed to using MAX, which can be difficult to index and may get stored off-row depending on data size and config options.) On a related note:
https://twitter.com/SirSQL/status/393383421491564544
While database vendors will say, “Sure, you can use our database as a file server!” remember that the licensing on databases is a lot more expensive than licensing on a file server. Plus, these big files hit the transaction log, making your smaller/faster transactions compete for log file access, and slowing everything down.
Not all types are good for primary keys #dellsql
— Rui Ribeiro (@ruirrib) October 24, 2013
Primary keys (and clustering keys) should be unique, narrow, static, and ever-increasing. Don’t go redesigning an existing database to change this, because it’s probably not your biggest bottleneck, but when starting from scratch, keep these guidelines in mind.
@BrentO @SQLRich If your first inclination is to use a trigger, stop. You very likely viewing the problem wrong.
— David L. Penton (@dpenton) October 24, 2013
I wrote triggers a lot too when I was a developer, but as I moved into the database side, I realized how tough they are to scale. They’re synchronous, and they can cause concurrency problems. Consider moving to an asynchronous solution that doesn’t do as much locking/blocking.
https://twitter.com/RealSQLGuy/status/393378719856742400
Narrow one-field indexes are usually less useful because you’re selecting more than one field. Plus, every index you add is like another copy of the table. Insert a record into a table with ten nonclustered indexes, and you’ll be doing eleven (or more) writes to disk each time.
@BrentO For the love of god, don't run SSMS index suggestions blindly #dellsql
— Nick Craver (@Nick_Craver) October 24, 2013
SQL Server Management Studio’s index suggestions have no concerns whatsoever about adding overhead to your inserts/updates/deletes, and they’re often supersets or subsets of existing indexes.
You can't design indexes based on the table structure alone. You need to know the common queries that run against it. #dellsql
— Gail Shaw (@SQLintheWild) October 24, 2013
When you first get started with a new database, don’t go crazy adding indexes with guesses about what will get queried. Indexes are easy to add later – wait to see how the queries actually shape up, and which ones get run the most often.
@BrentO Standard warnings about premature optimization apply – sort of. Don't let your design be too brittle on day 1.
— Vicky Harp (@vickyharp) October 24, 2013
What Developers Need to Know About Writing Queries
@BrentO Know thy craft. Read up on declarative languages. Understand that you need to context switch from your appdev language. #dellsql
— Vicky Harp (@vickyharp) October 24, 2013
This is one of my favorite tips. SQL is really easy to learn, but the underlying server code behaves very differently than you might expect. We’ve gotta think set-based. On a related note:
@BrentO Just like you had to graduate from imperative to OO, you need to graduate up to set-based programming. It's worth it. #dellsql
— Stuart Miller (@mecheph) October 24, 2013
@BrentO @SQLRich you are NOT smarter than the query optimizer. Hints are the exception, not the rule.
— Kendal Van Dyke (@SQLDBA) October 24, 2013
@BrentO @sqlrich There is no substantive difference between being careless with isolation levels and writing code which is not thread safe.
— Vicky Harp (@vickyharp) October 24, 2013
Learn more about isolation levels.
@BrentO @SQLRich In SQL, complexity isn't a sign of your ninja-like skills. Simplicity is better, for the optimizer and reader of your SQL.
— Richie Rump (@Jorriss) October 24, 2013
@BrentO @SQLRich Stop parsing strings, splitting lists and formatting dates in T-SQL. There are much more efficient ways to do these things.
— Aaron Bertrand (@AaronBertrand) October 24, 2013
SQL Server licensing is about $2k USD per core for Standard Edition, and $7k per core for Enterprise Edition. These are usually the most expensive cores you’ve got in the shop. Aaron’s guidance is based on minimizing the amount of CPU clock cycles we need to burn in the database tier, and thinking about moving those to the easier-to-scale-out app tier.
@BrentO @sqlrich Embrace set based logic. Learning new ways of thinking is the joy of being a developer, not the curse.
— Vicky Harp (@vickyharp) October 24, 2013
beware of outer joins #dellsql
— Rui Ribeiro (@ruirrib) October 24, 2013
Sometimes the business design requires outer joins, but beware that you can get much worse execution plans as your data grows.
@BrentO @SQLRich Avoid shorthand and other lazy syntax that may save you microseconds in productivity now, but cause others headaches later.
— Aaron Bertrand (@AaronBertrand) October 24, 2013
@BrentO @SQLRich In SQL, complexity isn't a sign of your ninja-like skills. Simplicity is better, for the optimizer and reader of your SQL.
— Richie Rump (@Jorriss) October 24, 2013
@Jorriss @BrentO @SQLRich 1 of my fav sayings: "Complexity is the enemy of Stability." Also, there's a fine line between clever and kludge
— Todd Klindt @ Home (@ToddKlindt) October 24, 2013
Very surprised nobody has mentioned stress testing code and schema design yet #dellsql #ItWorkedOnMyBox
— Lord Richard Douglas (@SQLRich) October 24, 2013
Databases perform very differently at scale, whether it be scale of data or scale of underlying hardware or scale of simultaneous queries.
What Features, Commands, Datatypes, Etc. Should Be Generally Avoided
@BrentO @SQLRich NEVER use the construct [DateColumn] BETWEEN 'date here' AND 'another date here' (you are likely missing data)
— David L. Penton (@dpenton) October 24, 2013
If you say BETWEEN December 1st and December 31st, SQL Server sees that last parameter as the very beginning of December 31st. Any data from, say, 1AM on December 31st and afterwards is going to be excluded.
use the principle of least privilege. You don't need db_owner, much less dbo or sa. #dellsql
— K. Brian Kelley (@kbriankelley) October 24, 2013
@BrentO Simplicity is often the best method. Getting fancy with things like query hints rarely improves query performance #dellsql
— Bob Pusateri (@SQLBob) October 24, 2013
Simplicity is a big theme here. Manhandling the engine around with hints usually results in worse performance overall.
If you're going to nest views, look at and understand the query plan that will result. #dellsql
— K. Brian Kelley (@kbriankelley) October 24, 2013
We’re not saying nested views are wrong – sometimes they can work really well – but KBK is pointing out that you may end up dragging a bunch more tables into your query inadvertently.
https://twitter.com/SirSQL/status/393387459842560000
Always good advice – but it’s advice for managers rather than developers, right?
@BrentO Don’t kill yourself trying to force a seek – they’re not always the most efficient way to access your data. #dellsql
— Aaron Bertrand (@AaronBertrand) October 24, 2013
Sometimes, a scan is easier – especially for grouping.
NOLOCK is *not* always (or even usually) the answer. #dellsql
— K. Brian Kelley (@kbriankelley) October 24, 2013
For more details, watch our There’s Something About NOLOCK video.
The Best One-Line Things to Improve Databases and Apps
Learning SQL Is an opportunity to expand your skill set. Embrace it like you run to the flavor of the week language/ORM #dellsql
— Bill Fellows (@billinkc) October 24, 2013
That’s actually how I got started in database administration – I got tired of learning the Language Du Jour because I’m not all that bright. I have so much respect for developers who can rapidly learn new languages and frameworks and then deploy them effectively. I switched to databases because I could learn one language that’s been the same for decades, and it’s even used by multiple back end platforms. I’m lazy. On a related note:
@BrentO check your design against t-sql code smells. They are different than other languages #dellsql
— Merrill Aldrich (@onupdatecascade) October 24, 2013
Data integrity. A database that stores garbage is useless. Apps can enforce integrity but database should as well. #dellsql
— Gail Shaw (@SQLintheWild) October 24, 2013
@BrentO Actually watch performance, have some idea of what queries are running and how they're performing #dellsql
— Nick Craver (@Nick_Craver) October 24, 2013
Nick is from StackExchange, the guys who built Opserver, an open source monitoring system. They’re brilliant about watching the top resource-intensive queries on the database server, and you should be too. When something new pops up, you need to know why it’s suddenly using more resources than you expect.
@BrentO Dev suggestion – Understand how indexes work. Understand how to read a query plan and work from it. #dellsql
— K. Brian Kelley (@kbriankelley) October 24, 2013
Query execution plans are road maps for how SQL Server will process the data for your query. Learn more about those.
@BrentO If this thing grows and grows and grows, what's the thing most likely to break first and what will I need to do about it? #dellsql
— Stuart Miller (@mecheph) October 24, 2013
You don’t have to know how to FIX the bottlenecks, necessarily, but you have to know what’s going to be the pain point as you scale.
Include DBAs early on in a development project and keep them in the loop for better results. #DellSQL
— K. Brian Kelley (@kbriankelley) October 24, 2013
The best DBAs are partners, not predators. When you find a good one, they’re willing to help make you look like a star.
@BrentO be conscious that as you push a database to the limits, fewer rules are absolutes, don't be afraid to seek expert advice #dellsql
— Nick Craver (@Nick_Craver) October 24, 2013
I’m all about this. By the time somebody brings me in for advice, the best practices guidelines aren’t usually working for them, and they need to learn how to bend the rules.
Live Video of the Twitter Chat
Here’s the 50-minute video of the Twitter chat where we built this list, plus covered a lot more tips:
I apologize for the audio – WebEx’s audio codecs are horrendous. (I’ve got a Blue Yeti microphone.)
Sign Up for the Next One:
What DBAs Need to Know About Monitoring
Page life expectancy. Disk queue length. Page splits. There’s so much old bogus advice out there around SQL Server monitoring, and it’s time to update the rules. Join Microsoft Certified Master Brent Ozar as we collaboratively write an e-book with new advice on monitoring. You can contribute too – we’ll be working together over Twitter using #DellSQL.
We’ll discuss and share:
- What metrics every DBA should monitor
- Which metrics are meaningless
- Our favorite free DMV scripts
1 Comment. Leave new
I missed this event on twitter, so here are my contributions:
Scalar Functions in a select statement are executed for every row, using SQL Server as inefficiently as possible, so a 10ms process done for 1,000,000 rows is the cause of that 10 second long query no one likes. They work great in OO, not in SQL. It’s a different language and needs to be respected as such.
Indexing should be understood better, specifically looking at the data logically and finding reusable indexes. That almost always helps more then two “perfect” special-use indexes that compete for cache and both need updated. You talk up reusable code, and that just saves your time. This is saving server time on inserts/updates/deletes, using cache more effectively, reducing the load and amount of data on disk, and reducing the size of backups.
Ask the DBA for help, and don’t be ashamed to do so. Keep in mind that you’d expect the DBA to ask you for help if we were fooling around with OO code. You’re respecting specialties to make future maintenance easier for the DBA, not harassing someone who doesn’t want to be bothered.
Document where procs are used when you create them. We tend to tune things because they were on our list of top 20 most expensive queries then go back and ask you to figure out where it’s being used. I would prefer to go straight to QA and say “I just improve this screen, can you prove my work with real-life examples?”
Ask us what we did to improve performance and why we did it. It’s almost giving us permission to brag, so we love it. On your side, you learn how we turn something slow into something fast so that you can improve at making something fast from the start.
Bribe us. When we help you with something big, buy us lunch. When you want to impress your users with blazing fast performance, offer to watch our kids for a night. Remember, a bribable DBA is a controllable DBA. The guys at Brent Ozar unlimited can be controlled with Bacon, the girls can be controlled with post-run cupcakes, and I personally prefer to answer your questions over a Hibachi Lunch-and-Learn session. Know the person you’re asking to help and you’ll get better help.