“Concurrency is hard”
When designing for high concurrency, most people look to the hardware for answers. And while it’s true that it plays an important role, there’s a heck of a lot more to it.
Like, every other major database vendor is optimistic by default. Even the free ones. SQL Server is rather lonesome in that, out of the box, you get the Read Committed isolation level.
Everyone else gives you something close to Read Committed Snapshot Isolation. This is why you don’t see too many Oracle bloggers wasting precious internet on the perils of NOLOCK.
- Avoid reader/writer blocking from the start
- Makes reporting on OLTP data easier
- Can alleviate some deadlocking scenarios
- Can drive tempdb crazy
- If your transactions depend on short blocks for queuing, you may need to use locking hints
- Those orphaned “begin tran” queries have more repurcussions
Perhaps the un-sexiest bit of databases: normalization! This plays a huge part in concurrency, though.
If you choose to store all of your data in overly wide tables, a lot more queries rely on them simultaneously. If you’ve followed my advice on optimistic locking, you’ve bought yourself a little reprieve, but all those modification queries still have to get along.
Wider tables are also a lot more difficult to index. The number and variety of queries that touch wide tables in different ways often lead to many wide nonclustered indexes to compensate.
When you spread data out to well-normalized tables, the easier it becomes to spread out writes, and index for specific query patterns.
I can’t recommend Louis Davidson’s book on database design enough.
I know it’s been said everywhere on the internet, and anyone heavily invested in high-end design and implementation will have a runbook for SQL Server setups.
But still, this stuff matters! Let’s take a look at a couple things Microsoft has made better, and a couple things they haven’t.
Having these two things set up correctly from the get-go can have huge impacts down the line. Especially with trace flags 1117 and 1118 being the default tempdb behavior starting with SQL Server 2016. Back when we talked about optimistic isolation levels, we talked a little about tempdb. All the row versions that allow readers and writers to not get all gummed up go to tempdb. You’re gonna want this going blazing fast.
Having instant file initialization turned on is a no brainer (unless you’re covered in tin foil for non-fetish-related reasons). End users typically don’t like having pauses in query processing while waiting for files to grow. Letting SQL and Windows negotiate this with IOUs behind the scenes makes data file growths a whole heck of a lot less painful — especially if you’re the kind of person who lets their files grow by percentages. You know, 10% of 4 GB is a whole lot less than 10% of 400 GB.
- Cost Threshold for Parallelism
I know these are a bit personal, especially if you’re on SharePoint (forgive me for speaking such words), but here’s why they impact concurrency.
You only get a certain number of threads to work with, and they’re based on your CPU count.
You’ve got to factor a lot into how many threads you need
- Number of Users
- Number of parallel queries
- Number of serial queries
- Number of databases and data synchronization (think Mirroring, AGs, backups for Log Shipping)
If you leave MAXDOP and CTFP at the defaults, you’re letting just about every query use every core.
You’re looking at running out of threads pretty quickly, if you’ve got any kind of user base. Nothing kicks concurrency to the curb like THREADPOOL waits.
This is SQL Server we’re talking about, and all those cores are doggone expensive. If a query wants to use more than one, it better be worth it.
Those defaults have been around since Justin Timberlake was country the first time.
As we speak, I’m kicking 128 GB of RAM. I don’t even have a real job.
If you’re sizing production servers, or Tituss Burgess forbid, VM hosts, and you think 256 GB of RAM is a good number, you have lost your mind to oblivion.
SQL Server 2016 SP1 Standard Edition can use 128 GB of RAM to cache data alone. If your database isn’t that big today, it will be soon. And you know you’re working on a data warehousing project, too.
On top of that, it can use
- 32 GB for column store
- 32 GB for Hekaton
And it can use any memory in your server above the 128 GB mark for query processing, DMV and plan caching, backups, and just about anything else.
Thanks to Bob Ward (who through all things are possible, except learning how to use the debugger) for confirming my suspicions about the memory grant part.
Stepping back a bit to design — you better choose those data types carefully if you want to make the most out of your memory.
Running out of memory means waiting on memory. Much like THREADPOOL, this is not what you want your queries waiting on.
Having data cached in memory and having enough memory for queries to run without spilling oodles of noodles to disk is important.
Babel of Bloggers
Concurrency isn’t the domain of any one aspect of the database. You have to put a lot of thought into it at every level of design.
There are other ways to shoot yourself in the foot, whether it’s instance stacking, putting all your clients in one database, or putting hundreds of client databases on one server.
These are the kinds of design decisions people seemed determine to make.
While they’re cheaper and easier in the beginning them, they’re often expensive and difficult to undo later.
Like these butt implants I got.
Thanks for reading!