Ever wonder what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the spring 2020 version of our population report.
Our SQL ConstantCare® service lets users send data in about their SQL Servers and get health & performance advice via email. Users can opt in to share their anonymized data to get better recommendations overall.
This data isn’t a full picture of the SQL Server ecosystem. Lemme be really clear: this is by no means a random sample of the full population. It’s just a picture of the kinds of servers where people want an inexpensive second opinion. It doesn’t include giant corporations, hospitals with PII, or tiny businesses who just shove a SQL Server in the corner and then never take a second glance at it (and if my experience is any indication, never back it up.)
Even though it isn’t a full picture, with 3,878 database servers sharing data as of May 30 (up from 3,570 in the last report), there’s still a lot of useful information in here for the community. Understanding what servers are out there will help you put context around where your own servers rank, and helps the community better build training material that’s relevant for most users.
This population report is just my attempt to help clear up the fog of war. Microsoft simply doesn’t give us any adoption information (even though SQL Server phones home by default.) If you believed what you saw onstage at conferences and in Microsoft’s changing certification programs, you’d think no one still bothers with self-hosted SQL Server, but the few who do have already migrated to SQL Server 2019 running on Kubernetes and Linux. This report is my humble attempt to keep Microsoft honest.
This time around, I built this post in an Azure Data Studio notebook live in front of a streaming audience, so the charts and tables look a little different. You can download the notebook to play with it yourself (but can’t run the queries, of course), and you can watch the video for that one at the end of the post.
Let’s start with a simple question:
What versions are the most popular?
Woohoo! This quarter, SQL Server 2019 adoption outpaced SQL Server 2008!
Thoughts on reading that chart:
- SQL Server 2019 still hasn’t caught up to 2008R2.
- If you add together 2008 & 2008R2, almost 10% of the servers out there are no longer under support. (That number is really slightly higher since we don’t monitor 2000 or 2005, and every time I talk versions in front of a group, folks talk about the 2000 and 2005 instance they still have cowering in shame in a closet somewhere.
- SQL Server 2017 adoption continues to rise, but frankly, 2016 is still the complete juggernaut: 1 in 3 servers is 2016.
- You’ll notice that the graph format is different this time around – that’s because I’m using Azure Data Studio’s built-in charts. The experience isn’t great – hopefully that’ll improve over time as well.
I’m going to continue publishing the numbers in this format (only showing the current population) for the first year of the survey, but on the first year’s anniversary, I’ll switch to graphs that show trending and year-over-year changes.
How about development? Are people maybe testing their apps on newer versions?
What versions are people using in development?
SQL Server 2019’s adoption rate for development servers doubled from the last quarter, and that’s a great sign. People are starting to develop more against 2019. However, 2017’s growth wasn’t as strong – I think it basically means people are using 2019 as their default new dev box, but it doesn’t bode well for a rapid adoption of 2019 in production over the next couple of quarters. People just aren’t testing on it yet, and testing is utterly critical for 2019 given its radical differences in execution plans (batch mode on rowstore, inlined scalar functions, etc.)
I think about this chart a lot when I’m designing training. When is the right time to start building developer training material for SQL Server 2019? It’s a tricky timing act: you need to be skilled-up before you adopt the new version, and I need to have the training material ready to go before then.
However, if I focus on building 2019-specific training material right now, it could be 6-12-18 months before it really starts to pay off. Judging by this chart, I’d be better off building 2017-specific training material since there are a lot of people who still haven’t adopted it yet. I use 2019 in my class lab VMs, but I still focus on 2017 compat mode (and only require the bring-your-own-VM students to use 2017 minimum, not 2019.)
For the rest of the census report, I’m going to focus on just the production servers.
How much data do servers have?
For quick, easy reporting (and because accuracy here has relatively limited value), I’m going to use user database data file sizes, not contents, and not log files.
Keep in mind that this is the total data file size for all user databases on the server, and these numbers really help to keep the world in perspective.
Over half of all servers host less than 225GB data.
Over 2/3 of servers are smaller than the Stack Overflow demo database, and that doesn’t even have nonclustered indexes. That’s why I love using that database for training: it challenges a very good chunk of the audience.
Consistent with the last population report, about 15% have a terabyte or more of data, with some of them having dozens of terabytes. That’s useful because it gives me over 500 servers’ worth of metadata to help guide folks in larger tiers. Performance advice is dramatically different at that scale: it’s so much harder and more expensive to throw memory and indexes at problems. Heck, just creating an index on a 1TB table can represent a big challenge for some shops.
How much hardware do production servers get?
Again, really helps set expectations: the majority of servers out there are pretty doggone small. You can go a really, really long way with just a few CPU cores, especially if you’re diligent about query tuning. At $2,000 per core for Standard Edition, you wanna avoid throwing cores at a problem if you can avoid it.
About 2/3 of the servers have less CPU power than a modern pro-grade laptop. That’s not a number to be ashamed of – that’s a number to celebrate! You can do a lot with a little. As with last quarter’s report, only about 10% of servers have over 24 cores.
Memory isn’t licensed – although Standard Edition has a ~128GB cap – so how much memory do folks have at the OS level:
I’m specifically looking at memory for the entire server OS here, like how much Windows sees.
There are several ways you can look at SQL Server memory, like target (how much it wants to use) or total (how much it’s currently using.) I kept it simple and I’m just looking at OS memory because I want to get a feel for the hardware population overall, not how people are configuring SQL Server, or whether their workloads are causing total memory to rise to the target counter.
When you think about these metrics, remember that over half of all production servers host <225GB data files. If you handed me a server with, say, 200GB of data and asked me how much memory to assign it, I think 32-48GB is actually a pretty realistic number. I might even call it generous – keeping in mind that RAM is still ridiculously overpriced in the cloud, and it’s tied to core counts.
Now, look at the metrics: half of the servers host <225GB of data, and half of servers have <48GB RAM. That makes decent sense. 0-15GB RAM sounds terrifying, but remember that 1,049 servers are hosting <25GB of data.
When you’re at these really low database sizes, I can see why people would think they’d be interested in a container-based solution: the overhead of 1,049 servers running Windows is pretty ugly. Containers might remove some of that overhead.
Bonus: what unusual wait types are showing up?
One of the fun things with gathering performance metrics on thousands of servers is that we get to see some pretty nifty edge cases:
Ooo, neato! PVS_PREALLOCATE, you say? SHRINK_CLEANER_SYNC? Time for me to go spelunking through the raw data to see what I can learn, and how I can help customers make their servers faster and more reliable.
Watch Me Build This Post with Azure Data Studio
I need to warn you a little, though, dear reader: I’m using Azure Data Studio in here, and I have some rather frustrating experiences with the graphing functionality. It’s a good overview of what it’s really like to use this product hands-on to build posts like this, though:
Brent, does postgreSQL have analytical functions? That might be the way to display the numbers of databases according to file size. I guessing something like:
SELECT COUNT(file_size) OVER (ROWS BETWEEN … FROM TABLE_NAME …
DBAugie – that’s a great question! I don’t really wanna focus this blog on learning PostgreSQL though – for questions like that, I’d rather steer you toward the documentation or other sites.
Regarding the server version: I’m stuck to SQL 2016 Enterprise, since one of my predecessors decided NOT to buy Software Assurance and there is no other upgrade option from Microsoft beside paying the equivalent for a nice house for complete new SQL 2019 Enterprise licenses and throwing the old 2016 licences into the trash bin…
Thomas – well, the good news is that 2016 Enterprise was a pretty good version. You still have a lot of good years left in ‘er before you need to move on and find another job with better predecessors. 😉
I have some interesting experience related to Managed Instance in Azure.
There are 2 server tiers: “General Purpose” and “Business Critical”.
The difference that for General database files are located on some other hardware accessed by https network protocol and database files on Busines critical are located on server that instance hosted (local files)
So, Business Critical works much faster for high load but it costs much more.
Vasyl – yep, that’s explained pretty clearly in the documentation here:
Microsoft is really up-front about where the data & log files are stored, but…folks just don’t read the documentation. Go figure.