This Isn’t A Trick Question
Hopefully it’ll get you thinking about your tables, and how they’re designed. One of the most consistent problems I see with clients is around wide tables.
I don’t mean data types, I mean the number of columns.
Going back to Michael Swart’s 10% Rule, if your tables have > 100 columns in them, you’re likely going to run into trouble.
What Makes Them Bad?
They’re nearly impossible to index efficiently:
- Queries will hit them in many different ways
- WHERE clauses will be unpredictable
- SELECT lists will vary wildly
When indexes pile up to support all these different queries, locking and blocking will start to become larger issues.
While some of them can be solved with optimistic isolation levels, writer on writer conflicts are really tough to avoid.
First Sign Of Problems: Prefixed Columns
Do you have columns with similar prefixes?
If you have naming patterns like this, it’s time to look at splitting those columns out.
I took the Users and Posts tables from Stack Overflow and mangled them a bit to look like this.
You may not have tables with this explicit arrangement, but it could be implied all over the place.
One great way to tell is to look at your indexes. If certain groups of columns are always indexed together, or if there are lots of missing index requests for certain groups of columns, it may be time to look at splitting them out into different tables.
Second Sign Of Problems: Numbered Columns
Do you allow people multiple, optional values?
The problems you’ll run into here will be searching across all of those.
You’ll end up with queries like this
WHERE Tag1 = '...'
OR Tag2 = '...'
OR Tag3 = '...'
OR Tag4 = '...'
OR Tag5 = '...'
Which can throw the optimizer a hard curve ball, and make indexing awkward.
This should also most likely be broken out into a table of its own that tracks the Post Id and Tag Id, along with a table that tracks the Ids of each Tag.
A wider index across a narrower table is typically less troublesome.
Third Sign Of Problems: Lists In Columns
This should be obvious, and has a similar solution to the problem up there.
Your queries will end up doing something like this:
FROM dbo.Posts AS p
WHERE p.Tags LIKE '%...%'
Which can’t be indexed terribly well, even if you go our of your mind with trigrams.
I See Tables Within Tables
If you have tables with these patterns, it’s time to take a really close look at them.
If you’re totally lost on this, Check out Louis Davidson’s book on relational design.
Stuff like this is easy to sketch out, but often difficult to get fixed. It requires application changes, moving lots of data, and probably dropping indexes.
It’s totally worth it when you get it done though, because it makes your tables far easier to index and manage.
You’ll need far fewer insanely wide indexes to compensate for bad design, and you’ll have way less head scratcher missing indexe requests to sort through.
Thanks for reading!
I once found a DB with 352 Columns in the “main table”.
It still haunts me.
We see a slight problem on your SQL Server. SPID 666. We want it killed. I did not say this…
Good info, thanks Erik. I also enjoyed Swart’s post.
As I sit here and type this, our DW guys tell me that it is totally OK to have a table with 1140 columns in it.
I actually spit coffee on my monitor reading that. Nice way to start a morning. (Well, reading about it, not dealing with it.)
I salute you, sir.
Thanks… I am going to need all the help I can get.
The spice must flow™…
(nice Dune image reference).
[…] Erik Darling has a few tips to see if you’re not properly normalizing your tables: […]
If all database architects understood this the world would be a better place.
There’s an application that’s heavily used at our shop with column names up to 45 characters. I won’t name the application (sounds like StayBehind) but the character names often aren’t unique until past the 33rd character. Our main ERP systems limit column names to 32 characters. So views have to be created for each table with this issue when it’s used to replicate data each way. That also means that with so many tables that qualify, all tables now involved have views created in the application. So with 25 databases (one for each region). it’s a lot of views. Did I mention the columns change and the tables have columns added fairly frequently?
It’s roughly 1800 views at this point and increasing. Fortunately, upgrades don’t seem to wipe out the views, at least so far.