T-SQL & Development

Do You Have Tables In Your Tables?

By Erik Darling · August 28, 2018 · 11 comments

This Isn’t A Trick Question

Hopefully it’ll get you thinking about your tables, and how they’re designed. One of the most consistent problems I see with clients is around wide tables.

I don’t mean data types, I mean the number of columns.

Going back to Michael Swart’s 10% Rule, if your tables have > 100 columns in them, you’re likely going to run into trouble.

What Makes Them Bad?

They’re nearly impossible to index efficiently:

Queries will hit them in many different ways
WHERE clauses will be unpredictable
SELECT lists will vary wildly

When indexes pile up to support all these different queries, locking and blocking will start to become larger issues.

While some of them can be solved with optimistic isolation levels, writer on writer conflicts are really tough to avoid.

First Sign Of Problems: Prefixed Columns

Do you have columns with similar prefixes?

If you have naming patterns like this, it’s time to look at splitting those columns out.

I took the Users and Posts tables from Stack Overflow and mangled them a bit to look like this.

You may not have tables with this explicit arrangement, but it could be implied all over the place.

One great way to tell is to look at your indexes. If certain groups of columns are always indexed together, or if there are lots of missing index requests for certain groups of columns, it may be time to look at splitting them out into different tables.

Second Sign Of Problems: Numbered Columns

Do you allow people multiple, optional values?

The problems you’ll run into here will be searching across all of those.

You’ll end up with queries like this

SELECT *
FROM dbo.Yuck
WHERE Tag1 = '...'
OR    Tag2 = '...'
OR    Tag3 = '...'
OR    Tag4 = '...'
OR    Tag5 = '...'

SELECT *

FROM dbo.Yuck

WHERE Tag1 = '...'

OR Tag2 = '...'

OR Tag3 = '...'

OR Tag4 = '...'

OR Tag5 = '...'

Which can throw the optimizer a hard curve ball, and make indexing awkward.

This should also most likely be broken out into a table of its own that tracks the Post Id and Tag Id, along with a table that tracks the Ids of each Tag.

A wider index across a narrower table is typically less troublesome.

Third Sign Of Problems: Lists In Columns

This should be obvious, and has a similar solution to the problem up there.

Your queries will end up doing something like this:

SELECT 'stuff'
FROM dbo.Posts AS p
WHERE p.Tags LIKE '%...%'

SELECT 'stuff'

FROM dbo.Posts AS p

WHERE p.Tags LIKE '%...%'

Which can’t be indexed terribly well, even if you go our of your mind with trigrams.

I See Tables Within Tables

If you have tables with these patterns, it’s time to take a really close look at them.

If you’re totally lost on this, Check out Louis Davidson’s book on relational design.

Stuff like this is easy to sketch out, but often difficult to get fixed. It requires application changes, moving lots of data, and probably dropping indexes.

It’s totally worth it when you get it done though, because it makes your tables far easier to index and manage.

You’ll need far fewer insanely wide indexes to compensate for bad design, and you’ll have way less head scratcher missing indexe requests to sort through.

Thanks for reading!

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

11 comments

emanuele.meazzo1 Student since 2017

August 28, 2018 at 10:03 am

I once found a DB with 352 Columns in the “main table”.
It still haunts me.

Reply
Chandram Venigalla VIP Student since 2017

August 28, 2018 at 11:52 am

Thanks, Erik

Reply
Bagface Lymphnode

August 28, 2018 at 12:02 pm

We see a slight problem on your SQL Server. SPID 666. We want it killed. I did not say this…

Reply
Robert Rice Student since 2019

August 29, 2018 at 5:47 am

Good info, thanks Erik. I also enjoyed Swart’s post.

Reply
Kevin

August 29, 2018 at 7:12 am

As I sit here and type this, our DW guys tell me that it is totally OK to have a table with 1140 columns in it.

Reply
1. Brent Ozar
  
  August 29, 2018 at 7:29 am
  
  I actually spit coffee on my monitor reading that. Nice way to start a morning. (Well, reading about it, not dealing with it.)
  
  I salute you, sir.
  
  Reply
  1. Kevin
    
    August 29, 2018 at 7:50 am
    
    Thanks… I am going to need all the help I can get.
    
    Reply
John G Hohengarten VIP Student since 2017

August 29, 2018 at 10:23 am

The spice must flow™…
(nice Dune image reference).

Reply
Normalization Code Smells – Curated SQL

August 30, 2018 at 7:05 am

[…] Erik Darling has a few tips to see if you’re not properly normalizing your tables: […]

Reply
Kevin Billings

August 30, 2018 at 12:14 pm

If all database architects understood this the world would be a better place.

Reply
Kevin

September 3, 2018 at 10:44 am

There’s an application that’s heavily used at our shop with column names up to 45 characters. I won’t name the application (sounds like StayBehind) but the character names often aren’t unique until past the 33rd character. Our main ERP systems limit column names to 32 characters. So views have to be created for each table with this issue when it’s used to replicate data each way. That also means that with so many tables that qualify, all tables now involved have views created in the application. So with 25 databases (one for each region). it’s a lot of views. Did I mention the columns change and the tables have columns added fairly frequently?

It’s roughly 1800 views at this point and increasing. Fortunately, upgrades don’t seem to wipe out the views, at least so far.

Reply