Do You Have Tables In Your Tables?

This Isn’t A Trick Question

Hopefully it’ll get you thinking about your tables, and how they’re designed. One of the most consistent problems I see with clients is around wide tables.

I don’t mean data types, I mean the number of columns.

Going back to Michael Swart’s 10% Rule, if your tables have > 100 columns in them, you’re likely going to run into trouble.

What Makes Them Bad?

They’re nearly impossible to index efficiently:

  • Queries will hit them in many different ways
  • WHERE clauses will be unpredictable
  • SELECT lists will vary wildly

When indexes pile up to support all these different queries, locking and blocking will start to become larger issues.

While some of them can be solved with optimistic isolation levels, writer on writer conflicts are really tough to avoid.

First Sign Of Problems: Prefixed Columns

Do you have columns with similar prefixes?

Iffy Kid

If you have naming patterns like this, it’s time to look at splitting those columns out.

I took the Users and Posts tables from Stack Overflow and mangled them a bit to look like this.

You may not have tables with this explicit arrangement, but it could be implied all over the place.

One great way to tell is to look at your indexes. If certain groups of columns are always indexed together, or if there are lots of missing index requests for certain groups of columns, it may be time to look at splitting them out into different tables.

Second Sign Of Problems: Numbered Columns

Do you allow people multiple, optional values?

Maybe So.

The problems you’ll run into here will be searching across all of those.

You’ll end up with queries like this

Which can throw the optimizer a hard curve ball, and make indexing awkward.

This should also most likely be broken out into a table of its own that tracks the Post Id and Tag Id, along with a table that tracks the Ids of each Tag.

A wider index across a narrower table is typically less troublesome.

Third Sign Of Problems: Lists In Columns

Poor Tags

This should be obvious, and has a similar solution to the problem up there.

Your queries will end up doing something like this:

Which can’t be indexed terribly well, even if you go our of your mind with trigrams.

I See Tables Within Tables

If you have tables with these patterns, it’s time to take a really close look at them.

I was not here

If you’re totally lost on this, Check out Louis Davidson’s book on relational design.

Stuff like this is easy to sketch out, but often difficult to get fixed. It requires application changes, moving lots of data, and probably dropping indexes.

It’s totally worth it when you get it done though, because it makes your tables far easier to index and manage.

You’ll need far fewer insanely wide indexes to compensate for bad design, and you’ll have way less head scratcher missing indexe requests to sort through.

Thanks for reading!

Previous Post
Building SQL ConstantCare®: Updating ConstantCare.exe
Next Post
What Were Your Game-Changing Discoveries in SQL?

11 Comments. Leave new

  • emanuele.meazzo1
    August 28, 2018 10:03 am

    I once found a DB with 352 Columns in the “main table”.
    It still haunts me.

  • Chandram Venigalla
    August 28, 2018 11:52 am

    Thanks, Erik

  • Bagface Lymphnode
    August 28, 2018 12:02 pm

    We see a slight problem on your SQL Server. SPID 666. We want it killed. I did not say this…

  • Good info, thanks Erik. I also enjoyed Swart’s post.

  • As I sit here and type this, our DW guys tell me that it is totally OK to have a table with 1140 columns in it.

  • John G Hohengarten
    August 29, 2018 10:23 am

    The spice must flow™…
    (nice Dune image reference).

  • […] Erik Darling has a few tips to see if you’re not properly normalizing your tables: […]

  • Kevin Billings
    August 30, 2018 12:14 pm

    If all database architects understood this the world would be a better place.

  • There’s an application that’s heavily used at our shop with column names up to 45 characters. I won’t name the application (sounds like StayBehind) but the character names often aren’t unique until past the 33rd character. Our main ERP systems limit column names to 32 characters. So views have to be created for each table with this issue when it’s used to replicate data each way. That also means that with so many tables that qualify, all tables now involved have views created in the application. So with 25 databases (one for each region). it’s a lot of views. Did I mention the columns change and the tables have columns added fairly frequently?

    It’s roughly 1800 views at this point and increasing. Fortunately, upgrades don’t seem to wipe out the views, at least so far.

Menu
{"cart_token":"","hash":"","cart_data":""}