Research Paper Week: Automatic Indexing in Azure SQL DB

Before I give you the link to the next research paper in this week’s series, I wanna give you a few questions to think about:

  • If you had to build something to automatically add indexes to SQL Server databases, where would you start?
  • What data would you use to make your decisions?
  • How would you test your decisions without implementing them in production?
  • How would you apply your decisions?
  • What’s the worst that could happen if your decisions were wrong?
  • How would you find out if your decisions were wrong?
  • What might your users do that would break your hard work?

Now, think about the same points – but for automatically dropping indexes. What data might you base your decision on? What’s the worst that could happen? How would you find out if the worst had happened, and could you correct the situation?

Seriously, print out this blog post or copy/paste those questions into an email, send it to yourself, and just think about it when you’re standing in a line or riding the train. Don’t cheat by reading the white paper first – spend some serious spare time thinking about how you’d answer those questions.

This paper covers those answers.

After you’ve spent a day or two thinking about the answers, read this: (PDF) Automatically Indexing Millions of Databases in Microsoft Azure SQL Database by Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri.

Automatically Indexing in Azure SQL DB

This easy-to-read 12-page paper is wonderfully candid about how Azure’s auto-indexing does what it does, the challenges it faces, how it’s succeeded, and times when it’s failed.

I don’t want to give away any spoilers here, but trust me when I say that if you do a lot of index tuning – especially graduates of my Mastering Index Tuning class – you’re going to recognize the challenges in this work, and you’re going to identify a lot with how Clippy runs into problems. Indexing is seriously complex, and there’s no magic answer. Azure SQL DB faces the same problems you do, and it’s making a really cool set of first steps to improve the situation – but with tools you might not expect.

I’ll talk spoilers in the comments as folks write in – feel free to leave spoilers in the comments, and avoid the comments before you’ve taken a day or two to think about your answers, and then read the white paper. This thought exercise really is worth the time. (I say this because I’ve thought a lot about these problems over the years!)

Previous Post
Research Paper Week: Query Execution in Column-Oriented Database Systems
Next Post
Research Paper Week: In-Memory Multi-Version Concurrency Control

5 Comments. Leave new

  • >> What’s the worst that could happen if your decisions were wrong?

    New index – frequent Deadlocks
    Remove Index – massive table scans leading to memory starvation, thread exhaustion, IO falling behind, long blocking chains >> application outage

    Reply
  • We did just that 10 years ago at DBSophic with Qure Optimizer.
    We didn’t rely on the optimizer’s ‘missing index’ recommendations which we found to be “Less than optimal” (politically correct phrasing).
    Instead, we collected a trace, parsed all queries, bounded them to the underlying objects and evaluated access patterns.
    Then we came up with a list of potential indexes, implemented them, and replayed the entire workload again against a copy of production.
    This requirement was probably the main downfall of the product…
    Again, we couldn’t rely on estimated plans as the pain is always when the optimizer gets it wrong.
    That’s why we executed each and every query from the workload, parsed the actual execution plan and recorded execution metrics several times.
    We recorded which of the new indexes were used for each query, and how it affected performance in all aspects.
    Afterwards, we did several iterations, removing ‘bad indexes’ and running the workload again.
    This was an exhaustive process, that on large database could take days to complete (another downfall of the product).
    What we didn’t have at our disposal, are millions of different workloads to learn from like Azure has today…

    Reply
    • Y’all were just ahead of your time! Today, you could do it with spot instance bidding on AWS at a dramatically reduced price. The cloud makes all kinds of neat business models possible.

      Reply
  • […] Research Paper Week: Automatic Indexing in Azure SQL DB […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.