Research Paper Week: Constant Time Recovery in Azure SQL DB

Let’s finish up Research Paper Week with something we’re all going to need to read over the next year or two. I know, it says Azure SQL DB, but you boxed-product folks will be interested in this one too: Constant Time Recovery in Azure SQL DB by Panagiotis Antonopoulos, Peter Byrne, Wayne Chen, Cristian Diaconu, Raghavendra Thallam Kodandaramaih, Hanuma Kodavalla, Prashanth Purnananda, Adrian-Leonard Radu, Chaitanya Sreenivas Ravella, and Girish Mittur Venkataramanappa (2019).

This one covers a new feature for both Azure SQL DB and SQL Server that:

  • Stores the version store in the user database file(s) rather than TempDB, which
  • Facilitates near-instant transaction rollbacks, even for huge transactions
  • Lets you clear the transaction log much faster, even when transactions are open

Those of you who are familiar with the magic of Read Committed Snapshot Isolation (RCSI) and its use of the version store in TempDB are probably slapping your heads right about now going, “Ah, it’s obvious!” You’re going to be able to hit the paper running – no need to keep reading.

For the rest of you, I’ll try to condense the relevant parts of Kendra’s intro to RCSI here in a few sentences. When you enable RCSI or SI, SQL Server automatically starts storing versions of rows in TempDB. If you want to hold a lock on a row, SQL Server stores the original unmodified row over in TempDB. This means that if someone wants to read (not write) the row that you currently have locked, the SQL Server can simply hand them the unmodified copy out of TempDB and skirt around your lock. This means writers don’t block readers, which is what makes RCSI so massively popular for fixing blocking issues.

The downsides of RCSI include more workloads happening in TempDB, which causes two problems: slower TempDB access, and larger TempDB files. For years, folks have asked for the ability to use a different TempDB per user database to work around problems like this, but Microsoft went in a totally different direction and decided to let you store the version store in the user database instead.

This is way better than separate TempDBs (for this purpose) because having the original rows inside the user database opens up new possibilities for faster transaction rollback and quicker transaction log truncation, as the authors go into in the paper.

If you’re considering enabling Constant Time Recovery in SQL Server 2019 (when it ships), you should read the paper to understand the risks and drawbacks of a feature like this. This is the kind of feature that will put the cutting in cutting edge: I would fully expect to see a lot of rollback, concurrency, and backup/restore bugs fixed in the early Cumulative Updates for 2019. I’m not saying it’s a bad feature – it certainly looks cool – but there are lots of risks when you make a change this big.

If you enjoy this paper, you’ll probably also enjoy Socrates: The New SQL Server in the Cloud. It explains how Azure SQL DB Hyperscale works, and boy, does it have intriguing new ways of handling data and log files.

I hope you enjoyed Research Paper Week – I certainly did – and I want to stop for a moment to thank all those academics and research professionals out there who’ve done such a fantastic job putting very complex thoughts down into easier-to-understand words for the rest of us. These papers involve hundreds or thousands of hours worth of research, action, writing, and testing, and they’re all available to you for free. How awesome is that?

Previous Post
Research Paper Week: In-Memory Multi-Version Concurrency Control
Next Post
DBA Training Plan 13: Why Do Cached Plans Go Bad?

10 Comments. Leave new

  • Mitchell Wheat
    August 24, 2019 4:41 pm

    File format is BibTex? what happened to .pdf? Love the the fact that it’s MS Research and the latest of Windows 10 isn’t able to open a BibTex file by default. Why is that something MS do over and over?

    Reply
  • Johannes Vink
    August 25, 2019 12:10 pm

    Interesting read about Socrates! I am not sure if that research is of “normal” quality, but I had to read twice to find how they tested, what the specifications were. Tested were 2 configs, 8 & 16 cores. But the result tables show… average? Median? Only one set? Nothing about scalability was tested? Since it is only one of the official sources on the architecture of Hyperscale it was interesting, but the test results were… mediocre. On-par with Azure SQL Database. Whoops. Admittedly, it looked like the preview specs, but still.

    I’ve also have not found a metric on the scalability of the storage. I.e. for SQL DB it is 3/6 MB/s ingestion (Gen4/5) per core. But with hyperscale the compute and storage is more seperated… Only the statement that it can scale up to 100 MB/s. Which my usb thumb drive is also capable of.

    Reply
    • That one’s less research, and more just a paper. I wouldn’t focus as much on the test numbers – in the day of the cloud, that stuff can (and does) change at the drop of a hat (both for better and for worse) at any time with no warning.

      Reply
      • Johannes Vink
        August 26, 2019 5:21 am

        That is what I thought indeed.

        The numbers are relevant to me. I am trying to figure out what kind of tier/spec we should choose. Which is a challenge as there are not many acutal figures published.

        All falls in the category: test it your self. I’ve got some interesting figures by now. Not published though. Yet.

        Reply
    • I think its pretty amusing MS is even bothering to do performance research like this when their database as a service performance is absolutely abominable that you could beat running sql with very bad settings and indexing on ten year old desktop hardware and spinning disks.

      Reply
      • woops, commented on the wrong blog

        Reply
      • Johannes Vink
        August 26, 2019 10:15 pm

        Assuming you run on anything else beside Azure VM’s. The throughput of an Azure VM is heavily limited. A specialized database VM L8sv2 (8 cores, 64 GB memory) has a 160 MB/s throughput. And that is for log, data, write/reads together. Ah you say: this is for NVME local drives, not for MS SQL databases? Well… other VM’s are even worse.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.