Rollback: What happens when you KILL a session?

SQL Server
53 Comments

It’s Friday afternoon. You, the DBA, are looking forward to a long weekend. You’re going to walk out the door in 30 minutes and enjoy some family time. The phone rings and you pick it up. On the other end, Joe Developer says, “There’s a query that’s been running for a long time. Can you kill it?”

You log into SSMS and use sp_who2 to confirm that the user has an open transaction. You issue the (infamous) KILL command.

Twenty-five minutes later, you’re still sitting there, waiting for the transaction to roll back. You’re sweating. You really want to leave for the weekend. Another ten minutes pass, and finally, it completes.

What happened? Why did SQL Server take so long to kill the process? What happens in a rollback situation?

What Happens in a Transaction

When a transaction is being committed, each step is written into the transaction log. Each value that is being inserted, updated, or deleted is recorded so that if there is a failure before the log is written to the data file, the appropriate rollback or roll forward can take place. The transaction may take and release locks on tables or indexes.

Want to see this in action? You can use the undocumented fn_dblog command to view the entries in the log.

What Happens in a Rollback

When we issue a KILL, every step of the transaction that has been written to the transaction log must be undone. The object the transaction was changing must be put back to the state it was in prior to the change. Was a record inserted? It needs to be deleted. Was a record updated? The previous value must be restored. Was a record deleted? It needs to be added back in. You can use fn_dblog to see this in action, too.

If SQL Server is simply undoing what was once done, why does it sometimes take longer?

If the transaction had to take a lock on a row or table to perform the operation, it will have to take that lock again – and other processes might be using that object now. Also, consider that a rollback is mostly single-threaded. If the transaction ran using four cores initially, and the rollback is now only using one, it will take more time.

Try going down backwards, hopping on one foot. I dare you!
Try going down backwards, hopping on one foot. I dare you!

Imagine this: you’ve decided to climb a tower with 10,000 stairs. You get to stair 9,999 and decide you don’t want to complete the climb. You can’t wish yourself to the bottom step – you have to go back down. But now, you’re tired – and have to do this single-threaded. Imagine having to hop down the stairs backwards, on one foot.

Tracking Progress

You are able to track the progress of a rollback. After you issue a KILL command, run KILL [Session ID] WITH STATUSONLY. This will show you approximately how much time is remaining in the rollback – but other processes in the database could affect that.. Consider the information here a guideline, not a hard-and-fast time.

Another thing to keep in mind: what you might not have known when you issued that KILL on Friday at 4:30 pm is that the transaction was going to finish at 4:32 pm. But, you had to wait until 5:05 for the rollback to complete. Always be cautious when using this command!

Solving the root cause

A one-time KILL issued to solve an “oops” may not need further investigation. But if you find yourself having to issue these commands for the same application, at the same time each week, it’s time to dig deeper. You want to work with the owners of the application or process to determine why this keeps happening, repeatedly. If it’s a long-standing process or transaction, what’s changed to make it take so long now? Perhaps you’re doing a mass insert or update, and could break it into chunks. If it’s a new process, look at what else is going on at that time. Could the statements be written more efficiently? Could it be scheduled at another day or time?

Previous Post
When Do You Fire the DBA?
Next Post
Finding One Problem Query With Extended Events

53 Comments. Leave new

  • Andrew Notarian
    March 18, 2014 7:57 am

    Why can’t SQL Server just jump down the middle of that spiral staircase so I can go home?

    Reply
    • Jes Schultz Borland
      March 18, 2014 8:05 am

      Ha! Because it would end up on crutches and in a walking boot, just like me this week! (True story. I hurt my foot while running last Friday.)

      Reply
  • I’ve noticed when I use KILL WITH STATUSONLY towards the end of the process I will see 0%. Sometimes that 0% will hang on for what feels like hours (sometimes literally). Any idea why?

    Reply
    • Andrew Notarian
      March 18, 2014 9:32 am

      This blog post jinxed me. I am staring a 0% 0 seconds rollback right now.

      Reply
      • Jes Schultz Borland
        March 18, 2014 10:07 am

        I’ll be honest, no, I don’t know what SQL Server is doing when it sits at 0%. You’ll need to do further investigation – how many processes are active? Is there blocking? Are there locks on objects the transaction needs yet?

        Reply
      • Shakti Singh
        June 13, 2016 2:32 am

        Hi Jes/Team,

        I would like to know, How a Kill statment rollback all the changes, I mean does it need to write more logs into the log files or need to change old logs (which are already logged into the log file).

        Reply
  • I wish I’d read this about 3 months ago. I was loading a fact table from a query that took 7 hours so I started it at 9 pm, figuring it would be done well before the 6am cutoff. I checked on it at 6 and it was still running so I killed it. The rollback took something like 18 hours and I was not a popular guy that day. I learned firsthand that optimization is still important even on a “1-time” query and that the concept of running a large import in multiple small batches is not just for paranoid folks.

    Reply
  • In fact it was last Friday mid-day when a developer was loading data into Dynamics AX – the SPID/process had seemed to have lost its mind, as it bloated the trans log to >100GB, filled that drive, etc. The data file doubled in size to ~1TB. I think the developer stopped their processing because I didn’t kill it, but that too started the awful rollback/recovery, at which time the database was obviously unavailable. It was fun watching 100GB of trans log statements roll back – took a good four hours (which I thought was rather quick). I typically watch the progress inside the SQL Server Logs but I’ll keep the statement in mind in the event it happens again one day. Thanks Jess!

    Reply
  • Good post Jes!
    Btw, have you ever encountered a session with “Killed/Rollback” status, sitting there for ever and the only way to get rid of it is to recycle SQL Services(IIRC, I encountered this with 2005 and 2008 in the past)? Any thoughts around this is much appreciated…

    Reply
    • I can hardly remember being on pre-2008 R2 versions so I don’t recall personally, but I have not had to do so with 2008 R2 and newer versions. Did you see progress on the rollback in your SQL Logs or was it just stuck at a certain percent for a really long time?

      Reply
  • Hi,

    can you use the start_time and percentage_complete columns of the DMV sys.dm_exec_requests for the SPID that’s issued the KILL command (BOL suggests so)?!

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    SELECT r.percent_complete
    , DATEDIFF(MINUTE, start_time, GETDATE()) AS Age
    , DATEADD(MINUTE, DATEDIFF(MINUTE, start_time, GETDATE()) /
    percent_complete * 100, start_time) AS EstimatedEndTime
    , t.Text AS ParentQuery
    , SUBSTRING (t.text,(r.statement_start_offset/2) + 1,
    ((CASE WHEN r.statement_end_offset = -1
    THEN LEN(CONVERT(NVARCHAR(MAX), t.text)) * 2
    ELSE r.statement_end_offset
    END – r.statement_start_offset)/2) + 1) AS IndividualQuery
    , start_time
    , DB_NAME(Database_Id) AS DatabaseName
    , Status
    FROM sys.dm_exec_requests r
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) t
    WHERE session_id > 50
    AND percent_complete > 0
    ORDER BY percent_complete DESC

    Thanks
    Ian

    Reply
  • Hi – What if we detach and attach the DB in case the roll back takes too long. Will it harm the database or cut-short the whole process?

    Reply
    • You cannot detach during a rollback. You can’t do much at all in fact – I just ran into a four hour rollback recently and you are pretty much going to sit and watch it until it’s done.

      Reply
      • Jes Schultz Borland
        January 30, 2015 3:28 pm

        I agree with Allen. You can’t detach a database when replication or mirroring is set up, or when database snapshots exist – so, when sessions exist. If a session is in rollback, it exists.

        Reply
        • I’ve seen people try to shut down the computer the instance is on as a way to short circuit a rollback but that not only risks corrupting your systems but the rollback is still going to happen as part of the start up process.

          Reply
          • Yeah, that essentially the same as restarting the instance, at which time it will continue with the rollback. There is some risk involved, but it doesn’t get you anywhere. Now, while I won’t promote this by any means, one person on #sqlhelp (and I trust him) said I could stop the instance and just move/remove the data/log files (I didn’t care if I lost it altogether – it wasn’t critical) and fire it up again. Unfortunately in my case I had an FCI and it just failed over 🙂 But, the stuck rollback did start going.

  • i am doing a import of huge oracle table.
    using “imp”
    it is taking huge time as the data is of millions…
    i had killed it.
    as everyone knows rollback started….
    and as this is also going long time…i abruptly closed the application.

    Now waht happens to my this rollback?
    will it occupy some space in terms of transaction logs?
    any action needed now?
    how to release any space it occupied?

    Reply
    • SQL Server will have to complete the rollback regardless, so that means time to sit back and relax while it does so. The transaction log is read from during rollbacks – it should not grow any larger due to the rollback. Once it’s complete, you can do transaction log maintenance if you want to right-size the log back to what it should be – that’s a whole other process.

      Reply
      • I might be little bit clear in my question.

        This is in Orcale DB and not SQL server.
        Also, i had already closed the session abruptly. So i am asking for cure not prevention.
        (issue already happend)
        I think the tranasaction log is really huge in this case…
        So as rollback already killed.
        what happens to the log?
        who need to clean that log?
        oracle automatically look after it?
        or DBA has to do something?

        Reply
        • Kendra Little
          March 16, 2015 9:37 am

          Hi Santhosh,

          This post is on SQL Server and most of its readers are SQL Server people. Have you tried asking on a site like dba.stackexchange.com? Make as much of the issue as clear as possible in the header, and put that it’s oracle in the title.

          Kendra

          Reply
  • Hi Jes,

    I have a server in which there are multiple DBAs (coz it is a shared environment).
    Someone has killed some transactions and I can see the logs in the SQLErrorlog.
    For Instance:
    Process ID 241 was killed by hostname , host process ID 51292.

    Is there any way to find who killed the statement and what was the statement ?

    If we cannot find out past data, is there any mechanism whereby we can set up an audit and track the kill commands for the future??

    Please help!

    Reply
  • Chaitanya Gadiraju
    September 16, 2016 6:13 am

    Hi,

    My situation is too different. I also faced same problem.

    transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.

    My situation was, one User executed update query, that table having 70 Lakh rows and after that he requested to kill that process. so killed.
    That bad user has created that table is in master database (production server— total db’s size 700 GB) and even We do not have backups also due to disk space.

    I do not know what i have to do as i am only one sql dba.
    If i restarted the server, it may impact.

    So,Finally i decided to restart the services with fear. But before that i started master database backup.
    after started backup, That rollback was cleared automatically.

    Thank god.

    Reply
    • There are times where the rollback gets stuck on stupid and you have to restart the instance. I’ve had this happen at least three times. When you bring the instance back up, the DB sits in a “recovering” state while the rollback actually happens.

      Reply
  • Having similar issue. On a UAT DB (thankfully) developer run a massive delete 10s millions rows of a audit table. Run for 14 Hours then just decided to kill as not prepared to wait further, So roll back kicked in. Been running for over 1 day now. Estimate completion is 0% and estimated time is 0 seconds yet this is doing something. Monitoring row count in audit table using sys.partitions shows constant row count growing and I’ve disable any triggers populating audit table. I’ve tried to find out if there is a bug in the Estimated status but find no reference to it. Log is 145GB and looks to be doing only 10m rows in 5/6 hours. Could be here a while. Especially as red rollbacks are typically single threaded compare to the delete which could have been multithreaded. Anyone else experience this.

    Am reluctant to restart as don’t want DB to come up in recovery as we can work around audit table. I’ve seen the rollback hang before in another instance but could see it was doing nothing – no CPU, IO, reds and writes increase in sp_whoisactive – where this one is…

    Reply
    • Paul, I had this happen yesterday with a fairly large rollback – it sat at 0% and all of the sudden it completed. Sometimes letting it go for a while is fine – the estimates are not real accurate at times.

      Reply
      • Hi Allen – Sounds good – its not urgent as UAT and time of year – so just going to let this run, as long as can see its still doing something. Will be good to see how long it actually takes to roll back…

        Reply
        • What I watched was the data usage w/in the data file – it should be filling up again. But yeah, be patient because if you fail over or restart, the DB will be “in recovery” and in some cases that stinks sitting and waiting forever w/ your DB now unavailable.

          Reply
  • Guys ,
    I faced similar problem but the process I killed was using the cursor, for Eg. Cursor getting values from Table A and then running delete on Table B with “where” clause from Cursor values . Running in batch of 10000 rows with begin and Commit transaction, there are more than 20m rows to be delete but I killed process ID after 100 batches, meaning it deleted around 10m rows.
    My understanding was SQL will roll back only the last uncommitted batch (10k rows) but I noticed it took a while and at the end it did rolled back everything from beginning.
    Any inputs much appreciated.

    Reply
  • I am pretty sure that the SQL Server rollback ‘code’ was designed and implemented by a contractor who simply wanted the benefit of lots of extra ‘contract hours’ for himself (and implicitly for all the other contractors worldwide). The solution to all the zillions of hours lost whilst waiting for SQL Server ‘to get its sh*t together’ is to employ your DBA as a permie or get Gates to refund our lost/wasted lives out of his £trillions??? That way you can be sure if a 0% 0hours rollback happens it will SURELY be by accident as your DBA won’t want to stare at a screen for yonks instead of being with his/her family….

    Just a thought!

    Reply
  • EXCELENT POST!!! Thanks!

    Reply
  • Your synopsis was exactly my scenario today.
    Thanks for the post and for the KILL [Session ID] WITH STATUSONLY.
    Awesome!

    Reply
  • […] for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks […]

    Reply
  • Tanmay Dhamnekar
    September 14, 2020 10:11 am

    During the Killed/Rollback statement, will it cause block to other access the database.
    I have query which running from 12 hrs and now I need to cancel it. I am in worry that, if Killed/Rollback statement will block to other processes as well during it overall completion.

    Reply
  • Great article as usual, thank you Brent. Is there a possibility for data loss when killing a spid?

    Reply
  • Question, we had a really bad query get on our system and the ETL process doing and later table switch caused it to be killed. It is pulling data from our DW database but is all temp tables and only selects from the user database… So even though it appears to be tempdb it still goes into a rollback. We eventually chose to restart SQL and it recovered fine. Since there was no updating to the “real” database, why does it need to try and rollback?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.