Blog

It’s Friday afternoon. You, the DBA, are looking forward to a long weekend. You’re going to walk out the door in 30 minutes and enjoy some family time. The phone rings and you pick it up. On the other end, Joe Developer says, “There’s a query that’s been running for a long time. Can you kill it?”

You log into SSMS and use sp_who2 to confirm that the user has an open transaction. You issue the (infamous) KILL command.

Twenty-five minutes later, you’re still sitting there, waiting for the transaction to roll back. You’re sweating. You really want to leave for the weekend. Another ten minutes pass, and finally, it completes.

What happened? Why did SQL Server take so long to kill the process? What happens in a rollback situation?

What Happens in a Transaction

When a transaction is being committed, each step is written into the transaction log. Each value that is being inserted, updated, or deleted is recorded so that if there is a failure before the log is written to the data file, the appropriate rollback or roll forward can take place. The transaction may take and release locks on tables or indexes.

Want to see this in action? You can use the undocumented fn_dblog command to view the entries in the log.

What Happens in a Rollback

When we issue a KILL, every step of the transaction that has been written to the transaction log must be undone. The object the transaction was changing must be put back to the state it was in prior to the change. Was a record inserted? It needs to be deleted. Was a record updated? The previous value must be restored. Was a record deleted? It needs to be added back in. You can use fn_dblog to see this in action, too.

If SQL Server is simply undoing what was once done, why does it sometimes take longer?

If the transaction had to take a lock on a row or table to perform the operation, it will have to take that lock again – and other processes might be using that object now. Also, consider that a rollback is mostly single-threaded. If the transaction ran using four cores initially, and the rollback is now only using one, it will take more time.

Try going down backwards, hopping on one foot. I dare you!

Try going down backwards, hopping on one foot. I dare you!

Imagine this: you’ve decided to climb a tower with 10,000 stairs. You get to stair 9,999 and decide you don’t want to complete the climb. You can’t wish yourself to the bottom step – you have to go back down. But now, you’re tired – and have to do this single-threaded. Imagine having to hop down the stairs backwards, on one foot.

Tracking Progress

You are able to track the progress of a rollback. After you issue a KILL command, run KILL [Session ID] WITH STATUSONLY. This will show you approximately how much time is remaining in the rollback – but other processes in the database could affect that.. Consider the information here a guideline, not a hard-and-fast time.

Another thing to keep in mind: what you might not have known when you issued that KILL on Friday at 4:30 pm is that the transaction was going to finish at 4:32 pm. But, you had to wait until 5:05 for the rollback to complete. Always be cautious when using this command!

Solving the root cause

A one-time KILL issued to solve an “oops” may not need further investigation. But if you find yourself having to issue these commands for the same application, at the same time each week, it’s time to dig deeper. You want to work with the owners of the application or process to determine why this keeps happening, repeatedly. If it’s a long-standing process or transaction, what’s changed to make it take so long now? Perhaps you’re doing a mass insert or update, and could break it into chunks. If it’s a new process, look at what else is going on at that time. Could the statements be written more efficiently? Could it be scheduled at another day or time?

↑ Back to top
  1. Why can’t SQL Server just jump down the middle of that spiral staircase so I can go home?

    • Ha! Because it would end up on crutches and in a walking boot, just like me this week! (True story. I hurt my foot while running last Friday.)

  2. I’ve noticed when I use KILL WITH STATUSONLY towards the end of the process I will see 0%. Sometimes that 0% will hang on for what feels like hours (sometimes literally). Any idea why?

    • This blog post jinxed me. I am staring a 0% 0 seconds rollback right now.

      • I’ll be honest, no, I don’t know what SQL Server is doing when it sits at 0%. You’ll need to do further investigation – how many processes are active? Is there blocking? Are there locks on objects the transaction needs yet?

  3. I wish I’d read this about 3 months ago. I was loading a fact table from a query that took 7 hours so I started it at 9 pm, figuring it would be done well before the 6am cutoff. I checked on it at 6 and it was still running so I killed it. The rollback took something like 18 hours and I was not a popular guy that day. I learned firsthand that optimization is still important even on a “1-time” query and that the concept of running a large import in multiple small batches is not just for paranoid folks.

  4. In fact it was last Friday mid-day when a developer was loading data into Dynamics AX – the SPID/process had seemed to have lost its mind, as it bloated the trans log to >100GB, filled that drive, etc. The data file doubled in size to ~1TB. I think the developer stopped their processing because I didn’t kill it, but that too started the awful rollback/recovery, at which time the database was obviously unavailable. It was fun watching 100GB of trans log statements roll back – took a good four hours (which I thought was rather quick). I typically watch the progress inside the SQL Server Logs but I’ll keep the statement in mind in the event it happens again one day. Thanks Jess!

  5. Good post Jes!
    Btw, have you ever encountered a session with “Killed/Rollback” status, sitting there for ever and the only way to get rid of it is to recycle SQL Services(IIRC, I encountered this with 2005 and 2008 in the past)? Any thoughts around this is much appreciated…

    • I can hardly remember being on pre-2008 R2 versions so I don’t recall personally, but I have not had to do so with 2008 R2 and newer versions. Did you see progress on the rollback in your SQL Logs or was it just stuck at a certain percent for a really long time?

  6. Hi,

    can you use the start_time and percentage_complete columns of the DMV sys.dm_exec_requests for the SPID that’s issued the KILL command (BOL suggests so)?!

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    SELECT r.percent_complete
    , DATEDIFF(MINUTE, start_time, GETDATE()) AS Age
    , DATEADD(MINUTE, DATEDIFF(MINUTE, start_time, GETDATE()) /
    percent_complete * 100, start_time) AS EstimatedEndTime
    , t.Text AS ParentQuery
    , SUBSTRING (t.text,(r.statement_start_offset/2) + 1,
    ((CASE WHEN r.statement_end_offset = -1
    THEN LEN(CONVERT(NVARCHAR(MAX), t.text)) * 2
    ELSE r.statement_end_offset
    END – r.statement_start_offset)/2) + 1) AS IndividualQuery
    , start_time
    , DB_NAME(Database_Id) AS DatabaseName
    , Status
    FROM sys.dm_exec_requests r
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) t
    WHERE session_id > 50
    AND percent_complete > 0
    ORDER BY percent_complete DESC

    Thanks
    Ian

  7. Pingback: (SFTW) SQL Server Links 21/03/14 • John Sansom

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php