It’s Friday afternoon. You, the DBA, are looking forward to a long weekend. You’re going to walk out the door in 30 minutes and enjoy some family time. The phone rings and you pick it up. On the other end, Joe Developer says, “There’s a query that’s been running for a long time. Can you kill it?”
You log into SSMS and use sp_who2 to confirm that the user has an open transaction. You issue the (infamous) KILL command.
Twenty-five minutes later, you’re still sitting there, waiting for the transaction to roll back. You’re sweating. You really want to leave for the weekend. Another ten minutes pass, and finally, it completes.
What happened? Why did SQL Server take so long to kill the process? What happens in a rollback situation?
What Happens in a Transaction
When a transaction is being committed, each step is written into the transaction log. Each value that is being inserted, updated, or deleted is recorded so that if there is a failure before the log is written to the data file, the appropriate rollback or roll forward can take place. The transaction may take and release locks on tables or indexes.
Want to see this in action? You can use the undocumented fn_dblog command to view the entries in the log.
What Happens in a Rollback
When we issue a KILL, every step of the transaction that has been written to the transaction log must be undone. The object the transaction was changing must be put back to the state it was in prior to the change. Was a record inserted? It needs to be deleted. Was a record updated? The previous value must be restored. Was a record deleted? It needs to be added back in. You can use fn_dblog to see this in action, too.
If SQL Server is simply undoing what was once done, why does it sometimes take longer?
If the transaction had to take a lock on a row or table to perform the operation, it will have to take that lock again – and other processes might be using that object now. Also, consider that a rollback is mostly single-threaded. If the transaction ran using four cores initially, and the rollback is now only using one, it will take more time.
Imagine this: you’ve decided to climb a tower with 10,000 stairs. You get to stair 9,999 and decide you don’t want to complete the climb. You can’t wish yourself to the bottom step – you have to go back down. But now, you’re tired – and have to do this single-threaded. Imagine having to hop down the stairs backwards, on one foot.
You are able to track the progress of a rollback. After you issue a KILL command, run KILL [Session ID] WITH STATUSONLY. This will show you approximately how much time is remaining in the rollback – but other processes in the database could affect that.. Consider the information here a guideline, not a hard-and-fast time.
Another thing to keep in mind: what you might not have known when you issued that KILL on Friday at 4:30 pm is that the transaction was going to finish at 4:32 pm. But, you had to wait until 5:05 for the rollback to complete. Always be cautious when using this command!
Solving the root cause
A one-time KILL issued to solve an “oops” may not need further investigation. But if you find yourself having to issue these commands for the same application, at the same time each week, it’s time to dig deeper. You want to work with the owners of the application or process to determine why this keeps happening, repeatedly. If it’s a long-standing process or transaction, what’s changed to make it take so long now? Perhaps you’re doing a mass insert or update, and could break it into chunks. If it’s a new process, look at what else is going on at that time. Could the statements be written more efficiently? Could it be scheduled at another day or time?