Let’s Build a Better KILL. What Do We Need to Consider?
When there’s a performance emergency, a lot of us perform the same manual tasks over and over. We run sp_BlitzWho or sp_WhoIsActive, look for blocking, long-running or out-of-control queries, and kill them.
I’m going to build a new sp_kill for the First Responder Kit to make that easier, faster, and safer, and I want your help with what I should consider as part of the design. Read through my ideas below, and then leave your thoughts in the comments. Feel free to chime in on other peoples’ thoughts and ideas as well.
Who is sp_kill for?
This stored proc IS targeted at people who would otherwise just restart the whole server.
This stored proc is NOT targeted at DBAs who want to pull on the latex gloves to do careful analysis first.
This is a little tricky because a lot of you are the latex gloves type. As you read through this, I want you to think about being the only DBA caretaker at a small company. You’ve gone on vacation in Mexico, unreachable to your peers, and they’re facing a performance emergency that’s rendered the entire company inoperative. You want to give them the safest, easiest tool that might help alleviate the problems. It’s going to be their last ditch effort before they restart the server out of desperation.
What problems might they be facing?
A read-only query got a bad query plan. Someone’s running a read-only SELECT, not a SELECT INTO, and it’s not part of a larger transaction (like something that might have been modifying data earlier.) It’s been running for more than X seconds (or minutes, which should be a parameter), and shows no signs of finishing anytime soon. It’s using a lot of resources on the server (like high CPU, logical reads, memory grant, or TempDB allocations) and we want to kill it to free up those resources.
Someone left a transaction open. The person or app started a transaction, but their session is sleeping, and it hasn’t sent in a command in the last X seconds (or minutes, which should be a parameter.) Maybe it was a human in SSMS that forgot to write COMMIT or ROLLBACK, or maybe they started running a transaction and then locked their workstation to go to lunch. Maybe it was a poorly-written app that started a transaction, but then crashed in a frozen state. We should examine the query’s writes and locks to see how widespread they are, trying to figure out how bad it’s going to be if we kill their transaction and roll it back.
A query might already be rolling back. And if it’s been going on for longer than, say, 30 seconds, we might not wanna take any action at all. We might just simply wanna say, sorry, we can’t perform any more kills right now because there’s already something rolling back, and we’re concerned about the server’s situation.
A job is running at an unusual time. Perhaps a nightly index maintenance or CHECKDB job kept going into business hours, or perhaps someone’s manually running a reporting Agent job in the middle of the day.
A user is doing something they shouldn’t. Perhaps someone’s running a query in SSMS.
What do we want to happen?
- Log who’s running sp_kill, and when
- Log all of the running queries, their metrics, and query plans – so we can troubleshoot in more detail later
- Suggest which query to kill (and optionally, depending on the parameters, kill it for us)
- Log which query was killed, at what time
It should take these parameters:
- SPID (optional) – if we already know what spid we want to kill, and we’re going to log everything anyway
- LoginName, AppName, DatabaseName, HostName (all optional) – if we want to kill all queries from a specific login or database, like a troublesome user
- LeadBlockers (yes/no, optional) – if we just want to kill all lead blockers
- ExecuteKills (yes/no, default no) – if no, we’re just going to log everything and return a table with a list of kill commands for the user to manually execute. If yes, we’re going to perform the actual kills.
- OrderBy (duration, reads, writes, probably default duration, optional) – if you want to kill queries one by one until performance gets back to normal. Really only useful for ExecuteKills = no, for folks doing the killing manually.
- OutputDatabaseName, OutputSchemaName, OutputTableName (optional) – for logging what was killed, by whom (the caller), when, etc
These parameters were suggested by Vlad Drumea:
- SPIDState – S = only kill sleeping SPIDs, R = only kill running SPIDs, empty string = kill SPIDs regardless of state
- OmitLogin – Kill all SPIDs except ones belonging to the login name specified here, empty string = omit none
- HasOpenTran – If set to Y will target sessions with open transactions, can be combined with @SPIDState = ‘S’ to target sleeping sessions with opened transactions that might be caused by SET IMPLICIT_TRANSACTIONS ON. empty string (default) = 0 open transactions
- ReqOlderThanMin – Kill SPIDs whose last request start time is older than or equal to the value specified (in minutes)
Ideally, the queries should be extremely lightweight, like single threaded and use max memory grant hints, to make it as likely as possible that they can run even without using the DAC.
So, what other things should we think about as part of the design?
Related

Hi! I’m Brent Ozar.
I make Microsoft SQL Server go faster. I love teaching, travel, cars, and laughing. I’m based out of Las Vegas. He/him. I teach SQL Server training classes, or if you haven’t got time for the pain, I’m available for consulting too.
Get Free SQL Stuff
"*" indicates required fields


61 Comments. Leave new
Definitely a what if, especially if there is going to be massive rollback. Would be amused if there was analysis that could be done to say “you know what you’d be better of toggling the server and having crash recovery because you’ve 200Gb of log to undo”.
I’d also like it to “keep running” and give me updates on progress.
Oh and cherry on the top, send an email to the offending user with their P45 if its their 3rd time they ran a query you’ve had to kill.
Hahaha, the email.
Continuing to run is interesting – I think it might make more sense to call sp_BlitzFirst in that scenario instead, since it’s designed to give ongoing performance checks (like warnings about queries rolling back, numbers of blocked queries, threadpool issues, etc.) I don’t currently have an “ongoing” mode for sp_BlitzFirst, and I think that would make pretty good sense! I like it.
This is a neat idea!
I wrote something fairly basic a while back for mass killing of SPIDs, and, if it helps, I’ll add options for it here (I’ve excluded the ones you already cover):
SPIDState – S = only kill sleeping SPIDs, R = only kill running SPIDs, empty string = kill SPIDs regardless of state
OmitLogin – Kill all SPIDs except ones belonging to the login name specified here, empty string = omit none
ForDatabase – Kill only SPIDs hitting this database, empty string = all databases
HasOpenTran – If set to Y will target sessions with open transactions, can be combined with @SPIDState = ‘S’
to target sleeping sessions with opened transactions that might be caused by SET IMPLICIT_TRANSACTIONS ON.
empty string (default) = 0 open transactions
ReqOlderThanMin – Kill SPIDs whose last request start time is older than or equal to the value specified (in minutes)
Ooo, those are fantastic parameters! Good call!
Including the database name as a parameter could be really useful. In the past, I’ve run into situations where a database was stuck in single-user mode, and I couldn’t grab the single connection without shutting down all applications. It hasn’t happened recently, but having the ability to kill all connections for a specific database would definitely help in those scenarios.
Yes, that’ll definitely be a part of it! Optional filters for database name, app name, server name, login name, etc.
Default for no rollback – make it an explicit parameter that if whatever is being killed has writes, raise an exception that requires the admin to opt in to the rollback. It may get weird if there are spills but would require extra thought after the python developer that just thrashed the 30 Gb production database with 300 Gb of transactions.
I hear you, but if you’re calling something that kills queries, you should expect rollbacks. (Same with the original KILL command.)
Maybe include the TSQL to perform the kill, in the output table. This would be useful for manual execution when ExecuteKills=No, and great for auditing when ExecuteKills=Yes.
You know, I actually didn’t think about a table result – I was just going to dump them to the messages tab – but I think an output table makes more sense because I can imagine this being called from something. Good call.
The native “KILL with status only” command, does not seem to work most of the times. It would be awesome to have that one included, so we can track how long a rollback will take. Once again Gracias Brent!!
Unfortunately I’m not aware of a way to do that.
The procedure name should be sp_BlitzKill ( or sp_Blitzkrieg ?) in alignment with other FRK procedures., right?
A couple other situations that can cause the performance emergency: Threadpool is one. Exceeding the connection limit (the 32K user connection limit) is the other. So it should be lightweight enough to leverage the DAC, not that I want your target audience having that capability. Do we want the procedure to be able to issue a shutdown command as a preparatory step for a server restart and if so would it need to account for Windows or linux hosts?
I hadn’t even thought about the name yet, hahaha. Funny.
That’s a good point about the DAC – the queries should probably all be single threaded and use max memory grant hint percents to keep them lightweight.
I definitely don’t wanna do a shutdown though only because there’s no way to restart via T-SQL, only shutdown. I don’t wanna leave the server without it coming back up, heh.
I’ve needed to build processes similar to this and a common request is to exclude (or include) sessions that are running specific query text. Examples might be a snippet of code, or just a table/view name.
Even if a particular app or user is known for being a bit lax with COMMIT and ROLLBACKs, you can still protect some of their more vital stuff.
It might be a bit too cumbersome for sp_kill, but I ended up creating a user-accessible lookup/exclusion table to store rules (e.g., X appname, %X% sqlscript, rollback if running longer than X minutes) so key developers can control it even when I’m unreachable.
The ability to leverage environmental profile specifics to allow greater intelligence would be awesome. Would that additional administrative overhead incur a burden? Yes. Would it be worthwhile? Probably on a case by case situation.
Holy smokes – yeah, that’s interesting, but it’s pretty far beyond the scope of what I’d wanna do in v1. I try to avoid building stuff that requires a configuration table just because it means the query can only be run in a specific database, for example.
We have alerts and other mechanisms to help prevent this, but sometimes you gotta go Friday the 13 on sessions
Some of our common kill scenarios are
– tempdb transaction log is full due to a long running transaction ( have to find oldest transaction and kill that)
– DDL change happened on AG primary and the read-routed secondary has a session blocking the apply ( so kill the blocking)
– Queries that get a bad plan, but that takes the surgical gloves
– Any weird users inside the database doing bad queries, but good luck trying to genericize that
“Any weird users inside the database doing bad queries, but good luck trying to genericize that”
Easy on my part. Users only come through known applications (e.g. SSMS). So an option to kill all sessions from a certain application. Since I also only give read access to users we are good there. Maybe combine with an option for hostname as well. Consider making it an positive/negative filter – e.g. all sessions from app A except the ones coming from server x.
Kill all sessions from a specific app or host – sure, that part’s easy.
Positive/negative filter – that would be beyond what I’d want to code personally, but once the initial proc is done, that’s certainly something that we’d welcome a pull request for!
Longest running transactions make perfect sense, yeah, and same with lead blockers.
For anything destructive, I like to have a check to make sure it’s running where it’s supposed to be running. A check for the server name would be a good start.
I’m not sure how I’d know where things are supposed to be running though. Can you give me examples of the parameters you’d pass in?
What about adding an Ordering parameter to sort the generated KILL commands? For example, if your system is struggling w/ I/O, sort the resulting KILL commands from highest reads/writes to least. If your CPU has hit the ceiling, sort the resulting KILL commands from highest to least CPU consumed. If waiting tasks are through the roof, sort by number of blocked processes from most to least. That sort of thing.
Oh that’s really interesting! Because then you could kill them one at a time until the pain went away. I like it.
In the scenario with a read-only query getting a bad plan…if query store is enabled, could we have a parameter that uses sp_query_store_remove_plan (and potentially other query store procs) to clean that up in addition to killing the offending query? That wouldn’t really work unless THIS particular query doesn’t use typical parm values and there’s a history of good plans among several other “what ifs”. If it were ME, I’d not use this much because I’d manually manage this to have more control, but since the question was what to do if I wasn’t there…
Great – yeah, that’s pretty far beyond the scope of what I’d want to do – remember the target audience we’re thinking about.
Great Idea. This is exactly what we need at my manufacturing environment. Will it include to kill replication jobs Spids too which are taking longer time than usual e.g. Distribution Cleanup – Distribution?
Suren – interesting question. How would you identify them?
Only want to be called sp_BlitzKillEmAll
Call it whatever you want. As for my databases, it’s probably going to be called sp_Order66
I have two things to consider adding: 1) Checking for one user running the same query multiple times. 2) Sometimes a user knows they have a run-away query and need to have something killed. Would a parameter-based option to kill all spids for a specific user be too much?
Both of those make sense!
You should also include a DROP USER for those pesky SQL Management Studio Query users. And Python users where the default connection uses IMPLICIT TRANSACTIONs and they don’t think to COMMIT. No matter how often you tell them to change the connection properties, they code something new and forget.
I’m kidding (about the DROP USER).
I did once have an auto-kill procedure to boot those AWAITING COMMAND blockers after blocking for x minutes. We did ignore them until they start blocking for a certain length of time.
In general note the danger of SNAPSHOT ISOLATION and uncommitted transactions. Even without them blocking, sooner or later you are going to run out of tempdb version store space. I guess that is a condition to monitor and KILL them under the right circumstances.
I know it’s unlikely, but a “have we already tried to kill this and nothing is happening” status would be somewhat useful. I’ve had a couple of those pop up over the years where the only actual option we have is to bounce the instance. Happily, that’s an infrequent occurrence, but it has happened.
I appreciate the ideas above, as well as a table-driven option with the KILL command as a column we can grab as needed.
The more common occurrence we hit these days is in “sp_dbrestore” where the proc tries to kill open connections, but sometimes just can’t for various reasons or not in the time expected. I’ve resorted to forcing a db offline, doing the restore, then making sure the db is online (in case the restore attempt fails). It works, but is a bit of extra effort at times.
It’s good to log everything. Start sp_kill with parameters. Some kind of GUID so that the remaining entries can be combined, indicating that this is a single call. Because if someone calls it in parallel from another connection, you can link the log entries to what came from what. Start the kill call (with what parameters). Log that the kill has completed, so you can see the time. End the sp_kill procedure.
Of course, all errors. And the call with meaningless parameters.
A parameter whether to continue after an error or not, because, for example, Msg 6106, Level 16, State 2, Line 10 Process ID 999 is not an active process ID. It may or may not terminate sp_kill.
Nazwa sp_kill jest zaj?ta
https://www.sqlservercentral.com/scripts/sp_kill
Sry The name is taken
My goodness, it’s a shame names have to be unique. Signed, The Only Guy in the World Named Brent
A scenario I encounter is to do with linked servers.
If you kill the SPID on server A but it is waiting on a linked server query on server B, it wont kill until to also kill the server B query.
Perhaps a switch to kill linked server queries it is waiting on?
That’s really interesting. Should we treat them differently, or the same as any other query?
The same I guess?
It is the same query it’s just sql server is having to run another query on another server to fulfil the query on server A.
It makes no sense to keep it open plus it stops people having to hunt around
In case of “A read-only query got a bad query plan”, shall we remove the execution plan from the cache ? Otherwise It might be recall soon after killing the query
I am wondering if there is a way to bypass the strict permissions for the KILL statement. This would be useful in many scenarios I encounter. It doesn’t have to be a runtime parameter or option, but it would be helpful to expose the best way to make this possible in advance for non-sysadmin/processadmin users. In the environment I work in, which consists of small and medium-sized businesses, “fire-men ” can restart the server (which is easy for them), but logging in as a privileged user to the SQL instance is not.
Yes indeed! The documentation will need to explain the concept of certificate signing like we do for sp_BlitzFirst.
Lots of great suggestions so far. i second the request for a whatif parameter – if i were to run the kill command with these other parameters, what would i then be killing. It gives you way to fine tune the other parameters.
With the list of parameters already mentioned, a way to both include/exclude them would be nice (thinking a la Ola Hallengren’s backup solution) so the same parameter could be used to target a specific database, or leave out a specific database
About the whatif – that’s the ExecuteKills param.
for the just suggested kills:
provide the final kill command but keep in mind that I may run it > 5 minutes later, so it may be a good idea, if it checks, if it is still the same query / user / state / whatever
That’s an interesting challenge. What if the user specified to kill everything in one database, but new queries have popped up in other spids since we gave them the list of commands?
(And by the way, GREAT design consideration! This is a really fun one to think about.)
[…] Let’s Build a Better KILL. What Do We Need to Consider? (Brent Ozar) […]
A parameter to have sp_kill only target SELECT queries would be nice, so you can avoid killing queries writing data.
This gets a little tricky because I can’t identify what queries are doing in stored procedures, select-into, etc.
I have some code that I started on sort of like this: https://erikdarling.com/in-which-i-share-a-piece-of-code-that-im-not-proud-of/
Not sure if anything in there would be useful for what you’re working on, but there it is.
The one thing I’d probably consider around rollbacks is if ADR is active. It would still mean losing uncommitted data, but at least it would get lost quickly. Heh.
Never seen the SELECT 1/0 within an EXISTS? I know it works unless you run the select itself and get the divide by zero error.
Also, I tried some of the bits and it will select ALL the replication logreaders to kill. It appears the logreaders like to sleep with an open transaction.
Ooo, thanks sir!
I think building a new tool would be really useful as I can see sp_whoisactive has not had any updates for a while although it is always still a very handy tool and I am surprised how many people are not aware of it.
It would be handy if it was more useful with default/no parameters such as telling you which is the query causing the blocking as I find often that is not really understood or some applications I work with tend to leave lots of sleeping transactions open which are doing nothing but just making the list of what is currently running harder to work your way through. Then once there is blocking you end up with a few source blockers that are themselves being blocked by another query with only really ever 1 real culprit.
Telling you the source of the block would be of interest too such as if it is a transaction left open or a query inserting data with a lock and maybe a hyperlink that tells you more about why this causes a lock and how to resolve it if it keeps happening.
A nice short name is going to help here but I wonder if calling it sp_Kill, people might be scared to execute it in case it causes someone to suddenly collapse at the water cooler or the main system to go down. Perhaps sp_WhyIsItSlow!
This isn’t about investigation at all. This is literally about killing queries. You might have missed the first 5 paragraphs – reread those.
Sorry I did read them but I think it was more about the part where it says “suggest which query to kill” as I think with the previous tools that bit has not always been as instantly clear as it could have been.
I know sp_whoisactive has switches where you can order by the blocking but it is hard to remember the extra bits in an emergency.
Perhaps it can present the issues more clearly than past tools then provide output that says to proceed with recommended killing, run this command again with “Y” on the end. Saying that though I suppose the recommended killing could change between runs unless it has it stored in a table and storing the number of times this same process/report has been killed is useful.
Like all your tools and information though, I am sure it will be of great help and I am sure I will use and recommend it once it comes out in the new year! I always recommend your resources to IT departments in the colleges where I work.
Happy Christmas and thanks for all that you do.
* Add a switch or parameter that will only kill those queries that are NOT Insert, Update, or Delete queries.
* Add a parameter that only kills those queries that contain a specific bit of code or table name
Adding a hierarchical tree-view (similar to oracle) helps to identify head blockers.
Highlighting the Host Process ID (allowing to trace spid back to the application)