This week’s series on error handling has been an eye opener for many of us. We’ve laughed. We’ve cried. We’ve screamed in horror. We’ve read the documentation.
I don’t blame you. This is a really confusing topic, and for many of us, it’s the first time we’ve actually looked at the mechanics of how transactions, TRY/CATCH, and error handling work. I know it’s the first time because a while back, I asked you how often your newly written code checks for errors. Most of the time, we’re not doing it at all.
That was enlightening (and a bummer) for sure, but I followed it up with another poll. When we DO check for errors, where are we doing it – in a TRY/CATCH block, in some other form of T-SQL, or in the app code?
When you write new T-SQL queries today, where is the error handling done?
— Brent Ozar (@BrentO) December 24, 2021
When you think about the results, remember that this is nowhere near a valid survey – it’s just a quick question to open a discussion. This poll is actually why I wrote the blog post series – I had a pretty good hunch people didn’t understand how TRY/CATCH really works, and given that most of you aren’t using it, I figured I should talk about how complex it is.
The responders aren’t a wide sample. Most of my audience consists of database people, not pure application developers. Also keep in mind that if you’re writing C# code that accesses the database via Entity Framework or some other ORM, you’re not writing T-SQL queries either. In that case, error handling would obviously be in the app code, not TRY/CATCH or other forms of T-SQL.
I had to have the “I don’t check for errors” answer given how widespread the no-checking answers were on the prior poll. It helps remind the rest of my audience what kind of real-world code is out there, and it gives the YOLO crowd a way to answer without pestering me for replies.
Twitter polls only let you pick one answer, not mark-all-that-applies, and many folks responded with replies saying that they sometimes check in different places (or multiple places) depending on the kind of queries they’re writing. That’s great! And given what we saw with the complexity today, you can see why it’s helpful to try to catch errors in more than just one place.
As an industry, I think we’ve got some work to do around improving our handling of errors, transactions, and database failovers. I’m not saying every query needs to automatically retry deadlocks, handle database failovers, and survive out-of-drive-space issues, but there are a few easy things we can do to make our apps more graceful and fault-tolerant.
“I’m not saying every query needs to automatically retry deadlocks, handle database failovers, and survive out-of-drive-space issues”
For some reason, most of my clients seem to think “computers” just handle this sort of thing automatically. I kind of wish they did. Aren’t these the kinds of problems cloud databases were supposed to make go away? 😉
At one point, I tried to build out a template for stored procs to handle errors “appropriately” in SQL. I had read, re-read, and re-re-read Erland’s posts. I went so far as to create signed certificates for procs (security, dontcha know), created a COMMON database with all the logging procs there, and added a whole bunch of checks for transaction count, state, etc. I had the ability to toggle logging and verbosity. It was great. It was huge. It was such a PITA, because if you try to nest procs with transactions, it’s even worse. In the end, I moved away from it, because it just became too much, and no one could understand it all. If you go this route, I strongly suggest you do no nest store proc calls, because then you get into a big tangle of WTF is going on. And may the universe help you if you have to do something crazy like rollback some issues, but not others.
I really appreciated this “Quiz week” series. I always use try-catch in all my T-SQL, but didn’t realize the edge cases with XACT_ABORT and the need to do the BEGIN TRANSACTION inside the TRY block.
(Why would Microsoft make the XACT_ABORT default = “OFF”???)
Maybe next week a session on @@TRANCOUNT, and I can find out what I have been doing wrong there?
I try to follow the philosophy of try/catch/log for all our procs, for reporting procs that’s sufficient for us. For procs that change data we use vanilla (unnamed transactions) and try/catch/rollback/log. All our procs get created with set nocount on and set xact_abort on at the top and we have a scalar function that gets called in the catch block to record the error message, proc name, line number … etc. Most web applications where the sql interface call is in a try/catch will also detect the error and can take the appropriate action (OOPS! Page). About 90% of our code falls within these guidelines.
We have a very complicated chunk of legacy code that was written before 2005 which doesn’t do any of this. I’m afraid to touch it, and it runs on auto-commit (like SSMS). If something bad happens a business analyst emails me and we rerun the process or heaven forbid drag all the data into another environment and look for the problem.
For every new project we start now, the first question we ask ourselves as developers is how are we going to handle catching errors and reporting them. It makes life easier, but there’s always that legacy code that gives me indigestion.
Really enjoyed following along this series. Coming from mainly working in other databases, it was really helpful.
As one of the less pure application developers following Brent, most of my error handling is in the application layer (Java) as mentioned. The nice thing with the JDBC spec is that we have to have some level of error handling to compile our code, and frameworks like Spring Boot handles most error handling/transactions fairly well. Error handling is generally something I need to coach for all layers of the application stack.
We also have static analysis tools to help more gaps in error handling (far from perfect). Looking SonarQube (popular option that also “supports” T-SQL), I’m not seeing any error handling rules (https://rules.sonarsource.com/tsql). I’m wondering if there’s any other analysis tools that do check for lack of error handling in T-SQL? Or potentially how easy would it be generate some static analysis rules could help identify lack of basic error handling?
At the company that employs me, some database developers have set up a general procedure as a frame. That frame does, among other tasks, the exception handling. All the procedures that do the real work are dynamically called by the frame procedure. The developers of procedures do not need to care about error handling. The error handling is consistent which simplifies the monitoring. Changes in error handling can be done centralized.
The solution has been stable and reliant for about ten years.
The thing that really frustrates me is the inability to have the kind of error handling you detail in your posts with XACT_ABORT, and also do anything graceful or clever with errors. Ideally you want to be able to have some non-vital part of a process fail, and have the rest of the process just keep going (which seems like you can do it with nested transactions and savepoints and try…catch blocks), but that bascally becomes impossible once XACT_ABORT is on because once any error is raised the outer transaction becomes doomed and you have to roll the whole thing back.
I’ve been aware for a little while that I need to read more on the subject, and go through Erland Sommarskog’s work on the subject in a lot more detail (I’ve only skimmed it so far), but it’s not made it to the top of my professional development to do list yet.
Yeah, that is indeed frustrating. I wish TRY/CATCH worked for all error severities, and it sucks that we have to do the turkey sandwich approach.
But who’s got the time to read the documentation and do it right, amirite? No sense in that – best to just start writing mission-critical code without understanding how the product works. I’m sure you’ll get around to that someday. 😉
Part of the problem is that XACT_ABORT is not clearly mentioned in the online Microsoft TRY…CATCH documentation. I read the TRY…CATCH documents and thought I understood it, but XACT_ABORT is only mentioned in an example of how to use XACT_STATE. Something so critical to correct error handling should have been highlighted there.
AND – they show BEGIN TRANSACTION outside the TRY…CATCH block in their “Using Try…Catch in a Transaction” example. So, following Microsoft’s published online documentation leads to incorrect error handling.