“Not to write any procedure over 50 lines”

In Joe Celko’s Stairway to Database Design series, he writes:

The rules of thumb for T-SQL are not to write any procedure over 50 lines (one page)

This seems so radical at first, but it has precedence in NASA’s 10 Rules for Developing Safety-Critical Code:

  1. Avoid complex flow constructs, such as goto and recursion.
  2. All loops must have fixed bounds. This prevents runaway code.
  3. Avoid heap memory allocation.
  4. Restrict functions to a single printed page.
  5. Use a minimum of two runtime assertions per function.
  6. Restrict the scope of data to the smallest possible.
  7. Check the return value of all non-void functions, or cast to void to indicate the return value is useless.
  8. Use the preprocessor sparingly.
  9. Limit pointer use to a single dereference, and do not use function pointers.
  10. Compile with all possible warnings active; all warnings should then be addressed before release of the software.

See that #4? If your T-SQL was safety-critical – if folks’ lives depended on the accuracy and speed of your query results – then NASA would suggest you break it down.

Don’t get me wrong – I can’t possibly work that way, nor do I think many of you can work that way either. As I write this, the current sp_Blitz is 9,210 lines long. sp_Blitz is a great example of something that, were it to be mission-critical, someone should refactor it into much smaller, more easily testable chunks. But I still love the 50-line suggestion because it gives us pause before we tack another hundred lines onto an already-ginormous spaghetti monster.

Previous Post
Upcoming Free Webcast: 500-Level Career Guide: Building Your Brand
Next Post
Updated First Responder Kit and Consultant Toolkit for September 2020

22 Comments. Leave new

  • Whew, thank goodness carriage returns are frequently optional.

    Reply
  • This calls that. That calls another. Another calls yet another. Yet another calls just another. Just another calls are we done yet? How many nests can one bird have?
    🙂

    Reply
    • It’s turtles all the way down….

      Reply
      • If talking Turtles all the way down, include a reference to Sound & Fury!
        On subject, when the business logic is in the procs the nesting within nesting of logic can be challenging. Currently supporting 3rd party business logic in procs and failing real world performance.

        Reply
  • One page you say? Alright time to get a 36″ plotter
    https://store.hp.com/us/en/pdp/hp-designjet-t830-36-in-multifunction-printer

    Reply
  • As Joe Celko says in his original SQL for Smarties; Normalize until it hurts, de-normalize until it works. I think the same can be said for spinning out webs of procedure call confusion. I use that as a goal (I use 60 lines – meh) but don’t force myself into knots to always achieve it.

    Having spent most of my career maintaining other people’s code I’ve found this “limit to so many lines” coding style to be a double edged sword. It’s difficult to reverse engineer (Documentation? HA!) the specific code you need to enhance, correct or blow up when you’re having to traverse this procedural mine field. But once you find it, it’s oh so nice when it comes time to retest and deploy, hoping that some other little nugget of logic (it’s always those damn reused variables that get ya!) doesn’t rear it’s head up so that now you get to play whack a mole with the code trying to keep the introduced bugs away.

    Reply
  • I’m struggling with this a bit – I understand the premise, but safety-critical systems tend to be written in very verbose languages like cobol, fortran, pascal, assembly etc. Associated databases will be highly normalized necessitating highly verbose queries. Admittedly I have never worked on something like this, but do support one extremely complex application where related data are written in several separate procedures for performance reasons and ease of troubleshooting by devs who frequently make updates to the application. There are times when a set of writes don’t complete, normally due to a novel form of bad data that is semantically correct, but logically incorrect. No one is at risk of dying, but confuses the hell out of anyone that sees it in the app. The app could write it to a staging table to validate first, but at some point all of it will have to go through and there is opportunity for it to fail after staging as well

    Reply
  • Years ago I wrote a monster in PL/SQL for an overnight batch process because the App layer (VIisual Basic) at the time (2005 I think) could not perform the same (so I was told). If it had been 5 years later I would have pushed back and told them to write it in the App layer (C# using .Net 3.5). Why? Well, by that time the Devs had adopted a test driven development framework with build servers and managed source code. This would have allowed any enhancements to be rapidly tested through the automated test suite thus lowering the cost of ownership and (arguably) delivering something more robust than what I could roll and test within the same time period. Business logic encapsulated in database objects such as functions, triggers, and procs should really be avoided in the first place. Database maintenance, diagnostics, metrics collection i.e. non business logic type stuff is fine cos it’s the DBAs who will be owning and looking after that stuff as part of their job.

    Reply
    • Fucntions and triggers – absolutely. But won’t all the queries appear as poor performing ad-hoc queries if they aren’t run in stored procedures? Optimize for adhoc workloads helps a little bit but can’t perfectly fix it. running it all in the app layer also allows the app devs to get away with unacceptable code

      Reply
      • We used to shake out poorly code in quarterly non-functional performance testing or via Client ticket if it got out before then. Agreed: sp’s can help with plan re-use but that has an ownership overhead if using ORP/ORM.

        Reply
  • I’m on board. [stares malevolently at 20,000+ line stored proc with “recompile” on]

    Reply
  • T-SQL is really bad for writing classic “clear” code. It’s either beautiful or effective. Exclusive.
    Functions? Almost always it’s bad idea by performance reason. So I can’t reuse them for “identical” or “similar” CTE. I can not reuse scalar expressions.
    Context. All modern languages reduce context to the minimum. I can not pass effectively and safe temp tables to another proc. Every bit of code almost always depends on whole database and server (tempdb) current context.
    Procedure head, start transaction, commit transaction and accurate exception handling – hey! – all that stuff take 20-30 lines.
    Even in “simple” queries I can not reuse code (dynamic filter for “WHERE id = @id or @id is null” is really bad idea), I MUST write different queries.

    I just have to accept this if I want to write effective TSQL.

    Reply
  • Not sure where to report this, but the page header seem to be out of its place; There is a gap size of 2 text lines between the page top and the header.

    Sorry if this is not the right place to report these things.

    Reply
  • Wojciech Sawicki
    September 10, 2020 11:19 pm

    If I remember correctly 50 lines it had a text monitor. Back then, the procedure was on one screen. Now I apply the principle – it is understandable in one reading.

    Reply
  • Even a simple well formated SELECT will often exceed the 50 line “limit”, so you would either put all selected columns / all wheres / every JOIN with its x condition in a single line, which makes it almost unreadable or ignore this rule.

    But maybe this rule is only bad defined and it should be “not more than 50 statements in a function” (since in many languages the most statements are single-lined) – this is something that could be possible.

    On the other hand there is still the problem with bypassing data from one to another procedure / function beside those slow, ugly @tempTables with user defined table types or global ##tempTables, where you have to ensure, that nobody else is running a concurrent session of your procedure or “temporary” tables in your original database which can cause problems with permissions, log backups, locks and of course again concurrency.

    I’m using the SSMSBoost Addin, which supports even in its free version (named) –#region / –#endregion sections which allows me to structure and collapse / unfold my code inside my code and increases the readability / managementability massive (you don’t have to page down through 300 lines of preparation statements when they are all collapsed)

    Reply
  • Good in theory, problematic in practice. It’s like the Ten Commandments. “Thou shalt not bear false witness” is all fine and dandy until your wife asks, “Do these jeans make me look fat?”.

    Reply
  • It’s intended to make you consider what you’re coding. I really like Uncle Bob’s SOLID principles, and try to find use for them in SQL. The ‘S’ingle responsibility is addressed here, I think. Strive to make your code do one thing (only one reason to change it). It’s not always easy… sometimes you need to perform multiple operations on a single recordset, and SQL doesn’t make it easy to pass recordsets around. But it’s a guide – a suggestion. And I think a good one.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}