Is NOLOCK Ever The Right Choice?

I’ve poked a lot of fun at NOLOCK

I mean, it sucks. You can get incorrect data, and it can throw errors if your data is particularly volatile. Generally, outside of testing a query or grabbing some sample data, I’m going to avoid it.

Crappy in every language

Crappy in every language

But what if…

What if it’s about all you can handle? Or your server can handle?

Picture a couple developers who started their app in the cloud, where they can’t get fancy with tempdb, fast disks aren’t in the budget yet, along with that beefier server with some extra RAM. They may not be able to turn on RCSI or SI at the drop of a hat; tempdb would keel over with the row versioning as part of a workload that already uses it pretty heavily.

They still need to run reports, either for users, or for higher ups at the company, and they can ask for them at any time. Caching the data when user activity is low and reporting against it when someone asks may raise some questions, like “why doesn’t my sales data show anything from today?”, or worse. You could invalidate the cache every X minutes, but that doesn’t help because then you need to re-run that reporting query every X minutes. That’s only moderately better than letting users query it at will.

Even with the right indexes in place for their workload, readers and writers will block each other, and long running reports can be painful.

What other options might they have? READPAST?

Informed choices

Ultimately, the goal should be to get your server, and your skills, to a place where they can handle optimistic isolation levels. Until you get there, you should know what NOLOCK actually does.

What everyone thinks: It doesn’t take out any locks
What it really does: Doesn’t respect other locks

This is where things go bad. You can get partial data, double data, and if you catch an update that gets executed as an insert and a delete, both versions of the row. That’s just for starters. And that’s tough news to break to people who just may well want accurate reports.

You might be able to get away with them just not noticing if things don’t line up. You might not. You could just ask them to re-run a report if they come to your desk with a hug full of dot matrix printer paper festooned with sticky note tabs, but if that happens enough times…

ISVs have it especially tough

Anyone with a credit card can install their software. Thinking about an application like kCura’s Relativity, where servers who have never met a DBA often end up with multiple terabytes of data, what would happen if a few cases got really lively? Picture tempdb with one file on a 100 GB C: drive. It ain’t pretty.

You can make all the best practice setup guides you want, but it doesn’t mean anyone’s going to follow them.

And so I ask you, dear reader

Are you ever okay with NOLOCK? Under what circumstances?

What options do DBA-less, development focused shops have?

Thanks for reading!

Brent says: I’m always okay with NOLOCK because I just pretend it’s another one of SQL Server’s incorrect results bugs.

, , ,
Previous Post
Hidden in SQL Server 2017 CTP v1.1: sys.dm_db_stats_histogram
Next Post
Ten Ways to Set MAXDOP

56 Comments. Leave new

  • Azure SQL DW (essentially Polybase) is READ UNCOMMITTED and that is all it supports.

    Most other “analytic platforms”, like HP Vertica, do this as well. I know your question wasn’t about these technologies, but frankly I don’t see the issue if you are aware of the architectural tradeoffs and you code your stuff to handle “dirty reads” as appropriate. Tricks like partition switching work wonders.

    I agree that you’d likely be nuts to do this on OTLP or a trickled-loaded DW where exact results were absolutely required.

    And it’s not just SI or RCSI tempdb issues forcing some shops from enabling it. Some older apps have transaction and concurrency semantics coded right in their procs. Too much for a reply but you can spot these code smells after a while.

  • I work in the IPTV world where data is constantly being updated and changed. There are times when we need to hit the production database to gather some quick information, and since TV is a vital piece to many people any interruption would not be welcomed. We wouldn’t want to interrupt someones purchase of a VOD while running a select query, so we want to ignore any other locks utilizing with (nolock) in these cases.

    • Have you considered using an optimistic isolation level instead? That sounds like a perfect use case for it, where you don’t want readers and writers blocking each other, and you probably want correct results, no?

  • I’ve seen (and built) reporting databases that were read-only. That seems like one of the few times where a (nolock) hint would be fine – no data is changing.

  • I still want to know why this NOLOCK was invented in the first place. If using it is a bad idea or not needed, where did it come from??

    • Well, rubbing alcohol is also useful, but you’re not supposed to mix it with a bunch of Tang and drink it to impress girls.

    • NOLOCK was more prevalent prior to the introduction of SI and RCSI when there was a risk of readers blocking writers.

  • I’m embarrassed for my shop to tell our nolock story

  • I’m not a big fan of NOLOCK but I have used it in some data warehouse scenario. I wouldn’t use it to get around blocking but to allow the storage engine to use an allocation order scan and push the SAN as much as possible with large read-ahead I/O requests. However, I would only do this if I knew 100% that the table being queried was idle.

  • – Anytime that we know the table is not hit with UPDATES/INSERTS during a certain time period.
    – Anytime you want to get the structure of the table and some sample but meaningful data for non critical tasks such as data preview.
    – Anytime that you are testing a query syntax, connectivity, or simply if a certain type of data exits.

    That’s all that comes to mind.

    • Same question as up above: if no modifications are happening, why do you need it?

      • Tony Robinson
        August 30, 2019 9:45 am

        Because it speeds up table reads. With NOLOCK on, my 25GB table reads in about 30 seconds, with NOLOCK off, the same read takes over 30 minutes. On very large tables, locking and unlocking each page of data during a select becomes a huge overhead.

        • Tony, seriously, hit the brakes on the comments. You’re missing something else obvious on your system. I can do a 25GB read in much less than 30 minutes – something else is going on on your system. If you’d like help, definitely click Consulting at the top of the site and we’d be glad to help.

  • I use READUNCOMMITTED to check on operations of badly written/very long running processes that keep transactions open for way too long. I never use NOLOCK. I have explained to everybody I know that these two do the same thing, but that READUNCOMMITTED is more honest about what it does, while NOLOCK is extremely misleading.

  • In my old company, we imported into the system 35 files. Each file had data which was by nature of the file, constrained to the file. Even though the database was multi-tenant database, a company within the database always came in on the same file everyday with newer credit card transactions. The company never crossed into other companies in the database or on the files, so knowing how the data worked and was processed, we used NOLOCK to process the data and push it on to other systems while importing the next file.

  • We use nolock when selecting from a highly volatile central error log table. We only need a few of the latest errors, and I have too many developers hitting it all the time with a select *. ReadUncommitted is probably a better choice, and I would like to send it out to all of their SSMS with a GPO… did M$ ever provide that tool?

  • I have a ISV application that has been up-sized from Access. The latest version finally dumped Access 2002 as the reporting engine. They tell us they use nolock hints to speed up reporting. Everything is hitting one database. It is Standard version so no partitioning and we expect current results from our reports. And the data updates are also accessing data that might be changing somewhere else.

    • This sounds like another great use case for optimistic locking. You trade wrong data with nolock for data that’s potentially a transaction behind with optimistic locking. You can even limit it to just the reporting queries with Snapshot.

  • My company relies upon a 3rd party developed database that heavily utilizes NOLOCK. I haven’t been able to get a good answer as to why this is. When I am developing custom code to work within this database (mostly for BI functionality within the application) should I use NOLOCK just for code consistency or abandon it as generally a bad practice?

    • If it’s a third party app, your choices are limited. Find out if they support RCSI. If they don’t, Snapshot Isolation might be an option, since you can limit it to just your queries. But you need to be able to test that thoroughly for both your code and your hardware. Check out the link up in the post about RCSI and SI for more information.

  • Well, I have stumbled accross some nolock stories as well. People thought it would be a good idea to get rid of some nasty deadlocks by applying NOLOCK hints. But essentially that’s misleaded as Erik wrote “What everyone thinks: It doesn’t take out any locks
    What it really does: Doesn’t respect other locks”. The worst argument for applying NOLOCK I heard was “We are running a web application thus it’s perfectly normal to show inconsistent data. Just let the users click the refresh button if they don’t like it.”

  • By the way we use RCSI already for a very long time.

  • Well, I am CONSIDERING using it, though to be fair, there may be other ways to perform this and my ignorance is pushing me towards NOLOCK. I have a few specific tables in my warehouse that are using Data Vault structures/methodology and are thus INSERT only (never any updates). The ETL in DV is very parallelizable so there’s a case when I may have two or more different sources wanting to insert business key data into the same HUB at the same time. My thinking is, if they are both running at the same time and trying to insert key “ABC” into the same table, they will both check to see if ABC exists and then both try to insert ABC. One will win and one will throw a constraint violation. I’m was thinking about NOLOCK to accomplish that because of some examples I read in a DV book way back when, though reading through the comments here, perhaps READ UNCOMMITTED is a better option?

    • You should read more carefully. There is no difference between nolock and read uncommitted.

  • We use NOLOCK on our simple monitoring for humans. We have to run a query across our tables to see if we have certain data yet, but don’t want to break our running applications. If the query errors out or gets bad data it just runs again in a couple minutes. Worst thing that happens is we get a false alert.

  • I worked for a large software development corporation in Redmond, WA where corporate DBA’s often require NOLOCK for all queries and ban those who do not comply. Typically the contract “DBAs” either have no clue how to monitor performance or have a job description that does not include this specific task. Inaccurate results tended to increase with the size of the row set returned by a query.

  • Austin SqlPowers
    December 20, 2016 1:49 pm

    Coming from non-sqlserver environments, its always interesting to see what topics are hot buttons for DBAs on a specific database technology. I find nolock hint one of those topics.

    It seems to me the response is largely dependent upon one’s point of view. If you maintain OLTP systems, the response is naturally negative. However in the data warehousing world where updates are highly controlled you will likely get a very different perspective. Both are actually correct. Nolock feature is by itself like any other tool – it can be a tool used for good or for evil. Good would be when you have a controlled update scenario where you are reading from say some stage tables that you know are no longer being updated when you read them. Dr Evil however likes to use nolock in a live OLTP environment causing integrity issues.

    The following link provides a nice summary of a simple performance test of nolock and some pretty emotional comments in response.http://www.sqlservercentral.com/articles/Performance+Tuning/2764/

  • When to use nolock:
    Getting schema info on the dev server when some fool has left a 3 hr select into running and touching the GUI table list in ssms hangs then times out, becase all Metadata queries are blocked waiting on the Sch-M.
    Using the reporting replica when someone has a large offline index rebuild (which locks the entire table).

    Other possible failure mechanisms :
    you don’t need a delete/insert mechanism to get duplicate results, you just need a page split to occur at the wrong time (and even deletes can theoretically cause a page split).
    A final possible result is rows that do not match your query being returned (plan is index seek/scan, then clustered/heap row lookup; a write can land between those).

  • Katherine Villyard
    December 20, 2016 2:50 pm

    I’ve used nolock for a specific purpose: Pulling older data from a logging database with nothing but sequential writes (like, it’s December 22 and I’m only pulling November 1-30) and writing to a second database to store locally (and maybe run the odd aggregate query against). I’m okay with using nolock for that.

    • Well, yeah, sure, you know what you’re getting and why. That’s not how most people approach nolock!

  • Sadly where I work it is the enforced standard that nolock *must* be used on every table / view in every query everywhere, period. Oh and every process also has to include SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED. You know, just in case…Something. sigh.

    • Are they unaware of other options?

    • Sadly even more, that makes no sense. Tell them they’re violating ACID principles of database technology and are subjecting themselves to bad information. In other words, tell them that they must not care about accuracy because what they’re getting isn’t guaranteed to be accurate. And if they don’t have any clue what you’re talking about then its time to find another job.

    • Steve Rezhener
      August 23, 2017 6:31 pm

      Tony, perhaps we used to work at the same HELL place! 🙂
      We had to use a NOLOCK as a best practices standard (no. 1 in a peer review). I couldn’t talk them out of the no locking of the NOLOCKing, so I tried to save keystrokes for the Cold Fusion developers that were writing the procs and just use UNCOMMITTED. Got even worse. The same DBA that sold them on using NOLOCK everywhere wasn’t convinced that this is the same and the poor dev souls ended up writing both, NOLOCK and UNCOMMITTED.

  • I really like this summary, it gets right to the nub of the problem

    “What everyone thinks: It doesn’t take out any locks
    What it really does: Doesn’t respect other locks”

    NoLock is still out there in the wild scattered like confetti. Actually I think Microsoft missed a trick when it brought out RCSI which is a way to coerce transactions running as read committed to run under optimistic versioning concurrency instead. But by the time this came out way back when it was praxis to solve the locking and blocking problems with the liberal use of the NOLOCK confetti. What was needed was an extra layer in the transaction isolation model – let’s call it – RUCSI – which would coerce NoLock / ReadUncommitted transaction isolation transactions over to the optimisitic versioning transaction isolation level.

  • Excellent article and so close to home. We have an international system that has a very basic goal. It is fraught with no lock statements. And I mean everywhere … from reports to inline SQL code. I have sent e-mails to the main developer regarding the issues but to no avail (I am just a supporting developer). He also doesn’t believe me that indexes are good idea, as he says it cause more locking on the system. Not surprisingly data is lost and many WTF moments ensues from the users. Obviously the main developer can’t explain what happened (He also designed the WIndows application with an MVC model and believe he is writing web forms, and every new feautre has at least four fixes due to OO design applied wrong… but I digress). I have even pointed him to the design considerations of our own home grown production line system with extreme transaction throughput for multiple lines which was properly designed with indexing and hardly a no lock statement can be found. Woe is me…

  • Excellent article, thank you! Our devs used NOLOCK extensively on reports and application queries, as a cheap and quick trick to “tune” them (especially as they didn’t really know better). I managed to prove them that:
    1) their queries were running as slow with the NOLOCK as without it (if not slower in some bizarre cases)
    2) all you need usually to make the same query run 10 to 100-fold faster is one or 2 well chosen additional indexes (which hardly takes more effort or time than spamming NOLOCK everywhere then wait for your query to finish)
    3) even with NOLOCK making the query performance shine, it doesn’t scale up that well anyway
    Now we have correct, faster reports as well as a fast application, an SQL server under less pressure and every day fewer NOLOCKs in our code.

    The other “cheap” solution we considered for reports that would be pretty slow even with a nicely tuned up query, is maintaining a copy of the database in standby (restore “with standby” instead of “with no recovery”), with log shipping to keep it up-to-date every 10min. Since the whole database is read-only, a report that used to run for 2min in production would run in 20s in the restored database. You even get repeatable reads! One needs to handle reports trying to run while the transaction log is being restored, but it is otherwise a nice emulation of a near real-time read-only secondary without having to build a second server.

  • Alex Friedman
    January 9, 2017 7:20 am

    While I always advise to avoid NOLOCK, and fully understand all the possible consequences if you do use it, there are two general cases where it can be acceptable IMO:

    1. When you need to run a large report-style query and ballpark figures are good enough (and there’s no readable secondary replica…).

    2. When one of the applications using the DB iteratively runs a large report-style query to get an initial population to work on, and then goes on to do validations and actions on them one by one (without NOLOCK). So if it gets bad data or misses data it will fail on validations and/or be fetched in the next bulk.

    Thoughts?

    • 1. I agree. Not everyone is writing HIPAA queries or general ledger reports. For many use cases, “close enough” is good enough. Know your use case.

      2. is common pattern in apps written before RCSI where you couldn’t afford to take the locks early during a long-running process. Once/if you move to RCSI then rewriting these processes usually yields good perf improvements. But this is also a big reason cited for why an app CANNOT move to RCSI, which is bunk.

    • Wayne Hamberg
      June 19, 2018 3:23 pm

      How often do people work with entire HUGE datasets that are continuously changing? SO I ASK WHY ARE YOU QUERYING THE ENTIRE COLLECTION. Simply work with smaller collections. If you are querying 1 billion rows and expecting nothing to change might as well to expect all the politicians in world to agree that the sky is blue.

    • Tony Robinson
      August 30, 2019 9:39 am

      You say “ballpark figures” like using NOLOCK turns your data into complete garbage. It doesn’t. At the very worst you’ll get one extra row in your data, A row that you would have got anyway if you ran your query a few seconds later, or vice versa, you’ll miss out on a row that was just written as you were running the query, which you would have missed anyway if you ran it a few seconds earlier.

      But you shouldn’t be doing this anyway! If your boss wants to know, TO THE EXACT UNIT, how many widgets you’ve produced so far today, and the shop floor is still producing widgets right this second, then that’s a dumb question. He won’t be asking that, he’ll be asking how many were produced yesterday, or in the last hour, or in the last minute. Everyone understands that running calculations on live data which is changing before your very eyes is not going to get you anywhere…

      The only use-case I could come up with is this: You run a large factory producing widgets, your target is 10’000 widgets per shift, and each shift lasts 10 hours. If you miss your target, the world explodes.
      You need to know at any given moment how many widgets have been produced, so that you know if you’re running ahead of or behind schedule, so you get your SQL guy to interface with the live production DB and run a query on the “WidgetProduction” table and show a live count of how many widgets are produced, in real time.
      Your query runs every second, and updates a number shown on a big board in the middle of the factory floor. At exactly 9:45:03:000 the factory produced 3,042 widgets, but your query catches a row just in the process of being written at 9:45:03:100, and incorrectly shows 3,043. It remains incorrect for an entire second, when it runs again and updates to 3,056.
      I think the factory manager should probably be OK with this level of inaccuracy, and it’s arguable even whether it really was inaccurate.

      I struggle to think of a single use-case where using NOLOCK would cause a genuine issue.

      • Tony, Tony, Tony. Boy have I got some heartbreaking news for you. NOLOCK can be off by way, way more than one row:

        https://www.brentozar.com/archive/2019/08/but-nolock-is-okay-when-the-data-isnt-changing-right/

        • Tony Robinson
          August 30, 2019 9:56 am

          Only in theoretical situations. As I say I’ve been using it for more than a decade and never had any real issues whatsoever. I don’t doubt you can fabricate a situation using test data where it causes a problem, but for real-world queries it really doesn’t.

          even sp_who2 uses NOLOCK.

          In 99% of production environments, you have the tables being written to by automated processes separated out from the tables that you run queries on and use. Generally by three or four layers of separation, more often than not by totally separated hosts and servers.

          EG in the office I’m sitting in right now, all the customer payments get taken by an online card processing module, this writes transactions out to a payments table in real-time, and those get merged into an orders table every minute or so. Overnight, data from the orders table is moved into an OrderHistory table, and that OrderHistory table is copied over to the reporting database.

          This is an extremely common type of reporting setup. In this scenario, the only time a query on OrderHistory can ever be impacted by changing data is during the overnight job that copies the data. So as long as I’m not running my query at 3am, I won’t ever see any bad data. And actually I’d have to be extremely unlucky even if I was running it at 3am!

  • Wayne Hamberg
    June 19, 2018 3:11 pm

    I did a quick check on several of the stored procedures on our databases and I was amazed on how many system databases use NOLOCK. Even sp_who2. I have been taught not to use WITH (NOLOCK) for 20 years and I’m surprised on how frequently I still see NOLOCK.

    • Erik Darling
      June 19, 2018 3:13 pm

      Wayne — for DMV queries it’s a bit different. They’re not guaranteed to be point in time consistent anyway, and you don’t want to get held up behind modifications to them. This can be especially rough with index DMVs, where they can be heavily modified on busy systems.

  • Tony Robinson
    August 30, 2019 9:25 am

    I don’t understand the WITH(NOLOCK) hate.

    I’ve worked as a data analyst for more than 10 years, using SQL Server and various other flavours of SQL.
    I’ve used WITH(NOLOCK) for pretty much all my queries for the same amount of time.
    I’ve never, ever, ever come across a situation where using NOLOCK caused any kind of issue in my data. I’ve never had any missing or extra rows in my data because of it, never seen the “Could not continue scan” error you show above or had anything other than better SELECT performance from using it.
    On the very rare occasions that I do actually need to query live transactional tables where ROLLBACK and COMMIT are actually being used (I can count the instances of this on one hand), then sure, I didn’t use NOLOCK. But for general purpose data-mining, analysis, report building, whatever, it’s an indispensable performance tweak. My users are *ABSOLLUTELY NOT* prepared to sit there for 25 minutes while a report runs just in case someone happens to be changing a row in it, vs 2 minutes for a report with NOLOCK…

    On the (very large) tables I query, this single table hint is the difference between a query taking 2 minutes or 45 minutes.
    If I were to remove all the WITH(NOLOCK) from my overnight table-building jobs, they wouldn’t finish inside the maintenance window!

    Sure, if you’re building queries that run as part of software dealing with transactions, the type of queries that are triggered multiple times per second and generally return one or two rows or whatever, then NOLOCK probably isn’t appropriate for you. But for the vast majority of us SQL Server users, writing queries against large datasets in order to produce reporting and insight data, NOLOCK makes our lives so much better as we’re not waiting around for hours on end for the database to lock and unlock pages 5 million times, just in case someone were to write a row of data!

Menu
{"cart_token":"","hash":"","cart_data":""}