Page Life Expectancy Doesn’t Mean Jack, and You Should Stop Looking At It.

Page Life Expectancy is a Perfmon counter that’s supposed to track how long pages stay in memory before they get flushed out to make room for other pages that are needed instead.

Paul Randal has blogged this a few times, and rather than rehash ’em, I’d rather point you to a couple of his roundups:

And layer in a few of my own observations for context:

Page Life Expectancy goes up 1 for each second that you’re not under memory pressure. Restart your SQL Server instance and watch PLE: it starts at 1, and goes up by 1 for each second of uptime. 5 minutes of uptime = PLE 300. For the first 5 minutes, it’s not like your server’s under memory pressure – it just woke up, for crying out loud. Give him a minute. Well, 15-20, I guess, because PLE is near useless during that time span.

Page Life Expectancy drops can be triggered by confusing operations. By default, any one running query can get a memory grant the size of 25% of your buffer pool. Run a few of those queries at the same time, and your buffer pool gets drained – but PLE doesn’t necessarily drop. However, the instant an unrelated query runs and needs to get data that isn’t cached in RAM, your PLE will drop catastrophically. Which queries are at fault? The queries getting large grants, or the queries doing reads? “Yes.”

Page Life Expectancy is a lagging indicator. Lagging indicators are something that tell you about an emergency long after the emergency has happened, and lagging indicators don’t recover quickly after the emergency is over. When you combine the above two problems – PLE only rising 1 per second, and PLE dropping at times that aren’t necessarily tied to the dropping buffer pool – then if you’re alerting based on low PLE numbers, you could have already missed the emergency. When you go log in and look for what long-running queries are running, it’s already too late. Instead, you should be using leading indicators: things that tell you a problem is coming up.

With that stuff in mind, I’ve removed the Page Life Expectancy warnings from sp_BlitzFirst altogether. I think they’re distracting y’all from the real leading indicator, wait stats – and of course, sp_BlitzFirst shows that, so I’d rather focus your attention there.

Previous Post
What Was the Worst Database Decision You Ever Made?
Next Post
How Doug Lowing Gets the Most Out of His Live Class Season Pass

20 Comments. Leave new

  • Say it isn’t so!!!!!
    Next thing you’ll say is I shouldn’t be looking at disk queue length and dividing it by the number of spindles in my RAID-5 array

    Reply
  • I do wish “best practice” came with a BBE date.

    Reply
  • Grzegorz ?yp
    June 17, 2020 8:08 am

    What should be used instead PLE to see memory pressure problem?

    Reply
  • “Page Life Expectancy Low” Finding is removed? Please no.

    I have inherited a lot of old apps and procedures from former sql admins which I will never understand. So when I find a “Page Life Expectancy Low” issue I thorow RAM at the server.

    Reply
  • Jeffrey Mergler
    June 17, 2020 9:35 am

    No complaints here, I didn’t even notice PLE was in BlitzFirst and there are plenty of scripts which give you PLE.

    Reply
  • Vince Thomas
    June 17, 2020 11:01 am

    I agree. When we switched over to very fast SSD drives a couple years ago, I stopped worrying about PLE.

    Reply
  • I remember, back in the day, it was all “buffer cache hit ratio is great”, then “no, BCHR is garbage – use PLE – its great”. Now “PLE is garbage use waitstats”. Am I starting to sound old??? ?

    Reply
  • I take the point about PLE being a lagging indicator but so are wait stats aren’t they?

    We have PLE checks but they look at the direction of travel not the values. If it is going up or flat with ‘high enough’ value we don’t generate alerts. If it is flat with a low value or falling then we need to take a look.

    Reply
  • The good guys of Solarwinds should read this. 🙂

    Reply
    • Well, don’t blame monitoring tools: when I worked for Quest, we had that same discussion, but the product management team said, “We can’t remove that metric. Too many DBAs *THINK* it’s a useful metric, and if we don’t show it, they’ll think our tool sucks.” So I get why the vendors have to put it on there.

      Reply
  • Haha yeah i don’t 🙂 Still love Solarwinds!

    Reply
  • Heyo Brent, ironically timed article because I was just asking about this on StackExchange two weeks ago. One of the first things I looked into we’re wait stats for any indication of memory pressure. I noticed my top 2 wait stats were: “MEMORY_ALLOCATION_EXT” and “RESERVED_MEMORY_ALLOCATION_EXT”. Are these any indication of memory pressure or I should be looking for other specific wait stats? (These two were my highest wait types by an order of magnitude more than the next highest.) Thanks.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}