Office Hours Speed Round, Text Edition

Got questions for me? Post ’em at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. I filter out the ones that are too short for video answers, and here’s the latest batch:

Corrupted: Should DBCC CheckDB be run on secondary replicas in AG as well ? Is it recommended to attach corrupted database to Prod server, to check how your integrity check job will react ?

You should run it anywhere you want to fail over to. Read this.

Khaled Budajaja: Brent Ozar Office Hours is more informative then many hours spent on reading blogs and technical articles. Did you create this? I don’t recall any body else doing office hours. Many thanks Brent

Aww, thanks. It’s definitely not my idea though – it’s common for college professors to have open office hours.

David: Where can I find information about the master and msdb databases similar to what you present in Fundamentals of TempDB?

What are the pain points you’re having with those databases that you need to solve? Leave ’em in the comments.

Bill Hicks still lives: God created the Earth in 7 days. Do you think it’s possible to migrate from Azure SQL Database to CosmosDB or Table Storage in 7 days? Apart from the joke have you ever done that and what are the pros and cons? The goal is to have reports go faster.

CosmosDB or Table Storage for fast reports?!? That is not even wrong.

Unut: Is indexing bit fields like “IsActive” on large tables a futile effort when the large majority of records are active?

Generally, single-column indexes are less effective than multi-column indexes, regardless of the data type.

Khepri: Does FORCE ORDER query hint ever have any practical uses?

Yes, if you’re low on time, and it gets you across the finish line quickly, and you’re willing to live with the technical debt.

Not_a_DBA_but_I_play_one_on_tv: I restore a 600gb database backup to a test server every morning. Typically the restore takes 45 minutes to run. One day last week, it only took 15 minutes. Everything appeared to have run correctly, size didn’t change. What could cause that one restore to run quicker?

Less competition for shared hardware resources.

Unut: What types of issues do you like to use the SQL default trace for?

Check out the sp_Blitz source code.

James Adams: What are the powershell’ish things a senior SQL DBA should know?

I stopped teaching production database administration work years ago because I focus on development DBA work now. I’d start with DBAtools.io. I will note, though, that it’s interesting that a lot of these questions are still about production database administration. I just don’t wanna do that work anymore – I don’t find it enjoyable because it’s basically on-call break/fix stuff. Like, I could go the rest of my life without getting woken up because somebody ran a server out of drive space. It’s 2022. Quit treating drive space like it’s unobtainium.

sqldeo: Hi Brent Any suggestion on tempdb growth,not clearing the space even though developer dropping the temp table after doing ETL stuff SQl 2019, initials size same for all 8 files, auto growth enabled. db are aprox 5 TB in total size, snapshot iso. enable for 1 db. any suggestion?

Attend my Fundamentals of TempDB class.

T: Brent, currently we have SQL Server Failover Cluster with 2 nodes. We are planning to migrate it to Availability Group solution. Do we need to reinstall instances on both nodes or can we reuse it ?

Before going live on any HA/DR platform, you should test it repeatedly. That means you should build a new environment from scratch and test the bejeezus out of it before you go live.

Murad: What is the widest varchar column you would ever consider as a key for a new non-clustered index?

I wouldn’t set limits on that without knowing more about the business problem we’re facing. I can totally imagine scenarios where I’d index a couple of NVARCHAR(2000) columns, for example.

Milind: Hi Brent, how are you? How to do 2.5M+ records update without table lock? It is about account balance limit reset overnight daily. During midnight schedule, other processes are getting hampered due to table lock. How to do such operation without impact and possibly quick? Thxs

Read this series from Michael J. Swart.

Mekhit: Is there a recommended way to identify the hot (or not) columns for a given NC index?

There’s no instrumentation in SQL Server for that. You’ll be forced to use common sense. I know, terrifying, right?

Sothis: What are the signs that a given operator’s estimated number of rows was a hard coded estimate provided by the engine?

Take the estimated number of rows, divide that by the number of rows in the table, and look for suspiciously round numbers that don’t change regardless of the predicates you’re searching for.

Sothis: Is there a good way to determine the appx tipping point (row count) for index seek + key lookup vs index scan operation?

No.

Livnat: Any tips for tuning for “PARTITION BY” performance in TSQL?

Read these.

Waldemar: Do you have any query tuning / index tuning tips for linked server queries?

Don’t run them. Connect to the server that has the data you want. If you want to get something done quickly, you don’t turn to someone else and say, “Hey, can you ask that person over there to do this for me?” It’s just idiotic.

Nephthys: When passing multiple args to a stored procedure via a single delimited string and using STRING_SPLIT, is there a max number of args you recommend to not exceed for performance reasons? 

There are even gotchas with just one row. Read this.

Right Said Fred: What are the top 3 DB communities in order?

That’s a weird question – can you rephrase it for the kind of thing you’re looking for out of a community? I’m not sure if you mean places where general database people hang out, or where SQL Server people hang out, or where questions get answered, or where presentations happen, or what.

SQL Crooner: Can we safely ignore the page life expectancy alerts from Idera SQL DM?

I cover that in detail in my Mastering Server Tuning class. I wish I could do justice to it quickly, but that’s why I have training classes – some topics require deeper discussions and details.

Jose: What is your opinion of the various commercial apps that re-write your queries a hundred different ways to find the optimal query syntax?

Really cool, but usually really expensive.

Nephthys: Does clustered index fragmentation matter for INSERT performance?

Run an experiment with your particular table’s structure, your nonclustered index structures, and your typical insert statement to find out.

CakeAndEatItToo: When running a query through the Actual Execution Plan, is it more important to be looking at time taken or cost%?

Neither. Time to head to my free How to Think Like the Engine class.

Alberto: Hi Brent, I read that default isolation level for SQL Azure DB is Read Committed Snapshot. In your experience, what should we pay attention to when migrating from an on-prem with Read Committed isolation ?

Start here.

DBGeek: Like you, I use a Macbook Pro as my primary work computer. I use a virtual machine where I install VPN clients etc. But SSMS sometimes behaves really strangely on the Macbook’s screen resolution. What is your solution/workaround to handle high DPI settings?

In your hypervisor, change the Retina settings. Whatever you’ve got it set to, use the opposite one. Then, in Windows, don’t use the HighDPI settings at all.

Reshep: What are the top conventions used by your clients when creating different availability groups (business unit, region, etc)?

Business units and RPO/RTO.

Heryshaf: How do you know when insert contention is happening and what are the common ways of dealing with it?

Review your server’s top wait types. For common ways of dealing with it, check out my training classes.

Pietro: What naming convention do you recommend for clustered and non-clustered indexes? We currently use IX_TableName_Co1_Col2_Inc everywhere.

When you’re looking at an execution plan, why waste valuable pixels on IX? You know it’s an index. Why waste space on the table name? The table name’s already shown to the left of the index name. Just use the columns.

Ptahhotep: Do you have any kind of tab organization strategy in SSMS when many tabs are open?

Yes, save files and close them. The kinds of psychopaths who leave multiple SSMS tabs open are the same kinds of nut jobs who leave hundreds of browser tabs open instead of using bookmarks. They should be hunted down and eliminated before they harm others.

Tenenit: What data profiling queries do you like to run against client tables when identifying new index key candidates?

I don’t. The contents of the data don’t matter – the contents of the query do. For the mechanics of how that works, and why you should never select a table’s data to determine selectivity, check out my Fundamentals of Index Tuning class.

OMC: Is it possible to optimize a SQL table both for high INSERTS and SELECTS or must you pick one or the other?

That’s a great question, and it’s a sign that you’re ready for my Mastering Index Tuning class.

Chris May: How does the optimiser know which set of statistics to use for a given column if there are multiple statistics available? e.g. default system statistics, statistics as part of an index, manual created/partitioned statistics etc

I’ve heard someone say it’s the most recently updated one, but I have no idea if that’s true. I was this close to recording a video of me testing that hypothesis, but I ran out of time before heading down to Cabo for vacation.

Len: What are the pros / con’s of creating a clustered index on an identity int field vs a datetime field based upon GETDATE() ?

The datetime column will take up more space due to its data type size and the fact that SQL Server will have to add a tie-breaking uniqueifier behind the scenes. The clustering keys get added to all of the nonclustered indexes, too, so more space there as well.

Waldemar: What are the top areas of specialized consulting for SQL Server?

Watch this. I explain the different kinds of specialized consulting, what they do, and how to get into those careers. I know it’s kinda long, like 43 minutes, but if you don’t have 43 minutes to learn, well, we don’t have to worry about you getting a career as a specialist, do we? 😉

Horus: What is the killer feature Microsoft should add to SQL table partitioning that would lead to more wide scale adoption?

Why does it need to be adopted more widely? That feature is a pain in the butt. Instead, I wish they could build automatic sharding, automatically splitting data across multiple database servers for faster queries. That’s kind of a pipe dream for relational databases though.

Previous Post
Estimated and Actual Plans Can Have Different Shapes.
Next Post
[Video] Office Hours: Speed Round Edition

20 Comments. Leave new

  • “Ptahhotep: Do you have any kind of tab organization strategy in SSMS when many tabs are open?

    Yes, save files and close them. The kinds of psychopaths who leave multiple SSMS tabs open are the same kinds of nut jobs who leave hundreds of browser tabs open instead of using bookmarks. They should be hunted down and eliminated before they harm others.”

    But it is so much fun when you click the close button for a tab that is BEHIND the active tab, get the “Save Changes” prompt, click Yes anyway and then realize that all that code you have been working on for hours just vanished.

    Reply
  • Regarding that question on int vs datetime for a clustered index key. In my limited experience (about 2 years) as a dba, I’ve noticed that a numeric data type such as int, bigint, etc. are typically best for keys for performance. We had a table that we discovered was being inserted into quite often, and it turned out the primary key was a column with a unique identifier data type. Based on my understanding, the pages have to be shuffled around each time a new row is inserted, in order to remain ordered by that column. Long story short, performance wasn’t so great and there were a lot of rows so we modified it to use a bigint instead and that seemed to fix the issue. But that’s also an important point too that essentially you’re not just determining what gets used for the clustered indexes, but also any and all nonclustered indexes too, which many may not realize since it’s not usually explicitly defined.

    Reply
    • About the shuffling around of data – so does that mean you also stopped doing indexes on columns like customer name or location?

      Reply
      • I mean, we do have non clustered indexes for data like that. I figured that’s better than clustered indexes, but again, I am a noob.

        Reply
        • Why do you think that? Just curious.

          Reply
          • My thought is because the data in the nonclustered index is limited to what’s specified in the order by and include and the primary key (if one exists), rather than the entire table’s rows. Less data being moved around.

          • Interesting. So you have less nonclustered indexes than clustered indexes, too?

          • I feel like there’s a gotcha coming up but I’ll keep it going lol. Yeah I believe so.

          • If you have less nonclustered indexes than clustered indexes, that’s less than one index per table? I’ll take a pause on the discussion here then. When you get a chance, check out the How to Think Like the Engine class. I don’t think the data type on your clustered index is the real problem. Hope that helps!

  • My etl team uses Informatica and tidal to run and schedule loads nightly. Almost every load uses a temp table some more than one and they create a key before loading data and they do lots of merges. What is best practice for etl?

    Reply
    • The best practice is to read at least the first paragraph of each blog post I write, like the one above that you’re commenting on.

      I’m not asking you to read the *entire* post – I know that would be a really large amount of work, completely unfair for me to ask of you. Instead, I’m just asking you to read the first paragraph.

      I know – it seems hard, perhaps more work than you’ve done this entire day so far at work. Hang in there, though. I believe in you. You’ve got this!

      Reply
  • Sir Logs A Lot
    August 10, 2022 11:41 pm

    Zing! That’s going to leave a mark ?

    Reply
  • Brent, regarding the pipe dream you mentioned of splitting data across multiple database servers for faster queries, what do you think of ACID-compliant, distributed relational databases? CockroachDB seems promising, but I’ve never had to set up or maintain anything like that. Do any of them live up to the hype?

    Reply
    • Every morning, I wake up with the hope that today’s going to be the day that everyone who comments on one of my blog posts will at least read the first paragraph of that post, the one up at the top of the page, right underneath the title.

      Maybe I’m naive. Maybe I’m a sucker. But I am filled with the idealistic possibility that some day, some fine day, it will happen.

      It’s 7:17AM today my time, and you have already dashed my hopes for today, Alex Pixley. For the rest of today, I want you to walk around with the burden of knowing that you, Alex, were the one who broke my heart today. We couldn’t even make it to 8AM.

      Reply
  • I’m not asking a question, just making a comment. I read the first paragraph. 😉
    Just perplexed by the fact that you abhor linked server queries and call them “idiotic” but wish that SQL Server supported database sharding across multiple servers “for faster queries.” At a core level, I’m thinking a query across shards is effectively a linked server query. Lots of hand-waving involved, of course. But I’ve never used a sharded DB, much less looked at the implementation details of one.

    Reply
  • Comment_not_question_dontshoot
    August 17, 2022 8:44 pm

    Sharding accessible via TSQL is available in the product Azure Synapse

    Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.