Office Hours: Dodging Work Edition

Videos
8 Comments

Post your questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me cover. Today, I’m dodging work, so I went through your questions while I waited for the coffee shop to open:

Here’s what we covered:

  • 00:00 Introductions
  • 03:28 Ivan: What are the top signs that a table has a poor clustered index?
  • 05:39 NotCloseEnough2RetirementToStopLearning: Hi Brent, it seems the job market wants and is paying more for data generalists (wider but less deep skillset) than data specialists (DBA, deep but less wide skillset). This seems to be limiting salaries and the number of Senior positions. Any thoughts or comments?
  • 07:52 Gennady: When should you use a table per year of data (and union them together in a view) vs using formal sql table partitioning?
  • 12:32 Mr. Ed: How would I determine if my IT team needs a DBA?
  • 14:28 Tom: Can you recommend a live T-SQL class?
  • 15:23 Jorriss: If you were a woodchuck, would you prefer brick and mortar structures?
  • 17:18 J.P.: Why should I learn noSQL if I know SQL? Is it only because of the speed?
  • 19:06 Rob: We have a 4.8 tb SharePoint database that is causing a lot of issues like backups due to the file size. Has anyone run across this issue?
Previous Post
[Video] Fundamentals of Stored Procedures at SQLBits
Next Post
Office Hours Speed Round: Text Edition

8 Comments. Leave new

  • patrick adams
    July 28, 2022 4:58 pm

    Question… Are there any recommendations for a replacement (tool) to the Microsoft Assessment and Planning Toolkit? I am about to start migrating databases off aging (2008 R2) sql server instances and servers. I did an inventory with MAP Toolkit and noticed that a 2019 server/instance were NOT in it.
    Turns out something in the new O/S or Sql Server edition dont allow the MAP TK, to do the nice work it did on these newer versions.
    Any suggestions would be welcome and appreciated. Thanks Pat

    Reply
  • Hi Brent,

    I really like this one because you touched on some of my favorite subjects… some of what Clustered Indexes really do for you and some of the aspects that I love about Partitioned Views over Partitioned tables.

    Thanks for what you do and thanks for sharing.

    Reply
  • Two consecutive raining days in Vegas? I suggest you buy a lawn mower! 😀

    Reply
  • John Barrow
    July 30, 2022 3:42 pm

    Hi Brent,

    Regarding the uses for CLUSTERED indexes you touched on (key lookup and scan), suppose I had the following two table definitions

    CREATE TABLE parent (
    id int IDENTITY (1,1) NOT NULL,
    name varchar(255) NOT NULL
    )

    CREATE TABLE child (
    auditId int IDENTITY (1,1) NOT NULL,
    toParent int NOT NULL,
    posted datetimeoffset NOT NULL
    CONSTRAINT dfPosted DEFAULT SYSDATETIMEOFFSET(),
    description varchar(255) NOT NULL,
    CONSTRAINT pkAuditId PRIMARY KEY NONCLUSTERED (auditId),
    CONSTRAINT fuToParent UNIQUE CLUSTERED (toParent, auditId)
    )

    And the only query requirement is that I want to optimise access to query all the children for a single specified parent. My thought was that if I use the clustered index to group all the children physically together on the disk for each parent then the query would perform faster with the resultant key lookups than if they were randomly spread over the table. The children are only ever created and never deleted. Now I realise that that there is an additional cost in forcing the clustered index to organise the records in this way but once created (once by an overnight process) then there would be a payoff over time as ‘create once, read thousands’. This idea gives great performance in an old fashioned network style database and was wondering whether SQL Server would benefit in the same way. I realise that I could use a covering index on the child (toParent, auditId, description, posted…) but as I want to retrieve most of the fields from the child table, this would create in effect a duplicate child table with inefficient long keys.

    Any thoughts of using CLUSTERED to collate children physically on the disk to aid future SELECT queries by minimising physical disk activity, assuming that the child table is too large to fully cache and the parent queries specify random ids (all values occur with equal probability)?

    Reply
    • Questions this detailed are beyond what I can do for free as part of Office Hours. You can click Consulting at the top of the site for personalized help, or for free advice on detailed questions, head to Q&A sites like https://dba.stackexchange.com. Hope that’s fair!

      Reply
      • Fair enough – thanks anyway, I was really just after a quick opinion or whether it is something that you might be interested in and was triggered by your answer to one of the questions in the Q&A video as I get the impression that you have an interest in the internal workings of MS-SQL Server.

        I have it on my list of interesting questions from my time developing databases (now retired) and will get around to investigating it at some point (just for my own curiosity) – hence no need for personalized help.

        In case you are interested 🙂

        I suspect that the cost of re-organising the physical rows each time a new row is added will be too detrimental to benefit any performance on querying and probably not detectable anyway. The situation I used to deal with had a small number of rows per parent (<1000) but 1 disk read covering the spread of all 1000 was obviously far faster than 1000 separate disk head movements. We used to run monthly 're-indexing' of the network database table to collate any new children.

        Reply

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.