It’s Friday. I’m Not Answering These 14 Office Hours Questions.

Sometimes, I don’t care if questions get a lot of upvotes at https://pollgab.com/room/brento – I just don’t wanna answer ’em. They’re not necessarily bad questions, but I’m just not interested in them, or I’ve answered them repeatedly, or I don’t have a good answer.

Maybe y’all do, though, so I’ve numbered these. If you want to leave your answer in the comments, prefix it with the question number you’re answering.

  1. Neil: A friend has a table where the primary key is a guid and clustered index, which is leading to index fragmentation and slow inserts. Any advice on how to reduce the index fragmentation? The guid is created application side, and can’t use sequentialuids. Thanks.
  2. Aussie Greg: Hi Brent, Thought i’d take a splash and ask a question (*new pool humour) In management studio, how can I limit the database list to those a user has access to? I can see the DENY VIEW ANY DATABASE but don’t want to make users DBOs
  3. Heather: What is your opinion of SQL Test app (https://www.sqltest.org) vs SQL Query Stress app?
  4. Erzurum: What should Microsoft do if they want lure SQL Server on AWS users over to Azure?
  5. Arslan: What’s your opinion of entering confidential info in chat gpt? Will we see AI therapist chat bots?
  6. Jack: In SSMS, when viewing the index “Rebuild All” screen to see the index frag percentage, is that percentage “internal” or “external” fragmentation? Corollary question: what about the avg_frag columns from the sys.dm_db_index_physical_stats. Thank you for your time.
  7. The Kevin: I’m starting to look at “Stolen Memory” statistics, I took your classes years back and don’t recall this discussion. EG: I have an instance with about 20 databases and 50Gb of RAM. I see the bufferpool using 40GB and stolen Memory at 9GB. should I beg Paul White? <ducks>
  8. gotqn: In terms of security, is it OK to expose your database to tools like GitHub Copilot in Azure Data Studio? Someone will know that your email address column is not encrypted or a stored procedure is not parsing its input parameters when dynamic T-SQL is built.
  9. Dwayne: What is your opinion of the QUERY_OPTIMIZER_HOTFIXES = ON DB setting?
  10. Chuck: Does running a lengthy (days) D.W. export query using NOLOCK impact the transaction log less than without NOLOCK?
  11. Harrison: Sorry if this is too low-level- I am mostly wondering what you would look for on an internship application that would make a candidate stand out. (Types of projects, experience, etc)
  12. Venkat: What is the best way to get/print/log entire TSQL stack trace in a stored proc to identify the root sproc that initiated an sp invocation?
  13. Inquisitive App Developer: Why are wait types all in UPPPER_CASE?
  14. dbant: We have a large transaction table with 99% fragmentation that needs index rebuilding without facing any block and performance issues.There is a brief period when we do ONLINE rebuild where blocking will occur which we are not allowed. Are there any other way to accomplish this?
Previous Post
[Video] Office Hours: Github Copilot in Azure Data Studio
Next Post
How to Make Database Changes Without Breaking Everything

30 Comments. Leave new

  • Kendra Little
    June 23, 2023 1:31 pm

    Q5. Arslan: What’s your opinion of entering confidential info in chat gpt? Will we see AI therapist chat bots?

    A5: Don’t enter someone else’s confidential info into the ChatGPT website. Don’t enter your own confidential info into the ChatGPT website unless you want it to no longer be confidential. Note that the ChatGPT website is not the same as the Open AI service— it is possible to use the Open AI service under a data agreement that they will not use your prompts/data you enter for training, but you need to specifically arrange that. (So in other words, it is possible for applications to be built that can use personal or confidential data safely with these services.)

    AI Therapist chat bots already exist, and have for a while, actually! One that I really like is “woebot” – https://woebothealth.com/. It is a free app (at least for now, has been for a while though) that specializes in cognitive behavioral therapy style conversations.

    Reply
  • #2 – If the user is not sa, then the default behavior is that the user doesn’t see databases they don’t have access to. Is the user sa or part of any other role that is allowing them to see more databases than you want them to see?

    Reply
    • i_write_my_tsql_in_lower_case
      June 30, 2023 12:58 pm

      unfortunately the default is, that every user sees any database even if the user hasn’t access to them. To restrict the users view is to make them the database owner (not dbo) and set the DENY VIEW ANY DATABASE. That’s the only way.

      Reply
      • Wow i just tried it and you’re right – can’t do anything with the other databases, but sure can see them. Kinda surprised it’s designed that way.

        Reply
  • #1 – Jeff Moden has a lot of material on how to handle clustered index maintenance on GUIDs. Google that, or here’s a long video https://www.youtube.com/watch?v=jx-FuNp4fOA&ab_channel=KevinFeasel

    Reply
    • Ninja’d. Also found it after hearing Brent mention the black arts series in a recent office hours

      Reply
    • A shorter, more to the point video on the myth of Random Guid fragmentation heaped upon us by bad tests and supposed “Best Practice” Index Maintenance that isn’t and was never meant to be a best practice is available at the following video. Other issues like massively fragmenting ever-increasing indexes and destroying several myths around the use of REORGANIZE are included. Done correctly, you can actually go for months without Random GUID fragmentation even with very high daily insert rates.

      Here’s the link to the video. Watch out around the 15 minute marks… the added some sudden, loud ads that will make you want to throw your headset across the room.

      https://www.youtube.com/watch?v=rvZwMNJxqVo

      Reply
  • #14 – Does it really need to be rebuilt though? Is it having performance issues because of the fragmentation?

    Reply
  • #1 – I guess my comment got eaten because I posted a link. Just google Clustered GUID fragmentation by Jeff Moden. He has lots of stuff out there.

    Reply
  • Erin Stellato
    June 23, 2023 3:45 pm

    Q8. gotqn: In terms of security, is it OK to expose your database to tools like GitHub Copilot in Azure Data Studio? Someone will know that your email address column is not encrypted or a stored procedure is not parsing its input parameters when dynamic T-SQL is built.

    I recommend you review the privacy information for Copilot here (scroll down toward the bottom): https://github.com/features/copilot

    Note that currently, Copilot does not read code from your database, it relies on what you’ve copied into the editor. Further, it doesn’t retain that information. In the page referenced above, the last entry:

    Will my private code be shared with other users?
    No. We follow responsible practices in accordance with our Privacy Statement (https://docs.github.com/en/site-policy/privacy-policies/github-privacy-statement) to ensure that your code snippets will not be used as suggested code for other users of GitHub Copilot.

    Reply
  • 1. two ways to deal with it. The first: ignore the fragmentation, don’t rebuild your indexes, certainly don’t reorganize them, but do keep the statistics well updated. Eventually the app will reach a terminal level of fragmentation so there is enough space to insert new records or make expansive updates to variable width columns. The second: Rebuild the index to 70 or 80% fill factor then rebuild the index at 1% fragmentation. The low fill factor provides enough space for the randomly ordered guids to be inserted without splits. Look up jeff moden index dna on youtube for this strategy.

    9. generally use them unless they create a problem. Most queries will be faster and more efficient with them, a small number will be slower.

    10. No. nolock is only transaction isolation.

    14. you can rebuild each index, one at a time. schema locks are required even for an online index rebuild. Keep in mind online index rebuilds are much slower than offline rebuilds. If you can arrange a window, it could be easier to do offline rebuilds quickly than trying to do online rebuilds. you may also include the parameters wait_at_low_priority, max_duration and resumable so that they wait until the SQL server is not doing anything else until the window is over and then if it doesn’t complete, resume the index rebuild the next day.

    I would also look at page density of each index and see if the 99% fragmentation is even a problem. If your page density is still at least nominal, it may not be

    Reply
  • 1. I imagine that not having the GUID as the PK would avoid all the data moving in the clustered index so I would just create a sequence on the database side (app doesn’t need to know about it) and create a non-clustered unique index to avoid duplicate GUIDs that should be much faster to insert and reorganize

    PS: I’m just a curious developer that reads a lot and likes to try new stuff, so please be nice if my answer isn’t that good.

    Reply
  • 13: It just is easier that way.

    Reply
  • 5.) Don’t. Unless you have read, and fully understand, all of the terms and conditions of usage. I personally avoid entering any sensitive information or proprietary code, as I am sure of the legal implications of doing such things, nor do I know how secure their environment is.

    Reply
  • 4. They should TRY to match the same price/performance that AWS offers.

    5. At least while there’s not a private ChatGPT option (that will be paid for sure) it should be avoided at all costs, every input becomes part of its “library” and we have seen Amazon source code leaked there used as examples for answers. In the current state of “AI” it’s just a better Google, something that is capable of searching a huge amount of ingested data and use context between questions to keep answering or improving answers, IMHO it’s not capable of creating anything new from scracth so it’s not an Intelligence per se.

    13. If you ever programmed would you name CONSTANTS or ENUMS in any other way?

    14. Can you think of a table/index partition that would make sense? If the table is large enough for partitions to make sense, can you think of a partition scheme that would generate less fragmentation? Let’s say you use random GUIDs as PK but can separate transactions per date or something that would keep the older data organized just once and intact, being able to rebuild just the last partition when needed… I’m just trying to think out of the box ok?

    Reply
  • Q. What should Microsoft do if they want lure SQL Server on AWS users over to Azure?

    A. Microsoft should purchase AWS or Amazon.

    Reply
  • #3. I haven’t used SQLTest and there aren’t any dates in the change log. However, given the .Net 4.0 vs .Net 6.0 for SQLQueryStress I would be inclined to stick with SQLQueryStress. Since you are looking at blog posts here, also note that Brent has blogged about using SQLQueryStress (https://www.brentozar.com/?s=SQLQueryStress).

    Cheers

    Reply
  • I missed number 4.

    MS could make even the vaguest attempt at improving the quality of their support. They could stop introducing cloud products that they will abandon shortly thereafter, leaving entirely broken functionality in place after an organization has invested in it. They could improve the quality of their documentation. Not only is documentation not uncommonly entirely missing, when it is there, its not unheard of to have information in it that is entirely wrong. They could make their offerings directly price competitive instead of their current model of charging more and attempting to impose value with bundling and horrifically complicated purchase options that are necessary even at the SMB level to make it cost effective.

    Reply
  • Michael J Swart
    June 23, 2023 6:29 pm

    Question 6. Jack, you ask:

    > In SSMS, when viewing the index “Rebuild All” screen to see the index frag percentage, is that percentage “internal” or “external” fragmentation? Corollary question: what about the avg_frag columns from the sys.dm_db_index_physical_stats. Thank you for your time.

    For other folks, Jack is referring to the dialog you get to when in SSMS’s Object explorer, you right-click on a table’s indexes folder and select Rebuild All.

    Jack, that screen describes fragmentation with this:
    > You can analyze index fragmentation using Total Fragmentation. Total Fragmentation indicates the percentage of logical fragmentation of the index. For more information see Reorganize and Rebuilding Indexes in Books Online.

    Books Online is the old way of describing SQL Server’s online documentation. But you can get there directly from that dialog by pressing the “?” button.
    Or visit here: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16

    That page defines logical fragmentation as:
    > out-of-order pages in the index

    The question you asked is whether this internal or external?
    Brent characterizes internal and external fragmentation at https://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/ :
    > Problem #1 – Internal Fragmentation: We’ve got a newly added page with hardly any stuff on it.
    > Problem #2 – External Fragmentation: The phone book pages are out of order.

    So to answer your question, external.

    You had a follow up question. What about sys.dm_db_index_physical_stats?
    https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-physical-stats-transact-sql?view=sql-server-ver16
    It’s described as “logical” fragmentation which is Brent’s “external”

    The docs are pretty clear and these answers didn’t take any work. I don’t know what you’re going to do with that info. I can see why this question made the list of questions Brent didn’t want to answer.

    Reply
  • Keith, you seem upset about documentation for some reason. Microsoft has a way for you to let them which articles are incorrect or misleading, at the bottom of every single article, using the Feedback option. Give it a try.

    Reply
    • what happened to the documentation
      June 26, 2023 6:26 pm

      The “for some reason” phrase implies Keith is wrong to be upset about the documentation being incorrect or misleading.

      Microsoft shouldn’t expect users to know better and to let them know their own documentation is wrong.

      The documentation used to be accurate and reliable in the good ‘ol days. Now it isn’t so much. It’s frustrating. Not just for Keith.

      Only Microsoft would know the source of what’s causing users to be upset “for some reason” at the documentation being inaccurate (e.g. – why isn’t the documentation as accurate these days as it used to be – what changed?)

      Reply
      • I have mixed feelings about this because three things can be true at the same time.

        • Microsoft can be improving the documentation, AND
        • The documentation can still be insufficient, AND
        • The onus can still be on Microsoft to improve their multi-million-dollar product’s documentation

        It really does feel like MS is putting in more effort, as demonstrated by the recent post about performance troubleshooting documentation. I wanted to write a blog post applauding those efforts, but … then I opened up the individual pages, and I was kinda horrified. For example, is this really how we expect users to troubleshoot slow queries in 2023? There’s no methodology here.

        Do I want to submit pull requests to write the documentation on how to troubleshoot slow queries? No – and if I did, I’d expect to get paid for that, because it’s a Herculean amount of work.

        Reply
  • Don't Bother Asking
    June 25, 2023 10:58 pm

    1. Try making the guid a nonclustered PK, and set some other more sequential column(s), even an identity column, as the clustered index. You’d probably also need to create a nonclustered index on the guid column, to compensate for any foreign key lookups.

    Reply
  • Thomas Franz
    June 26, 2023 7:52 am

    #14:
    – restore a copy of the database somewhere
    – create a empty copy of your table in your origin database (e.g. with suffix _new), consider using partitions, optimized indexes etc. on it
    – fill it with the rows in the restored db, usually it should be defragmented automatical (particularly when you create the nonclustered indexes after copying it)
    – get a short maintenance window
    – rename the origin table to _old
    – either copy all the changes to the new one and rename the _new (remove the suffix, so that it has now the origin name)
    – when you are doing log backups and putting the restored database into standby mode (instead of RESTORE WITH RECOVERY) you could apply more log backups to the restored DB after doing the long duration first copy job

    Or (much more to do)
    – select the max(id) or last_change_date (or whatever you can use to identify new records) from the _old and the _new table
    – create a view (with the name of the origin table) that selects all rows with id > @old_id from _new table (this are rows that will be added after your maintenance window)
    – and union all rows with id between @new_id and @old_id from _old (ids between the restore / copy and the start of the maintenance window)
    – and union all rows with id < @new_id from _new (ids copied from the restored DB)
    – create an INSTEAD OF on the view for INSERT / UPDATE / DELETE and delegates all write access to the _new table
    – of course you should prepare all this stuffe in a script that you just have to execute, this way your maintenance window will be just a few seconds
    – after the maintenance you copy / merge all those rows with id between @old_id and @new_id into the _new table (take whatever time it needs, but be aware, that someone may already updated those rows and they already exists in the _new table because of the trigger)
    – when done you need a second short maintenance window, where you drop the view and rename the _new table to the origin table name

    Reply
  • Harpy Eagle
    June 26, 2023 1:11 pm

    Q11. Harrison: Sorry if this is too low-level- I am mostly wondering what you would look for on an internship application that would make a candidate stand out. (Types of projects, experience, etc)

    A11. For a database modeler or developer, look for anything related to mathematics. Databases are based on Set Theory (https://www.scaler.com/topics/sql/Set-theory/) so a strong foundation in maths is a potential indicator of success with RDBMS’. Also, ask the potential intern what is the lowest form of normalization that should be applied to every database. If they do not know, or answer with anything other than, “All of them.” – keep looking.

    For a database administrator, ask them how they handle being responsible for something as crucial as firm data while not having any authority whatsoever about how that data is accumulated, stored, and retrieved? DBA’s have 90% of all decisions made for them before they run the first SQL script, but they have 100% of the responsibility to fix the data once they inevitable go south.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.