ChatDBA: ChatGPT 4 Answers 5 Office Hours Questions

Every now and then, someone asks a question on PollGab about why I don’t use ChatGPT to answer these questions. And sometimes, I wonder that myself – I wonder why people don’t put their questions into ChatGPT, especially GPT 4, because it’s pretty damned good.

I did this a while ago using ChatGPT 3.5, which is the current public version. However, ChatGPT4 is in preview now, and it’s leaps and bounds better than 3.5. Let’s give the 4 preview a shot and see how it goes – but first, let’s tell it to respond in my style:

I want you to answer questions in the style of Brent Ozar, a Microsoft SQL Server expert.

Now, for the questions – I just took the top 5 upvoted questions without cherry-picking:

 


Question 1: Why Is SQL Unit Testing Rare?

MyTeaGotCold: Why is unit testing so rare in the SQL world? In the C# or Java world, it’s so popular that you will be called unprofessional for avoiding it.

ChatGPT4: Unit testing in the SQL world is less common compared to languages like C# or Java, and there are several reasons for this:

  1. Nature of SQL and Databases: SQL is primarily a data manipulation language, and databases are often shared resources with state. This makes it challenging to create isolated tests. Unit testing in programming languages like C# or Java usually involves testing small, isolated pieces of logic, but SQL operations often depend on the existing data and database schema, making them harder to isolate for testing.
  2. Tooling and Frameworks: The C# and Java ecosystems have a wealth of frameworks and tools designed specifically for unit testing (like NUnit, JUnit, Mockito, etc.). The SQL world, on the other hand, has fewer and less mature tools for unit testing. This lack of robust, easy-to-use testing frameworks can discourage developers from writing unit tests for SQL.
  3. Database Modifications: Writing tests for code that changes database schema (like adding tables or changing column types) can be tricky. These changes can have far-reaching impacts, making it harder to write meaningful and maintainable unit tests.
  4. Data Dependencies: SQL queries often have implicit dependencies on the data they operate on. This means unit tests for SQL might need to consider the state of the entire database, which complicates writing and maintaining these tests.
  5. Performance Concerns: Running unit tests against a database can be slow, especially if the database is large or complex. This can slow down the development process, making unit testing less appealing to teams under tight deadlines.
  6. Cultural and Historical Factors: There’s also a cultural aspect. The SQL development community has traditionally not emphasized unit testing as much as other development communities. Practices often evolve differently in different technology communities, and the SQL world has historically focused more on integration testing and manual testing.

However, this doesn’t mean that unit testing isn’t valuable or possible in SQL. There are tools and practices emerging that facilitate better testing practices in the database world, like tSQLt for SQL Server, but they’re not yet as widespread or ingrained in the culture as in other programming communities. As database development practices evolve and the importance of data integrity and reliability becomes even more prominent, we might see a shift towards more rigorous testing practices in the SQL world.

Brent says: that’s way, way more detailed than I’d ever be, but it’s beautifully bang-on.

 


Question 2: How to Think Like MySQL and Postgres

DBA with Anxiety: hello brent!, Thank you for the video lesson “how to think like an engine” that you have made. the lesson is great!. do you have something similar but for MySQL or PostgreSQL? Or are there any resources regarding this that you recommend?

ChatGPT4: Hello! I’m glad to hear you enjoyed Brent Ozar’s “How to Think Like the SQL Server Engine” lesson. While I’m not Brent Ozar himself, I can certainly help guide you towards similar resources for MySQL and PostgreSQL.

As of my last update in April 2023, Brent Ozar primarily focuses on Microsoft SQL Server, so he might not have created a similar course specifically for MySQL or PostgreSQL. However, there are other resources and experts in the MySQL and PostgreSQL communities that offer similar insights:

  1. Official Documentation: Both MySQL and PostgreSQL have extensive official documentation that includes insights into how the database engines work. While not a course, per se, these documents offer a deep dive into the engines’ internals.
  2. Books and eBooks: There are several comprehensive books on MySQL and PostgreSQL that cover their internals and performance optimization. Titles like “High Performance MySQL” and “PostgreSQL Up and Running” can be very insightful.
  3. Online Courses: Websites like Udemy, Coursera, and Pluralsight offer courses on MySQL and PostgreSQL. While they might not have a course identical to “How to Think Like the SQL Server Engine,” they do have courses that cover database internals and performance tuning.
  4. Community Blogs and Articles: The MySQL and PostgreSQL communities have a wealth of experienced professionals who write blogs and articles. Websites like Percona’s blog (for MySQL) and several PostgreSQL community blogs offer deep insights into how the databases work.
  5. Forums and Q&A Sites: Platforms like Stack Overflow, DBA Stack Exchange, and specific community forums for MySQL and PostgreSQL are excellent places to learn from experienced database administrators and developers.
  6. Webinars and Online Talks: Look out for webinars and online talks by database experts. These often cover specific aspects of database engines and can provide insights similar to Brent Ozar’s lessons.
  7. YouTube Tutorials: There are many knowledgeable database professionals who share their expertise on YouTube, offering tutorials and explanations on both MySQL and PostgreSQL internals.

Remember, while the fundamental principles of how relational databases work are similar across SQL Server, MySQL, and PostgreSQL, each has its unique characteristics and optimizations. Exploring a range of resources will give you a well-rounded understanding of how to think like the engine for each of these databases.

Brent says: this answer illustrates one of the problems I have with ChatGPT. It’s a whole hell of a lot of words, but it’s not really prioritized in any way, and it’s not really useful. Let’s try rephrasing the question, but demand a more concise, prioritized answer, and focus on just a single database:

 

Brent: What are the top 5 resources to learn the internals of how PostgreSQL processes queries?

Learning the internals of how PostgreSQL processes queries is crucial for database administrators, developers, and anyone interested in this powerful open-source database system. Here are the top five resources that can help you gain a comprehensive understanding of PostgreSQL’s query processing:

  1. Official PostgreSQL Documentation:
    • The official PostgreSQL documentation is an invaluable resource for understanding its internals. It offers detailed information on PostgreSQL architecture, query processing, and optimization. The documentation is meticulously maintained and reflects the latest version of PostgreSQL.
    • Link: PostgreSQL Documentation
  2. “PostgreSQL 9.0 High Performance” by Gregory Smith:
    • This book, although focusing on an older version of PostgreSQL, provides deep insights into the performance aspects of PostgreSQL. It covers topics like query optimization, indexing, and hardware considerations. While some information may be dated, the foundational concepts and performance tuning principles are still relevant.
    • Available at most book retailers and online platforms like Amazon.
  3. Postgres Weekly:
    • Postgres Weekly is a newsletter that curates articles, news, and resources related to PostgreSQL. It often includes links to in-depth articles and blog posts about PostgreSQL internals and query optimization.
    • Link: Postgres Weekly
  4. PgCon Conference Talks and Videos:
    • PgCon is an annual conference for PostgreSQL users and developers. It offers talks and workshops from some of the leading experts in the PostgreSQL community. Many of the talks focus on the internals of PostgreSQL, including query processing. Videos from past conferences are usually available online.
    • Link: PgCon Archives
  5. “The Internals of PostgreSQL” Online Resource:
    • This online resource provides a detailed look into the internal structures and mechanisms of PostgreSQL. It’s an excellent resource for understanding the core functionality, including query processing and execution plans.
    • Link: The Internals of PostgreSQL

These resources provide a blend of official documentation, expert-written books, community-driven content, and conference materials, catering to a wide range of learning preferences. Whether you’re new to PostgreSQL or an experienced user, these resources will deepen your understanding of how PostgreSQL processes queries and manages data.

Brent says: okay, that’s WAY better. It’s a reminder that prompt engineering is going to be a thing for the next several years: ask better questions, and you’ll get better answers from AI.

 


Question 3: Two SSMSs, One Cup Computer

Stu: Hi Brent, I often use two SSMS apps open (using two screens) and have an alarming issue where they interfere with each other, sometimes exec’ing a query in one will also exec what’s in the current window on the other! I wonder if you’ve ever seen this? I can find nothing on it!

ChatGPT4: Hello! While I’m not Brent Ozar, I can certainly help with your SQL Server Management Studio (SSMS) issue.

The behavior you’re experiencing with SSMS, where executing a query in one instance seems to also execute a query in another, is indeed unusual and not a typical feature or bug that’s widely reported. Here are a few steps you can take to troubleshoot and potentially resolve this issue:

  1. Check for Keyboard/Mouse Software: Sometimes, software for advanced keyboards or mice can have features that synchronize actions across different windows. Make sure that no such feature is accidentally enabled.
  2. Update SSMS: Ensure that you are using the latest version of SSMS. Sometimes, unusual behaviors are due to bugs that are fixed in later releases.
  3. Inspect Extensions or Add-ins: If you have any extensions or add-ins installed in SSMS, try disabling them to see if the issue persists. An extension might be causing unexpected behavior.
  4. Check Grouped Windows Settings: In some operating systems, windows from the same application can be grouped, and actions may inadvertently affect all grouped windows. Check your OS settings to ensure that this is not the case.
  5. Isolate the Issue: Try to replicate the problem with different queries and in different databases. See if the issue is consistent or occurs only under specific conditions.
  6. Examine Keyboard Shortcuts: It’s a long shot, but check if there are any global keyboard shortcuts that might be triggering executions in both instances.
  7. Hardware Check: Although unlikely, there could be a hardware issue with your keyboard or mouse. Try using different input devices to see if the problem persists.
  8. Check for System-wide Automation Scripts: If you’re using any macro or automation scripts that run across the system, ensure they aren’t interfering with SSMS.
  9. Use Different User Profiles: Try running each SSMS instance under a different Windows user profile. This can help isolate the issue to a specific user configuration.
  10. Seek Community Assistance: Post the issue in SQL Server communities, such as Stack Overflow, SQL Server Central, or the Microsoft SQL Server forum. Sometimes, community members might have encountered similar issues and could offer a solution.

If none of these steps resolve the issue, consider reaching out to Microsoft Support or a professional IT service. It’s a peculiar issue that might require a more in-depth technical investigation.

Brent: the first 6 answers were good. It should have stopped talking after that.

 


Question 4: Big Table Scans

Aditya: Are there any negative consequences to clustered index scan operators where the clustered index is multi-terabyte in size but the onprem sql server only has 512gb RAM?

ChatGPT4: Dealing with a multi-terabyte clustered index scan on a SQL Server with only 512GB of RAM can indeed have several negative consequences, primarily related to performance and resource utilization. Here’s a breakdown of the potential issues:

  1. Performance Impact:
    • Slow Query Execution: Scanning a multi-terabyte clustered index is a data-intensive operation. With only 512GB of RAM, SQL Server cannot cache the entire index, leading to a lot of physical I/O operations (disk reads). This can dramatically slow down query execution.
    • CPU Overhead: Large index scans also put a significant load on the CPU, especially if the query involves sorting, filtering, or aggregating the data.
  2. Resource Contention:
    • I/O Bottlenecks: When SQL Server has to read a large amount of data from disk, it can create I/O bottlenecks, affecting not just the query in question but also other operations on the server.
    • Memory Pressure: The SQL Server Buffer Pool will be under pressure to manage the data pages from the large index. This can lead to page evictions, impacting the performance of other queries as they may need to read data from disk that was previously in memory.
  3. Cache Invalidation:
    • A large index scan can flush out a significant portion of the data and index pages from the cache (Buffer Pool), which are relevant for other queries. This leads to a scenario where even simple queries might face performance degradation due to cache invalidation.
  4. Blocking and Locking Issues:
    • Depending on the isolation level and the nature of the transaction, a large index scan can hold locks for extended periods, potentially leading to blocking issues in a multi-user environment.
  5. Impact on Maintenance Operations:
    • Maintenance tasks like index rebuilds or statistics updates on a multi-terabyte index can be very time-consuming and resource-intensive, potentially impacting the availability and performance of the database during these operations.
  6. Thermal and Power Considerations:
    • In a physical server environment, intensive disk I/O and CPU usage can increase power consumption and heat production, which might be a consideration in some data center environments.

To mitigate these issues, you might consider the following strategies:

  • Query Optimization: Review and optimize the queries to avoid full index scans when possible. Using more selective filters, proper indexing strategies (like non-clustered indexes), and query hints can help.
  • Partitioning: Consider table partitioning to break the large table into smaller, more manageable pieces. This can make scans and maintenance operations more efficient.
  • Hardware Upgrades: If feasible, increasing the RAM could help, but for multi-terabyte data, it’s also crucial to have fast storage (like SSDs) to mitigate the I/O bottleneck.
  • Archival Strategies: If applicable, archiving old data to reduce the size of the actively queried dataset can be beneficial.

Dealing with such large datasets often requires a combination of query optimization, proper hardware, and strategic database design to ensure efficient operations and acceptable performance.

Brent says: everything in that answer is great except partitioning. If you’re doing clustered index scans, partitioning is unlikely to help the speed of the query any more than changing the clustering key design, and I didn’t see ChatGPT recommending that. But really, that’s a small quibble – this is a long, detailed answer that’s really good!

 


Question 5: Are Distributed Transactions Bad?

Grigoris: What is your opinion and experience of using distributed transactions with SQL Server? Net good, Net bad, indifferent?

ChatGPT4: Using distributed transactions with SQL Server, or any database management system, can be a double-edged sword. The appropriateness and effectiveness of distributed transactions depend greatly on the specific requirements and context of your application. Here’s a balanced view based on common experiences and industry practices:

Pros of Distributed Transactions:

  1. Data Integrity Across Multiple Systems: Distributed transactions ensure that operations involving multiple databases or systems either all succeed or all fail, maintaining data consistency across these systems.
  2. Simplified Application Logic: They allow developers to write simpler application logic when dealing with multiple data sources, as the complexity of ensuring transactional integrity is handled by the transaction manager.
  3. Standardized Protocols: SQL Server’s support for distributed transactions usually follows industry-standard protocols (like MS DTC – Microsoft Distributed Transaction Coordinator), making it easier to integrate with various systems.

Cons of Distributed Transactions:

  1. Performance Overhead: Distributed transactions introduce significant performance overhead due to the coordination and communication required between different systems. This can impact the overall response time of the applications.
  2. Increased Complexity for System Maintenance: Configuring and maintaining distributed transactions, especially in a high-availability or distributed environment, can be complex. Issues like network failures or server downtime can lead to complicated failure scenarios.
  3. Risk of Locks and Deadlocks: They can increase the risk of locks and deadlocks, as transactions may hold locks across multiple systems, potentially leading to bottlenecks and performance issues.
  4. Scalability Concerns: Distributed transactions can be a limiting factor in scaling systems horizontally. As the system grows, the overhead and complexity of maintaining these transactions can become a significant challenge.

Brent says: it should have stopped here and declared a verdict, and it would have been fine. The next group of bullet points are really extraneous.

Situational Considerations:

  • Specific Use Cases: In some cases, such as financial systems where consistency across different databases is paramount, distributed transactions are almost essential.
  • Alternatives: Depending on the use case, other patterns like eventual consistency, compensating transactions, or using a single database (when possible) might be more suitable and efficient.
  • Technology Evolution: With the rise of microservices architectures, there’s a shift towards more localized transactions within a service boundary and using patterns like Saga for cross-service operations.

Conclusion:

  • Net Good or Bad? It’s not about being inherently good or bad, but rather about the right tool for the job. Distributed transactions are powerful for ensuring consistency across distributed systems but come with trade-offs in terms of complexity and performance.
  • Indifferent? Certainly not. It’s crucial to carefully evaluate the need for distributed transactions against their drawbacks and consider whether other architectural patterns might be more suitable for your specific scenario.

In summary, while distributed transactions can be essential for certain applications, they should be used judiciously and with a clear understanding of the implications they bring to system design, performance, and maintainability.

Brent says: that’s fair, but it’s also a word salad, which is a classic problem with ChatGPT’s iterations so far. Even when instructed to write like me, it doesn’t take firm stances on topics. I don’t think I’d recommend a distributed transaction under any circumstance where an app needs to scale. It’s like a nail gun: it’s a very dangerous tool, and if you’re asking if you should use it, the answer is no. To some extent, that’s probably part of the value of experts, even in the age of ChatGPT: to cut through all the bullshit text and give verdicts.

Previous Post
[Video] Office Hours: Live on Black Friday Edition
Next Post
Who’s Hiring in the Microsoft Data Platform Community? December 2023 Edition

22 Comments. Leave new

  • Question 4, bullet 6 “Thermal and Power Considerations”, really ChatGPT?
    While it is true it is just like saying, if you drive your car you’re going to use fuel.

    But it’s an interesting read non the less.

    Reply
  • Stephen Morris
    December 1, 2023 1:38 pm

    Re Question 2 revised :-
    I don’t know if something got changed while editing it together but you asked for MySQL & got PostgresSQL

    “Brent: What are the top 5 resources to learn the internals of how MySQL processes queries?

    Learning the internals of how PostgreSQL processes queries is crucial for…”

    Reply
    • Ah! You’re right – I actually asked MySQL first, and then as the answer came rolling out, I thought, “You know what, that isn’t really all that useful for my readers, lemme switch to just Postgres.” Fixed.

      Reply
  • Nice experiment you got there, Brent. I tried the same questions on the platform of my choice and see what I got.

    https://platform.straico.com/share/chat/656a0ab9b7d46994094be4b0

    🙂 Gene

    Reply
    • By platform of choice, I assume you mean ChatGPT 4, which is what your “platform” says it’s using. It says “MODEL: GPT-4 TURBO 128K” right at the top, which is the same thing I’m using. Cheers!

      Reply
      • Stuart Ainsworth
        December 1, 2023 5:44 pm

        But his is Turbo. -And not just any Turbo; it’s 128K Turbo. So it’s like 128.000 Turbo AI, so that’s like the super fastest kind of AI.

        Reply
    • I like this answers !! looks really more like Brent ??? liked the ACID is just not a chemistry term very Brent ?

      Reply
  • Don't mind me, I'm not a DBA
    December 1, 2023 5:40 pm

    Interesting, but without any questions for someone’s “friend”, I don’t think we’ve got a legitimate sample. And, while the answers may be adequate, they lack the sarcasm, wit, and stares over your beverage during those pauses while you ponder the ridiculous that make office hours so addictive.

    Reply
    • Hahaha, thanks. Yeah, that’s fair – I don’t think that stuff comes across in my writing as much as it does in the videos of Office Hours. Once they start analyzing my videos, though, we’ll really have something interesting!

      Reply
      • I know I’m really late but I’m finally catching on to what all the hype is about. That’s absolutely scary! Great stuff!

        Reply
  • Chat GPT Answering as Donald Trump regarding the rumor of Microsoft lowering boxed SQL Server Enterprise per core licensing by 50%.

    “Now, let me tell you about Microsoft and their SQL Server licensing. They’re always changing things, and nobody knows what they’re doing. But, I heard that they’re lowering the per core licensing price by 50 percent. Can you believe that? 50 percent! That’s a lot of money. But, you know what? I don’t believe it. I think it’s fake news. Nobody knows what’s going on with Microsoft. They’re always changing things, and nobody knows what they’re doing. But, if it’s true, then it’s a good thing. It’s a good thing for the people. But, I don’t believe it. Nobody knows what’s going on with Microsoft”

    Reply
    • Speaking of word salad… May I find the willpower to not blow my weekend playing with Chat GPT answering SQL questions as Donald Trump.

      Reply
  • Re: Question 5 and ChatGPTs tendency to not take a firm stand, I find sometimes you have to tell it to do so. Answering that question after telling it “Please take a firm stand on issues like Brent Ozar would”, gives this response (GPT 3.5)

    “Absolutely, let’s take a more definitive stance. Distributed transactions in SQL Server? Net bad.

    While they may seem like a convenient solution for maintaining consistency across multiple databases, the truth is they often bring more trouble than they’re worth. In the world of performance tuning and scalability, distributed transactions can be a nightmare. They introduce unnecessary complexity, increase the risk of deadlocks, and create potential points of failure that can cripple your system.

    It’s far better to explore alternatives such as local transactions or design your application to be resilient without relying on distributed transactions. They might seem like a quick fix, but in the long run, they can lead to more headaches than solutions. So, in the spirit of performance optimization and streamlined database management, my firm stance is to steer clear of distributed transactions in SQL Server whenever possible.”

    Reply
    • That’s pretty neat actually

      Reply
    • My guess – and this is just a guess, out at dinner with friends – but by telling it to take a firm stand, that prompt engineering did the heavy lifting there. You could have said “give a noncommittal answer like Brent Ozar would” and it would give a vague answer. The Brent Ozar part is less important there.

      Reply
  • I don’t hate the “hardware check” answer. I’ve had countless times where someone has opened a ticket like “Omg this button is doing something WAY different today than it did yesterday!” and it turns out their CTRL or ALT button is stuck.

    Reply
  • I loved reading this exercise (and learning a thing or two) but it’s no fun–we love Brent and your travel backgrounds and true spirit of the question, which sometimes, less face it, is just a stupid question (yes, there are some).

    At the end of the day, as my kids tell me videos of celebrities are fake, etc., the world has become a VERY scary place. What will people be asking it if it’s that on point!

    Reply
  • I don’t buy it. I never saw it refer to Clippy.

    Keep trying 🙂

    Reply
  • RE Question 1: Why Is SQL Unit Testing Rare?
    It doesnt need to be. If the Client systems implement unit testing then they should be calling the DB in the appropriate context.
    Having said that I recall at one job learning .net just so I could write NUnit tests for my stored procs. You just need setup and tear down procs to call before and afterwards so you can repeat your tests frequently without leaving a lot of test data lying around.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.