Query Exercise: Who’s Changing the Table?

Query Exercises
27 Comments

For this week’s Query Exercise, your challenge is to find out who keeps messing up the rows in the Users table.

Take any size version of the Stack Overflow database, and the Users table looks like this:

People are complaining that from time to time, the contents of the AboutMe column in some – but not all – of the Users rows are getting mangled. We’re not sure if it’s an app bug, an ETL problem, or someone goofing around in T-SQL, and we need you to find out.

At the same time, this is also a busy production database, and we want to minimize impacts to the end users. The site still needs to go fast.

For this exercise, I’m not expecting a specific “right” or “wrong” answer – instead, for this one, you’re probably going to have a good time sharing your answer in the comments, and comparing your answer to that of others. Feel free to put your queries in a Github Gist, and include that link in your comments. I’ll check back in next week with the approach I usually use with clients. Have fun!

Previous Post
It’s Time For Our Annual Anniversary Sale!
Next Post
Who’s Hiring in the Microsoft Data Platform Community? May 2024 Edition

27 Comments. Leave new

  • Like Frank Sinatra – I’d do it my way:

    https://straightforwardsql.com/posts/how-to-audit-data-modifications-with-surgical-precision/

    tldr;
    Create a trigger on the table the RETURNs when a condition is matched (and otherwise does nothing)
    Set up an extended event session on that RETURN statement of the trigger
    Get all the audit information I need including TSQL stack from the XE session

    Reply
    • Very nice post, sir! Nicely done.

      Reply
    • I really liked that solution.

      But when SP is created with encryption, trigger’s name for sp_statement_starting is empty. So your XE config will miss that event completely. Solution would be to filter on object_id of the trigger, but we need to ensure, that trigger is not dropped and re-created.

      Reply
  • – Enable DDL Auditing on the Database to capture all Changes

    – Enable CDC Or Temporal Table to capture the actual changes been done to data .

    Reply
  • I will do it for all the users who have DML access , In prod I believe there would be barely any users who will have DML access based on security standpoint apart from and applications accounts and admin group . So I will try enabling audit for Update for those application accounts only with the admin user group .

    There can be two possibilities, either we allow the users to modify the comment from application which is valid scene or someone from admin group is modifying it

    Triggering will also do the same thing based on the update condition . But both the way that extra validation happens against every query we hit to Server . So we will be using resources

    Please comment if I am missing something

    Reply
  • Connor O'Shea
    May 2, 2024 4:19 pm

    I would use an extended event to capture statements that contain AboutMe in the StackOverflow database. If I knew there were any stored procedures or views that aliased AboutMe to something else, I would include those predicates in my event filter, too.

    Since you mentioned that we’re not sure if it could be caused by an ETL process, using extended events is a better choice than triggers. ETL processes like SSIS don’t fire triggers by default, so it’s unlikely that a trigger you write to audit changes to the AboutMe column would get caught if it happened via ETL.

    https://gist.github.com/c0shea/282b41a722d464cc814064565668473d

    Reply
  • Mandy Burch
    May 2, 2024 4:53 pm

    We have to answer this question all the time. We have a SP that generates DML triggers and creates the audit tables for every table in our database that we care about. We have a purge process that prunes the audit records down to just 6 months or 15 months, depending on the table. We’ve been running that way for over a decade for our SQL Server on VMs and for a few years in our Azure databases. Yes, this means we have a two-phase commit for every DUI in the database, but that hasn’t caused us performance problems that we’re aware of. We keep the audit tables unindexed to make those inserts as quick as possible.

    Reply
    • We have a challenge just like that in the Mastering Server Tuning class too, so folks can see the overhead of the DUIs in that situation. I do like that kind of solution though!

      Reply
  • Eric Moore
    May 2, 2024 4:59 pm

    We use system versioning with a trigger to put in the username/login of who/whatever made the change. Also keeps a history of the before and after values. Works ok…but system versioning is pita sometimes. Also…gotta create a job to clean it up. And when refreshing dev and qa environments with prod, it’s a good idea to truncate those history tables for space issues.

    Reply
    • Yeah, exactly, that gets pretty ugly in systems like StackOverflow.Users – note the very hot columns LastAccessDate, Views, etc that constantly get updated.

      Reply
  • One thing I like to do is have an UpdatedBy and InsertedBy column on tables and guide people to use the system User_Name function to populate it.

    May not be a solution for this exact puzzle, but worth noting.

    Reply
    • Yeah, that’s not a bad idea, along with a date stamp column of the last update date, maintained by a trigger. That can help with ETL processes. Of course, as the table grows and you start to hit size issues, that can become a problem, but I do like the idea.

      Reply
  • Dave Vickery
    May 3, 2024 7:59 am

    Extended events on a live table make me nervous, so I’d start with a bit of old-fashioned detective work. Take a snapshot of the table, wait 24 hours and then find the values that have changed and their LastAccessDate. Hopefully, any ETL process involved would update the LastAccessDate field as it goes. If that’s the case, then the trail of 2am updates would point to that.
    What kind of mangling has taken place? Misplaced carriage return chars? Data that should be in another column?
    Armed with that information, send a snooty email to the admin team to see if they can identify the culprit.
    Failing all that (or if the values have changed without an updated LastAccessDate). then it would be time to run the extended events gauntlet.

    Reply
  • […] Query Exercise: Who’s Changing the Table? (Brent Ozar) […]

    Reply
  • Create a trigger on update, Insert or delete on the tables you want to track changes on

    Reply
    • John – let’s put a little more thought and effort into it. Do you need one in inserts? What about deletes?

      Then as you think more, think about the kinds of things that you’d want to capture – or ignore – in the trigger. I think you’ll find that the task is a lot more complex than you first guessed. Enjoy!

      Reply
  • […] moved up this post’s publication date after Mr. O posted this question. So, Dear Brent, if you’re reading this, you can consider it my humble submission as an […]

    Reply
  • […] challenge for this week was to find out who keeps mangling the contents of the AboutMe column in the Stack Overflow […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.