Blog

Azure SQL DB Managed Instances: Transactional Replication

I knew Brent and Erik wouldn’t touch replication, so I figured I’d give it a whirl.

My good, old friend replication

I have a love-hate relationship with replication. Mostly hate due to latency and errors, but it does serve its purpose. Before Availability Groups came out, I used Transactional Replication to copy data from the production OLTP database to another server so that we could offload reports.

Will it work with a Managed Instance?

I immediately crashed and burned as I didn’t read the limitations of Managed Instances. My bad.

A Managed Instance can be a subscriber but not a distributor or a publisher. FINE.

I setup the distributor and a publisher on an Azure VM and then added the Managed Instance as the subscriber. I initially had some errors in Replication Monitor, but that was because I hadn’t touched replication in over 2 years and had forgotten some things that I once knew, such as replication needs a UNC path for the snapshot. Once the data was flowing between the two servers, I started pumping data into the publisher and watching Replication Monitor for latency and errors.

I inserted a tracer token to verify how fast data was making it to the subscriber: 4 seconds, not bad.

Those who’ve used replication know that rebuilding a large index can cause excessive replication latency. I didn’t have a large index for this test, but I still wanted to see how this setup would do.

Latency was always under 20 seconds, and it recovered back down to 4 seconds fairly quickly.

Test, test, test

If you plan on using a Managed Instance as a subscriber in your replication topology, be sure you test for latency. Excessive latency is typically caused by a large transaction. The largest transaction on a particular database is often rebuilding the largest index. Start there for your latency testing.


Azure SQL Managed Instances Brain Dump on CPU, Memory, HA/DR, and Backups

Managed Instances
3 Comments

Normally when we write blog posts, we try to explain something or tell a story. If you’re looking for a solid educational post, stop here, mark this one as read, and go on about your day.

This post is just a brain dump of unorganized notes from our experimenting with Azure SQL DB Managed Instances Preview. Buckle up.

Update 2018/05/05 – Microsoft’s Dimitri Furman shared how CPU and memory allocation works for Managed Instances.

CPU & memory configs

Let’s start with this: both 8 and 24-core MIs all have ~170GB RAM. (I haven’t deployed a 16-core VM.) When you provision a Managed Instance (MI), you choose between 8, 16, or 24-core VMs. Interestingly, they all appear to have 24 cores. I’m guessing Microsoft is keeping deployment simple for now.

Screenshots for the 8 and 24 core MIs:

8-core MI with ~170GB RAM
24-core MI with ~170GB RAM

For the 8-core MI, here’s what sys.dm_os_schedulers looks like – it really does only have 8 online visible schedulers:

sys.dm_os_schedulers on an 8-core MI

That’s an awful lot of hidden online schedulers – but note that they all still tie to the same parent_node_id and cpu_ids (0-7).

Yes, it still has a Dedicated Admin Connection (DAC), and yes, as of right now you can connect to it – and here’s where things get a little cool. Instead of just 1 query connected to the DAC, you can have at least 2. That’s new.

So how are they restricting us to 8, 16, or 24 cores? The conventional way would be affinity masking, but they did something different: if you go into Processors, only 8 cores show up as options under the 8-core MI. The core restriction is being enforced differently, and I’m guessing it has to do with these startup messages in the error log:

Affinity Agnostic configuration, AffinityAgnosticMaxCoreCount = 0

If you want to spelunk through error logs and see what’s new, here’s one Managed Instance error log, and here’s another.

In the opening server properties screenshots, the eagle-eyed amongst you will notice Win 2012R2 and SQL Server v12.0.2000.8. I wouldn’t put a whole lot of stock in these numbers right now – exact version numbers are starting to matter less here. More on that when I talk about feature switches.

You’ll also notice that Is Clustered = False, Is HADR Enabled = False. So with what kind of black magic are they replicating your data?

Enter the HADR Fabric.

Your Azure SQL Managed Instance consists of 5 servers:

sys.dm_hadr_fabric_nodes

This is why MIs have such tricky network requirements: this ain’t one server or service. Microsoft is creating, fixing, deleting, redeploying VMs to maintain the service. Together they make up a cluster, but this isn’t a Windows cluster. All kinds of new DMVs here:

sys.dm_hadr_fabric_config_parameters

Here’s the applications that show as running on these nodes:

sys.dm_hadr_fabric_applications

You don’t see SQL Server here directly, and you might also remember that in the Managed Instance restore tutorial, you saw RestoreService performing restores for me.

Look a little closer at Worker.CL, and he’s hosting the databases:

sys.dm_hadr_fabric_partitions

In this screenshot, my Managed Instance has 4 user databases, and they’re represented by 4 GUIDs. Erik wrote about how you’ll need these for some of the DMVs.

If I scroll across sys.dm_hadr_fabric_partitions to the right, there are some interesting columns:

Replica counts and quorum

Remember, my user databases are near the bottom of this list. If target_replica_set_size = 1 and min_replica_set_size = 1, you might guess that only one node is hosting my databases. (Especially with partition_kind_desc = singleton.) That would be putting a lot of eggs in one basket, but it doesn’t appear to reflect how many nodes are actually hosting the database. Remember during the restore tutorial, I showed how something akin to Direct Seeding is running in the background.

Based on the MSDB tables, it looks like you’re getting log backups every 5 minutes:

msdb.dbo.backupset

Which takes me back to work. I’m working on getting sp_Blitz to be fully compatible with Managed Instances, and I gotta go fix the backups query. (sigh) The server_name reported in these DMVs doesn’t match things like SERVERPROPERTY(‘ServerName’). No rest for the wicked.


How to Restore a SQL Server Database into Azure SQL DB Managed Instances Redux

Anything Brent Can Do

There’s some things you can do with Managed Instances, and some ways you can look at data that you can’t do elsewhere.

For instance, you have the ability to start an restore ASYNC — that means if your session drops, the restore will continue headlessly in the background.

There’s nothing about this equation that requires you to be connected. You’re restoring to a cloud server from cloud storage. You’re a meaty third wheel.

The command looks something like this:

While it’s running, you can sorta-kinda check the status with this new DMV:

I say sorta-kinda because my restore of the Stack Overflow database hit 50% complete immediately… and then sat there.

Back like that

You can see all sorts of stuff in here:

  • Create database commands
  • Drop database commands
  • Restore database commands
  • Their status, including percent complete, success/failure
  • Any errors
  • Start time and last check-in from the process

What I need to look into more is how the session_activity_id ties into other DMVs.

Thanks for reading!


Azure SQL DB Managed Instances: New Wait Stats

Incidental

This is a long list that I haven’t had a change to dig through yet — all I did was compare them to which waits were occurring on my 2017 CU4 instance.

There are about 174 of them that I found, though some may just be generated by Hekaton that I don’t have set up on my home servers.

Feel free to make notes on that Google Sheet — I’ve opened that up to the public.

Some kind of interesting ones:

  • BOOST_CPU_TASK
  • CLOUD_FABRIC_ENQUEUE
  • CLOUD_FABRIC_PAIRUP
  • CLOUD_FABRIC_RELEASE_ALL
  • CLOUD_FABRIC_WAIT
  • CPU_ALLOCATION_VERIFIER
  • FABRIC_PAIRING
  • SOS_WORK_DISPATCHER
  • TIERED_STORAGE_MIGRATION_TASK
  • TIERED_STORAGE_PERSIST_LRU_INFO_TASK
  • TIERED_STORAGE_REENCRYPTION_TASK
  • TIERED_STORAGE_SCANNER
  • WAIT_VLDB_DUMP_LOG_LOCK

I wonder what Microsoft considers a VLDB to be. Could this be the standard that we all abide by when talking about our VERY LARGE BIG HUGE TABLE problems?

Stay tuned!

 


Azure SQL DB Managed Instances: We’re All GUIDs

Six is having problems adjusting to his clone status

Some funny things happen when you create databases up in a Managed Instance.

For Instance (HEH!), you may expect this to yield some fruitful results, but it Manages (HAH!) to defy logic.

But we get blank results! Mon Dieu!

But YourMom will faithfully be in sys.databases for you to not call for 3 months.

We have Id’d your mom.

When we look that up in sys.master_files by database_id, we get back a Funky Bunch®

No Master Plan, No Master Files

Your databases automagickally get created with an XTP file. For those of you unfamiliar, that’s Hekaton, AKA In-Memory OLTP, AKA IMOLTP, aka Xtreme Transaction Processing.

You can just smell the Monster Energy drink pee from here.

On the plus side, this prevents you for turning Auto Close on.

On the down side, you uh might accidentally use Hekaton. Be careful out there!

No more problems on the way

People who write scripts to analyze database-level metrics across DMVs (ahem) will have to make big adjustments to tie some of this stuff together. At least database_id seems to be in tact, but this could make some things awkward.

Before you wreck your mom

But then again, maybe the kind of people who write those kind of reports aren’t yearning for Cloud VMs where you drop your database off and feel a slight pang of guilt before going on vacation.

Like a retirement home for databases.

Thanks for reading!

Brent says: I was working on getting sp_Blitz to work with these guids, and squirreled away in the documentation is a note that they expect to change at least some stuff back to the database name before General Availability. I’m not sure if this means all of the DMVs, though (like msdb.dbo.backupset.)

In the “Known Issues” at the bottom of this page

Azure SQL DB Managed Instances: Trace Flags, Ahoy!

When you talk to people from Microsoft

They’re all “aw shucks” about Trace Flags.

  • “Don’t really need’em!”
  • “Eh, haven’t used one in years.”
  • “Sometimes they’re good for troubleshooting, but…”

Then you hop on a managed instance (where you’re not allowed to turn Trace Flags on or off), and you see what they have turned on.

28 trace flags… in a row?

If you head over to Konstantin Taranov’s GitHub repo for Trace Flags, you can decode some of them.

Missing from the list:

  • 2591
  • 3447 (but 3448 is there, which is supposed to help fix an issue with hung Mirrored databases)
  • 3978
  • 4141
  • 5521
  • 7838
  • 8037
  • 8054
  • 8057
  • 8063
  • 8065
  • 9041
  • 9537
  • 9570
  • 9883
  • 9905
  • 9934
  • 9940
  • 9941

See where I’m going with this? Konstantin has a GREAT LIST but there are still 19 unknowns enabled on every Managed Instance.

Kinda makes you wonder what all these Trace Flags are up to behind the Redmond Curtain.

If you know what any of them do, feel free to leave a comment — or better yet, add an Issue for Konstantin so he can keep his list up to date.

Some of them are startup Trace Flags, so if you’re aiming to do some poking around, it may not help you to just flip them on.

-xcopy is the name of my xtc tribute band

Thanks for reading!


How to Restore a SQL Server Database into Azure SQL DB Managed Instances

You’ve built your first Managed Instance, and now you wanna play around with real live data. You’re going to need a backup in the cloud to do it – SSMS doesn’t have a drag-and-drop option.

Big picture, here’s what we’re going to do:

  1. Set up a container in the cloud to hold our backups
  2. In our regular SQL Server, set up security so we can back up to that cloud container, and then back up a database to it
  3. In our Managed Instance (MI), set up security to access the cloud container, and then restore the backup from it

For steps 1 & 2, follow Steve Thompson’s excellent checklist, Backup SQL Server to an Azure Storage Account. I’ve tried several other checklists, but as of 2018, this was the only one that worked for me out of the box. (He’s @Steve_TSQL and a nice guy.)

The restore syntax in Managed Instances is a little different.

To create the security credential, the syntax is a little different:

The portions you have to change in there are:

  • MYSTORAGEACCOUNTNAME
  • MYCONTAINERNAME
  • MYBIGLONGKEYSTRINGFROMTHEAZUREPORTAL

Then, to restore the backup, the syntax is a little different too:

The portions you have to change:

  • MYDATABASENAME – the target, new database you’re creating
  • MYSTORAGEACCOUNTNAME
  • MYBACKUPNAME
  • MYCONTAINERNAME

You can also restore databases asynchronously – meaning, fire off the restore so that it runs even when you’re not connected.

Want to check progress? This is about to get technical.

If you don’t care about checking progress, you can bail here. Happy testing.

You can’t use WITH parameters on the restore, which means you don’t get status messages. To check restore progress, use sp_WhoIsActive, which works beautifully:

Restore in progress, part 1

In that screenshot, session_id 143 is me (Doctor) running the restore command, restoring StackOverflow from backup. If I scroll across to the right of sp_WhoIsActive’s output, I can normally check restore progress, but it’s null:

Restore progress, part 2

But it’s null.

Notice how there’s another restore running from 127 at the same time, restoring a guid-based database? With the wacko service name, and program_name RestoreService? Azure Managed Instances are the ones really doing the restore, and you can see the percent_complete is populated for their session. They’re doing more than just restoring locally, too. In the error log, while a restore runs, you’ll also see interesting messages about how Azure is setting up the replication for me to the other nodes:

Seeding the secondaries

And even system databases get new names – master is replicatedmaster, for example:

Replicatedmaster

And they really do mean replicated – as of this writing, you can create tables in master, load them full of data, and get your instance into trouble on allocated disk space. (You get 32GB by default, and master is included in that number.)

Want to go from cloud to on-premises?

You can go in the other direction, too: since both your on-premises SQL Server and Azure SQL Managed Instances can access the same Azure Blob Storage, you can take a copy-only backup in Azure SQL Managed Instance:

(This syntax requires me to have already created the credential as shown at the start of this post, a SHARED ACCESS SIGNATURE credential pointing to this URL.)

Then back on-premises, just do a restore pointing to that file and…awww….

SadTrombone.com

That’s fair – MIs are a newer version of SQL Server than what we have on-premises. I wouldn’t be surprised if we got access to a FeatureSwitch to enable exporting data easier at some point in the future, though – Lord knows there’s enough backup feature switches in sys.dm_hadr_fabric_config_parameters:

DBCC WITHOUTATRACE

But that’s a story for another post, and GroupBy is about to start.

Speaking of feature switches, what trace flags are in use on Managed Instances?


How to Deploy Your First Azure SQL DB Managed Instance Preview

The Azure SQL DB Managed Instance public preview is open, although it may take a week or two for new applications to get their new VMs. To start your application process now, go into your Azure portal and Create an Azure SQL Managed Instance. There are preview terms at the top – fill out the form to accept that now, because they’re going through a big queue, and you’re not my first reader. (You’re still my favorite, though.)

Do that, then come back here and start reading – because you’ve got some planning and thinking to do. The preview deployment process has gotchas. Read the documentation carefully. It’s not hard by any means, but you have to be really detail-oriented. (The company Slack room was colorful yesterday.)

First, read the documentation tutorial, Create an Azure SQL Database Managed Instance. It looks like an easy pictorial walkthrough, but do not follow it yet. Read the rest of this first because there’s some huge gotchas. You do not wanna screw this up because your first deployment is going to take 24 hours or more:

If your deployment lasts 4 hours or more, call Reddit, because you still have another 20 to go

Mine really did, too:

Kiefer Sutherland is frantically deploying behind the scenes

That pictorial tutorial is great, but keep reading – there’s details that matter.

How to configure a network for Azure SQL Managed Instance – here’s where the hurt is gonna come in: with Azure SQL DB, you have a fancypants portal that just lets you open a firewall to specific IPs and let you connect directly in. That is not the case with MIs – read this post to understand how you’re going to be doing networking.

Even when you read that post, what it doesn’t explain is how to open access to your on-premises SQL Server Management Studio to be able to connect. That’s an even bigger process with more steps.

You might think, “I’ll just deploy a VM to that same subnet that my MI is in,” but as of this posting, that doesn’t work – all VM deployments into that subnet fail with a cryptic error:

No sex in the champagne room, no VMs in the Managed Instance subnet

Instead, here’s what worked for me:

  1. Follow the MI setup instruction carefully – if you’re in a hurry, follow the instructions even though you may end up with too-large of a subnet. If you’re a networking pro with time, use a small subnet mask (like a /32). I’m not going to explain that here given what I wanna cover, but TCP/IP people will understand what I’m getting at. (Welcome to the cloud, here’s your networking manual.)
  2. Create another virtual network and another subnet in the same region (like if your MI is in South Central US, create another virtual network in South Central US, with a subnet in it, with different IPs from your MI subnet.
  3. In the MI subnet, create a peering that points to your VM subnet.
  4. In the VM subnet, create a peering that points to your MI subnet.
  5. Deploy a VM in the VM subnet – this is where you’re going to remote desktop in and run SSMS or SOS. From that VM, you should be able to connect to your MI once it’s available.

You can prepare now for when your preview goes live.

If you’re not savvy with networking, start by playing with the same concept using two Azure VMs. Create two virtual networks, two subnets, and peering routes between them. Put a VM in each subnet, and make sure that they can see each other. Yes, this is going to cost you some money to spin up, but I’d much rather have you learn the hard lesson with a couple of $3/day VMs as opposed to a $30/day MI that takes a day to spin up, and then you borked the setup and you have to start over, spending $30/day and waiting a day each time you goof it up.

If you’re part of a company or you’re networking-savvy, start reading how to configure a VNet for MIs. There are some really big gotchas here: no network security group, no other cloud services in it, and you only have 1 option for a route table. To be clear, I don’t have any problem with these requirements – Microsoft is doing some ambitious stuff in here, spinning up and tearing down VMs in order to handle the service, and they need IP address space to do it:

Nodelayheehoo

But that’s a story for another blog post.

Next, you probably wanna restore a SQL Server database into your Managed Instance.


Why sp_prepare Isn’t as “Good” as sp_executesql for Performance

sp_prepare For Mediocre

You may remember me from movies like Optimize for… Mediocre? and Why You’re Tuning Stored Procedures Wrong (the Problem with Local Variables)!

Great posts, Kendra!

Following the same theme, we found this issue while looking at queries issued from JDBC. Specifically, the prepared statement class seems to cause queries to hit the density vector rather than the histogram for cardinality estimation.

Puddin’

Let’s create an index to make our query’s work easy!

Now, to mimic the behavior of a JDBC query:

The query plans for all of these have something in common. They have the exact same estimate!

Ze Bad Guess

You might be saying to yourself that the first parameter is sniffed, and you’d be wrong.

That estimate exactly matches the density vector estimate that I’d get with a local variable or optimize for unknown: SELECT (7250739 * 5.280389E-05)

Cruddy

You can validate things a bit by adding a recompile hint to the demo code.

The plans for all of the recompiled queries get different estimates, and no estimate matches the 382 estimate we saw from the first round.

BIG MONEY

Am I saying you should recompile all of your queries to get around this?

No, of course not. Query compilation isn’t what you should be spending your SQL Server licensing money on.

You may want to not use JDBC anymore, but…

How Is sp_executesql Different?

Well, sp_executesql “sniffs” parameters.

If I run my demo queries in this order, the plan for Reputation = 1 gets cached and reused by all the other calls.

Sniff sniff pass

If I change the order so Reputation = 2 runs first, the plans change (after clearing the plan out of the cache, of course).

Now they all reuse that plan:

Look at you then

Why Is One better?

I put together this handy chart!

IT’S ONLY A PICTURE

I’m not smart enough to get a formatted table like this into a web page.

I’m a bad DBA.

Thanks for reading!

UPDATE:

The admirable and honorable Joseph Gooch notes in the comments that you can configure this with the jTDS JDBC driver:

prepareSQL (default – 3 for SQL Server, 1 for Sybase)This parameter specifies the mechanism used for Prepared Statements.

Value Description
0 SQL is sent to the server each time without any preparation, literals are inserted in the SQL (slower)
1 Temporary stored procedures are created for each unique SQL statement and parameter combination (faster)
2 sp_executesql is used (fast)
3 sp_prepare and sp_cursorprepare are used in conjunction with sp_execute and sp_cursorexecute (faster, SQL Server only)

Though I’m not too thrilled that sp_prepare is called “faster”.

And! That similar options are available in the 6.1.6 preview of the Microsoft JDBC drivers.


“Full Stack” Means “Part Time” – or Why DBAs Might Wanna Learn DevOps

Long before the “full stack developer” phrase became fashionable, us early developers did exactly that. We didn’t do any of these full time, mind you – full stack doesn’t mean full time.

Here’s a grid I use to explain the work involved in building and troubleshooting database apps:

Full Stack, Part Time

Developers start from the top and work down. Systems administrators start from the bottom and work up.

If you’re reading this blog, you’re most likely a database professional – but think back to how you got your start, and it was likely either as a developer, OR as a systems administrator. You loved learning, so you tackled more and more roles until you were kind of a full stack developer. You ended up specializing in databases, though.

As companies grow, they separate roles.

When companies have enough work in a department that they have dozens of developers, some specialized roles emerge. This is especially true when they have dozens of servers, some of which require specialized knowledge to administer, scale, and troubleshoot.

Developers who know a database well end up becoming the Accidental DBA – or eventually, just The DBA – and their role looks more like this:

Gotta keep ’em separated

Full time database administration roles tend to focus on the middle parts of that grid: queries, performance, and database server outages. The more time you spend administering databases, the less time you spend writing code or installing hardware.

But you see some gaps, right? Even with separated roles, the “Deploy changes” and “Tune queries” steps are still only done sometimes in shops like this. Historically, this has meant that we’re just not very good at either of those two tasks.

In our industry, we’ve historically had a lot of training classes and tools targeted at monitoring performance and tuning queries & indexes.

But we haven’t had much around deploying changes.

DevOps is about trying to change that.

I need to step back here for a second and explain something before the Well Actually crew eats me alive. Technically, DevOps isn’t a job role: it’s a set of job duties that many people on the team may be required to perform, just like writing queries. However, it’s a specialized skill, just like databases: lots of people may kinda-sorta know it, but as your team grows, you might need at least one person on the team who knows it really well to help mentor the others and define good practices – just like you need a database administrator.

If you’re a database professional, and you’ve gotten good at your specialized part of the stack, but now your company is trying to deploy database changes faster, you need help. You need to learn what tools to use, how to deploy them, and how to integrate with the pure developers on your team.

That’s why Alex Yates is teaching a 2-day online course, Database DevOps. It focuses just on the parts you’re missing, and helps you get started on a similar – but slightly different career, one that’s hot as all get out right now.


Announcing 2 New Online Classes: Database DevOps and Practical Real-World Performance Tuning

Company News, Development
0

We’re proud to announce that in addition to our existing summer lineup:

We’ve now got two new online training classes coming up in June & July, and these new ones are available in schedules that work well around the world – click on the class to see the dates/times:

Database DevOps: Why, What, and How
Alex Yates, 2 days, $1,995*

Database DevOps Training Class

Managing database changes is hard. Come and learn how to do it properly.

We’ll explain in straightforward terms what DevOps is (and what it isn’t) as well as dissecting the concepts of ‘continuous integration’, ‘continuous delivery’ and ‘continuous deployment’. We’ll also discuss the various ways of applying the ideas in practice.

Over the course of two days you’ll build your own source control, testing and automated deployment solution for SQL Server, equipping you with the knowledge and the skills you need to significantly improve your team’s ability to deliver database updates regularly and reliably.

In the spirit of DevOps, we welcome attendees from both the dev and DBA sides of the fence.

About the instructor: Alex Yates has been helping organisations to apply DevOps principles to relational database development and deployment since 2010. He’s most proud of helping Skyscanner develop the ability to deploy 95 times a day. Originally for Redgate, later for DLM Consultants, Alex has worked with clients on every continent except Antarctica – so he’s keen to meet anyone who researches penguins. A keen community member, he co-organises SQL Relay and is the founder of www.SpeakingMentors.com.

Learn more and register now, and if you want the background on why it’s important, watch Alex talk about DevOps 101 for data professionals:

Practical Real World Performance Tuning
Pinal Dave, 4 hours, $495*

In the real world when we face any performance problems, we do not have the luxury to open PowerPoint and study theory. What helps us during the time of crisis are handy scripts which help us fix our performance problems.

In this quick paced, demonstration oriented learning workshop, we will see how we can immediately help our application with performance if we just know what exactly to do.

Following the practices in this 4-hour workshop can help mitigate your current performance problems but also helps you to put your preventive action plans.

About the instructor: Pinal Dave is a SQL Server Performance Tuning Expert and an independent consultant at SQL Authority. He has been a part of the industry for more than 16 years. During his career, he has worked both in the USA and in India (Technology Evangelist at Microsoft). He received his Master of Science from the University of Southern California and a Bachelors of Engineering from Gujarat University. He has been a regular speaker at many international events like TechEd, SQLPASS, SQL Saturday, SQL Live, Techorama and countless user groups.

Learn more and register now, and if you wanna get started early, watch Pinal talk about 3 common performance mistakes:

These new classes all include our Instant Replay – the ability to watch a video of your class for a year after it runs. See you in class!

* – What’s with the asterisks? Well, for reading the fine print, try coupon code HalfOffDevOps and HalfOffRealWorld. Move quick – they expire Friday!


Building SQL ConstantCare®: The Vision

SQL ConstantCare
3 Comments

We’re starting to roll out a new product, SQL ConstantCare®, and now that we’ve got over 50 customers, it’s time to start sharing some behind-the-scenes details about what we’ve built. Over the next several Mondays, I’ll be writing about the decisions we made along the way – architecture, packaging, pricing, support, and more.

Lemme start by explaining something that I’ve noticed a lot in the field. A lot of companies buy SQL Server monitoring software, and then…don’t have the time to learn how to use it. They install it, and then promptly set up Outlook rules to filter all of the email alerts into a folder that they never read.

I’ve even point-blank asked clients, “So you’ve got a monitoring tool, right? Open it up and show me what metrics you’ve looked at in order to troubleshoot this problem.” They open up the app, click around hesitantly, and then eventually confess that they have no idea what they’re doing or what numbers they’re supposed to look at. Even when they have a pretty good handle on SQL Server metrics, they get overwhelmed when they see all the dials and charts.

Monitoring tools are fantastic for highly trained people with plenty of time on their hands.

But most people out there don’t have the luxury of in-depth training and decades of experience. They’ve got too many servers and not enough time. They just want to cut to the chase and be told what tasks they need to do, in prioritized order.

You’re safe in my jazz hands

Admins want mentoring,
not monitoring.

So I wanted to build something that simply:

  • Checked in with you once a week
  • Told you what specific tasks to do, how to do them, and why
  • Reviewed the homework you were supposed to do last week, and what kind of difference it made

I didn’t wanna show you dials, charts, or any metrics whatsoever other than supporting evidence for your homework tasks, like proving that your change was effective and noticeable by end users.

In the cloud, admins want context and cost.

One of the most common questions I get from clients – especially when I’m in-person and people feel more comfortable asking it – is, “How are we doing compared to other shops? Are we managing our servers like everybody else does? Are we over-powered or under-powered?”

It’s easy for us to give clients a rough guesstimate and grade because we see a lot of servers. However we wanted to take it to the next level and say things like:

For SharePoint environments with a similar ~1TB data size and query workload to yours, your server is seriously underpowered, and as a result, you’re seeing slower queries and higher wait times. The sweet spot for ~1TB of SharePoint data seems to be around 8 cores and 64GB RAM. If you switch from an m5.xlarge to an m5.2xlarge, we estimate that the average query duration will drop by 40% without any code or index changes. The VM’s costs will go up by about $616/month.

Or…

This data warehouse is doing very well: your nightly loads are finishing in 90 minutes, you’re doing CHECKDB and full backups daily, and during the day, reports are finishing within 10 seconds. It’s a little over-provisioned. If you wanted to cut your monthly bill, it’d actually be fairly easy since it’s in an Availability Group in an Azure VM. Change the secondary to be a L16 instead of a L32, and during your next maintenance window, fail over to the L16. See how the user experience goes, and if it’s awful, you can always fail back to the L32 secondary replica. If it’s good enough, though, then change the remaining L32 replica down to a L16 too. Between the two replicas, you’ll save about $12K per month.

We also wanted to answer management questions at scale, like:

  • When a new CU comes out, does it backfire? For example, when 2014 SP1 CU6 broke NOLOCK, if we had wait stats data across thousands of servers, it’d be much easier for that to pop out right away – like the very next day after people applied the CU. This is becoming so much more important in these days of fast-paced updates.
  • What are adoption rates of features like In-Memory OLTP or Operational Analytics? Are you really safe investing your limited training time in those features?
  • Are other people using a particular trace flag, like 8048 to prevent CMEMTHREAD, and what before/after performance effects did it have?

These are data problems.
We are data professionals.

But at the same time, sound the alarms: the cloud is involved.

When I started designing this in 2015, the term “the cloud” provoked rabid anger from many data professionals, as in, “I ain’t never gonna let none of mah data into the cloud!” On the other extreme, some folks are perfectly willing to paste their execution plans out for the public to see. There’s a wide cloud comfort range out there.

I totally get it, and I knew from the start that this product wouldn’t be for everyone when it was launched. But I wanted to design something for the next 10 years, not the past 10, and over time, the cloud is going to be the new default starting spot for most data. I was totally okay with launching something that only 1-10% of data professionals would use. After all, to use our mailing list as an example, there’s roughly 100K data professionals out there. If only 1% of them bought into it, that’d still be one hell of a helpful tool.

I’ll talk much more about the collection, data, security, storage, and analysis in subsequent posts, and I’m excited to share it because as a data professional, I wanna set a good standard as to how data gets handled. (Now I bet my GDPR post suddenly makes more sense to you, dear reader, but I wasn’t quite ready to announce SQL ConstantCare® yet back then.) I’m aiming for GDPR compliance even though we’re not selling to the EU – but I’m just not ready to deal with the complexities and legal fees of being one of the first defenders if something goes wrong. These aren’t easy problems – but this is what it means to work with data in the year 2018.

So that was the vision. Over the next several Mondays, I’ll blog about the PaaS database back end, development timeline, minimum viable product, packaging, pricing, security, analysis, and more. Up first next week: why we picked serverless architecture running on AWS Lambda. If you’ve got questions, feel free to ask away – in these behind-the-scenes posts, I try to share the kinds of business decisions I’d find interesting as a reader. I’ll try to answer the questions in comments, and it’ll also help me shape the rest of the posts in this series.

Read on about SQL ConstantCare®’s Serverless Architecture, try it out, or check out the rest of the SQL ConstantCare® series.


[Video] Office Hours 2018/03/07 (With Transcriptions)

SQL Server, Videos
0

This week, Brent, Erik, Tara, and Richie discuss database restores, opening a case with Microsoft, buying a pre-built server vs building your own, auto-shrinking all databases with a single command, SQL Server dev environment options, the best way to learn Git, excessive memory grants, and more.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 3-7-18

 

How did my users get messed up during a restore?

Brent Ozar: Deborah says, “Last week, I restored a database…” Yay, “From a test server to a production server…” What, what?

Erik Darling: That’s more than I did last week.

Brent Ozar: “From a test server over to a production server. At some point in the process, the admin user for that database was changed to one that resides on the production server and then removed the proper database, which caused the apps admin user not to function. I found a couple of articles but no reason about why that happened. I would like to prevent that from happening in the future.”

Erik Darling: An orphaned user?

Brent Ozar: That’s my guess. So explain what orphaned users are.

Erik Darling: Oh, I can’t. I just remember having to do that when I restored databases a long time ago.

Tara Kizer: Every single time you restore, you run the script.

Brent Ozar: So when you create a login in SQL Server, it creates a SID; a security identifier. And you don’t usually get to pick the SID; it just automatically creates a SID. You could create with a specific SID if you wanted to, but it’s unique to that login on that machine. You can create the same login name on a different SQL Server, but when that database comes across, it’s got a different SID. So the username may not be able to log in if it’s the wrong default database or a default database that person can’t access. So there’s a stored procedure you can run, sp_change_users_login. And I never remember where the underscore is…

Tara Kizer: It’s all of them, I think. I think it’s all, yeah…

Brent Ozar: It might be.

Richie Rump: In between each letter there’s an underscore.

Brent Ozar: Worst casing ever. Yeah, so sp_change_users_login or you can just drop that login on both servers and then recreate it with the same SID. Anytime you Google for sp_change_users_logins, you’ll find more information about how to drop and recreate that guy the right way permanently.

 

How do we open a case with Microsoft?

Brent Ozar: Sree asks, “How do we open up a case with Microsoft? I’m just wondering about the process if I ever have to open up one.”

Tara Kizer: I always save the 1800 number in file so that I have it because it was hard to find the phone number, but I was a Premier Account, and I don’t know if that’s available to everybody.

Erik Darling: yeah, I think how you open a case depends on what level of support you bought into, you know. I get nothing from Developer Edition, honestly. Joe Sack just responds to my emails graciously once in a while. That’s fine, that’s all I get.

Brent Ozar: What do you email him? Do you just say, “Hey honey, how are you doing? Can you help me with this?”

Erik Darling: No, I text him directly…

Richie Rump: “Hey sweetie, it’s me again…”

Erik Darling: I’m going to guess that if you’re asking that question, you don’t have like a dedicated person that you email.

 

What are work files and work tables?

Brent Ozar: Let’s see, Pablo says, “Ola, amigos. How do I interpret work files’ and work tables’ logical reads? Is it always on tempdb?”

Erik Darling: Yep. So work files and work tables get spun up, usually for hashing operations. Sometimes, as you’ll find out in an upcoming blog post, they happen for merge joins when they are many to many, when there are duplicates in the join columns. But yes, they are all directed to tempdb; that’s where they do their dirty work. It’s the same number of 8K page reads that you would see for a regular table or lob logical reads or any of the other stuff.

Brent Ozar: Kevin has a follow up on the support case and this horrifies me. Kevin says, “Something to know – last time I opened a case with Microsoft, I found out they’re outsourcing it to a company called Convergys, or something like that.” I know it’s been different by product too, and like for out-of-support products they’ve farmed it out too before.

Tara Kizer: Nobody wants to work on it.

Brent Ozar: No – you call in with a merge replication program and… Hold on…

Erik Darling: I’m going to guess that Kevin either has SharePoint or Dynamics if he’s getting sent off to someone else. He’s probably getting sent off to some like, you know, Microsoft partner consultant who does that kind of work.

 

What kind of camera is everyone using?

Brent Ozar: Brandon asks, “What kind of camera is everyone using to cast themselves?” I think we all do have different webcams. What webcams…

Erik Darling: No, we all have the same… [crosstalk]

Richie Rump: You bought us all the same one.

Brent Ozar: You all bought the BRIO. We all use the Logitech BRIO then. Logitech BRIO – it’s 4K. It has really nice light sensitivity. I like it a lot. The only thing is that it says in there that it’s got a tripod mount on the bottom, so if you’re like me, I tried to unscrew it so I could put it on a tripod mount. It’s not a real tripod mount, so all I’ve done is broken the mount on my webcam, so now it kind of flops around on the top of my screen and I got to go get a new one. I can’t turn it in under warranty either because it’s just clearly that I broke it.

 

Should I build my own home lab, or buy something off the shelf?

Brent Ozar: Udai asks a question that we will have passionate responses about. He says, “I am trying to build a home lab and I wanted to know if there would be an advantage to buying a pre-built server, like say a Dell PowerEdge, versus building a machine using consumer parts?” Tara, you have one – we’re all at different camps here. Tara, your hardware budget is up. What do you think about doing with your hardware budget this year?

Tara Kizer: I still haven’t submitted it. Dell XPS 8930 – I don’t want to have to mess with it. I want to just get whatever hardware I need and have them test it and have a warranty on it. Whereas Erik bought a piece here, bought a piece there and then dealt with all sorts of incompatibility issues. I just don’t have the patience for that and, frankly, I don’t have the knowledge for that type of stuff. I make Mike do that kind of work and he keep saying, “Let’s build it from scratch.” And I’m like, “No, I really don’t want to.”

Erik Darling: But it’s so gratifying when it gets working. I actually didn’t have too many problems with compatibility. I used a website called PC Part Picker and I was able to get all that stuff weeded out upfront and kind of figure out which CPU…

Tara Kizer: You had issues, though. You had to return some items?

Erik Darling: Yeah, I had to – well my issues weren’t because of, like, technical problems. They were because of dirty customer problems where someone had returned their old motherboard in a box and then whatever – I can’t remember the name of it – had sold me the motherboard open-box…

Brent Ozar: Micro Center…

Erik Darling: Micro Center sold me the open-box motherboard for like 80 bucks less and when I got home and I started looking at pins on the CPU connector I was like, “Wait a minute, I don’t think this is legit.” [crosstalk] I returned it and, of course, I walk into the store and I’m like, “You sold me the wrong motherboard. This one’s broken.” And all the salespeople are like, “Really? Tell me more…” Like no seriously, I just brought this home; it’s broken.

I don’t know, I built mine just because I got all excited about – Nick Craver had published the Stack Overflow Developer Desktop Build on his website and I was looking at that and I was like, “That’s really cool. I want to try my hand at building that.” And then I ended up buying some other stuff. And since – I got a new laptop this year. Next year I’m looking at that pieces of that I might upgrade with next year’s budget.

Richie Rump: [crosstalk] Than buy a laptop. It’s like a technical coaster is kind of what you got.

Brent Ozar: You got a hand-me-down.

Erik Darling: Yeah, well I got Jeremiah’s old laptop when I came on here, so…

Richie Rump: Oh snap.

Brent Ozar: Although it is still faster than Andy Leonard’s – and I can say this because Andy’s not here. Andy’s got the generation before and during his class, his power went out. And he and I had been talking in the morning and I’m like, “Well how’s the battery life?” And he’s like, “Oh it doesn’t work a dang. When I unplug it, it immediately goes dark quick.” So all of a sudden, we got to see a live test of how Andy’s battery countdown [crosstalk] outage.

Tara Kizer: I was stuck with an old laptop when I worked at Qualcomm for at least two to three years longer than I should have. And eventually, I just couldn’t do my job anymore. I was like, “Listen, I’m a production DBA; I need to be able to do my job.” And finally, they replaced it.

 

How much RAM and SSD space do you have?

Brent Ozar: Sree follows up with, “How much RAM and SSD space do we have?” So I follow – I didn’t answer either. I have a home lab of an Apple Mac Pro, which is the trashcan looking thing. I have 64GB of RAM and 4TB of SSD space. Erik, how about you?

Erik Darling: I have – let’s see – 3.6Ghz, 6 core processor, 128GB of RAM and I have my SSDs in a RAID. I bought a dock and I have 6TB SSDs in a RAID that gives me about 2.5ish terabytes. Terabyte PCI NVMe M.2 card and a big 8TB spinning piece of crap that just backs stuff up too.

Brent Ozar: Richie, how about you?

Richie Rump: Okay, so…

Erik Darling: Here we go…

Brent Ozar: Why did I ask?

Richie Rump: Save your money for fixing your plumbing. Go to the cloud. Just do all the cloud stuff. Don’t but hardware. All your databases are going to be in the cloud anyway; just go to the cloud. Use your Cosmos DB, use your Redshift, use RDS Aurora. Get to know all of them, save your dough, go to the cloud.

Erik Darling: And then what happens if you need to go someplace where there’s no wifi and show stuff that you’re doing? You can’t really.

Richie Rump: Don’t go to Doug Lane’s house.

Erik Darling: Okay, good point.

Brent Ozar: Richie goes and opens up his backpack and takes out one of half a dozen laptops which collectively have maybe 1TB of space.

Richie Rump: Actually no, each one has 1TB except for the Surface.

Tara Kizer: Mine is a lot smaller than you guys, which is why I need a new one. So 20GB of RAM. It came with a lot less but we did upgrade that, and just 500GB of disk space. I’m looking at 1TB and 64GB or RAM for my next machine. I’ll still use the laptop but I need something with more power so I can do large memory grant demos because my machine just could not keep up. And there isn’t enough disk space either.

 

Should developers have SA?

Brent Ozar: Kevin says, “How do you let your dev folks give themselves database owner permissions on a database without giving themselves SA permissions?”

Tara Kizer: If it’s dev – is it a question for dev or prod?

Brent Ozar: Oh god, I hope it’s dev.

Tara Kizer: If it’s dev, I let everybody have DB owner; I don’t care.

Richie Rump: Don’t [crosstalk]

Tara Kizer: Access, maybe…

Richie Rump: Don’t even give them read access in prod, don’t do it.

Tara Kizer: In dev?

Richie Rump: Give them all access, give them SA, who cares?

Tara Kizer: I don’t give them SA because then I have to fix the server, but the database I can at least easily restore the database.

Richie Rump: Let them go off, let them screw with it. Let them play with the new features. Let them do all that stuff. This is how they learn and try to do all the bad ideas that they have like user service broker and all this other crazy stuff. Let them do whatever they want in dev. Now production, don’t even give them read access. Forget it. I mean, it’s the same thing I did to Brent; I didn’t him…

Brent Ozar: I was just going to say, I should point out the irony here that Richie is our developer and Richie has complete access to every environment, development, production, whatever. I own the company and I don’t have write access to our production databases and I don’t want it.

Richie Rump: And there’s a reason for it, Brent. When you stop breaking the builds… SO a few weeks ago, I put in database tests, he goes off and changes a value [one in database creation scripts] and it breaks the build.

Brent Ozar: Yeah, I’ve got a 100% build failure rate.

Richie Rump: But hey, the work, right? The database tests, they work great.

Tara Kizer: Wait, one of my managers, a few jobs ago, he was my mentor 20 years ago, he was the reason I became a SQL Server DBA and got good at things. He later moved away from SQL Server and into management and then other technologies and he then became my manager again, and here I’ve got this extremely mission-critical SQL 2012 Availability Group system; a very complex system. He’s like, “Oh I probably should get access again.” It had been years since he touched it. I was like, no, nope, can’t have it now.

Brent Ozar: Nope, no good is going to come of me touching our production SQL instances.

 

Can I shrink all of my databases with one command?

Brent Ozar: Andy asks, “Is there a way to auto-shrink all of my databases with a single command?”

Tara Kizer: Oh Andy – I was going to say, this is a troll.

Brent Ozar: And there is, because I had a client doing this. You run sp_MSforeachdb and then DBCC SHRINK-whatever and it’s unbelievable. It actually works.

Richie Rump: I thought that was sp_shrink_Buck_Woody…

Brent Ozar: Ah yes, that’s why you don’t give him access to the database.

 

How do I log everything anyone does?

Brent Ozar: Let’s see, next one, Anon Ymous says, “An IT group wants us to start logging in individual logins including those with inactive directory groups what changes and what actions they perform. I believe this would be difficult for us and will impact performance on our low-RAM server.”

Erik Darling: How low? Not that I think more Ram is going to help this, but Brent had the best answer to this. I forget if it was in Office Hours or via email or something a while back, where it was just like, imagine if you came into work and instead of just doing things, you had to write down everything you did before you did it, then go do something, then go write down what you did after you did it and what changed, then that was your day. And just imagine how much slower your life would be if you had to take note of everything you did before and after you did it. That’s kind of how SQL Server’s going to treat the whole experience. It’s not good for you.

Brent Ozar: It’s never going to get faster, that’s for sure. Yeah, if you really want to audit who does what, you buy a third party appliance like Guardium or Imperva and they sit in between SQL Server and the network, they log everything. They’re supposed to work at wire speeds, but these are six-figure appliances; they’re not cheap.

Tara Kizer: [crosstalk] you get this with low-RAM server.

Brent Ozar: Low-RAM server.

Erik Darling: You know, that’s a better name for it than, like, SQL01, like, low-RAM.

Brent Ozar: Prod42…

 

Should I use Docker or Kubernetes for SQL Server development?

Brent Ozar: Joseph says, “What’s a good resource for me…” And I’m going to step back and ask a bigger question, Joseph. Should I use Docker or Kubernetes to use SQL Server development environments?

Tara Kizer: I’ve never heard of Kubernetes. Is that how you pronounce it? I feel like there’s extra…

Brent Ozar: He couldn’t spell – he couldn’t write it…

Erik Darling: Well if you can’t spell it, I can’t answer it.

Brent Ozar: Richie, is there anything that would make you choose Docker or Kubernetes for a SQL Server dev environment?

Richie Rump: I don’t like either for SQL Server. I haven’t jumped onto the Docker bandwagon yet. It’s easy to install and everything, but there’s value in going through the install experience yourself and picking what things you want and whatnot, especially for a developer, you know, just getting into SQL Server. And I’m assuming you’re going to want a developer instance on each developer laptop. I guess I would go Docker, if you put a gun to my head, but I haven’t jumped onto the whole Docker thing. I’m a bare metal type guy. I work for Brent Ozar unlimited. We do database stuff. What makes you think I wouldn’t be a bare metal guy? I mean, I think that was one of the questions, “What do you think about virtual machines?” Bare metal, it runs faster I guess, I don’t know…

Brent Ozar: This is coming from the guy who runs serverless. Not only is the OS abstracted away but the platform is. We could be running on Commodore 64s on Pluto; we wouldn’t know.

Richie Rump: Yeah, but I don’t have to worry about that, right. I mean, I don’t have to worry about spinning them up or anything, it’s just it’s there, and it’s there in less than a second, most of the time. I’ve been keeping track of it.

Brent Ozar: Joseph says he owes his “Six figure salary largely to you guys. Really appreciate what you do.”

Erik Darling: We take cash, checks, booze, steak, however, you want to owe it to us, you can choose and we’ll graciously accept…

Richie Rump: Japanese whiskey.

Brent Ozar: No Docker containers. Don’t pay us in Docker containers.

Erik Darling: None of those fancy PowerShell dollar signs either. I know those tricks.

Brent Ozar: Query bucks, we print those ourselves.

 

What clients do I use for Git?

Brent Ozar: Anon Ymous follows up with, “Hey Richie, I am a noob researching Git for version control and I don’t understand what we would need. Our company has a Git repository, but what clients do our developers need? What is like Git tortoise? Do they need Visual Studio Team Services too? What should I do to start this from scratch?”

Richie Rump: All you really need is Git for Windows. And I believe Phil Hack and team at GitHub has a nice client for that. There’s also a GitHub, or Git for Windows, for JavaScript that you could download that works pretty well…

Erik Darling: [crosstalk]

Richie Rump: I know, I use Sourcetree occasionally. So if you wanted a client, to use that. But really, the best way to learn Git is through your command line. I kind of recommend that for everyone getting used to it because then you understand the bare metal, how everything kind of works, and then you can graduate to whatever client you want to run. You don’t need Visual Studio, you don’t need anything like that. it’s all kind of right there. A tutorial that may help out, Joel Spolsky, and I believe it was for Mercurial, but Git and Mercurial are so close to one another that you could just walk through the Mercurial tutorial and he has a really great way of explaining the concepts just in that one. I forget what it’s called. Google Joel Spolsky Mercurial tutorial, or whatever, and you’ll get that. Walk through that. It’s only a few pages. And you’ll get a better idea of how Git works and the Git workflow and whatnot; from a Mercurial standpoint.

Brent Ozar: It’s so weird too, like distributed version control is so different from what I used to deal with, with visual source safe and team foundation and all that. It’s very different. And good, I like it.

Richie Rump: Yeah, it’s so much better. I mean, I’ll take this for an example. Yesterday, Brent made a check in and there was a conflict on the merge – no, it was a fine change. There was a conflict because I had also changed the same file. I went to GitHub and said here, let me go ahead and merge it. I did the manual merge; just deleted what was wrong and just kept what was right and hit submit and boom, everything was there. But he ran it on his own machine, I ran it on mine, all the tests passed, everything looked good, just when we kind of merged it together, that’s when kind of this special sauce of the distributed – everything kind of works where you’re at, and when we kind of put it all together, that’s when some of the problems happen. But for us, 90% of the time, 95% of the time, no merge. Everything works great.

Erik Darling: Everything works great because there’s no merge.

Brent Ozar: There’s no conflict, yeah.

Richie Rump: No, it does the merge but there’s no conflicts, right. There’s no conflicts.

Brent Ozar: Well also, 95% of the time you’re the only one doing the development.

Richie Rump: Hey, I’ve messed myself up. Left in multi-branches, you know, because I’m working on different bugs at the same time and, you know, you collide with yourself. It happens.

Erik Darling: I mean, my only experience with it is GitHub desktop and I was surprised at how easy that made it to, like – like it’s really intuitive the interface. Like I had to go into someone else’s pull request and do stuff and I was able to do that really easily, just like a couple of dropdowns and there I was and I could fix things. Because I’m dumb, like, if you want to think about a target audience of the lowest common denominator you need to explain things to with pretty charts and drawings and stuff, I’m it. So if I can figure out a user interface, I’m amazed. I’m like, wow, that’s…

Richie Rump: GitHub for Windows is – GitHub Desktop I guess they call it now – but it’s really improved since it originally came out. It’s a lot, lot, lot better.

 

Should I shrink the database regularly?

Brent Ozar: Sree asks, “If we have tables where millions of rows are inserted and then deleted, like loading happening, do we need to shrink the database or should we just update stats to bring back and not play around with used space?”

Richie Rump: Shrink everything.

Tara Kizer: I mean, how often are you doing this? Inserting certainly won’t be shrinking it, but deleting – this is a one-time shrink where you’re purposefully deleting a lot of data in one recover space. One-time shrinks I’m okay with, but not regularly. It sounds like this is more of a scheduled process. I would be doing shrinks, for sure. I never, ever schedule shrinks, for sure, ever.

Brent Ozar: Why don’t you? What’s the bad thing of scheduling shrinks? What does it do in terms performance?

Tara Kizer: Well, I mean, the shrink, it can cause blocking, but that’s not really the reason why I wouldn’t do it. It’s just because I’m going to need that space again, you know. It also causes fragmentation and, you know, there’s just no reason for it. When you have to auto-grow back out, that’s an expensive operation, if it’s the log file especially because you don’t get instant file initialization, even if you have that setup, it has to be zero initialized. So if your auto-grow is 1GB, there’s going to be a pause. It’s going to probably do it fast, unless you’re on some kind of slow I/O back in.

Richie Rump: Slow SAN…

Tara Kizer: Slow SAN.

Brent Ozar: We know about this week.

 

What causes excessive memory grants?

Brent Ozar: Steve says, “Any general comments about what causes excessive memory grants?”

Erik Darling: Excessive memory.

Brent Ozar: Having too much of it lying around, you just end up giving it away.

Tara Kizer: Varchar MAX.

Erik Darling: Yeah, so the two main things in SQL Server that will cause a query to get a memory grant are sorts and hashes. So you need to sort data because you don’t have an index that supports the sort order, you get that. Or if SQL Server chooses a hash join or a hash match, it will request memory to do that. Tara is absolutely spot on; one of the things that impacts the size of the memory grant is the size of the data that needs to be processed via the sort or the hashing operation. So if you’re thinking about logs or string data, that will certainly impact the memory grant. The number of rows that have to be sorted will impact the memory grant and all sorts of other good stuff. Poor cardinality estimates can certainly contribute to that, but most of the time it’s just something that you need to fix or tune with or query your indexes.

Brent Ozar: Alright, and that’s it for this week’s Office Hours. Thanks everybody for hanging out with us. And we will see y’all next week; adios.


I Most Certainly Do Have A Join Predicate

Execution Plans
15 Comments

FREAK OUT

You wrote a query. You joined tables.

You have the right ON clause.

You have the right WHERE clause.

But the query plan has a problem!

BugBurg

How Could This Happen To Me?

Oh, relax. You’re not crazy. You just assumed the worst.

Like me whenever I feel pain near my liver.

Call the mortician.

For a query like this, the optimizer can play some tricks.

One of our join columns is in the where clause, too.

That means our plan looks like this!

Lemons!

You see, when the optimizer looks at the join and the where, it knows that if it pushes the predicate to the two index seeks, whatever values come out will match.

I don’t think it even needs the join at that point, but hey.

It certainly doesn’t need the warning.

Thanks for reading!

Brent says: Erik’s like that dying replicant at the end of Blade Runner. He’s seen things you people wouldn’t believe. Also, he has superhuman strength.


Troubleshooting Parameter Sniffing Issues the Right Way: Part 3

In part 1 and part 2 of this series, I showed you the various ways to get the application’s SET options so that you can get the same execution plan as the application. If you skip that step and run the query that is timing out in the application, it could be fast in SSMS. Your SET options must match the application’s in order to reproduce the slowness users are seeing.

Once you’ve reproduced the slowness, it’s time to start testing for a workaround or a solution. You need to be aware of the density vector issue when testing.

What is the Density Vector?

When you use a local variable in a query, SQL Server cannot use the statistics histogram. The histogram is where SQL Server has estimates for how many rows are equal to a value, how many rows are between two values, etc.

What does SQL Server use to estimate how many rows there are if it can’t use the histogram? It uses the number of rows in the index multiplied by the distribution of the values, aka the density vector.

Let’s look at an example that uses the histogram.

For UserId=1144035, there are 40,326 estimated rows to be returned from the ix_OwnerUserId index on Posts.

Let’s look at the histogram for the index that was used, ix_OwnerUserId.

RANGE_HI_KEY=1144035 has EQ_ROWS=40326. This user has answered so many questions on Stack that it has its own row (step) in the histogram.

Now let’s look at the estimate when we use a local variable.

We now have an estimate of 10.7218 rows for pretty much the same query. It was 40326 before. That’s quite a discrepancy.

Where did it get 10.7218 from? This used the density vector instead of the histogram!

Calculate “number of rows in the index multiplied by the distribution of the values”.

10.7218!

Why does this matter?

SQL Server chose a different execution plan for the local variable than it did for the stored procedure even though both used @UserId = 1144035.

You aren’t comparing apples to apples when troubleshooting parameter sniffing issues in production if the source query is parameterized, like a stored procedure, and you are testing with a local variable.

For more information on this topic, check out Benjamin Nevarez‘s post about OPTIMIZE FOR UNKNOWN. You might be confused why I’m directing you to a post about OPTIMIZE FOR UNKNOWN. Using a local variable is equivalent to using OPTIMIZE FOR UNKNOWN. Both use the density vector.

Avoiding the Density Vector

How do we avoid the density vector when we are actively troubleshooting a parameter sniffing issue in production?

The easiest way is to use a temporary stored procedure.

Keep in mind that this object only exists in the session/query window you created it in, so you’ll need to execute it there too.

To make your life easier, use WITH RECOMPILE when you execute it to avoid having to remove the plan from cache each time you make a change or want to test which plan you get for various parameter values.

Mission Accomplished

My goal with this three-part series was to educate you on how to troubleshoot parameter sniffing issues the right way. There’s plenty of blog posts on this already. It has become very apparent since becoming a consultant that many, maybe even most, people don’t know that they are doing it wrong. I’m not saying that you are doing it wrong, but many are.


Troubleshooting Parameter Sniffing Issues the Right Way: Part 2

In part 1 of this series, I showed you how to get the compiled parameter values for an execution plan that you are investigating. You’ve identified it as a bad execution plan. You’ve already fixed the production emergency by removing the bad plan from the plan cache. You’re almost ready to start testing for a workaround or a solution to avoid this issue in the future.

Before you start testing, you have to get your SSMS window set to the same options as the application.

Getting the App’s SET Options

Most applications use a database driver that has ARITHABORT set to off. SSMS has ARITHABORT set to on. Because of this difference, you get a different execution plan than what the application is using. This is why you could be troubleshooting a performance issue in production where the query is fast for you in SSMS, yet it is timing out when the application runs the same query. Erland has a really good article on this: Slow in the Application, Fast in SSMS?

It’s easy to fix: just get your SSMS SET options the same as the application. For most applications, all you need to do is set ARITHABORT to off.

But I can never remember which setting is different, so I always check which options are being used by an application session and then do the same in my SSMS query window.

What are the different ways we can check which settings are being used by a session?

There are easy ways to see your own current settings, such as via DBCC USEROPTIONS, but this post is about getting the settings the application is using.

Various DMVs

The SET options can be queried via various DMVs: sys.dm_exec_sessions, sys.dm_exec_requests and sys.dm_exec_plan_attributes.

sys.dm_exec_sessions and sys.dm_exec_requests have the data in bit columns, one column per setting.

SET options in sys.dm_exec_requests and sys.dm_exec_sessions

sys.dm_exec_plan_attributes has the data in a single column called set_options. You have to use bitwise operators to figure out which ones are set.

SET options in sys.dm_exec_plan_attributes

The way it is formatted does not give me a quick copy/paste script. A quick Google search didn’t lead me to an already-built script (drop a comment if you have/find one). Even if I could find a script, I may not always have access to it or be able to find it. I need something quick when I’m working on a production issue.

sp_WhoIsActive

sp_WhoIsActive provides the info too in the additional_info column, but it’s in XML format.

SET options in sp_WhoIsActive

sp_BlitzCache

If you scroll all the way to the right in the sp_BlitzCache output, you’ll see a “SET Options” column.

It’s not clickable like it is in sp_WhoIsActive, but you can copy/paste the output to a new window. There are a few problems with this output:

  1. It’s not clickable like it is in sp_WhoIsActive. You can copy/paste the output, but it’s not in a format that you can just run.
  2. It’ll only show you the options that are on. You’d have to know which ones to turn off.
  3. You’re not sure if this is the plan that the application is using, though the “# Executions” would give you a clue.

It’s in the execution plans too!

If you’ve got the execution plan that the application is using, you can see the SET options in the plan’s properties. Click on the root operator and either hit F4 or go to Properties from the right-click menu. Expand the “Set Options” section.

SET options in an execution plan

 

Extended Event Session

There are two events in an XE session that contain the data: login and existing_connection. The trick is that you have to enable collect_options_text to see the data, otherwise the options_text column is blank.

SET options in an XE session

SQL Profiler/Trace

In my opinion, using SQL Profiler is the easiest and possibly the fastest way to get the SET options. The data is available in the “Audit Login” and “ExistingConnection” events. I always just use the ExistingConnection event so that I don’t have to wait for a new session to connect. I start the trace and then immediately stop it.

You can also do a server-side trace, SQL Trace, but that just complicates things since I’d need to output to a file and then read it.

SET options in Profiler

Why not just change SSMS?

You could modify your SSMS settings to match the application’s, but I don’t ever do that. An application could be using different options than the default for the database driver it is using. Plus, changing SSMS wouldn’t give me the right options for execution plans from an Agent job, SQLCMD or bcp. Those have QUOTED_IDENTIFIER set to OFF.

So many different ways to check SET options

I’m sure I’m missing some ways to check the SET options of a session. Leave a comment with what I’m missing. Maybe you can even convince me not to use Profiler to get this info but probably not! PROFILER 4 LYFE!

For more information on troubleshooting Parameter Sniffing issues, check out Brent’s post.

What’s Next?

You still aren’t quite ready to start testing yet. Next up:


Troubleshooting Parameter Sniffing Issues the Right Way: Part 1

Many of you are troubleshooting Parameter Sniffing issues the wrong way in production. In this three-part series, I’ll show you how to troubleshoot it the right way:

  1. Getting the Compiled Parameter Values
  2. Getting the App’s SET Options
  3. Avoiding the Density Vector

This is part 1 in the series.

Parameter Sniffing 101

When a query suddenly degrades in performance, my first thought is always a bad execution plan. If it’s a query that gets executed frequently, there’s usually higher CPU utilization when there’s a bad plan in the plan cache.

You’ll know if that’s the case if the production emergency goes away after you’ve removed the bad plan from the plan cache. But wait! Before you take corrective action on the problematic query, save the plan. You are going to need that plan to continue troubleshooting after you have removed it from the plan cache.

There are various ways to get the plan for a query. I’m going to use sp_BlitzCache.

We’ve got two rows here as sp_BlitzCache has info about the stored procedure and the one statement inside it. Click on the “Query Plan” value for either row. Save it as a sqlplan file.

Now that you’ve saved the plan, you can fix the production emergency by removing the plan from the plan cache. If it’s a stored procedure that I’m dealing with, then I’ll use sp_recompile as I’m old school like that. Otherwise, I’ll use DBCC FREEPROCCACHE(<sqlhandle>) or DBCC FREEPROCACHE(<planhandle>).

If the production emergency stops, you know you’ve found a bad execution plan and now need to see what can be done to avoid this in the future.

You can quickly get the stored procedure code by right-clicking on the plan anywhere that’s white (not an operator) and selecting “Edit Query Text…”

You may see that the query text is truncated if your query is obnoxiously long. Mine is short, so we get the whole thing.

Getting the Compiled Parameter Values

The next step is getting the parameter values that were used when the query was compiled.

Right-click in the white area of the plan and select “Show Execution Plan XML…”

Scroll to the bottom and locate the value for ParameterCompiledValue.

When this query was compiled, the stored procedure was executed with @UserId = 26615. The execution plan was optimized for @UserId = 26615.

When the execution plan is optimized for @UserId = 26615, does it perform well for other @UserId values? If the answer is no, then this one is considered a bad execution plan. It was good for @UserId = 26615 though.

You can also use sp_BlitzCache to get the compiled values.

Click on the “Cached Execution Parameters” value for either row.

What’s Next?

You aren’t quite ready to start testing yet. Next up:


How to Get a Random Row from a Large Table

T-SQL
23 Comments

Method 1, Bad: ORDER BY NEWID()

Easy to write, but it performs like hot, hot garbage because it scans the entire clustered index, calculating NEWID() on every row:

The plan with the scan

That took 6 seconds on my machine, going parallel across multiple threads, using tens of seconds of CPU for all that computing and sorting. (And the Users table isn’t even 1GB.)

Method 2, Better but Strange: TABLESAMPLE

This came out in 2005, and has a ton of gotchas. It’s kinda picking a random page, and then returning a bunch of rows from that page. The first row is kinda random, but the rest aren’t.

The plan looks like it’s doing a table scan, but it’s only doing 7 logical reads:

The plan with the fake scan

But here’s the results – you can see that it jumps to a random 8K page and then starts reading out rows in order. They’re not really random rows.

Random like mafia lottery numbers

You can use the ROWS sample size instead, but it has some rather odd results. For example, in the Stack Overflow Users table, when I said TABLESAMPLE (50 ROWS), I actually got 75 rows back. That’s because SQL Server converts your row size over to a percentage instead.

Method 3, Best but Requires Code: Random Primary Key

Get the top ID field in the table, generate a random number, and look for that ID. Here, we’re sorting by the ID because we wanna find the top record that actually exists (whereas a random number might have been deleted.) Pretty fast, but is only good for a single random row. If you wanted 10 rows, you’d have to call code like this 10 times (or generate 10 random numbers and use an IN clause.)

The execution plan shows a clustered index scan, but it’s only grabbing one row – we’re only talking 6 logical reads for everything you see here, and it finishes near instantaneously:

The plan that can

There’s one gotcha: if the Id has negative numbers, it won’t work as expected. (For example, say you start your identity field at -1 and step -1, heading ever downwards, like my morals.)

Method 4, OFFSET-FETCH (2012+)

Daniel Hutmacher added this one in the comments:

And said, “But it only performs properly with a clustered index. I’m guessing that’s because it’ll scan for (@rows) rows in a heap instead of doing an index seek.”

Bonus Track #1: Watch Us Discussing This

Ever wonder what it’s like to be in our company’s chat room? This 10-minute Slack discussion will give you a pretty good idea:

Spoiler alert: there was not. I just took screenshots.

Bonus Track #2: Mitch Wheat Digs Deeper

Want an in-depth analysis of the randomness of several different techniques? Mitch Wheat dives really deep, complete with graphs!


[Video] Office Hours 2018/02/28 (With Transcriptions)

This week, Erik, Tara, and Richie discuss cloning databases without 3rd party software, tips for upgrading from 2005 to 2017, archiving databases, partitioning databases, version control, SQL Server settings, SSRS, moving database from stand-alone server to an AG cluster, limiting memory use at the query level, and mirroring.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 2-28-18

 

How do I clone production with less data?

Tara Kizer: Alright, we’ll start off with [Narav’s] question. He has a production database that’s 500GB. He wants to create a clone database with limited data size for developer and testers in local environment. He wants to know how he can achieve this. “No third-party software allowed in my company.”

Erik Darling: Oh boy, no third-party software. It’s almost like they want you to recreate the wheel every day; what an awful place to work. You should get a new job. But on your way to get a new job, you should go look at this blog post by Tom LaRock.

He wrote this – it shows you how to go and script out a stats only copy of your database so that you don’t have to use a third-party tool. If you’re on a newer version of SQL Server like 2014 – or I think even 2012 with one of the service packs – you can use DBCC CLONEDATABASE to get a stats only clone of your database for developers to use. But, at least when I checked it. There were still some security issues around that – like if you’re okay with your developers seeing the contents of the histogram, there can still be personally identifiable information in there.

Richie Rump: Alright, so I think the key word there is developer and tester – and as a developer and tester, I’m probably going to be running some stored procedures and looking at data and doing that kind of stuff; so this doesn’t really help you that much. So my big thing here is that one, you should have already had your database scripted out and at least all your prime data scripted out as well. And that should be checked in into some sort of source control. So if you’re not doing that, you’re not even at, kind of, base-level yet.

You’re trying to get way up high on the database pyramid of needs and you need to get to the bottom, so get that first. And then when you get that, then you can start eyeing out the data that you want to move over, and that probably should be scripted out as well because, at some point, you’ll want to refresh all that data. Moving data over from production to development is usually a bad idea because production doesn’t really represent, a lot of times, that testing that you’re going to want to do. Testing, you want to do a lot of oddball things; a lot of strange things. And production data is usually pretty uniform. It’s usually pretty good because everything’s going through the application and everything kind of works the way it should. In test…

Erik Darling: You don’t see a lot of applications, do you?

Richie Rump: Listen, I see a lot of stuff, but you don’t really see a lot of that – you see this stuff in outliers in production, but the great majority of it is pretty good data because it’s gone through the process of cleaning and whatnot. So you’re on a long trek, and if you haven’t started your application doing it this manner, it’s pretty painful. But once you get there, then you can actually do the automated deployments and automated builds and automated database testing and things of that nature, and you’ll start getting really high up on that database pyramid; hierarchy of needs or whatever that you need to do. But it’s hard work, it’s all scripting out stuff. And that’s really the best thing we have right now, especially if you’re not using third-party tools.

Erik Darling: If you just need to, like, move chunks of data over, if you’re too lazy to script it out, you could use SSIS on another server and just move some new stuff over like every night so it was fresh for dev. You’re still going to run into the stuff he talks about, where data might be a little too sanitary and a little too easy, but you can always add stuff on top of that as well if you need to test some edge cases.

Richie Rump: Yeah, and if you need to automate any of those tests, those are going to continue to be changing as well and that’s not going to help at all. So yeah, there’s no easy answer to this. We’ve been trying to do this now for well over 15-20 years, and we still don’t have a good answer for it. There’s some third-party tools that kind of help, but there’s still no panacea and great answer for all this.

I feel your pain. I just went through this whole thing this last month of automating database testing and getting everything into scripts and doing all that stuff. So yeah, it’s just hard.

Erik Darling: Alright, let’s get that next question up.

 

Why don’t I see SOS_SCHEDULER_YIELD in sp_WhoIsActive?

Tara Kizer: Alright, Pablo asks, “Monitoring tool says that the principal wait on a server is SOS_SCHEDULER_YIELD with greater than 70% waits, but when I run WhoIsActive during the day I see so few activity or none. Does it mean that there’s a problem or just few activity?”

Erik Darling: Right, so just because that makes up 70% of what you’re waiting on, doesn’t mean you’re waiting on it all that much. I don’t know who your monitoring tool is; you didn’t mention the name of the company that does the monitoring in your question. But a lot of monitoring tools are really bad at showing you what your server is actually waiting on and if it’s waiting on it a lot. So really, monitoring tool dependent, try to dig into when the server is busy and see if you are actually spending a significant amount of time waiting on anything. Otherwise, you’re just kind of sitting there staring at a bored server.

SOS_SCHEDULER_YIELD is one of those waits where it’s like, that’s cooperative scheduling. That’s SQL in the OS saying queries are going to run for a little bit, they’re going to step off, they’re going to go back on the scheduler. There’s really not a whole heck of a lot you can do to fix that, aside from faster processors or having queries run less often or other goofy things.

Tara Kizer: Yeah, and also run sp_BlitzFirst during the day when you think that you have the highest load. It might be at night that you have the greater than 70%; it’s hard to say. But run sp_BlitzFirst @ExpertMode equals one, and then @Seconds, say five minutes, so 300. Then you might see the SOS_SCHEDULER_YIELD waits there. Running WhoIsActive might not capture it because that’s just a moment in time, whereas you need to be able to have something run and then run again and then do a diff between the two result sets; like sp_BlitzFirst can do.

Erik Darling: Yeah, it’s just when I see that as being 70% of the waits on the server, I don’t think that server’s doing a whole heck of a lot else.

 

What should I encrypt, and how?

Tara Kizer: Alright, so Rick – oh, he said he wanted to be anonymous, but just because he’s currently on SQL Server 2005. Alright, I didn’t say the whole name, so that’s alright. They’re finally upgrading to SQL 2017 from 2005. Some people are suggesting applying encryption and compression and who knows what else, “Any advice on what to stay away from and what to slowly apply?” Are you testing this stuff? I mean, why are you implementing – I like compression, but I’m not going to be implementing encryption unless I need it, because that brings up all sorts of problems.

Erik Darling: Yeah, especially around TDE – what kind of encryption? There’s like five different kinds of encryption now. Alright, we’ll come back to that one when Rick gives us some more – I mean not Rick – gives us some more information.

 

 

How can I archive data fast?

Tara Kizer: Alright, [Narav] has another question, “What is the best method to archiving databases because archiving database scripts take long time when we remove the data from the database?”

Erik Darling: Are you talking about archiving an entire database or just part of the data? I don’t know…

Tara Kizer: I imagine historical data.

Erik Darling: Okay, I like partitioning for that; anyone else?

Tara Kizer: Yeah, definitely…

Richie Rump: Yep.

Tara Kizer: Table partitioning. It’s been around a long time and you now get it with SQL 2016 Service Pack 1, right, even in Standard Edition. It’s been an Enterprise Edition feature since before that.

Erik Darling: Yeah, so like, definitions of archiving are always funny because I’m never quite sure if it’s like, we’re going to get rid of this whole database, or we’re just going to take this chunk of data and move it to a different database, or if we’re just going to delete this data. S-o partitioning, and even partition views, work out fairly well for a lot of that stuff. But if you’re just wholesale deleting data or putting it into a different database, then obviously some sort of data-moving mechanism is the best way to do it. Richie probably knows a few of those…

Richie Rump: Yeah, man. Partitioning, as much as I love it, you’ve got issues; especially when applying queries or indexes and things like that. It can definitely be an issue when you go ahead – are they coming after you? Is that me or is that you? I don’t know. But it can definitely be an issue when you’re doing some query tuning and what not and you’re trying to apply an index across the entire table, and then you do your partition swap and you can’t do it because the index is going across and they’re not aligned with your partitions. So that’s a problem with the partition swapping.

As far as why your archiving is slow, I would take a look at your indexes and try to work out why those queries are particularly slow. Is there something that you’re missing? Do some perf tuning on your archiving queries and see what’s going on there.

 

Can you combine a failover clustered instance and an AG?

Tara Kizer: Alright, New York City sounds like London, by the way. There were always sirens. It was never police; it was always ambulances. Alright, Eric asks, “Can you combine a failover cluster instance and an Availability Group so you have a failover instance with an Availability Group for reporting?” That answer is yes, but I’ve never done it so I don’t know how complicated it is. Do you have any info, Erik, on it?

Erik Darling: Yeah, me and all my Availability Group experience…

Tara Kizer: I’m the only one at the company that has done Availability Groups in production, and I’ve never combined the two but I’m fairly confident that this is possible.

Erik Darling: No, you can totally do it. We’ve had clients in the past who have combined them, but as far as, you know, HADR typography goes, it’s pretty complicated; especially if you’re going to stick a reporting layer on top of it.

Tara Kizer: As a matter of fact, Brent has done it on past clients that I know of – are public names. I’m not going to mention them here, just in case there’s any kind of issue. But I know, when we do the HADR stuff with clients and we talk about Availability Groups. In there, it shows some pretty complicated setups that do combine the two technologies.

 

How can I integrate version control with SSIS?

Tara Kizer: Alright, Anon asks – and I think that’s anonymous – “Have any of you used version control with SQL? If so, how easy do you think it is to integrate when there is no current version control? We have a lot of SSIS packages and changes.

Erik Darling: Richie, take it away…

Richie Rump: What kind of version control are we talking about? Yeah, I’ve used version control with pretty much every version of SQL Server that I’ve dealt with. Even just starting from just scripts, we go and check into VSS – visual source safe for those old guys – and then all the way up through TFS and now pretty much use GitHub.

Tara Kizer: Have you used Subversion though?

Richie Rump: I’ve used Subversion. I didn’t like it at all.

Tara Kizer: Yeah, we’ve used Subversion a few jobs ago and I wasn’t a fan.

Richie Rump: Yeah, there was a really old version when I first joined up; DVCS or something like that. That was a nightmare. But yeah, so I’ve used a lot of different source control kind of stuff. Yes, you can get it into SSIS. I think with SSIS, as I recall with TFS, there was an issue where only one person can work on it, really, at once because of – the merging stuff wasn’t as easy because it’s all one big XML file [crosstalk]. And merging it just wasn’t easy at all. It’s not like working with c# text files and doing all that stuff. And merging is fairly simple. Merging everything together when you’ve got one version here form one person, another version here and you’ve got to bring it together. That was pretty difficult, from what I recall, with SSIS.

So we had a rule that only one person can edit SSIS at one time. I’m not quite sure with the newer Git type stuff because I’ve never used Git with SSIS, if that’s still the case. But I’m assuming so because the file format still hasn’t changed and whatnot. So, I say go ahead and throw it out there. And it’s real easy to get things in, it’s just a matter of getting your team and your workflow optimized as far as getting everybody up to speed and things like that. There’s no downside to version control.

Erik Darling: Hopefully it’s not still a single-threaded process then.

Richie Rump: Oh god, TFS was, when it was first released, was the worst. It was just so bad.

 

How should I set Max Degree of Parallelism?

Tara Kizer: Alright, Tom asks – he’s currently on SQL 2016 Enterprise; 40 cores with 1TB of memory. Yay…

Erik Darling: jealous, right…

Richie Rump: I mean, who isn’t really, right.

Tara Kizer: Jealous…

Erik Darling: I’ll gladly take that off your hands [crosstalk].

Tara Kizer: “What are the best settings for max degree of parallelism?” He’s currently set to eight. And then cost threshold for parallelism, currently set to [five].

Erik Darling: Please, you spent all that money on a server and no money on training. What happened?

Tara Kizer: I like his settings, personally, but…

Erik Darling: Yeah, I’d stick with those. I wouldn’t change those one bit, unless I observed some sort of issue.

Tara Kizer: Yeah, and if I was going to change anything where I wanted to affect MAXDOP or cost threshold for parallelism, I would do it at the query level. I would keep your server settings as they are. At SQLBits conference last week, and somebody’s session – Erin Stellato’s query store session, the topic of cost threshold for parallelism came up and she asked the audience who sets theirs to 50, or some number around there; not five. You shouldn’t have it set to five. And half the people raised their hand. Then she asked, who sets it to a different number based upon what you see as the workload, you know, examining the plan cache, and half the people raised their hand; the other half.

Which I’m, in my head, I’m calling BS because there’s no way that many people are doing this because I don’t know that you need to do that kind of work. I know that there’s a blog article that was in the Slack, in our company chat room yesterday about setting cost threshold for parallelism to a value based upon what you see is the workload. But I’m a fan of setting it to 50 and then adjusting specific queries with the option query – whatever it is – query hint or query option. Whatever it is, you can change max degree of parallelism at the query level. So I don’t think you should change server settings because there are generally good.

Richie Rump: So you’re okay with MAXDOP at eight?

Tara Kizer: For a 40 core server.

Richie Rump: For a 40 core?

Tara Kizer: Yeah.

Erik Darling: Yeah, I’m fine with that.

Tara Kizer: I haven’t gone higher than that.

 

How can I make SSRS go faster?

Tara Kizer: Alright, Steve asks – he’s moving a batch of reports from 2014 to 2016. They were originally developed for 2008. He has serious performance problems with one report. What’s the best resource of SSRS performance issues? I wouldn’t even think that the issues in reporting services. I would be suspicious of the cardinality estimator since they were developed in 2008 and you’re going from 2014 to 2016. Oh, but 2014 to 2016 is going to have the same cardinality estimator, if you upgraded your compatibility levels. So I would be looking into compatibility level here if it’s suddenly slow. But from 2014 to 2016 – what is the database engine version? That’s what I need to know. Is it 2014 SSRS, database engines lower? I’d be suspicious of the cardinality estimator.

Richie Rump: Yeah, or possibly you’ve just got a bad plan in there.

Tara Kizer: Yeah. I would not be suspicious of reporting services being the culprit. I would look into performance tuning the actual query.

Erik Darling: SSRS is always a bad plan. [Git tag, low].

 

What’s the best way to move a 1TB database…

Tara Kizer: Alright, Dorian asks, “What is the best way to move a 1TB database form a standalone server to an Availability Group cluster to minimize downtime?”

Erik Darling: You ‘neek up on it… Oh wait, that’s not it.

Tara Kizer: What?

Richie Rump: Postgres.

Erik Darling: Ooh, SAN snapshots.

Tara Kizer: [crosstalk] restores and then apply transaction logs. So do a full backup and a restore differential as you get closer, and then apply transaction logs. And you can make that switcheroo within seconds if you’re fully scripted. So make sure you’re applying transaction logs. So you can use database mirroring or log shipping to do this. You don’t have to do it manually. But yes, you want to do some kind of method which allows you to do restoring of transaction logs. That way, you’re mostly up-to-date once the maintenance window hits, then you’ve just got to apply your final transaction log.

Erik Darling: Yeah, for stuff like that, I’ve done either log shipping or mirroring. Either one is fine. I liked mirroring, just because it was a little bit easier to manage and I could keep it in asynchronous mode until I needed to switch over, then flip it to sync and boom. I’m done. I moved about, I don’t know, 15 or so terabytes in one night with that once, and that worked out well for me. So I would do that, but what do I know?

Tara Kizer: No, it’s a good solution. It’s really easy to switch another synchronous move, failover, and boom, done; two commands. Getting from async to sync can take a little bit of time, depending on how busy your system is.

Erik Darling: Yeah, but if you wait until you’re in a lull, you’ll generally be okay. Or if you’re on a SAN and you can take SAN snapshots or you can do something else that is, you know, a pretty quick snapshot and remount of the data. That’s sometimes pretty good too, It really depends on how busy your system is and what your maintenance windows look like. I would assume that you’d get some kind of maintenance window to flip over to a brand new set of hardware like that, but again, what do I know?

It’s a crazy world out there. I love it when people ask questions on Stack Exchange and they’re like, “I have this problem with a thing that doesn’t work” And you’re like, “Here’s a pretty easy solution.” And they’re like, “I’m not allowed to use temp tables.” Like, where do you live? What’s going on in your life? Are you on Oracle? I don’t know.

 

How can I tell what tables someone is accessing?

Tara Kizer: Alright, Ronnie asks, “Is there a way, in SQL 2012 without running profiler in the background, to identify tables that a specific login has accessed?”

Erik Darling: Audit.

Tara Kizer: Tell us more. One word isn’t sufficient.

Erik Darling: I don’t know; that’s it. SQL Server audit, you can set it to, I think, like a single login or a single set of tables or all logins or all tables. You can just figure out who’s run a select or insert or update or delete on them. I’m not promising that it’s pain-free and everything’s going to work quickly and nicely and, you know, you’re going to make a lot of friends by turning that on, but you could do it. I think that would be kind of your best shot.

 

How are jobs handled in an Availability Group?

Tara Kizer: Alright, Tom asks, “With Always On, do agent jobs exist on all nodes in case the primary fails? Do they all run simultaneously?”

Erik Darling: 40 cores, 1TB, an AG…

Tara Kizer: Alright, so the first part is, Always On – you’re talking about Availability Groups, but that also covered failover cluster instances. So failover cluster instance is part of Always On, and yes, the agent jobs exist in the instance and that fails over. But you’re really asking about availability groups here. Do the agent jobs exist on all nodes? That answer is no, you are required to maintain those other replicas. Now, what we did at my last job was pretty nifty. SO we didn’t have to pay attention to all these other replicas. We had another server, which had to be licensed, of course, that was the job scheduler.

And so we used the Availability Group listener to connect to wherever the primary was. It didn’t matter. It could be in San Diego, it could be on a disaster recover site. We used Availability Group listener and that follows wherever the primary goes. So that was a nifty way of getting away from having to keep all the jobs up-to-date on all the other replicas; so having another server that’s just responsible for running jobs for Availability Group databases.

Erik Darling: I want to say that Brent has a post, that I’ll throw in the chat in a minute, that’s called Using SQL Agent as an App Server. Basically, it talks about a similar concept where you just have that one SQL Server that sits out there. All it does is run SQL agent and you can just point those jobs anywhere you want.

Tara Kizer: And then the thing to note, if you aren’t going to use this other server, like we mentioned, because you don’t have the licenses for this other server and you are going to have other jobs all in sync across all replicas – in order to ensure that they run on failover, each job needs to have an if statement. Am I the primary replica? And if so, then I run the job step. If not, I just bail out. So they all have this exact same job schedule. They’re all enabled. They all have the same job step, but you just need to do a quick check – am I the primary replica? And if not, bailout.

So yes, they all will run simultaneously, but all the other replicas, all the secondaries won’t actually do any work.

Erik Darling: Yeah, and if you want to see a pretty good example of jobs that take those kinds of precautionary steps before they run, go look at the source code on Ola Hallengren’s scripts, because all those will do certain checks to make sure that the databases either primary and not read-only and some other stuff that makes running maintenance tasks difficult.

 

Can I limit memory use by a query?

Tara Kizer: Alright, Mahesh asks, “Is there a way to limit the memory use by query, other than resource governor? When I do sp_WhoIsActive and I see a couple of queries use a huge chunk of memory, like 23GB, and causes resource semaphore waits for all other queries.”

Erik Darling: Ooh wee, yes. Well, if you’re on SQL Server 2012 or up – well, 2012 SP3 or up – you can use the max_grant_percent hint and you can cap, at the query level, what percentage of memory a query is allowed to use. By default, It can ask for 25% max server memory, so you can use max_grant_percent to bring it down lower. If you are on a version of SQL Server prior to 2012 or you’re uncomfortable with that hint, for some reason, you would just have to get in and do some necessary query index tuning to make that query ask to less of a memory grant.

Tara Kizer: And if you can’t modify the queries, you can add a plan guide on top of it to apply what he just said. That gets a little tricky.

Erik Darling: Or you could just stab him?

Tara Kizer: Or just use resource governor. I know the question says other than resource governor, but that is one of the reasons to use resource governor…

Richie Rump: That’s what it’s for.

Erik Darling: He might not be on enterprise edition.

Tara Kizer: Oh, that’s right; got you.

 

When I upgrade from a standalone to an AG…

Tara Kizer: Nestor asks, “Do you switch roles for the final move or do you just remove log shipping or mirroring?” So he’s talking about upgrading from a standalone Availability Group. The thing is, once you failover or you restore to the other server, you can’t go back. So yeah, you just drop them. But they are gone, there’s no rethinking in log shipping in place anymore.

Erik Darling: But I’ll usually leave that other server online for a bit, just in case I do need to say, like, abandoning this upgrade, whatever data is there, we’ll try to manage loss on and we’re going to go back to the old one…

Tara Kizer: Yeah, and the upgrades I’ve done, the abandoning our upgraded server occurs only in the maintenance window. So maybe, we’re four hours into our maintenance window. Maybe our maintenance window was supposed to be shorter, but we kept troubleshooting, troubleshooting, troubleshooting, and then we abandoned it and went to the old server. So there’s no data-loss because we kept the system down while we did this. And if we had to abandon it the next day, that just never was part of our plan. We would always just keep troubleshooting and fix the issue; whatever it took.

Erik Darling: Smoke tests are a beautiful thing.

 

Can I mirror from a cluster to something else?

Tara Kizer: Alright, Thomas asks, “I have some databases in a cluster. I’m moving them to an Availability Group. Can I set up mirroring from the cluster to one of the instances, failover when I’m ready to migrate and then add them to the availability group?” Sure.

Erik Darling: Sounds right to me.

Tara Kizer: Yes, definitely. He wasn’t sure about taking databases in the cluster and then mirroring them. I’ve done tons of mirroring with failover cluster instances and Availability Groups. They’re all separate technologies – just think, if you could do mirroring and Availability Group; probably, but that wasn’t the question anyway.

 

In my 2-node cluster, how do transactions fail over?

Tara Kizer: Alright, some of these I’m skipping because they’re too long. Alright, so J.H. asks, “In a two node cluster, server one is active, server two is passive. If server one gets rebooted and server two then becomes active, do all connection transactions get persisted via clustered MSDTC or other clustered process. No connections, transactions, get hard dropped…” You get hard dropped because that instance is coming down. It is severed. So you need to have mechanism in your application to retry queries once the instance comes up on the other node.

Erik Darling: Yeah, that will definitely sever whatever connections you have. If the shutdown is kind of graceful then some of them might do okay. But in general, if you have a hard stop on that server, whatever was in-flight is not going to survive.

 

Why doesn’t Richie like Subversion?

Tara Kizer: Alright, and I did see a question. I don’t remember who asked it. But someone wanted to know from Richie why he didn’t like Subversion.

Richie Rump: Personal preference; just didn’t like it. So I’ve…

Erik Darling: So what do you like?

Richie Rump: I was just about to get there. Thank you very much [crosstalk]. I’m a big Git guy these days; pretty much Git…

Tara Kizer: How can you like Git? Oh my goodness, I can’t stand it. It’s too complicated.

Erik Darling: What do you like, Tara?

Tara Kizer: Well, if I had to pick, I’m going to pick nothing. I’d rather not have source control. I’m kidding. I was okay with TFS at my last job. I thought that that was pretty fancy. Learning curve, I thought, was much lower than GitHub. I cannot – if I have to do anything in the Blitz stuff, it is going to be a very painful day. I’m going to need some Motrin; I may need a cocktail. And I have to go through the article again on the steps I have to do because it is just too many steps.

Richie Rump: I’m in it all day long, so it’s…

Tara Kizer: You guys use it more often, yeah.

Richie Rump: It’s easy for us. The other thing is, if Git is not an option or is something that you guys don’t want to go to, I like Mercurial as well. They’re very close to one another, but definitely, I would go mercurial or Git, I’m fine with these days. TFS, it’s okay. It doesn’t have that Git-iness where you can work everything locally. Everything’s kind of shared up top. And the server – there’s just a lot of power when I can work on something local, pull it down, merge it myself and then push it back up and rebase it; whatever I need to do, and everything just kind of works. The branching is so much easier than TFS. For the branching alone, Git wins for me.

Tara Kizer: Got you. Alright, that’s it. We’re all done for the day. We will see you guys next week. Bye.

Erik Darling: Time to go keep drinking.