Minimal Logging when you can’t change the code

What a gem!

Minimal logging is super cool. If you follow all the rules, there’s a pretty good chance it will work! Unfortunately, it’s not always up to you. All sorts of ISVs put out all sorts of bad ideas in form of code. Sort of like in Ghostbusters, when Gozer tells the crew to choose the form of the destructor, and Ray thinks of the Stay-Puft Marshmallow Man. Except in your case, the form of your destructor is a flat file bulk loading process that you can’t change any part of.

So here’s the test setup. Usual caveats about doing this in development, alright? Deal? Deal!

Create a table and dump a million rows of crap into it. A million rows is a good starting place for table sizes. I’m leaving it as a HEAP here, because this is one of the few occasions I’m cool with people having HEAPs. Nothing is faster for dumping data into.

Good ol’ HEAPs. Everything is faster when you don’t care.

Since we need to test a bulk process, we’re using BCP. If you’re cool with xp_cmdshell, you can run this. If not, you’re gonna have to crack open a command prompt and run the bcp commands on their own. You’ll probably have to change the file destinations and server names, unless you hacked into my laptop.

This gives us a comma delimited file of all our table data, and a format file to feed to the bulk insert process. You don’t need the format file for this to work, I just prefer to use them when possible. You can do all sorts of cool stuff with format files!

If we had full control, we could very easily load data in through BULK INSERT like this:

That takes about 16 seconds on my laptop, dumping in a million row batch from the file we output. If you’re working with more rows, you may want to break the batch size down into smaller chunks.

Do anything for TABLOCK

The magic happens with the TABLOCK OPTION. Without TABLOCK specified, this runs for around 40 seconds. That’s a bit more than double the time involved when we use TABLOCK, and minimal logging occurs.

But of course, we’re working with a file loading GUI, or something that just doesn’t let us make those kind of changes. So what do we do?

There’s a silly old system stored procedure out there that lets us change certain default options for tables. It is called, shockingly, sys.sp_tableoption. How does it help us here? One of the options is “table lock on bulk load”. That would have been more dramatic if I didn’t give you the link first, huh?

This buys us just about the same real estate as if we used TABLOCK in the BULK INSERT statement. Here are the run times for each!

No TABLOCK:

TABLOCK:

No TABLOCK, table option set:

You don’t have to be an internals wizard

When minimal logging works, the difference in timing and I/O is quite apparent. There are undocumented functions you can use to further prove your point, just don’t run them unless you’re on SQL Server 2014, or 2012 SP2 or higher. On all the other versions, there’s a bug where the function creates threads that won’t go away until you restart. So, yeah. Don’t do that. Unless you’re into thread starvation.

If you’re a customer, this is a good way to prove to your vendor that minimal logging is cool thing to get working. Lots of applications rely on flat file loading. I’m a big proponent of using staging databases that sit in simple recovery model for stuff like this, so you’re not whomping your client-facing database, and you don’t have to switch it from full to bulk logged to use minimal logging.

Thanks for reading!

Previous Post
Generating test data without complicated T-SQL
Next Post
Announcing the SQL Server Theme Song Winners

4 Comments. Leave new

  • Beware: A failure to use CHECK_CONSTRAINTS in conjunction with pre-SQL Server 2012 Mirroring will make your secondary assert that its tlog is missing an entry for a (parent system) lock on the checked table (and the Secondary will have to be rebuilt from scratch). See https://support.microsoft.com/en-us/kb/2700641.

  • interesting, in my lab tests it seems that while each bulk insert statement took significantly less time, total execution time of the query was actually longer (approx. ~20% longer). so I agree the tablock works as you state but there must be some overhead somewhere that I cannot account for.

  • Lock escalation auto fired when batch size 5000 and above so no need.

Menu
{"cart_token":"","hash":"","cart_data":""}