What a gem!
Minimal logging is super cool. If you follow all the rules, there’s a pretty good chance it will work! Unfortunately, it’s not always up to you. All sorts of ISVs put out all sorts of bad ideas in form of code. Sort of like in Ghostbusters, when Gozer tells the crew to choose the form of the destructor, and Ray thinks of the Stay-Puft Marshmallow Man. Except in your case, the form of your destructor is a flat file bulk loading process that you can’t change any part of.
So here’s the test setup. Usual caveats about doing this in development, alright? Deal? Deal!
Create a table and dump a million rows of crap into it. A million rows is a good starting place for table sizes. I’m leaving it as a HEAP here, because this is one of the few occasions I’m cool with people having HEAPs. Nothing is faster for dumping data into.
Good ol’ HEAPs. Everything is faster when you don’t care.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
USE [Sample] SET NOCOUNT ON IF OBJECT_ID('dbo.BulkLoadTest') IS NOT NULL DROP TABLE [dbo].[BulkLoadTest]; USE [Sample] GO CREATE TABLE [dbo].[BulkLoadTest]( [ID] [bigint] IDENTITY (1,1) NOT NULL, [DumbGUID] [uniqueidentifier] NOT NULL, [PO] [varchar](9) NOT NULL, [OrderDate] [date] NOT NULL, [ProcessDate] [date] NOT NULL, [ShipDate] [date] NOT NULL, [SalesPersonID] [int] NULL, [CustomerID] [int] NOT NULL, [SalesOrder] [varchar](10) NOT NULL, [PurchaseOrder] [varchar](10) NOT NULL, [OrderSubTotal] [numeric](18, 2) NOT NULL, [OrderTax] [numeric](18, 2) NOT NULL, [OrderShipCost] [numeric](18, 2) NOT NULL, [SalesNotes] [nvarchar](MAX) NULL, [isBit] [int] NOT NULL ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] ;WITH E1(N) AS ( SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL ), E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j), Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2) INSERT [dbo].[BulkLoadTest] WITH (TABLOCK) ( [DumbGUID] ,[PO] ,[OrderDate] ,[ProcessDate] ,[ShipDate] ,[SalesPersonID] ,[CustomerID] ,[SalesOrder] ,[PurchaseOrder] , [OrderSubTotal], [OrderTax] ,[OrderShipCost] ,[SalesNotes] ,[isBit] ) SELECT NEWID() AS [DumbGUID] , SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) AS [PO] , CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())) AS [OrderDate] , CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE() + 1)) AS [ProcessDate] , CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE() + 3)) AS [ShipDate] , ABS(CHECKSUM(NEWID()) % 100) + 1 AS [SalesPersonID], ABS(CHECKSUM(NEWID()) % 100000000) + 1 AS [CustomerID], 'S' + SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) AS [SalesOrder], 'P' + SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) AS [PurchaseOrder], ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 50000.50))) AS [OrderSubTotal], ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 100.99))) AS [OrderTax], ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 500.88))) AS [OrderShipCost], CASE WHEN [N].[N] % 19 = 0 THEN REPLICATE (CONVERT(NVARCHAR(MAX), 'BOU'), 8000) WHEN [N].[N] % 17 = 0 THEN NULL ELSE REPLICATE(CAST(NEWID() AS NVARCHAR(MAX)), CEILING(RAND() / 10 + 1)) END AS [SalesNotes], CASE WHEN [N].[N] % 17 = 0 THEN 1 ELSE 0 END AS [isBit] FROM [Numbers] [N] ORDER BY [N] DESC |
Since we need to test a bulk process, we’re using BCP. If you’re cool with xp_cmdshell, you can run this. If not, you’re gonna have to crack open a command prompt and run the bcp commands on their own. You’ll probably have to change the file destinations and server names, unless you hacked into my laptop.
1 2 3 |
EXEC xp_cmdshell 'bcp Sample.dbo.BulkLoadTest out C:\temp\blt.csv -w -t, -S localhost\NADA2014 -T' EXEC xp_cmdshell 'bcp Sample.dbo.BulkLoadTest format nul -w -t, -f C:\temp\blt.fmt -S localhost\NADA2014 -T' |
This gives us a comma delimited file of all our table data, and a format file to feed to the bulk insert process. You don’t need the format file for this to work, I just prefer to use them when possible. You can do all sorts of cool stuff with format files!
If we had full control, we could very easily load data in through BULK INSERT like this:
1 2 3 4 5 6 7 8 9 |
BULK INSERT dbo.BulkLoadTest FROM 'C:\temp\blt.csv' WITH ( BATCHSIZE = 1000000 ,FORMATFILE = 'C:\temp\blt.fmt' ,MAXERRORS = 2147483647 ,ERRORFILE = 'C:\temp\blt.errlog' ,TABLOCK ) |
That takes about 16 seconds on my laptop, dumping in a million row batch from the file we output. If you’re working with more rows, you may want to break the batch size down into smaller chunks.
Do anything for TABLOCK
The magic happens with the TABLOCK OPTION. Without TABLOCK specified, this runs for around 40 seconds. That’s a bit more than double the time involved when we use TABLOCK, and minimal logging occurs.
But of course, we’re working with a file loading GUI, or something that just doesn’t let us make those kind of changes. So what do we do?
There’s a silly old system stored procedure out there that lets us change certain default options for tables. It is called, shockingly, sys.sp_tableoption. How does it help us here? One of the options is “table lock on bulk load”. That would have been more dramatic if I didn’t give you the link first, huh?
1 2 3 4 |
EXEC [sys].[sp_tableoption] @TableNamePattern = N'dbo.BulkLoadTest' , @OptionName = 'table lock on bulk load' , @OptionValue = 'ON' |
This buys us just about the same real estate as if we used TABLOCK in the BULK INSERT statement. Here are the run times for each!
No TABLOCK:
1 2 3 4 5 6 7 8 9 10 11 12 |
/* SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 5 ms. Table 'BulkLoadTest'. Scan count 0, logical reads 1029650, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 19671 ms, elapsed time = 38798 ms. Table 'BulkLoadTest'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 16 ms, elapsed time = 3 ms. */ |
TABLOCK:
1 2 3 4 |
/* SQL Server Execution Times: CPU time = 15688 ms, elapsed time = 16595 ms. */ |
No TABLOCK, table option set:
1 2 3 4 5 6 7 |
/* SQL Server parse and compile time: CPU time = 15 ms, elapsed time = 192 ms. SQL Server Execution Times: CPU time = 15297 ms, elapsed time = 16490 ms. */ |
You don’t have to be an internals wizard
When minimal logging works, the difference in timing and I/O is quite apparent. There are undocumented functions you can use to further prove your point, just don’t run them unless you’re on SQL Server 2014, or 2012 SP2 or higher. On all the other versions, there’s a bug where the function creates threads that won’t go away until you restart. So, yeah. Don’t do that. Unless you’re into thread starvation.
If you’re a customer, this is a good way to prove to your vendor that minimal logging is cool thing to get working. Lots of applications rely on flat file loading. I’m a big proponent of using staging databases that sit in simple recovery model for stuff like this, so you’re not whomping your client-facing database, and you don’t have to switch it from full to bulk logged to use minimal logging.
Thanks for reading!
12 Comments. Leave new
Beware: A failure to use CHECK_CONSTRAINTS in conjunction with pre-SQL Server 2012 Mirroring will make your secondary assert that its tlog is missing an entry for a (parent system) lock on the checked table (and the Secondary will have to be rebuilt from scratch). See https://support.microsoft.com/en-us/kb/2700641.
interesting, in my lab tests it seems that while each bulk insert statement took significantly less time, total execution time of the query was actually longer (approx. ~20% longer). so I agree the tablock works as you state but there must be some overhead somewhere that I cannot account for.
Lock escalation auto fired when batch size 5000 and above so no need.
It is good taste, and good taste alone, that possesses the power to sterilize and is always the first handicap to any creative functioning.
Hi Brent!
In my test, bulk insert operation is not minimally logged when the clustered table has data.
The query uses tablock hint!
Mehdi – so let me try to rephrase your comment to make sure I understand.
You did a different demo than what’s in the blog post, and you got different behavior?
Yes.
But I did a lot of tests around trace flag 610 and tablock hint.
Insert into select operation is not minimally logged when the clustered table has data.
Contrary to what is stated in the link below:
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd425070(v=sql.100)
Also, the below link shows mans tests:
https://gavindraper.com/2018/05/29/SQL-Server-Minimal-Logging-Inserts/.
Correct. Going forward, you’ll probably find the best experience if you repeat the same demo in the blog post that you’re commenting on. If you want to learn other lessons by trying other things, and you want to ask questions about those, head to a Q&A site like https://dba.stackexchange.com. Cheers!
Many tests.
Mehdi, you’re not hearing me, so no point in me continuing to talk. Instead, I’ll dance. You can’t see me dancing, but trust me, it’s terrible. Later!
Minimal Logging is nice indeed.
However – I am interested in the opposite case.
I wonder if / how can one disable any kind of minimal logging whatsoever (temporarily/permanently?)
We have an application that performs a periodical pass on the TLOG contents, while analyzing certain row data occurences and values.
Minimal logging would “hide” pieces of those important values.
This is also required for the case of UPDATE operations, which are normally minimally logged (unless MS-REPLICATION or MS-CDC is set).
Is there any chance to achieve that?
Thanks.
Hillel.
For unrelated questions, your best bet is a Q&A site like https://dba.stackexchange.com.