The Simplest Sample Database Ever
First Update: Jan 25, 2017
Last Update: Jan 25, 2017
Welcome to the documentation page for the Orders database. It was created to have an easy to manage workload run to test the throughput of a SQL instance. If you want a more standardized TPC workload, HammerDB is a good choice.
We used our own method because we wanted to be able to release everything as a FOSS project to the public. The database and all scripts are governed by the MIT license. If there’s enough demand, we’ll put them on GitHub.
With that out of the way, let’s talk about what’s inside.
The Order database is free to download as a .torrent. Much like with the StackOverflow database, we have a dedicated seedbox for it. If you have trouble downloading it, it’s most likely a firewall issue, and we can’t troubleshoot that. Try it from home instead of work! Slacker! Now you have to go say you’re sorry to the network admin.
There is one table, which is around 24 GB, and 111,527,788 rows. Or records, if you’re weird. The table looks like this:
CREATE TABLE [dbo].[Orders]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [OrderDate] [datetime] NOT NULL, [ShipDate] [datetime] NULL, [CustomerID] [int] NOT NULL, [SalesPersonID] [int] NOT NULL, [OrderSubTotal] [numeric](18, 2) NOT NULL, [SalesNotes] [varchar](8000) NULL, CONSTRAINT [PK_Orders] PRIMARY KEY CLUSTERED ([ID] ASC)
It’s intentionally narrow so you, the users, can add columns (or fields, if you’re weird) to mimic situations you have.
There are also some thoughtful, artisanal indexes made to support the primary workload (which we’ll get to in a minute). You can remove and alter these as you wish to test different indexing scenarios. Whatever blows your hair back, kiddo!
Keep in mind, you don’t need to create any of this stuff; it’s already in there for you. This is just so you can see what you’re getting, and have a drift reference in case you make a bunch of changes.
CREATE NONCLUSTERED INDEX [ix_Orders_CID_OST_OD] ON [dbo].[Orders] ( [CustomerID] ASC, [OrderSubTotal] ASC ) INCLUDE ([OrderDate]) GO CREATE NONCLUSTERED INDEX [ix_Orders_CID_SPI_OD] ON [dbo].[Orders] ( [CustomerID] ASC, [SalesPersonID] ASC, [OrderDate] ASC ) GO CREATE UNIQUE NONCLUSTERED INDEX [ix_Orders_ID_SN] ON [dbo].[Orders] ( [ID] ASC ) INCLUDE ([SalesNotes]) WHERE ([SalesNotes] IS NULL) GO CREATE UNIQUE NONCLUSTERED INDEX [ix_Orders_OD_SD] ON [dbo].[Orders] ( [OrderDate] ASC, [ID] ASC ) INCLUDE ([ShipDate]) WHERE ([ShipDate] IS NULL) GO CREATE NONCLUSTERED INDEX [ix_Orders_SID_OD_OST] ON [dbo].[Orders] ( [SalesPersonID] ASC, [OrderDate] ASC ) INCLUDE ([OrderSubTotal]) GO
Stored Procedures, Agent Jobs, and Scripts
These stored procedures are what our Agent Jobs call to generate different types of work. We have one for reads, and three for writes. There is non-stored proc code to generate TempDB and reporting-type workload on the table as well.
The stored procedures are:
- dbo.GenerateInserts: Inserts one day of data spread out over 5000 rows, one row at a time, and randomly insert nonsense into the Notes column. This stored proc is called by an Agent Job.
- dbo.GenerateInserts_Big: Same as above, except it will insert one year of data. You can edit the WHILE loop at the beginning of both of these to insert rows for different spans of ime. This stored proc is not called by an Agent Job, because it causes really bad deadlocks. You can do that if you want to cause deadlocks, sicko.
- dbo.GenerateReads: Runs some simple aggregation and range queries all in a row. Called by an Agent Job.
- dbo.UpdateShipped: Updates the ShipDate column where it’s NULL and OrderDate falls into a one day range. Called by the Generate Write Activity Agent Job. For some reason, it doesn’t have Generate in the name. I don’t know why.
The Agent Jobs are:
- Generate Read Activity: calls reads proc
- Generate TempDB Activity: calls simple SELECT … INTO a temp table of 5000 rows from Orders table
- Generate Write Activity: calls writes proc
- Hourly Report Refresh: calls two large aggregation queries that span the entire Orders table
The Measuring Scripts are:
- Backup sizes
- Backup speeds
- DBCC CHECKDB speeds
- Index maintenance speeds
The full text of everything is available here, and will be kept up to date. The method we use to call these stored procs is via ostress in the RML Utilities package. The measuring scripts, you just run on your own.
All of the Agent Jobs are set to run every 10 seconds, though I don’t expect them to run that frequently, it’s just to reduce downtime between calls. The only exception is, of course, the Hourly Report Refresh. Which runs hourly. If that was unclear, head on over here.
The Agent Jobs should be disabled by default, so you’ll have to enable them after verifying that they work correctly. Agent Jobs are the one thing you will have to create on your TEST server, because they’re not part of the Orders database that you attach. Obviously. If that wasn’t obvious, head on over here.