Blog

Enabling Query Store in Azure SQL Database

Enter Query Store

Query Store, in short, is a way to track query performance over time. In Microsoft’s words, “The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review”. It’s like sys.dm_exec_query_stats but it persists across reboots! And it has execution plans!

The Query Store is a great feature for on premises SQL Server backed applications, but it adds significant value in Azure SQL Database. Instead of potentially losing execution history and execution plans, developers can see the history of query performance and execution plans across reboots.

Enabling Query Store in Azure SQL Database

Point SSMS at your Azure SQL Database instance and run:

ALTER DATABASE my_databas SET QUERY_STORE = ON;

No, really, it’s that easy.

How Do I Really Know Query Store is Enabled?

Figuring out if Query Store is enabled is pretty easy, too. All you need to do is look at sys.database_query_store_options. The following query should give you exactly what you need to know about the Query Store:

SELECT  desired_state_desc ,
        actual_state_desc ,
        readonly_reason, 
        current_storage_size_mb , 
        max_storage_size_mb ,
        max_plans_per_query 
FROM    sys.database_query_store_options ;

As long as things are running correctly, the desired and actual states should read either ALL or AUTO. For more information about using Query Store, check out Monitoring Performance by Using the Query Store.

The Takeaway

Azure SQL Database instances can be rebooted at any time. It’s important to keep this in mind and plan accordingly, not just in the application but also in how you approach tuning your T-SQL. You should enable the Query Store in Azure SQL Database instances where you care about performance.

Getting JSON out of SQL Server

SQL Server 2016 brings support for JSON. For those of you who don’t know, JSON is JavaScript Object Notation – it’s a way of representing application level objects as a string. Imagine XML without anywhere near as much line noise, and you’re pretty much there.

Our Sample Document

In this example, to retrieve orders and their line items from the AdventureWorks database. We need to get the data out in this format for fast processing. In the end, our data should look something like:

{ "orders" : 
    [ /* an array of orders */
        { "order" : 
            { "number" : "123456" ,
                /* more stuff goes here */
                "items" : 
                    [
                        { "item": { /* item info goes here */ } }
                    ]
            }
        }
    ]
}

Holy cow, that’s a lot of stuff just to represent and order and the line items.

Here’s our starter query:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        d.ProductID AS [item.product_id],
        d.OrderQty AS [item.quantity],
        d.UnitPrice AS [item.unit_price],
        d.UnitPriceDiscount AS [item.discount]
FROM    Sales.SalesOrderHeader h
        JOIN Sales.SalesOrderDetail d ON h.SalesOrderID = d.SalesOrderID
WHERE   h.SalesOrderNumber = 'SO43659'

First Attempt

The documentation hints that we might just be able to use `FOR JSON AUTO` option to automatically format our JSON. What could go wrong?

Syntax is hard

Syntax is hard

Our results aren’t what we want at all! The structure is close, but not quite what we’re looking for.

Getting the Structure Right

We are getting all the data we want, but we’re not getting the structure that we want. Our requirements say that we need to have and `orders` array and then each `order` should be a separate item in that array. An orders line items should also be stored in a separate array.

The square brackets make an array in JSON. The curly brackets denote an object.

Using the root() function will nest our results in a root object. In this case, we can use root('orders') to create a basic orders object to hold our list of orders.

Making each item part of an array inside the order is going to be trickier. We can still use FOR JSON AUTO to get the job done:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS [item.discount]
         FROM   Sales.SalesOrderDetail d 
         WHERE  h.SalesOrderID = d.SalesOrderID 
         FOR    JSON AUTO) AS items
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'
FOR JSON AUTO, ROOT('orders')

We’re almost right!

So wrong I can taste it.

So wrong I can taste it.

Well, that’s … something. To get the output we want, we’re going to have to use a correlated subquery – fans of generating XML in SQL Server may remember jumping through hoops to get the right structure out of SQL Server.

We’re on the right track, but what we really need is better formatting.

The Best Solution, For Now

Since the FOR JSON AUTO isn’t generating the JSON output that we want, we can go down a manual route and use FOR JSON PATH. If this is starting to sound like XML, you’re absolute right.

After attaching a debugger to SQL Server, Paul White shows that the JSON writer is using the old XML code under the hood:

Anyway, the SQL to generate the right JSON document:

SELECT  h.SalesOrderNumber AS [order.number],
        h.DueDate AS [order.due_date],
        h.ShipDate AS [order.ship_date],
        h.PurchaseOrderNumber AS [order.po_number],
        h.TotalDue AS [order.total_due],
        h.Freight AS [order.freight],
        (SELECT d.ProductID AS [item.product_id],
                d.OrderQty AS [item.quantity],
                d.UnitPrice AS [item.unit_price],
                d.UnitPriceDiscount AS [item.discount]
         FROM Sales.SalesOrderDetail d 
         WHERE h.SalesOrderID = d.SalesOrderID
         FOR JSON PATH
        ) AS [order.items]
FROM    Sales.SalesOrderHeader h
WHERE   h.SalesOrderNumber = 'SO43659'
FOR JSON PATH, ROOT('orders')

Check out the correct (and formatted) results!

Getting JSON out of SQL Server

It’s easy to get incorrectly formatted JSON out of SQL Server. If you do need to get well formatted JSON out of SQL Server be prepared to put in a lot of work coercing your query into the right shape. You may find that you need to do a large number of nested sub-queries, too.

The Fastest Way to Reconfigure a Bunch of Servers

… is to use a SQL Server where a robot does it for you!

I stole that from Kendra – someone else already wrote the sentence for me and why not keep re-using things, right?

Configuring SQL Servers the Usual Way

What’s the usual way to do this?

There are a few commercial tools on the market that let you deploy scripts to multiple servers, they all vary in how they apply the changes and deal with errors.

You could also roll your own tools. This gives you full control of the process, but you have to spend time implementing and maintaining code over time.

Finally, you could just hire junior DBAs and make them implement the changes individually. This will give all of your servers a delightful hand-crafted flair. Sure, the changes may or may not all be applied with equal rigor, but hey, you’re a job creator now.

SQL Server RDS Parameter Groups

SQL Server RDS is part of Amazon’s database as a service offering. We’ve covered RDS a couple times before and we’re still in love with RDS all these years later.

One of my favorite features of RDS is the idea of parameter groups. Parameter groups are a unified set of parameters. A parameter group contains a bunch of different things that we’d normally set up through trace flags and sp_configure changes.

Parameters everywhere

Parameters everywhere

How Can I Change a Parameter Group?

Changing parameters in a parameter group is incredibly easy. The first step is to create a new parameter group.

Creating parameter groups is hard!

Creating parameter groups is hard!

Once that parameter group is created, we can highlight the parameter group and click “Edit Parameters”.

Edit parameters with ease!

Edit parameters with ease!

The UI makes it relatively easy to sort and filter. Here, I’m looking to adjust how parallelism is handled on these particular RDS instances. These parameters are applied to all instances assigned to the parameter group; make sure that you put similar SQL Server instances in the same parameter group.

Change doesn't have to be hard.

Change doesn’t have to be hard.

If you’re unsure about what you’ve changed, you can even select “Preview Changes” and get a quick overview of current and new values.

It's almost like we can see the future.

It’s almost like we can see the future.

Applying Changes with a Parameter Group

Once you’ve changed the parameter groups, you’d be tempted to think the changes will be immediately applied to the servers. They won’t be applied immediately. You’ll still need to schedule a restart to each SQL Server RDS instance for the new parameter group settings to be applied.

This approach works well for many applications – you can restart a small portion of the servers in the system, make sure that the application is performing correctly, and then restart more servers. By leveraging the power of the cloud – scale out, not up – we can make operations easier and simpler.

What’s it mean for you?

Once you’ve gotten the hang of parameter groups, new SQL Server RDS instances can be created with the parameter group already applied. Existing SQL Server RDS instances can be moved into a parameter group. Managing settings across 10s or 100s of instances just got significantly easier – change a parameter group and you’re off to the races.

Brent says: But I put all this work into learning PowerShell! Oh, wait, no, I didn’t. Carry on.

Jeremiah retorts: Good news! There’s a PowerShell suppository toolkit.

Faster queries using narrow indexes and CROSS APPLY

It’s common to try to build the perfect index for a query. This index only uses the exact right columns for the key columns and other, presentational, data is pushed to the INCLUDE section of the index. While you can create a narrow key this way, the entire index row can become wide.

A Sample Report

For this blog post, we’re going to work with the StackOverflow dataset. For more information on getting started with the StackOverflow database, check out How to Query the StackExchange database.

We want to find interesting, unanswered questions that have been asked in the last 90 days.

We’ll start by creating some indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (CreationDate) 
    INCLUDE (Id, Score, OwnerUserId, Title) 
    WHERE (AcceptedAnswerId = 0) ;

CREATE INDEX IX_ScoredCommentsByParent
    ON dbo.Posts (ParentId, Id)
    WHERE (Score > 0);

CREATE INDEX IX_VotesByPostId
    ON dbo.Votes (PostId) ;

CREATE NONCLUSTERED INDEX IX_Votes_VoteTypeId_CreationDate
    ON dbo.Votes (VoteTypeId, CreationDate)
    INCLUDE (PostId, UserId) ;

And here’s our sweet query:

SELECT  TOP 2000
        p.Id ,
        p.Score ,
        p.Title, 
        p.CreationDate,
        p.OwnerUserId ,
        p.Score
        + (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1
                 WHEN v.VoteTypeId = 3 THEN -1
                 ELSE 0 END )
            / COUNT(*))
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    dbo.Posts AS p
        JOIN dbo.Votes AS v ON p.Id = v.PostId
                               AND v.VoteTypeId IN (2,3)
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
WHERE   p.CreationDate > '2013-06-08'
        AND v.CreationDate > '2013-06-08'
        AND p.AcceptedAnswerId = 0
        AND p.PostTypeId = 1
        AND EXISTS (SELECT  *
                    FROM    dbo.Posts a
                    WHERE   a.ParentId = p.Id
                            AND a.Score > 0)
GROUP BY p.Id, p.Score, p.Title, 
        p.CreationDate,
        p.OwnerUserId
ORDER BY SortWeight DESC
;
GO

This query uses a lot of CPU and uses nearly 5 seconds of CPU time on my machine between parsing, compilation, and execution. Overall it does around 1 million logical reads. That’s a few too many reads for my liking. I think we can make it better.

Simple, Small, Indexes and APPLY

We need a way to make this query faster and lighter. First, we get rid of the wide index we created before.

DROP INDEX dbo.Posts.IX_UnansweredQuestions ;

We add two more indexes:

CREATE INDEX IX_UnansweredQuestions
    ON dbo.Posts (PostTypeId, AcceptedAnswerId, CreationDate) 
    INCLUDE (Id, Score) ;

CREATE NONCLUSTERED INDEX IX_Posts_AcceptedAnswerId_PostTypeId_CreationDate
    ON dbo.Posts (PostTypeId, CreationDate)
    INCLUDE (Id, Score);

 

And then we re-write the query with magic:

WITH recent_votes AS (
    SELECT  v.PostId,
            v.UserId,
            (SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 
                     WHEN v.VoteTypeId = 3 THEN -1
                     ELSE 0 END ) 
                / COUNT(*)) AS VoteWeight
    FROM    dbo.Votes AS v
    WHERE   v.CreationDate > '2013-06-08'
            AND v.VoteTypeId IN (2,3)
    GROUP BY v.PostId, v.UserId
),
posts AS (
    SELECT  p.Id ,
            p.Score 
    FROM    dbo.Posts AS p 
            LEFT JOIN dbo.Posts AS answers ON answers.ParentId = p.Id
                                              AND answers.Score > 0 
    WHERE   p.CreationDate > '2013-06-08'
            AND p.AcceptedAnswerId = 0
            AND p.PostTypeId = 1
            AND answers.Id IS NOT NULL
)
SELECT  TOP 2000
        p.Id ,
        p.Score ,
        _.Title, _.CreationDate, _.OwnerUserId ,
        p.Score 
        + SUM(v.VoteWeight)
        + SUM(COALESCE(uv.Reputation, 0)) AS SortWeight
FROM    posts AS p
        JOIN recent_votes AS v ON p.Id = v.PostId
        LEFT JOIN dbo.Users AS uv ON v.UserId = uv.Id
        CROSS APPLY (   SELECT  Title, CreationDate, OwnerUserId
                        FROM    dbo.Posts p2
                        WHERE   p2.Id = p.Id ) _
GROUP BY p.Id, 
        p.Score,
        _.Title, _.CreationDate, _.OwnerUserId
ORDER BY SortWeight DESC
;
GO

The new query performs 800,000 reads but it only takes 1.2 seconds to execute and it goes parallel (1.9 total CPU seconds are used). This is a big improvement over the first query, and I’ll explain how we got there.

What makes the CROSS APPLY faster?

This query has a lot of differences from the first query. Let’s break down what’s going on in here.

We’ve split out the core of the query into two CTEs. In this case, we’re using the CTEs as optimization fences. Rather than let SQL Server do its own thing with optimization, we’re tricking it into providing the optimization path that we want – recent_votes and posts will be optimized separately. Because of this optimization fence, we’ve been able to construct very narrow indexes that bring back the data that we need for each.

After tricking SQL Server into optimizing our CTEs separately, we use CROSS APPLY to pull back the details we want based on a primary key lookup. This helps our query run faster – we need fewer pages allocated for our indexes which means that SQL Server estimates that less memory will be used. The results speak for themselves, too – the second query performs 1/4 the I/O of the first query.

What’s the Result?

The end result is that we have query that’s fast. Our indexes are free of bloat caused by wide INCLUDE columns.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Do Foreign Keys Matter for Insert Speed

Do you have the key?

Do you have the key?

Do foreign keys hurt the performance of inserts? Okay, we all know that foreign keys do require some work, but the crazy people of the internet have wildly varying guesses as to just how much work is involved. Estimates varied from “barely any” to “as much as 50% slower”.

I figured that you were going to do the work, so I might as well do it.

How did the test work?

I created two tables – parent and child. The parent table has an int primary key and a fixed width filler column. I varied the size of the filler column across multiple test runs, just to make sure I wasn’t biasing the test by using an artificially large parent table. The testing happened with a parent table that had 2 rows per page, 4 rows per page, and 8 rows per page.

You can find the test code in fk.sql.

Testing SQL Server insert speed with foreign keys

The test code inserts 1,000,000 rows in batches of 5,000 rows. Although this isn’t strictly realistic, it’s better than timing a single batch of 1,000,000 inserts.

What happens during the test?

It turns out that no matter what the page size it takes approximately 40ms to insert 5,000 rows with the foreign key in place. When we remove the foreign key, inserting 5,000 rows takes ~26ms.

Although the difference looks big at 5,000 row batch sizes, each insert is taking, on average, 0.0072ms with the foreign key and 0.0052ms without the foreign key. These differences are hardly worth noting. And, let’s face it, 2 microseconds is a pretty minor price to pay for data integrity.

How much do foreign keys affect single row inserts?

I ran the test one last time with a batch size of 1 row, just to see how much difference there was in insert performance.

FK present? duration
yes 253,896 ms
no 241,195 ms

When it comes down to single row performance, the difference is neglible. We’re spending all of our time waiting for other parts of the system.

How real is this test?

It’s not terribly real, but it’s real enough to say “Yes, foreign keys have overhead” and follow that up with “No, you shouldn’t care, just add the keys.” The tests were done in VMware and the hardware in question is mid-2012 Retina MacBook Pro.

In the real world, we’ve got a lot more than inserts happening, but it’s worth quantifying the cost of a foreign key lookup and realizing that it’s worth having foreign keys.

Kendra says: Wanna know if your foreign keys are ‘trusted’? Check out our script here.

Finches and Job Roles

Developers – how much operations work are you doing?

DBAs – how much development do you do?

The Separation of Duties

For most of us, we stick to our assigned job role. Developers write code and then throw it over the wall for the ops team to put in place. If there’s a problem, there is inevitably finger pointing. Eventually a developer gets over to an operations person’s cube and the problem gets solved.

It’s rare that we see any cross disciplinary skill sharing.

In Origin of the Species, Darwin notes that the finches in the Galapagos have become very specialized, depending on the island where they were found. The finches were so specialized that Darwin originally miscategorized a subspecies of finch as a wren. He goes so far as to say:

Seeing this gradation and diversity of structure in one small, intimately related group of birds, one might really fancy that from an original paucity of birds in this archipelago, one species had been taken and modified for different ends

What Do Birds Have To Do With Work?

Darwin’s finches all had very different beak sizes and shapes – each finch’s beak had adapted to a different food source. Even though they’re all finches, they worked in very different environments.

What about you? How specialized are you?

I consider myself a developer – I’ve spent most of my career writing applications. Some of those applications focus largely on SQL Server. But I can also configure HA/DR solutions, set up hardware, and plan storage deployments.

One of the problems with overspecialization is that it becomes difficult to survive if your environment changes.

Avoid Overspecialization

I’m a big fan of mixing job roles. Developers should provide operational support for their features. Operations staff should take part in developing tools or even features for the application. Having a well-rounded set of skills makes it easier to survive when your environment changes.

SQL Server Version Detection

Every now and then, you need to figure out which version of SQL Server you’re using without knowing in advance. This might happen in a script (like sp_BlitzCache) or you might be using it in a migration.

Getting the SQL Server Version with @@VERSION

THe first thing that comes to mind is @@VERSION. On my SQL Server 2014 RTM installation, this returns an ugly string. Unless you like parsing multiline strings in T-SQL, you’re going to be left using LIKE on a hideous string like this one:

Microsoft SQL Server 2014 - 12.0.2000.8 (X64) 
    Feb 20 2014 20:04:26 
    Copyright (c) Microsoft Corporation
    Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)

Imagine if you needed to get the build number of SQL Server out of there to validate against a list of builds to see if you had the current version of SQL Server installed. Pulling the build out of the major.minor.build.revision string isn’t easy when you first have to pull that string out of a bigger string.

There’s a better way than mucking around in @@VERSION.

Getting the SQL Server Version with SERVERPROPERTY

The SERVERPROPERTY built-in function has a lot of great functionality. We’re only going to concern ourselves with the ProductVersion portion of SERVERPROPERTY. How do we use it?

SELECT CAST(SERVERPROPERTY('ProductVersion') as nvarchar(128))

That should return nothing but the major.minor.build.revision for the current version of SQL Server. In my case, this returns: 12.0.2000.8.

Rather than parse the values myself, I created a temporary table with computed columns:

CREATE TABLE #checkversion (
    version nvarchar(128),
    common_version AS SUBSTRING(version, 1, CHARINDEX('.', version) + 1 ),
    major AS PARSENAME(CONVERT(VARCHAR(32), version), 4),
    minor AS PARSENAME(CONVERT(VARCHAR(32), version), 3),
    build AS PARSENAME(CONVERT(varchar(32), version), 2),
    revision AS PARSENAME(CONVERT(VARCHAR(32), version), 1)
);

INSERT INTO #checkversion (version)
SELECT CAST(SERVERPROPERTY('ProductVersion') AS NVARCHAR(128)) ;

The common_version column will display the version of SQL Server as a floating point number – 12.0 for SQL Server 2014 or, in the case of SQL Server 2008R2, 10.5. When you’re targeting scripts for specific families of SQL Server, knowing the high level version can make scripting a lot easier.

What’s the final output look like for me?

version common_version major minor build revision
12.0.2000.8 12.0 12 0 2000 8

There you have it

That’s really all there is to finding and parsing the SQL Server version. Just a simple insert into a temporary table and a few computed columns and you’re ready to go.

Moving Databases Made Easy – SQL Server on a File Share

We’ve all had to move a database. For databases above a small size, this is a painful process. While there are a number of ways that you can move a database without downtime, they all involve a lot of extra work. What if there were something easier?

Every database is unique and precious.

Every database is unique and precious.

What’s the Real Problem?

For most people, moving a database is a one time thing – you’re moving to a new server. That’s pretty much it. I spent 5 minutes staring at my ceiling trying to figure out other reasons that most people might move a database from one SQL Server to another.

For some people, though, databases could move on a regular basis. Think about it like this – you host a software as a service (SaaS) application. You’ve got one database per relative unit of work (customer, forum, cat picture collection). These databases can sit idle for months before, suddenly, there’s tremendous activity.

Normally, you wouldn’t make each SQL Server capable of handling peak load for any section of your SaaS. Instead, you would probably have a few servers ready to handle the worst load and move your busiest database over to those servers.

Typically, this is going to involve a small outage and setting up something like database mirroring or log shipping in order to move the database with minimal down time.

SMB to the Rescue!

Microsoft has supported housing SQL Server databases on a file share for a while now. If you’re not familiar with the idea, I recommend checking out the MSDN article Install SQL Server with SMB Fileshare as a Storage Option.

Moving a database with SMB is straight forward – detach the database from the old server, attach the database to the new server, bring the database online. For most of us, this quick change is much preferred to other methods of moving the database.

SMB File Shares and You

Although SMB file shares are a new technology, they’re worth considering for new SQL Servers. SMB file shares make it easy to move databases between SQL Servers. If you want to learn more about using SMB check out the MSDN article on using SMB and SQL Server.

Brent says: If your storage area network (SAN) is using older connectivity methods like 2Gb or 4Gb FC, and your Ethernet is 10Gb, you might even see a performance boost.

How Do I Know My Query Will Be Fast In Production?

We’ve all had this problem before – the production server is significantly more powerful than the development server. You tune a query and you’re pretty sure you’ve made it faster, but there’s that doubt in the back of your mind. There are so many things that can change performance – memory, CPU count, disk performance – how can you tell that you’ve made a query better?

Measuring Slowness

When you know you have a slow query that you need to tune, run it in production one last time. But, before you do that, gather CPU and IO metrics using STATISTICS IO and STATISTICS TIME:

SET STATISTICS IO, TIME ON; 

EXEC dbo.MyAwfulQuery @slow = 'yes';

You’ll get a bunch of jibber jabber in the “Messages” tab that documents how much CPU time was spent compiling and running your query as well as information about logical IO. Save this off to a text file for now.

Measuring Fastness

Go to your dev server. Tune your query. I’ll wait.

As you’re tuning your query, pay attention to the CPU time and logical IO numbers that you’re getting back from STATISTICS IO and STATISTICS TIME. These are a relative indication that things are getting better. If there are CPU differences, you’re likely to see different CPU numbers on that dev machine, but that’s OK. We’re looking to compare total logical IO and total CPU time in milliseconds.

If you want an easy route to compare your results, you can paste the output from STATISTICS IO and STATISTICS TIME into statisticsparser.com. This will go through the results and push the data into a nice table for your perusal.

How Much Faster Will my Query Be?

Using STATISTICS IO and STATISTICS TIME to tune helps me figure out how well I’m doing compared to where I started from. Some queries move around a lot of data. In these cases, it can help to know how much faster dev is than production.

If you really want a reasonable comparison between dev and production, you can find some guidance in the Fast Track Data Warehouse Reference Guide. There’s a section of this document about measuring the MCR (Maximum Consumption Rate) and BCR (Base Consumption Rate) of your SQL Server. These numbers reflect 100% cache reads and 100% disk reads, respectively.

Once you have MCR and BCR, you make effective predictions about how query performance could differ between the two. This gives you the power to predict, with some margin of error, how fast newly developed queries can run.

But is it Faster?

Between STATISTICS IO and STATISTICS TIME you can determine if a query is going to use fewer resources. The MCR and BCR give you relative numbers for how fast a server can process data. Between these measurements, you can predict how a query will perform between environments and what kind of impact your changes will have.

Brent says: as you get used to doing these measurements, you’ll also be able to tell if a new dev-only query is going to be fast enough in production, too. You’ll know how many logical reads is slow (or fast) in your environment.

Erik says: The hardest thing for me to ever get was commensurate data to test on in dev. If you don’t have it, get it. If you have it, don’t waste it.

Identity Columns in Oracle

Many databases have the ability to create automatically incremented numbers. In SQL Server, we can use an `IDENTITY` property on a column to get autoincrementing numbers. But what can do we in Oracle?

Sequences

Sequences work in all recent versions and editions of Oracle. The default way to do this is pretty simple:

CREATE TABLE my_table (
    id NUMBER PRIMARY KEY,
    whatever VARCHAR2(30)
);

CREATE SEQUENCE my_table_seq ;

CREATE OR REPLACE TRIGGER my_table_insert 
BEFORE INSERT ON my_table
FOR EACH ROW
BEGIN
    SELECT my_table_seq.nextval 
    INTO   :new.id
    FROM   dual ;
END ;

/* Or, if you hate triggers, you can make sure all inserts look like: */
INSERT INTO my_table (id, whatever)
VALUES (my_table_seq.nextval, 'Erik');

Now that SQL Server supports sequences, you could use sequences to help with portability between both systems, but there’s a gotcha – SQL Server’s next value syntax is different from Oracle’s. That means you’d need to use a trigger based solution if you didn’t want code to change.

Identities

What about identities? With SQL Server, we just mark a column as having an identity value. Magic! It’s done for us. Oracle is a commerical database, surely it has a magic option, right?

Oracle introduced identity columns with Oracle 12c. This lets developers keep using the code they’re used to. Check this out:

CREATE TABLE my_table (
    id NUMBER GENERATED AS IDENTITY PRIMARY KEY,
    whatever VARCHAR2(30)
);

This lets you keep using the same kind of inserts that you would normal use for SQL Server, with one execption. Oracle gives you some flexibility for generating identities – you can generate them `ALWAYS` (the default) or `BY DEFAULT` (when a value isn’t provided).

If you really want to mimic SQL Server’s identity behavior, including `IDENTITY INSERT` then you can create your table like this:

CREATE TABLE my_table (
    id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    whatever VARCHAR2(30)
);

You can even go so far as to protect yourself from trying to insert `NULL` values into the table using the `ON NULL` clause:

CREATE TABLE my_table (
    id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY PRIMARY KEY,
    whatever VARCHAR2(30)
);

Sequences and Identities in Oracle and SQL Server

If you want to make sure you can reuse as much code as possible between Oracle and SQL Server, identity columns are a safe bet. The vast majority of code won’t need to change and your application will be none the wiser.

Brent says: when people talk about porting their application from one database back end to another, this is a good example of how it’s so tricky. Sure, both databases have exactly the same functionality available, but even something as simple as an identity column is syntactically different between them.

Erik says: In case anyone’s wondering, it didn’t hurt getting inserted into an Oracle table. The bathrooms in Tablespace were very clean.

css.php