Tag Archive: sql server

SQL Server RDS

Amazon Web Services (AWS) is a platform that offers a ton of services ranging from block storage, operating systems, to mail queues, DNS, and caching. When you say “cloud”, a lot of people think of AWS. AWS hosts a variety of businesses, large and small. Unfortunately, you still have to administer and configure your servers. That’s not a bad thing, but it’s something that many teams have to keep in mind. Configuring SQL Server in AWS environment isn’t the easiest thing on earth: the technology behind the AWS platform changes on a regular basis and it sometimes changes in subtle ways that can change how SQL Server performs.

Relational Database Service

Amazon Relational Database Service (RDS) is Amazon’s attempt at creating a hosted database platform on top of the other AWS services. The point is to take a lot of the headache out of managing relational databases. Instead of setting up and configuring servers for MySQL or Oracle, customers can have a database server up and running in a few clicks. Backups happen automatically, restores are easy to accomplish, and instances can be easily added or removed.

The problem is that there hasn’t been a way to do this with SQL Server. Companies using the Microsoft stack but invested in AWS have had to set up and configure their own SQL Servers. While this is normal for many businesses, for start ups this is an extra expense – it means that the servers are either configured by developers or operations staff, or that the company has to bring a DBA on board sooner than they had planned.

Enter SQL Server RDS

Amazon have launched SQL Server RDS: a hosted SQL Server service. Users can easily create instances of SQL Server through a wizard, web service, or command line script. It’s easier than ever to scale your infrastructure in the AWS environment.

Previously it was possible to create a new instance of SQL Server by spinning up a new SQL Server equipped AMI, but it still took time for both Windows and SQL Server to finish the sysprep process. On top of the instance set up time, a DBA would still have to configure the instance with additional security and configuration settings. In short, you could spin up instances of SQL Server in Amazon, but someone on your own team was still responsible for patching and maintenance.

Like every other AWS service, there’s a free usage tier. Developers, or really lean applications, can start out using a free version of the software and migrate up to larger instances as needed. The majority of the instances types are supported, apart from a notable absence of cluster compute instances.

SQL Server RDS gives you most of the features of SQL Server – it compares pretty closely with SQL Server and many features are fully supported. Amazon are pretty explicit about which features aren’t supported, so if you’re expecting a SQL Server Agent, maintenance plans, or the DTA, then you’re out of luck.

The Limitations

There’s a lean set of features that can be relied on to work even if your SQL server instance needs to be restarted or if the OS disk becomes corrupted. Features that rely on MSDB or any other system databases simply can’t be used because there’s no guarantee that you’ll be attached to the disk when your new OS volume is spun back up. By making sure that the OS volume never changes, it’s very easy to upgrade an instance – you just attach a new OS volume with the requisite changes at the next reboot. Because of limited access to the system database, many SSMS standard reports don’t work.

SQL Server RDS imposes a hard limit of 30 databases per instance of SQL Server. While that seems purely arbitrary, remember that there’s an unknown pool of storage underneath the instance that is invisible to you as a consumer of the product. On that note, you also can’t scale the storage underneath a SQL Server RDS instance – once you set up your storage at a certain level, that’s the amount of storage that you are stuck with.

Many DBCC commands don’t work at all – DBCC DBINFO and DBCC LOGINFO were explicitly denied. I did, however, discover that I could run DBCC FREEPROCCACHE as often as I wanted.

Keep in mind that these limitations may change over time – new features are added on a regular basis.

Configuring SQL Server

Some system tools, like sp_configure are not available as a way to set server level settings – you can run sp_configure and see the configuration values, but you can’t change them directly in SQL Server. Instead, you create a SQL Server parameter group.

DBAs may find it frustrating to use command line tools to set up SQL Server parameters, however once a parameter group has been created, it’s incredibly easy to apply the parameter group to any number of SQL Servers at the same time. The upside is that both trace flags and sp_configure settings can be modified at the same time. The downside is that the commands to set up the parameter group are not at all intuitive. Changing a the max server memory looks something like this rds-modify-db-parameter-group sqlservergroup --parameters "name='max server memory (mb)', value=65536, method=immediate". Intuitive? No. Powerful? Yes. Once you’ve set up the parameter group, you just tell RDS to create new SQL Servers inside that parameter group and they will automatically be started with those trace flags and settings in place.

Scheduling maintenance and backups

Backups

How are backups configured and maintained? By you… in a way.

During instance creation, you specify a backup retention period between 0 and 30 days. You can also specify your backup window and maintenance window – just in case you want to specify quiet hours when maintenance should occur. Point in time restore is available via a GUI, command line tools, and that’s it. Keep in mind that these aren’t database level restores,these are instance level restores – during the backup, writes will be quiesced, the storage will be snapshotted, and then writes will resume again. Point in time recovery is also available and transaction logs are backed up every 5 minutes. There’s more information and details in the AWS RDS documentation.

And A Gotcha

When you create a user, they’re automatically given access to the rdsadmin database. This is a database that the RDS team have created to provide access to some of the low level functionality, like reading trace files and other actions that you normally must be a sys admin to perform. The big gotcha here is that any user who can access the SQL server can, by default, access the rdsadmin database and execute the stored procedures in the database. Admittedly, the stored procedures that could potentially cause any harm aren’t accessible, but the unprivileged user can still see them.

Running sp_helprotect shows that guest has privileges to execute these stored procedures. Although guest can run the stored procedures, the procedures that can potentially cause changes will fail with errors like Login needs CREATE ANY DATABASE permission. So while a malicious user could determine that you’re running your system on RDS, there’s only a limited amount they could do from that point.

When Would You Use This?

Let’s be fair, it isn’t apparent to everyone when they might want to use a hosted SQL Server. There are a lot of reasons why you would want to stand up a brand new SQL Server quickly.You may want to create multiple backend servers as you scale out your application – maybe you create one instance per customer to make billing easier. Developers can spin up a full copy of the application stack to test how changes will work – they no longer need to maintain a copy of the database locally, instead a clone of the production system can be restored and made available in a matter of minutes.

How Does It Perform?

SQL Server RDS performs admirably. I ran a variety of different benchmarks against my test instances and I maxed out the CPU of the RDS instance at 42% while running around 710 transactions per section and sustaining a decent rate of throughput. The biggest bottleneck was the CPU and network connection on the clients running the benchmarking tools. Obviously, benchmarks aren’t a real measure of anything apart from the benchmark’s performance, so test your application appropriately.

Alerting and Monitoring

Amazon haven’t forgotten about setting up alerts and monitoring. Every piece of AWS comes with instrumentation in the form of Amazon CloudWatch. CloudWatch is a set of metrics that are collected from underlying OS or application server. Even though you have no access to the underlying OS, SQL Server RDS exposes many metrics that can help you determine whether you need to move to a larger instance, monitor CPU, tune for memory, or purchase more storage. It’s even possible to stack graphs to see how two instances compare to each other.

Monitoring in the AWS Management Console

Beyond that, it’s easy to make alarms on your instances that will alert you when your instance goes beyond specific performance metrics.

Creating an Alert

Once you’ve configured your alerts, you can even see how current instance performance compares to the alerts you’ve set up. All of your alerts for all AWS services are located in the same place, so you’ll be able to see just how your entire virtual infrastructure is performing.

The AWS Alerts Dashboard

How’s RDS Different From SQL Azure?

Microsoft’s database-in-the-cloud solution has a subset of SQL Server’s capabilities. It supports less SQL Server features, less datatypes, and smaller databases. Don’t get me wrong – it’s still a solid product – but it’s half the man that SQL Server is.

Amazon SQL Server RDS is full-blown SQL Server. You don’t have to change anything whatsoever about your apps – if it runs on-premise, it likely runs in Amazon RDS. Your tools like SSMS won’t know they’re talking to anything but a regular SQL Server instance.

However, neither SQL Azure nor RDS will act as a log shipping target or a database mirror: you won’t be restoring databases directly. Both Microsoft and Amazon try to automate routine maintenance for you, but as a result, routine maintenance tasks aren’t available to you. This limitation on database restores means Amazon RDS won’t replace every SQL Server running in Amazon EC2 today.

What’s an RDS SQL Server Cost?

The Amazon RDS pricing page has tabs for MySQL, Oracle, and SQL Server, and unlike SQL Azure, it gets complicated fast. Pricing depends on:

  • The SQL Server Edition – Express, Web, Standard, or bring-your-own
  • The hardware size – from micro (630mb RAM) to quadruple-extra-large (68GB memory)
  • The storage size – from 20GB to 1TB
  • The datacenter you choose – Oregon is cheaper than Tokyo, for example
  • How much bandwidth you use

And more. To get started, a micro instance with 20GB of storage running SQL Server Express Edition is just $.035 per hour, or about $306.60 per year. A Standard Edition, quad core, 15GB memory instance is $1.22 per hour, or $10,687.20 per year, but it drops to $6,219.60 if you bring your own licensing or $6,482.40 if you use Web Edition. Reserved instances become even cheaper by the hour in exchange for a one time upfront payment. The break even point typically comes at the three month mark – after three months you were better off buying a reserved instance. That quad core 15GB instance drops down to $3206.16 per year, plus a one time payment of $5730 for a 3-year reserved instance.

One More Thing

Just one more thing: the sp_Blitz that we know and love? When you use sp_BlitzUpdate to grab the latest version, you’re fetching it from a SQL Server RDS instance right now.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

How to Tell if TempDB Is a Performance Problem

Years ago, I came across an article about a gentleman who made a cat camera. Without boring you too much, this camera gave him a detailed view (including GPS info) about what his cats got up to all day. I got excited about the idea of finding out what my pets did all day. Not having the means to build a cat camera of my own and being far too cheap to buy one, I rigged up a web cam at home to watch the cats all day. It turns out that my pets do nothing all day. They slept and ate and slept and ate until I got home. At which point, they kept doing the exact same thing.

In a wonderful bit of quantum boringness, it turns out that I had no idea what kind of chaos my pets were causing until I directly observed them. I had no idea if they were actually moving things around in the house or if I was simply forgetting where I was putting things (turns out I’m very forgetful). TempDB is a lot like a basket of cats – you don’t know for sure that it’s causing your problems, you have some sneaking suspicions, but you’re not sure how to prove anything.

Watching Over TempDB

If TempDB is like a basket of cats, we need to watch what it’s doing; there’s no telling when it’s going to go from adorable to shredding the drapes. Knowing what to watch in TempDB is just as important as knowing that you should even be watching TempDB at all.

How Many Cats Do I Have? (Watching TempDB Free Space)

Excessive TempDB usage isn’t necessarily a sign that TempDB is a problem, but it is an indicator that you have problems worth looking into. When TempDB starts getting full, it’s an indicator that there’s a lot of temporary object creation as well as out of memory sorting and joining going on in the database. None of these things are bad, but they’re indicators that we should be taking a closer look at TempDB.

There’s no hard and fast metric for what you should do when your TempDB data file is large, but it’s a good indicator that you can stand to do one of a few things:

  1. Enable Instant File Initialization
  2. Add Multiple TempDB Files

These changes won’t always cure the problem, but they are starting points. Waiting for TempDB to grow can be a cause of performance problems and enabling Instant File Initialization makes it possible to quickly grow TempDB data files. Using multiple TempDB files uses more storage bandwidth, reduces file contention, and adds magical pixie dust to your queries.

What Are My Cats Doing? (Monitoring TempDB Usage)

The next step, after you know how much TempDB you’re using, is to find out how TempDB is being used. TempDB is used for a few distinct things: joins, aggregations, sorting, the version store, temporary tables (and table variables), and table/index spooling. While these are different operations, they all consume TempDB space. Understanding how your applications use TempDB is critical to understanding if TempDB is causing performance problems.

This is where things get more complicated; there’s never a right or wrong answer, but TempDB usage varies heavily by application and workload. Sometimes even the same application, with different customer workloads, can have wildly different TempDB usage characteristics. By monitoring TempDB through a variety of DMO calls, server side traces, and performance counters it’s possible to get an accurate picture of the health and utilization of TempDB over time. Through some careful DMO/DMV scripting it’s even possible trace who the biggest consumers of TempDB are back to the stored procedure or query that’s using TempDB.

Just like trying to watch a basket of cats through a webcam, you can only catch quick glimpses of what’s going on. This process makes it possible to capture a sample of what’s going on inside TempDB at any moment, but it’s only for a quick moment. The DMOs to monitor TempDB only look at the currently running queries, there is no historical record. The best way to get an accurate picture of what’s happening is to sample these DMOs on a regular basis and sample aggressively during peak performance periods. You won’t catch every query this way but you should be able to catch most.

Benchmark, Rinse, Repeat

Whenever I talk about performance tuning or general SQL Server problems, I always advise people to benchmark everything that they can. Having a steady baseline is the only way to verify that changes are having a positive effect on performance. Without a performance baseline in place, all you have to go on is a feeling that things are faster. Unfortunately, feelings don’t translate into quantifiable numbers (unless you’re trying to quantify how you feel about a basket of cats).

Establishing a performance baseline is one of my favorite parts of working with clients. As we go through the health check, I work with our clients to figure out where it hurts and help them build a solution. I cover how they can use the baseline to keep monitoring their system. With these tools in place, it’s easy to monitor a system’s health over time.

Focusing on performance metrics makes it easy to see which parts of an application are causing performance problems. It TempDB usage spikes after a change to a few stored procedures, it’s easy to identify the problem when you have a baseline established.

Determining whether or not TempDB is a performance problem boils down to establishing a baseline, monitoring performance before and after changes, and carefully making changes until acceptable performance levels are reached. This may involve adding more TempDB data files, forcing memory grant allocations, or using solid state drives for TempDB.

Resources

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

A Different World: Three Use Cases For Riak

Writing and deploying an application is pretty easy, there’s a process to follow that will get you most of the way there. The tricky part is getting the business logic down. But this isn’t about writing code, this is about what happens once you have an application out in the wild. The tricky part comes when you have to grow with your application. There are some tricky parts of keeping an application up and running. Things get complicated when you start trying to add complex functionality for your data. For those of us who use SQL Server there are a lot of options within SQL Server, but some of them require a seasoned DBA, expensive features, or both.

Durability

Making sure your data sticks around is important. Losing data can be catastrophic for a business. Even if you have a backup, it can take hours or even days, to restore your database and verify that everything is back up and running. That doesn’t even include the possibility that a backup is corrupted and you won’t be able to bring your database online. Or, worse yet, you could lose your SAN completely. Durability is important.

With SQL Server it’s possible to set up a cluster, mirroring, or replication to provide a second copy of your data for readability. These three solutions all require knowledge of some specialized features of SQL Server. Keeping each of these features up and running requires some knowledge, planning, set up, and monitoring. None of it is easy, some of it’s hard, and some features even need a lot of care and feeding to stay up and running. DBAs tell horror stories about hand feeding replication for days to get it up and running again after a catastrophic failure; it’s almost a badge of honor. Here’s the thing: none of this should be difficult. One way data durability (mirroring, transactional replication) is limited in its functionality – reads can happen everywhere, but writes can only happen in one place. If the master server goes down, it can be difficult to switch the master server around.

Durability problems plague DBAs every day, whether they know it or not. I spent a lot of time trying to solve my durability problems by configuring replication or log shipping and building complex application logic to support the possibility of multiple write locations. I later solved the problem by using Riak. One of the main benefits of Riak is the availability and fault tolerance it provides. That is: it’s durable as all get out. Data isn’t written to one place, it’s written to several places at once; writes won’t succeed unless more than one server responds that a write is successful. In a way, it’s like a really fancy version of database mirroring or SQL Server Denali’s Availability Groups feature that you can have right now.

When a server in a Riak cluster eventually fails, as hardware always does, you could recover it using a filesystem backup. Riak will handle making sure everyone has the right data by using a very cunning technique called read repair. If the server completely fails the other option is to simply replace it. You tell the cluster of servers that one server is gone and another server is taking its place. The cluster then figures out how to perform recovery or redistribute the data. It’s a lot less painful than shipping full database backups or replication snapshots across the data center or even across the country.

Latency

Relational database, and SQL Server in particular, use B+trees indexes to store data. B+trees provide reasonably fast look ups of data (any lookup of a single row will need to traverse the same number of pages on disk as any other lookup). By contrast, inserting data into a B+tree is not always particularly fast – page splits may occur, forcing data to be shuffled around on disk and moving things out of order could require intermediate pages to be updated. Because of the B+tree structure, many SQL Server DBAs advocate using an ordered, constantly increasing key for clustered indexes.

Riak, by contrast, defaults to using the Bitcask storage back end (with the option to use several others). Bitcask is unique in that it doesn’t use a B+tree to store data on disk. Instead Bitcask uses a pair of data structures – a series of data files (written as log-structured hash table) and an in memory directory of keys (a keydir) to make it easy to find records in the log-structured hash table. Reading data out of Bitcask will take two hops – one for the keydir and one for the data file – versus many potential hops in larger SQL Server tables (probably around 4 physical hops with a cold cache on a large-ish table).

There are two things that immediately stand out to me about Riak’s data latency. One is that Riak’s latency is predictable. Any write is going to take as much time as it takes to stream the data to disk. Since Bitcask is a log-structured hash, there’s no possibility of fragmentation; data is written in the order it arrives. Like some other databases, Riak does not perform in place updates of data. Instead, a new record is written in the log-structured hash table and the keydir is updated with the location of the new record. Because of this, it’s easy to predict how long an insert or update will take – as long as it takes to write that many bytes of data to disk.

Just as Riak provides a predictable low latency for writes, it also provides low latency for reading data from disk. Riak doesn’t use locking, so there can be no blocking. Instead read latency boils down to the amount of time that it takes to pull data off of disk. The more data there is in a given record, the slower the read is going to be. Obviously, there’s some seek time involved, but it’s negligible when you consider that a seek involves a single memory read and a single disk read.

Full Text Search

SQL Server’s full text search has caused a lot of problems for a lot of smart people. The more complex the schema and the more data load involved, the more likely that SQL Server’s full text search is going to have some problems. It’s a great tool, but there are some limitations and tuning full text queries is very different from tuning regular T-SQL queries.

A problem around full text search is the inability to scale the full text search service independently of the SQL Server. They’re tied together on the same physical instance. If you need to increase the performance of full text search, you must resort to increasing the performance of SQL Server, including the same licensing costs for SQL Server. Anyone how has gone from a 2 socket to a 4 socket machine can tell you that a doubling in license costs isn’t trivial. Full text search can, like many things, be moved to a separate SQL Server using replication, but SQL Server’s transactional replication has a reputation for being a bit manpower intensive. Many teams eventually move full text search to something other than SQL Server. Pushing full text search outside of the database engine frees up the database engine to serve other queries. Some people, like the fine folks at StackOverflow, have used Lucene.net. One of the advantages of using a full text search engine is that it offers phenomenal flexibility. Unfortunately, both SQL Server’s full text search Lucene and limited to a single node. Riak Search is Lucene compatible, flexible, and durable.

Riak Search automatically indexes documents whenever they’re saved in a specific bucket, just like SQL Server. Server side triggers are created to asynchronously index data as it is saved. Unlike SQL Server’s full text search, it’s possible to create custom functionality for searching. Different word breakers and tokenizers can be created to support different methods of indexing. Riak Search also allows complex documents to be indexed using custom schemas, much like Lucene.

The icing on the cake, for me, is that Riak Search can be used to provide linear scaling – as servers begin to run out of capacity, additional nodes can be brought online to quickly scale out. The upside of Riak Search is that there’s also durability built-in: the search indexes are spread across the cluster. Spreading indexes across the cluster, term partitioning, means that load from complex queries can be served by many nodes at once making it possible to provide overall higher query throughput on large data sets.

Making It Work

It’s fair to say that many people using SQL Server are also using the .NET Framework. While Riak is implemented in Erlang and runs on Unix based operating systems, it’s easy to work with SQL Server and Riak together in the same environment. CorrugatedIron is a community developed, open source, .NET library for Riak. While it’s still under heavy development pending Riak’s 1.0 release later this month, CorrugatedIron makes it possible to develop cross database functionality to save data in SQL Server and Riak without requiring developers to learn a new programming language or operating system. Functionality can be moved out of SQL Server and onto a distributed platform where functionality can be scaled horizontally and linearly.

This isn’t to say that I’m advocating for teams to abandon SQL Server in favor of Riak. In fact, that’s the opposite of what I want to say. Teams should pick a best of breed solution for their data storage needs. SQL Server is a great relational database, but there are some things it doesn’t do as well as we’d like. Riak is a great distributed database that is impervious to single node failure. They both provide different features and functionality that complement each other very well. Riak removes single points of failure, provides rich full text search functionality, and provides consistent latency guarantees. SQL Server provides rich querying semantics on high structure data and support for atomic transactions.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

The Difficulty with Deadlocks

Deadlocks can kill an application’s performance. Users will complain about the app being slow or broken. Developers will ask the DBA to fix the problem, DBAs will push the problem back on developers. The next thing you know, the office looks like Lord of the Flies.

What is a Deadlock?

A deadlock occurs when two queries need exclusive access to different tables and each query is waiting for the other to finish. Assume that there are two tables, tA and tB. There are also two queries, Q1 and Q2. The first query, Q1, takes an exclusive lock on tA at the same time that the second query, Q2, takes an exclusive lock on tB. So far, there’s nothing out of the ordinary happening. Q1 then requests exclusive access to tB. At this point we have a block. Q1 must wait for Q2 to release its lock before Q1 can finish. Q2 now requests an exclusive lock on tA. And here we have a deadlock.

Q1 won’t release its lock on tA until it can get a lock on tB. Q2 won’t release its lock on tB until it can get a lock on tA. In order for either query to finish, they need access to the other query’s resources. That’s just not going to happen. This is a deadlock.

In order for the database to keep responding, one of these queries has to go. The query that’s eliminated is called the deadlock victim.

Finding Deadlocks

What is the first sign of a deadlock? Queries that should be fast start taking a long time to respond. That’s the first sign of a deadlock, but that’s also the first sign of a lot of other problems. Another sign of a deadlock is an error (error 1205 to be precise) and a very helpful error message: Transaction (Process ID %d) was deadlocked on {%Z} resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

SQL Server is telling you exactly how to solve the problem – re-run your transaction. Unfortunately, if the cause of the deadlock is still running, odds are that your transaction will fail. You can enable several trace flags to detect deadlocks (trace flag2 1204 and 1222), but they output the deadlock to the SQL Server error log and produce output that is difficult to read and analyze.

Once deadlocks show up, your database administrator might reach for a script to pull deadlocks out of Extended Events. Extended Events are a great source of data for analysis. Although they’re a relatively new feature to SQL Server, they first appeared in SQL Server 2008, Extended Events already provide an incredibly rich set of tools for monitoring SQL Server. Event data can be held in memory (which is the default) or written out to a file. It’s possible that to build a set of monitoring tools that log all deadlocks to a file and then analyze that file after the events happen.

Yesterday's deadlocks are tomorrow's news!

Just like newspapers help us find out what happened yesterday, Extended Events provide a great way to investigate deadlocks that have already occurred. If you don’t have any other monitoring tools in place, Extended Events are a great place to start. Once you start seeing deadlocks, you’ll want to start gathering more information about them. It takes some skill to read the XML from a deadlock graph, but it contains a great deal of information about what happened. You can find out which tables and queries where involved the deadlock process, which process was killed off, and which locks caused the deadlock to occur.

The flip side of the coin is that Extended Events will give you very fine grained information about every deadlock that has already happened. There’s nothing in Extended Events to help you stop deadlocks from happening or even to detect them right when they are happening. Much like a microscope lets you look at a prepared slide in excruciating detail, Extended Events let you look at a single point in time in excruciating detail. You can only find out about things after they happen, not as they happen.

Deadlock Notifications

Wouldn’t be nice if you received notifications of deadlocks as they were happening? Good news! Deadlocks only happen when data is changed; it’s possible to wrap your modification statements inside a template to record any deadlocks that happen. This template takes advantage of some features in SQL Server to allow the deadlock notifications to get sent out asynchronously – the notifications won’t slow down any applications while they interact with SQL Server.

There is a big problem here: every stored procedure that modifies data needs this wrapper. If the wrapper is missed in one place, there won’t be any deadlock information collected from that query. If the wrapper needs to be changed, it has to be changed everywhere. This can be a good thing, of course, because you can target problem queries for reporting or different queries can respond in different ways. Like using Extended Events, this is a very fine grained mechanism for dealing with deadlocks. Action is taken at the level of a single execution of a query and not at the level of our entire application. If we’re going to take care of deadlocks, we want to do it once and fix things across the entire application.

Deadlocks by Design

Both the Extended Events and notification solution are very cunning ways to get information about deadlocks that have already happened. Neither solution helps applications respond to deadlocks as they happen.

Much like a director deciding to fix it in post, monitoring for deadlocks and trying to solve the problem is a reaction to something that should have been done right in the first place. Maybe budget constraints got in the way, maybe the software had to ship by a deadline, maybe there wasn’t expertise on the team to look into these problems. For whatever reason, something made it into production that causes deadlocks. It doesn’t matter what happened, the problem is there; deadlocks are happening.

Error 1205: Catching Deadlocks with Code

Application developers have tool they can use to cope with deadlocks. When SQL Server detects a deadlock and kills of a query, an error is thrown. That error makes its way back up to the software that made the database call. .NET developers can catch the exception and check the Number. (Deadlocks throw an error number of 1205.)

When a deadlock happens, SQL Server will kill off the cheapest transaction. The “cheapest” transaction is the transaction with the lowest cost. It’s getting rid of something that will be easy to run a second time around. Instead of having deadlocks cause problems, developers can easily check the errors that come back from the database server and try again. You can set the deadlock priority; if you don’t have time to fix to the code, you can specify which queries should run at a lower priority.

This is moving the problem up the chain. The users may not see that there is a deadlock, but the application code still needs to deal with it. Things can still be tricky, though. If there’s a long running transaction holding locks and causing deadlocks, no reasonable amount of re-tries will solve the deadlocking problem.

Reacting to Deadlocks with Architecture

The easiest way to eliminate deadlocks is to design the database to avoid deadlocks. It sounds facetious, doesn’t it? Of course the easiest way to avoid deadlocks is to design so they don’t happen!

There are a few architectural patterns to use in an application to avoid deadlocks.

Pattern 1: Using NOLOCK to Stop Deadlocks

NOLOCK for YESOUCH

A common way to stop deadlocks is to use the NOLOCK query hint. NOLOCK users advocate this approach because they believe it does what it says – it eliminates locking.

NOLOCK doesn’t get rid of all locks, just the ones that make your queries return the right results. You see, NOLOCK stops locking during read operations. In effect, it throws the hinted table or index into READ UNCOMMITTED and allows dirty reads to occur. Locks are still necessary for data modification; only one process can update a row at a time.

By using NOLOCK, you’re telling the database that it’s okay to avoid locking for read safety in exchange for still letting deadlocks happen.

Pattern 2: Indexing for Concurrency

In some cases, deadlocks are caused by bookmark lookups on the underlying table. A new index can avoid deadlocks by giving SQL Server an alternate path to the data. There’s no need for the select to read from the clustered index so, in theory, it’s possible to avoid a deadlock in this scenario.

Think about the cost of an index:
* Every time we write to the table, we probably end up writing to every index on the table.
* Every time we update an indexed value, there’s a chance that the index will become fragmented.
* More indexes mean more I/O per write.
* More indexes mean more index maintenance.

To top it off, there’s a good chance that the index that prevents a deadlock may only be used for one query. A good index makes a single query faster. A great index makes many queries faster. It’s always important to weight the performance improvement of a single index against the cost to maintain and index and the storage cost to keep that index around.

Pattern 3: Data Update Order

A simple change to the order of data modifications can fix many deadlocks. This is an easy pattern to say that you’re going to implement. The problem with this pattern is that it’s a very manual process. Making sure that all updates occur in the same order requires that developers or DBAs review all code that access the database both when it’s first written and when any changes are made. It’s not an impossible task, but it will certainly slow down development.

There’s another downside to this approach: in many scenarios, managing update order is simply too complex. Sometimes the correct order isn’t clear. Managing update order is made more difficult because SQL Server’s locking granularity can change from query to query.

In short, carefully controlling update order can work for some queries, but it’s not a wholesale way to fix the problem.

Common Patterns: Common Failures

One of the problems of all three patterns is that they’re all reactionary. Just like the two methods for detecting deadlocks, they get implemented after there is a problem. Users are already upset at this point. There has already been some kind of outage or performance problem that caused the users to complain in the first place. Of course, sometimes you inherit a problem and you don’t have the opportunity to get good design in place. Is there hope?

Whether you’re starting off new design, or combating existing problems, there is a way that you can almost entirely prevent deadlocks from occurring.

Using MVCC to Avoid Deadlocks

MVCC is a shorthand way of saying Multi-Version Concurrency Control. This is a fancy way of hinting at a much broader concept that can be summarized simply: by maintaining copies of the data as it is read, you can avoid locking on reads and move to a world where readers never block writers and writers never block readers.

This probably sounds like a big architectural change, right? Well, not really.

SQL Server 2005 introduced READ COMMITTED SNAPSHOT ISOLATION (RSCI). RCSI uses snapshots for reads, but still maintains much of the same behavior as the READ COMMITTED isolation level. With a relatively quick change (and about a 10 second outage), any database can be modified to make use of RCSI.

When Should You Use RCSI?

If you actually want my opinion on the subject: always. If you’re designing a new application, turn on RCSI from the get go and plan your hardware around living in a world of awesome. TempDB usage will be higher because that’s where SQL Server keeps all of the extra versions. Many DBAs will be worried about additional TempDB utilization, but there are ways to keep TempDB performing well.

The bigger question, of course, is why should I use RCSI?

Use RCSI to Eliminate Locking, Blocking, Deadlocks, Poor Application Performance, and General Shortness of Breath

RCSI may not cure pleurisy, but it’s going to future proof your application. Somewhere down the road, if you’re successful, you’ll have to deal with deadlocks. Turning on RCSI is going to eliminate that concern, or make it so minimal that you’ll be surprised when it finally happens.

A Snapshot of The Future: Looking Past RCSI

RCSI is probably all that most people think they going to need at the start of their architectural thinking. There will be circles and arrows and lines on a whiteboard and someone will say “We need to make sure that the DBAs don’t screw this up.” What they really mean is “Let’s talk to the data guys in a year about how we can make this greased pig go faster.”

Both of these versions can poop and bark.

During the early stages of an application’s life, a lot of activity consists of getting data into the database. Reporting isn’t a big concern because there isn’t a lot of data to report on and a few tricks can be used to make the database keep up with demands. Sooner or later, though, demand will outstrip supply and there will be problems. Someone might notice that long running reports aren’t as accurate as they should be. Numbers are close enough, but they aren’t adding up completely.

Even when you’re using RCSI, versions aren’t held for the duration of a transaction. The different isolation levels correspond to different phenomenon and those phenomenon, under a strict two-phase locking model, correspond to how long locks are held. When using one of the two MVCC implementations (RSCI or snapshots), the isolation levels and their phenomenon correspond to how long versions are kept around.

Using RCSI, or even READ COMMITTED, locks/versions are only held for a single statement. If a query has to read a table multiple times for a report, there’s a chance that there can be minor (or even major) changes to the underlying data during a single transaction. That’s right, even transactions can’t save you and your precious versions.

SNAPSHOT isolation makes it possible to create versions for the duration of a transaction – every time a query reads a row, it’s going to get the same copy of that row, no matter if it reads it after 5 seconds, 5 minutes, or 5 hours. There could be multiple updates going on in the background but the report will still see the same version of the row.

Getting Rid of Deadlocks in Practice

There are manual ways to accomplish eliminate deadlocks, but they require significant effort to design and implement. In many cases deadlocks can be eliminated by implementing either READ COMMITTED SNAPSHOT ISOLATION or SNAPSHOT isolation. Making the choice early in an application’s development, preferably during architectural decisions, can make this change easy, painless, and can be designed into the application from the start, making deadlocks a thing of the past.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

Leaving the Windows Open

Aggregations and grouping can be a pain in the rear for the novice SQL developer. Way back in 2003, the ANSI/ISO standards people figured this out and added windows and ranking functions to the standard. In 2005, Microsoft added a few of these functions (ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE()) as well as the OVER() clause so we could whet our appetites. The combination of window and ranking functionality has been used and abused to add all kinds of functionality to applications. SQL Server Denali takes it to the next step and adds a whole slew of additional functionality to window and rank and analyze data in without having to find a BI expert.

OVER() Defining Your Windows

I’m sitting in my office right now, when I look out the window I can see a shrub, part of the neighbor’s car, and the grass in my lawn. I can also see my cat lazily napping. If I move my chair over a little bit, the view out of the window changes and I can see the driveway and the road. Changing where my chair is changes the view I can see of the outside. Changing the definition of a window in T-SQL changes the view of the data for a function in the query.

AVG(view) OVER (PARTITION BY chair)

PARTITION BY and ORDER BY

Much like GROUP BY in a query, PARTITION BY lets us divide up the results in partitions. The difference is that PARTITION BY acts locally – on a single function – instead of on the entire query.

SELECT  LastName ,
        FirstName ,
        ROW_NUMBER() OVER (PARTITION BY LastName ORDER BY FirstName) AS position
FROM    Person.Person ;

We're all counting on you

Looking at the results from this query, we can see that the ROW_NUMBER() function gives us the number of a row – in FirstName order – in the results when it’s grouped up by LastName. Or, to put it differently, it gives us the relative position of the row defined by the window of LastName.

A window is a way of defining a local scope for a function. Instead of using convoluted logic, subqueries, and temporary tables, it’s easy to definite local scope for queries to do something useful.

Finding Other Rows with LAG and LEAD

It’s easy to produce queries based on relatively fixed windows, but what if you need to know what’s going on just before and just after the current row. Surely there must be a way to do something like that.

SQL Server Denali brings support for the LAG and LEAD functions. The LAG function can be used to show a previous row. LAG accepts two additional optional parameters, the number of rows to go back a default value to use, just in case a NULL is found.

Let’s look at employee performance in the AdventureWorks2008R2 database by comparing their current performance with previous performance. This query, without the LAG function requires creating some sort of temporary table and joining to it once over each of last_month, three_months_ago, six_months_ago, nine_months_ago and twelve_months_ago. It might end up looking something like this (if we used a CTE instead of a temporary table):

WITH sales AS (
    SELECT  h.SalesPersonID ,
            ROW_NUMBER() OVER (PARTITION BY h.SalesPersonID
                               ORDER BY     DATEPART(yyyy, h.OrderDate),
                                            DATEPART(mm, h.OrderDate)) AS rn ,
            DATEPART(yyyy, h.OrderDate) AS [Year],
            DATEPART(mm, h.OrderDate) AS [Month],
            SUM(h.SubTotal) AS TerritoryTotal
    FROM    Sales.SalesOrderHeader AS h
    WHERE   SalesPersonID IS NOT NULL
    GROUP BY SalesPersonID ,
            DATEPART(yyyy, OrderDate) ,
            DATEPART(mm, OrderDate)
)
SELECT  s.SalesPersonID,
        s.[Year],
        s.[Month],
        s.TerritoryTotal,
        s1.TerritoryTotal AS last_month ,
        s3.TerritoryTotal AS three_months_ago ,
        s6.TerritoryTotal AS six_months_ago ,
        s9.TerritoryTotal AS nine_months_ago ,
        s12.TerritoryTotal AS twelve_months_ago
FROM    sales AS s
        LEFT OUTER JOIN sales AS s1 ON s.SalesPersonID = s1.SalesPersonID
                                       AND s.rn = s1.rn + 1
        LEFT OUTER JOIN sales AS s3 ON s.SalesPersonID = s3.SalesPersonID
                                       AND s.rn = s3.rn + 3
        LEFT OUTER JOIN sales AS s6 ON s.SalesPersonID = s6.SalesPersonID
                                       AND s.rn = s6.rn + 6
        LEFT OUTER JOIN sales AS s9 ON s.SalesPersonID = s9.SalesPersonID
                                       AND s.rn = s9.rn + 9
        LEFT OUTER JOIN sales AS s12 ON s.SalesPersonID = s12.SalesPersonID
                                       AND s.rn = s12.rn + 12

Although this query is clear enough, it requires a considerable amount of disk access and performs 6 logical scans and 4116 logical reads.

Re-writing this query to use the LAG function makes the query easier to read, easier to understand, and much faster for SQL Server to run. It only requires 1 logical scan and 686 logical reads.

SELECT  SalesPersonID ,
        [Year] ,
        [Month] ,
        TerritoryTotal ,
        LAG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
                                  ORDER BY [Year], [Month]) AS last_month ,
        LAG(TerritoryTotal, 3, 0.00) OVER (PARTITION BY SalesPersonID
                                           ORDER BY [Year], [Month]) AS three_months_ago ,
        LAG(TerritoryTotal, 6, 0.00) OVER (PARTITION BY SalesPersonID
                                           ORDER BY [Year], [Month]) AS six_months_ago ,
        LAG(TerritoryTotal, 9, 0.00) OVER (PARTITION BY SalesPersonID
                                           ORDER BY [Year], [Month]) AS nine_months_ago ,
        LAG(TerritoryTotal, 12, 0.00) OVER (PARTITION BY SalesPersonID
                                            ORDER BY [Year], [Month]) AS twelve_months_ago
FROM (
        SELECT    SalesPersonID,
                DATEPART(yyyy, OrderDate) AS [Year] ,
                DATEPART(MONTH, OrderDate) AS [Month] ,
                SUM(SubTotal) AS TerritoryTotal
        FROM    Sales.SalesOrderHeader
        GROUP BY SalesPersonID,
                DATEPART(yyyy, OrderDate),
                DATEPART(month, OrderDate)
) AS x
WHERE    SalesPersonID IS NOT NULL
ORDER BY SalesPersonID,
        [Year],
        [Month];

While this isn’t the most interesting report, it’s very common in many fields to compare current performance to previous performance. Stocks and 401(k) portfolios do this. My utility bills all show the last year of history as well as what I’m being charged this month.

Your sales are down, Ted.

Relative Aggregates Using ROWS BETWEEN

Moving a window relative to the current row is pretty interesting, but LAG and LEAD can only do so much. What if you need to show a three month average? What if that three month average needs to be centered on the current month?

SELECT  SalesPersonID,
        the_year,
        the_month,
        TerritoryTotal,
        AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
                                  ORDER BY the_year, the_month
                                  ROWS BETWEEN 1 PRECEDING
                                           AND 1 FOLLOWING) AS three_month_average ,
        AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
                                  ORDER BY the_year, the_month
                                  ROWS BETWEEN 6 PRECEDING
                                           AND 6 FOLLOWING) AS year_average
FROM (
        SELECT    SalesPersonID,
                DATEPART(yyyy, OrderDate) AS the_year,
                DATEPART(MONTH, OrderDate) AS the_month,
                DATEPART(QUARTER, OrderDate) AS the_quarter,
                SUM(SubTotal) AS TerritoryTotal
        FROM    Sales.SalesOrderHeader
        GROUP BY SalesPersonID,
                DATEPART(yyyy, OrderDate),
                DATEPART(month, OrderDate),
                DATEPART(QUARTER, OrderDate)
) AS x ;

This query adds in something called the frame clause (ROWS BETWEEN start AND end). The frame clause makes it easy to define a sliding window for the function being used.

Historically speaking, you never were very good.

There are a few ways to use the frame clause to change how the window is computed. As you saw in the previous query it’s possible to state the number of rows before and after the current row. That’s simple enough. It’s also possible to define a range of rows to use. One of the most first uses that developers will find for frame clauses is to produce a running total:

SELECT  soh.SalesPersonID, soh.OrderDate,
        soh.SubTotal ,
        SUM(soh.SubTotal) OVER (PARTITION BY SalesPersonID
                                ORDER BY soh.OrderDate
                                ROWS BETWEEN UNBOUNDED PRECEDING
                                         AND CURRENT ROW) AS total_sales_to_date
FROM    Sales.SalesOrderHeader AS soh
WHERE   soh.SalesPersonID IS NOT NULL

Run faster. Work harder. Sell. Sell. Sell.

In the past, developers would have had to resort to trickery with messy self-joins to get running totals. Thankfully, it’s now possible to define a frame for the window and use it compute aggregates over a range. I did resort to a bit of trickery to demonstrate the running total. The default value for a frame clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so it’s not that impressive and can be achieved with T-SQL that looks like any other window function.

The ranges in a frame clause are all relative to the current partition of the window function. UNBOUNDED PRECEDING will get all rows from the start of the frame, and the start of the frame is relative to whatever has been defined in the OVER clause. UNBOUNDED FOLLOWING will get all rows from the current row to the end of the frame.

Brand New Analytic Functions

There are a few other functions that are being added to T-SQL in SQL Server Denali. I haven’t mentioned them yet because they’re statistical functions for computing distribution and median information. This is about the extent that I’ll mention them, but here they are:

  • CUME_DIST() – calculates the position of a value relative to a group of values
  • PERCENT_RANK() – calculates the percent rank of a value in a group of values
  • PERCENTILE_CONT(X) – looks at the percent rank of values in a group until it finds one greater than or equal to X.
  • PERCENTILE_DISC(X) – looks at the cumulative distribution of values in a group until it finds one greater than or equal to X

I don’t do a lot of analytical work, and certainly nothing where I’m going to be computing cumulative distributions, but it’s good to know that these functions are available. A lot of people do need to perform this functionality and they’ve resorted to using T-SQL trickery or SQL-CLR to get the same results when they could have just had these functions in the database.

These analytical functions are a bit different than the other functions that I’ve mentioned so far. First of all, they all don’t require an OVER() clause. Because these are analytical functions, you might want to perform the analysis over the entire query, so there’s no explicit need to apply a window and frame to the function.

The other thing that makes these functions different is that they allow a WITHIN GROUP specification. WITHIN GROUP is new in SQL Server Denali and it applies an ordering to the analytical function’s actions but not to the results.

Here’s an example using all four to demonstrate how they work on the data in AdventureWorks, and to show how poorly those poor sales clerks get paid:

SELECT  p.LastName ,
        p.FirstName ,
        e.OrganizationLevel,
        eph.Rate * 2080 AS Salary ,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)
                                     OVER (PARTITION BY e.OrganizationLevel) AS MedianContinuous ,
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)
                                     OVER (PARTITION BY e.OrganizationLevel) AS MedianDiscrete ,
        CUME_DIST() OVER (PARTITION BY e.OrganizationLevel
                          ORDER BY eph.Rate * 2080 DESC) AS CUME_DIST,
        PERCENT_RANK() OVER (PARTITION BY e.OrganizationLevel
                             ORDER BY eph.Rate * 2080 DESC) AS PERCENT_RANK
FROM    HumanResources.Employee AS e
        JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
        LEFT JOIN HumanResources.EmployeePayHistory AS eph ON eph.BusinessEntityID = e.BusinessEntityID
GROUP BY p.LastName,
        p.FirstName,
        e.OrganizationLevel,
        eph.Rate
ORDER BY OrganizationLevel ASC,
        Salary DESC ;

Mike, I have good news: you're getting a corner office.

SQL Server Windowing Functions Training Video

For more information, watch this free 20-minute training video on Denali’s new windowing functions:

Summing it up

SQL Server Denali adds many new features, not the least of which are the new window functions, frame clauses, and analytic functions. When you take each of these features on their own, they add powerful functionality to SQL Server. When you combine these three T-SQL enhancements, they give database developer tremendous power and flexibility to quickly develop reports that analyze data and deliver the results without needing extra features, tools, or products.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

Resolving Conflicts in the Database

Anyone who has worked with replication has run into problems caused by conflicting data. I’ve lost enough sleep over replication to know that having an effective plan for conflict resolution is important to keeping replication running and healthy.

What Causes Conflicts?

There are a few ways that conflicts can show up in a database. In strict transactional replication, it’s possible for there to be a conflict when a row exists in one place and not another.

Let’s say we have a row in Sales.SalesOrderHeader that is present on both our publisher and subscriber. During some maintenance to delete incorrect orders, the row is incorrectly deleted from the subscriber. An update to the order is correctly written on the publisher. When the command to update the record is run on on the subscriber, there’s a conflict. The command contains instructions to modify a row that doesn’t exist. Replication will be brought to a screeching halt while operations staff figure out how to get the data back onto the subscriber.

In a peer-to-peer replication scenario, conflicts can happen in even more ways than with transactional replication. Many situations are caused by similar scenarios – data is inserted on both servers, data is deleted from both servers, or updates happen to a row that isn’t present. Peer-to-peer replication can also suffer from a situation where rows are different on all servers. SQL Server’s peer-to-peer replication can be configured to use optimistic concurrency; row versions are added to make sure that SQL Server is updating the version of a row it thinks it should be updating. If the versions don’t match, then there’s a conflict that needs to be taken care of.

To look at it a different way, let’s say we have two SQL Servers, one in Los Angeles and one in Charlotte. There’s a user in each city; Alfred is in LA and Barney is in Charlotte. Both Alfred and Barney start updating information about the same customer (customer ALFKI, version 1). Alfred finishes quickly, clicks “Save”, and goes out for coffee. In the background, Alfred’s write is sent to the SQL Server in LA where the row version for ALFKI is compared to Alfred’s row version. Since everything is correct, Alfred’s data is saved. Meanwhile, Barney is a bit slower, but he gets his data finished and he clicks “Save”. Alfred’s data hasn’t replicated to Charlotte yet, so Barney’s save goes through the same way.

At this time, the databases on opposite sides of the country both contain a row for customer ALFKI that has a version number of 2. The problem is that we don’t know which version is correct. When replication kicks off from one server to another (let’s say from LA to Charlotte), the originating server is effectively going to say “Here’s version 2 of row ALFKI.” The receiving server (Charlotte) is going to respond “Uh oh, we got a problem: I also have a version 2 of row ALFKI.” At this point, replication will come to a screeching halt until someone can come in and fix the problem.

Let’s Hug It Out

Unfortunately, conflict resolution isn’t as easy as talking about the problem – there’s no way for SQL Server to figure out the best way to solve the problem. There are a few strategies that can be employed to make the process easier.

  • Manual intervention
  • Logging conflicts
  • Master write server
  • Last write wins
  • Write partitioning

Manual Intervention

This is the default way that SQL Server handles replication conflicts. It’s up to operations staff to figure out how the conflict happened and how it needs to be resolved. If there are a number of conflicts, they all need to be resolved before replication can continue. This is a DBA’s worst nightmare and why many DBAs try to stay away from replication.

The more critical the business, the less likely manual intervention is a possibility you want to consider; 3:00AM is unfriendly at the best of times, much less when a critical application is down.

Log Your Conflicts

Rather than require immediate manual intervention, it’s possible to modify the replication stored procedures to log that a conflict has occurred and keep on merrily replicating data.

This approach has some advantages over manual intervention. As conflicts are logged, they can be written to a queue where conflict resolution software and kick into action and attempt to resolve the conflicts automatically before alerting operations staff. Depending on the type of data modifications being made, it’s possible that there could be few or no true conflicts in your application. Conflict detection and resolution logs can be periodically audited to ensure that there are no major problems or to adjust the algorithms for additional scenarios and to increase the number of scenarios where data can be automatically corrected.

Master Write Server

This approach makes your network topology look like an airline’s spoke and hub system: one lone master server is used to serve all writes. There’s no possibility of write conflicts because all writes happen in the same location. The downside of this approach (as mentioned previously) is that there’s only so much optimization that can be done before a server is overloaded by writes, locks, and replication latency.

Last Write Wins

The last write wins approach assumes that writes are always issued in the correct order and the last write being sent to the database is assumed to be correct. Instead of relying on any kind of conflict detection, conflicts are ignored. This approach works well when all data is written in an append only manner – there are few updates, only inserts of new data. This can also work well when O/R-M code is use to only update columns that have changed. The likelihood of conflicting writes frequently can be reduced in these cases.

Write Partitioning

To be quick about it, write partitioning is a mechanism to ensure that all writes for a single row will always happen on the same server. There are multiple ways write partitioning can be performed. In short, a partitioning key is created by applying a hashing function to a row or application entity. This partitioning key is used to find the server responsible for that chunk of data.

Fully Partitioning Data

The last way to manage conflict resolution is to fully partition data. Peer-to-peer replication scenarios assume that all data belongs on all servers. What if that isn’t a necessity? It’s always possible to split up the data across multiple servers. Rather than write to multiple master servers and wait for merges to occur, it’s possible to spread data out over multiple servers with very little shared data. This removes most places for conflicts to occur and allows the load to be spread out across several servers.

Which Way is Right for Me?

There is no easy answer for this: it depends on both the application and network architecture. Requirements are very different when writes can occur in multiple data centers as opposed to a single data center. Some applications (recording sensor data) will never run into potential write conflicts, while others (content management systems) have the potential for write conflicts.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

Scaling SQL Server: Growing Out

Scaling up is hard, scaling out is even harder. We all know that we can scale reads by adding some kind of replication or read-only copies of databases or using a massive caching layer. What happens when we have to scale writes?

Borrowing Brent’s grocery store metaphor, a database that is effectively scaled for reads will have ample parking, the aisles will be wide and well lit, and there will be plenty of high capacity shopping carts available for everyone to use. There are even 20 staffed checkout lines. The problem is that goods coming into the store only enter the store through a single loading dock that has space for one truck. Shoppers can get in or out quickly, but there’s no way for new groceries to get to the shelves.

If we were to redesign the grocery store to be able to handle more shipments, we need to ask ourselves:

  • How often do we need new shipments? Do we need more shipments or more shipments at the same time?
  • Where do those shipments need to go in the store?
  • Do we need to put similar items together on the truck?

Scaling Out For More Produce

To get more data into SQL Server, we need more places to write. There comes a time when the hardware in a server can’t handle any more writes, no matter how much hardware is thrown at the problem. Writes also have the distinct problem of issuing locks and hanging on to them until the lock passes out of scope (be that an individual statement or an entire transaction).

We can approach the problem a few ways:

  • Writing to one server and reading from others
  • Writing to many servers and reading from many servers
  • Finding a new job

Since this is an article about solving the problem, we can rule out finding a new job right away. That leaves us with two possible solutions.

Write Master

This is a tried and true way to solve the problem of scaling writes. Since a mix of reading and writing to a single server can cause a lot of locking and blocking, much of the locking and blocking can be eliminated by moving reads to a separate server.

Moving reads to a new server solves the problem of writing quick enough to our master server, but it just spreads the problem out to all of the read slaves. Instead of having to troubleshoot locking and blocking problems on a single server, someone will have to troubleshoot locking and blocking problems combined with whatever mechanism we’re using to move data to the read servers. Instead of having one problem, we now have a lot of problems. And, to put a big red cherry on top of this sundae of problems, eventually there will be problems scaling writes on a single write server.

Sharing the Load

Instead of writing to a single server, it’s possible to write to many servers. SQL Server supports peer-to-peer replication, MySQL can be sharded through a driver, PostgreSQL has the Postgres-XC project, and other databases have their own solutions. Clearly this is a problem that people have approached before.

A major difficulty with something like peer-to-peer replication is determining where to write data. Sometimes this is done via a discriminator column. Some attribute of the data is used to determine which server will receive the data. This could be first name, user name, email address, or an arbitrary value assigned at object creation. Schemes like this work well with a finite number of servers and an even distribution of data. Natural keys can have data skew based on regional and linguistic preferences. Splitting the alphabet in even chunks won’t lend an even distribution of data.

The other option is to randomly partition the data. We could assign a random number when data is initially written, but random numbers aren’t entirely random.

If you can’t trust randomness what can you trust? Math.

Hashing algorithms make it easy to take an input of any length and produce an identifiable output of a fixed length. The best part is that some hashing algorithms have the interesting side effect of evenly distributing data (roughly) across that fixed length output value. This is usually called consistent hashing. However the consistent hashing is generated, it’s easy to divide the total range of hashed values into multiple buckets and spread writes out across many databases. This has been discussed at length throughout computer science and in multiple papers, including Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web and Dynamo: Amazon’s Highly Available Key-value Store.

Where Does the Load Split?

Implementing a combination of consistent hashing and peer-to-peer replication isn’t easy. The decision to route a write has to be made somewhere.

  • In the application A layer of code is written to determine how a write will be routed.
  • On the wire Instead of routing writes in our application, writes are sent to the appropriate server by router that performs packet inspection looking for a hash value.

Both approaches have their merits. Routing in the application can make this process as easy as deploying a new DLL or configuration file. A determined developer could write code to look for new databases and partition writes accordingly. On the flip side, routing writes on the wire makes this routing transparent to any calling applications – no code will need to be changed to add additional databases. Router configuration changes will be required, but the changes can be performed at any time rather than waiting for a regular application deployment.

Specialty Stores

Just as there are specialty stores in the real world, it may become necessary to create special purpose data stores and separate the data. This is frequently called sharding and is a different approach to spreading out load. Rather than assume that all databases need the same data, sharding assumes that the data can be separated across servers by some sharding key. In applications where there is no reporting that performs aggregations across all of the data, sharding can be an easy way to scale out read and write performance. Additional servers are added and new data can be written to them. Problems arise when one shard becomes heavily loaded and data needs to be moved to a new shard. This is a tricky operation at the best of times and requires careful planning.

Where Do I Split the Load?

Designing a scale out strategy isn’t easy. There are many factors to consider, and knowing how to scale out writes is just one thing to think about before designing your overall architecture.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

SQL Server Locking and You!

Did you know that SQL Server’s locking has a name? It’s called two-phase locking. If we’re really getting specific about it, SQL Server uses what’s called strong strict two-phase locking or SS2PL. We’ll get there in a few minutes, right now we’re going to take a look at what makes up two-phase locking.

But First, Some History

The earliest references to two-phase locking (2PL) that I can find is in Bernstein and Goodman’s 1981 paper Concurrency Control in Distributed Database Systems. The authors examine multiple 2PL techniques for synchronizing transactions, alternatives using timestamp ordering, integrated concurrency control methods combining 2PL and timestamp ordering, before mentioning some other also-rans in an appendix. (This may also be the first use of the now tired example involving bank balances and transactions.)

Even though his paper focuses on distributed databases, it’s still valuable because it sets up a common vocabulary for things to come. There’s a lot of theory in here. It’s interesting, but it’s still mathematical theory and most people glaze over when they see that sort of thing.

The Basics of Two-Phase Locking

2PL works by being explicit about who is doing what to whom. Or, in clearer terms:

(1) Different transactions cannot simultaneously own conflicting locks; and (2) once a transaction surrenders ownership of a lock, it may never obtain additional locks

Locks conflict if they’re not compatible with each other. That is to say that if both locks are on the same thing and at least one lock is a write lock, that’s a conflict. (Astute readers will notice that readers won’t block readers.) Anyone who has used SQL Server for a while will be familiar with what happens when we try to acquire a conflicting lock: we wait. Sometimes, we’ll wait for a good long while and a lock will eventually be released. Sometimes, we would end up waiting forever (a deadlock) if the lock manager didn’t step in and kill off one of the processes.

Why Is It Called Two-Phase Locking?

This process is called two-phase locking because there are two distinct phases. The two rules above hint at them, but in effect a transaction can either be issuing locks or releasing locks, it cannot be in stasis.

In reality, this works more like the following:

  1. A statement is issued by an application.
  2. SQL Server compiles the statement and determines the types of locks that are needed to most efficiently satisfy the query.
  3. Once all locks are acquired, the transaction is in a ready state.
  4. SQL Server will begin operations and release locks as appropriate.

If you’re really playing along at home, at some point you’ll figure out that lock acquisition and release varies by isolation level and results in the various phenomena that you see in each of the isolation levels. If you noticed that on your own, give yourself a gold star. If you didn’t, you’re normal.

Deadlocks

Deadlocks are, sadly, a byproduct of a 2PL mechanism. Most of the literature talks about things like transaction graphs (or waits-for graphs) and edges. The Great Triumvirate illustrates this perfectly.

Original source via http://www.flickr.com/photos/charmainezoe/5307975564/

Deadlocks, circa 1836

Daniel Webster is grooming his eyebrows in a fashion that almost became known as ‘the Webster’. He’s impatiently waiting for Henry Clay to finish with the funny pages, but he won’t release his comb until he has the funny pages. Henry Clay is reading what passes for the funny pages in 1836 (hint: it’s the New Yorker). Henry is well pleased with himself, but he’s waiting on John C Calhoun to relinquish the volumizer before he will give up the funny pages. John C Calhoun is volumizing his hair to lofty heights but he really wants Daniel Webster’s eyebrow comb. Everyone is waiting on something from everyone else, but nobody will give up first. This is a deadlock. Okay, maybe that’s not really a deadlock, but it’s better than a waits-for graph. A combination of techniques can be used to determine which transactions will be killed off. SQL Server will typically use the least expensive transaction (in terms of optimizer cost). SQL Server avoids some problems, too, by not attempting to retry transactions.

A Waits-For (or Deadlock) Graph

Strict Two-Phase Locking

So far, we’ve only talked about 2PL, but I said SQL Server is SS2PL. Between the two is Strict Two-Phase Locking or S2PL. S2PL is like 2PL but there are some more rules.

To be considered S2PL, a transaction has to follow the rules of 2PL (sound like normalization rules?). In addition, a transaction also has to release write locks after the transaction has ended and either been rolled back or committed. Interestingly, nothing is directly said about read locks in S2PL. However, read locks can be released as they are no longer needed during the transaction.

Strong Strict Two-Phase Locking

SS2PL (called S2PL in Concurrency Control and Recovery in Database Systems) requires that the locks are only released after the transaction is finished and has been committed or rolled back. SS2PL provides serializability – database transactions appear as if they are atomic and occurring in complete isolation from one another. Serializable transactions are interesting because for a database to truly be serializable, it should be possible to process transactions in any order, s long as the effective is the same as that of some serial order (not any, just some).

Why the Devil Should I Care?

Locks are the primary way that SQL Server manages concurrency. This is a limitation of the database. New CPUs, a SAN, more disks, solid state hardware, and more RAM will not remove locking, blocking, and deadlocks from the database. New CPUs, a SAN, more disks, solid state hardware, and more RAM will make locking, blocking, and deadlocks happen faster. They may happen so fast, that you don’t really notice the problem until it’s growing out of control.

Combine a healthy knowledge of how locking operates with a working knowledge of isolation levels and some allegedly insurmountable application problems can be resolved through simple changes in the data layer.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook

Which Database Is Right for Me?

When you start developing a new application how do you pick the database back end? Most people pick what they know/what’s already installed on their system: the tried and true relational database. Let’s face it: nobody is getting fired for using a relational database. They’re safe, well understood, and there’s probably one running in the datacenter right now.

Here’s the catch: if you’re using a relational database without looking at anything else, you’re potentially missing out. Not all databases work well for all workloads. Microsoft SQL Server users are already aware of this; there are two distinct database engines – one for transactional and one for analytical workloads (SQL Server Analysis Services).

When you start a new project, or even add a substantial feature, you should ask yourself [questions about your data][1]. Here are a few sample questions:

  • Why am I storing this data?
  • How will my users query the data?
  • How will my users use this data?

The Relational Database

There are a lot of reasons to use a relational database. Relational databases became the de facto choice for databases for a number of reasons. In addition to being based on sound mathematical theory and principles, relational databases make it easy to search for specific information, read a select number of columns, and understand structure by querying the metadata stored in the relational database itself.

The self-describing nature of a relational database provides additional benefits. If you’ve created a relational database with referential integrity and schema constraints you are assured that every record in the database is valid. By enforcing data integrity and validity at the data level, you are assured that any data in the database is always correct.

The adoption of SQL as a standard in the mid-1980s finalized the victory of the relational database for the next 25 years. Using SQL it became easy to create ad hoc queries that the original database developers had never dreamed of. The self-describing nature of relational databases combined with SQL’s relatively simple syntax was hoped to make it easy for savvy business users to write their own reports.

In short, relational databases make it easy to know that all data is always correct, query data in many ways, and use it in even more ways. Will all of these benefits, you’d think that people would have no need for a database other than a relational database.

Document Databases

Document databases different from relational databases primarily because of how they store data. Relational databases are based on [relational theory][2]. While databases differ from relational theory, the important thing to remember is that relational database structure data as rows in tables. Document databases store documents in collections. A document closely resembles a set of nested attributes or, if you’re more like me, you might think of it as a relatively complete object graph. Rather than break an application entity out in to many parts (order header and line items) you store application entities as logical units.

The upside to this is that document databases all developers to create software that reads and writes data in a way that is natural for that particular application. When an order is placed, the order information is saved as a logical and physical unit in the database. When that order is read out during order fulfillment, one order record is read by the order fulfillment application. That record contains all of the information needed.

Unlike relational databases, document databases do not have a restriction that all rows contain the same number of columns. We should store similar objects in the same collection, but there’s no mandate that says the objects have to be exactly the same. The upside of this is that we only need to verify that data is correct at the time it is written. Our database will always contain correct data, but the meaning of “correct” has changed slightly. We can go back and look at historical records and know that any record was valid when it was written. One of the more daunting tasks with a relational database is migrating data to conform to a new schema.

While data flexibility is important, document databases may make it difficult to perform complex queries. Document databases typically do not support what many database developers have come to think of as standard operations. There are no joins or projections. Instead it’s a requirement to move querying logic into the application tier.

Database developers will find the following query to be a familiar way to locate users who have never placed an order.

SELECT  u.*
FROM    users u
        LEFT JOIN orders o WHERE u.user_id = o.user_id
WHERE   o.order_id IS NULL;

With a document database a naive approach might be to write queries that retrieve all users and orders and subsequently merge the list of results. A more practical approach is to cache a list of order IDs within the user object to improve look up performance. This seems like a horrible idea to many proponents of relational thinking, but it allows for rapid lookups of data and is considered to be an acceptable substitute for joins in a document database. Finding the users who have never placed an order becomes as simple as looking for users without an orders property. Some document databases support secondary indexes, making it possible to improve lookups.

Document databases are a great fit for situations where an entire object graph will always be retrieved as a single unit. Additionally, document databases make it very easy to model data where most records have a similar core of functionality but some differences may exist between records.

Key/Value Store

Key/value stores are simple data stores. Data is identified by a key when it is stored and that key is used to retrieve data at some point in the future. While key/value stores have existing for a long time, they have gained popularity in recent years.

Many data operations can be reduced to simple operations based on primary key and do not require additional complex querying and manipulation. In addition, key/value stores lend themselves well to being distributed across many commodity hardware nodes. A great deal has been written about using key/value stores. [Amazon's Dynamo][2] is an example of a well documented and much discussed key/value store. Other examples include Apache Cassandra, Riak, and Voldemort.

Key/value stores typically only offer three data access methods: get, put, and delete. This means that joins and sorting must be moved out to client applications. The data store’s only responsibility is to serve data as quickly as possible.

Of course, if key/value stores did nothing apart from serve data by primary key, they wouldn’t be terribly popular. What other features do they offer to make the desirable for production use?

Distributed

It is very easy to scale a key/value store beyond a single server. By increasing the number of available servers, each server in the cluster is responsible for a smaller amount of data. By distributing data, it’s possible to get faster throughput and better data durability than is possible with a monolithic server.

Partitioning

Many key/value stores use a technique known as consistent hashing to divvy up the key space. Using consistent hashing means we can divide our key space into many chunks and distribute responsibility for those many chunks across many servers. Think of it like this: when you go to register in person at an event the alphabet has frequently been divided up into sections at separate tables. Splitting up responsibility for check ins across the alphabet means that, in theory every attendee can be served faster by having multiple volunteers sign them in. Likewise, we can spread responsibility for different keys across different servers and spread the load evenly.

Replication

Data is replicated across many servers. Replicating data has several advantages over having a single monolithic, robust, data store. When data is stored on multiple servers the failure of any single server is not catastrophic; data can still be read and written while the outage is solved.

Hinted Handoff

Hinted handoff mechanisms make it easy to handle writing during a server outage. If a server is not available to write data, other servers will pick up the load until the original server (or a replacement) is available again. Writes will be streamed to the server responsible for the data once it comes back online. Much like replication, hinted handoff is a mechanism that helps a distributed key/value store cope with the failure of individual server.

Masterless

Many distributed databases use a master server to coordinate activity and route traffic. Master/coordinator servers create single points of failure as well as singular bottlenecks in a system. Many distributed key/value databases bypass this problem by using a heterogeneous design that makes all nodes equal. Any server can perform the duties of any other server and communication is accomplished via gossip protocols.

Resiliency

The previous features add resiliency and fault tolerance to key/value data stores. Combining these features makes it possible for any node to serve data from any other node, survive data center problems, and survive hardware failures.

Column-Oriented Databases

Column-oriented databases store and process data by column rather than row. Although commonly seen in business intelligence, analytics, and decision support systems, column-oriented databases are also seeing use in wide table databases that many have sparse columns, multi-dimensional maps, or be distributed across many nodes. The advantage of a column-oriented approach is that data does not need to be consumed as an entire row – only the necessary columns need to be read from disk.

Column-oriented databases have been around for a long time; both Sybase IQ and Vertica are incumbents, and SQL Server Apollo is Microsoft’s upcoming column store, slated for release in SQL Server Denali. Google’s Bigtable, Apache HBase, and Apache Cassandra are newer entrants into this field and are the subject of this discussion. Bigtable, HBase, and Cassandra are different from existing products in this field: these three systems allow for an unlimited number of columns to be defined and categorized into column families. They also provide additional data model and scalability features.

I have to speak in generalities and concepts here since there are implementation differences between the various column-oriented databases.

Data Model – Row Keys

A row in a column-oriented database is identified by a row key of an arbitrary length. Instead of using system generated keys (GUIDs or sequential integers), column-oriented databases use strings of arbitrary length. It’s up to application developers to create logical key naming schemes. By forcing developers to choose logical key naming schemes, data locality can be guaranteed (assuming keys are ordered).

The original Bigtable white paper mentions using row keys based on the full URL of a page with domain names reversed. For example www.stackoverflow.com becomes com.stackoverflow.www and blog.stackoverflow.com becomes com.stackoverflow.blog. Because data is sorted by row key, this scheme makes sure that all data from Stack Overflow is stored in the same location on disk.

Data Model – Columns & Column Families

Column families are an arbitrary grouping of columns. Data in a column family is stored together on disk. It’s a best practice to make sure that all of the column in a column family will be read using similar access patterns.

Column families must be defined during schema definition. By contrast, columns can be defined on the fly while the database is running. This is possible because column families in a column-oriented database are sparse by default; if there are no columns within a column family for a given row key, no data is stored. It’s important to note that different rows don’t need to contain the same number of columns.

Indexing

Column-oriented databases don’t natively support secondary indexes. Data is written in row key order. However there is no rule that data can’t be written in multiple locations. Disk space is cheap, CPU and I/O to maintain indexes is not.

The lack of secondary indexes may seem like a huge limitation, however it frees application developers from having to worry about how indexes might be maintained across multiple distributed servers in a cluster. Instead, developers can worry about writing and storing data the same way that it needs to be queried.

N.B. Cassandra has secondary indexes as of Cassandra 0.7

Picking the Right Database

Ultimately, picking the right database depends on workload, expertise, and future plans. It’s worth considering one of many options before settling on a relational database or one of many other databases. They all serve different purposes and fill different niches. The decision to store your data one way will have far reaching implications about how data is written, retrieved, and analyzed.


Resources

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

More Posts - Website

Follow Me:
TwitterFacebook