In theory, you can add indexes online with SQL Server Enterprise Edition.
In theory, with AlwaysOn Availability Groups, you can add and drop indexes on the primary replica whenever you want.
In theory, you can perform read-only queries on the secondaries whenever you want, and nobody gets blocked.
In practice, these things don’t always add up to the same answer.
I’ve got an AlwaysOn Availability Group demo lab with SQL Server 2014 hosting the AdventureWorks2012 database. I’ll start on my secondary server, and I’ll get all of the suffixes from the Person.Person table:
The query results aren’t important – instead, I’m showing the actual execution plan, which is very important. By default, there’s no index on the Suffix field, so I get a clustered index scan.
Say that for some bizarre reason, this is the type of query my end users constantly run, so I decide to switch over to the primary to add an index on Suffix. (As a reminder, you have to add indexes on the primary, not the secondaries.) After adding the index, I run the query again:
Woohoo! We’ve now got an index scan on my new index. (Yeah, it’s not a seek, but this is a crappy query, and a scan is the best I’m going to get.) So far, so good.
But now let’s mix things up a little – let’s run that same query, but start a transaction with it:
Now I have an open transaction, which really shouldn’t mean anything because you can’t do updates on the secondary. However, switch over to the primary, and drop that newly created index. Back over on the secondary, where we’ve got that open transaction, run the exact same query again:
Our transaction has been knocked out and failed. It’s funny to think of a read-only transaction as “failing”, but if you have a stored procedure that starts things off with BEGIN TRAN and then does a lot of reads, thinking it’s going to get a complete point-in-time picture of the data, that’s not going to work.
I know what you’re thinking: “But Brent, I don’t even use snapshot isolation, because I read Kendra’s post about it, and I know we’re not ready to test our app with it yet.” Thing is, SQL Server uses RCSI behind the scenes on all AlwaysOn Availability Group replicas – and it’s lying to you about whether or not the feature is enabled. Here’s the sys.databases view for my secondary replica:
SQL Server claims that for the AdventureWorks2012 database, both snapshot isolation and read committed snapshot are off – but it’s lying to you. Read Books Online about querying the replicas and note:
Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.
They’re not kidding – even if I specify dirty reads, I can run into the same problem. Here, I start a transaction, but I’ve asked for the dirtiest, no-lockin-est reads I can get:
But when I add or drop a related index on the primary, even WITH (NOLOCK) transactions are affected:
This starts to point to the challenge of running complex reporting jobs off the secondary replicas. SQL Server is constantly changing the data underneath your queries, and most of the time, it cooperates beautifully. However, if you have a long-running stored procedure (like minutes or hours long), and you want a point-in-time picture of the underlying data while you generate reports on the replica, you don’t want to play around with transactions. Instead, a database snapshot might be a better fit.
If you’d like to run these demos on your own lab, download the scripts here.
Our live webcast topic this week was Q&A – you could bring any HA/DR questions, and we could avoid your answers. Just kidding. Here’s the webcast – we apologize for the audio, WebEx is still getting their act together. It cleans up after the first couple of minutes.
And here’s the questions we couldn’t get to during the webcast:
SQL Server Licensing Questions
Q: If using virtual machines and clustering / failing over at that level (not sql server) is there any reason that SQL Server Standard Edition won’t work? Someone once told us in a sql class that Enterprise Edition was necessary for this.
Answer from Brent: don’t you just love those “someone once told us” things? You’ll want to get them to tell you why. Standard Edition works fine in virtual machines. It may not be cost-effective once you start stacking multiple virtual machines on the same host, though, because you have to pay for Standard Edition for every guest.
Q: Hi, with mirroring being deprecated and AlwaysOn AG only available with Enterprise Edition – what are our HA options going to be with Standard Edition in the future? Any ideas if AlwaysOn synchronous will make it into Standard?
Answer from Jeremiah: You have a few HA choices with SQL Server 2012 Standard Edition and beyond. Even though mirroring is deprecated, you could feasibly use mirroring in the hope that something new will come out. Obviously, this isn’t a viable option. The other HA option is to use clustering. SQL Server Standard Edition supports 2 node clusters, so you can always use it for HA.
Training for Clustering, AlwaysOn
Q: Is there a good resource on setting up a lab environment for a clustering setup?
Answer from Kendra: I’m so glad you asked!
- Matt Velic has put together a guide on how to build a virtual lab with a Windows Server 2012 Cluster with Virtual Box and SQL Server 2012 Evaluation Edition– complete with eBook.
- If folks want to test AlwaysOn, I did a video a while back on how to plan an AvailabilityGroup lab, and our AlwaysOn page has a setup checklist PDF.
How to Manage AlwaysOn Availability Groups
Q: Did you experience or know “split brain scenario” in AlwaysOn Availability Groups that when secondary node is up to take over primary role, the transaction becomes inconsistent? And how to avoid it?
Answer from Brent: Ooo, there’s several questions in here. First, there’s the concept of split brained clusters – when two different database servers both believe they’re the master. Windows Server Failover Clustering (WSFC) has a lot of plumbing built in to avoid that scenario. When you design a cluster, you set up quorum voting so that the nodes work together to elect a leader. In theory, you can’t run into a split brain scenario automatically – but, you can most definitely run into it manually if you go behind the scenes and change cluster settings. The simple answer here: education. Learn about how the quorum process works, learn the right quorum settings for the number of servers you have, and prepare for disaster ahead of time. Know how you’ll need to react when a server (or an entire data center) goes down. Plan and script those tasks, and then you can better avoid split brain scenarios.
Q: Can you recommend any custom policies for monitoring AlwaysOn? Or do the system policies provide thorough coverage? Thank you!
Answer from Brent: I was a pretty hard-core early adopter of AlwaysOn Availability Groups because I had some clients who needed it right away. In that situation, you have to go to production with the monitoring you have, not the monitoring you want. The built-in stuff just wasn’t anywhere near enough, so most of my early adopters ended up rolling their own. StackOverflow’s about to share some really fun stuff there, so I’d keep an eye on Blog.ServerFault.com. You should also evaluate SQL Sentry 7.5′s new AlwaysOn monitoring - it’s the only production monitoring I’m aware of, although I know all the other developers are coming out with updates to their tools for monitoring too.
Q: Is it wise to have primary availability groups in one server of the nodes and have primary groups on another of the servers that form the cluster. Or is it better to have all primary groups on server 1 and secondary on server 2?
Answer from Brent: If you split the primaries onto two different nodes, then you can do some load balancing.
Q: Would you consider Always-ON AG Read-Only replicas as a method to offload or load balance reporting? Looks like the Read Intent option acts like a load balancer for reading off of those DBs, right?
Answer from Brent: Offload yes, load balance no. The read intent options give you the ability to push read-only queries to a different replica, but there’s no load balancing. Your clients just hit the first server in the list. If you need true load balancing, you’ll want to put all of the read-only replicas behind a real load balancing appliance.
Windows Clustering Setup and Management
Q: Where can I find a good list of cluster hotfixes SQL 2008 R2 and perhaps the OS as well?
Jes here. You can go to the Update Center for Microsoft SQL Server to find the latest CU and hotfixes. Check the Support pages for Windows Server 2008R2. Updates aren’t released as cluster-specific. This is why it’s really important to have a test or QA environment that is also set up as a cluster, so you know if the cluster services are affected at all.
Q: What is the recommended order/procedure when you have to do Windows updates to servers in a cluster?
Answer from Kendra: Microsoft knew you were gonna ask this! Check out their “SQL Server failover cluster rolling patch and service pack process” KB here. But do yourself a favor and always deploy patches to a non-production test cluster first and let them burn in a bit.
Q: From your previous answers, it sounded like you don’t recommend use Windows 2008 R2 for AlwaysOn. Can you elaborate bit more on why Windows 2012 is better suited for this? I need more persuasive power to talk the rest of folks of my company to use it.
Answer from Brent: Sure, check out the AlwaysOn Availability Groups Lessons Learned video at the bottom of that page.
Q: Would you have a single DTC group or multiple groups configured for a 4 instance cluster?
Answer from Kendra: There’s no shortcut here: you have to decide on an instance by instance basis. For each instance you gotta determine how much it uses distributed transactions, and how impacted it might be if DTC were to temporarily be offline. Review Cindy Gross’ information on DTC to find out pros and cons of different approaches to configuring DTC.
SQL Server Clustering with VMware and Hyper-V
Q: Is VMWare HA a good alternative to use instead of a Microsoft Cluster?
Answer from Jeremiah: The HA choice comes down to where you want your HA to be managed. VMware HA pushes the high availability question out of the SQL Server realm and into the VMware infrastructure. More than anything else, this is a business decision – just be sure you’re happy with the decision of which team is managing your uptime.
Q: When using a virtualized active/passive 2008R2 cluster with underlying iSCSI storage can the nodes by on different hosts or is FoE needed to have nodes on different hosts?
Answer from Brent: Check out VMware’s knowledge base article on Microsoft cluster support. It lays out your options for iSCSI, FC, FCoE, and more, and separates them by shared-disk clustering versus non-shared-disk (AlwaysOn Availability Groups).
Q: Any thoughts on implementing AlwaysOn in conjunction with a virtual SQL environment using VMWare HA/ Site Recovery Manager (SRM)?
Answer from Kendra: With this level of complexity, when things get tricky it’s incredi-hard to sort out. You gotta have a rockstar team with great processes and communication skills to handle problems as they arise– and you are going to hit problems.
Even if you have the rockstar team, you want to first ask if there’s a simpler way to meet your requirements with a less risky cocktail of technologies. If you rush into what you describe, you’ll find that your high availability solution becomes your primary cause of downtime.
Shared Storage for Clusters
Q: Was reading a great article from Brent on SQLIO. How does this work on a SQL Cluster?
Answer from Kendra: You run SQLIO against the storage (not the SQL Server instance) so it works the exact same way.
Q: After Setting up The Cluster and adding the various CLUSTER DATA Drives how can I add additional Drives after gaining new internal storage?
Answer from Kendra: Before you touch production, make sure you’ve got a lab environment. If you don’t, check out the link above on how to build one. The exact steps to do this are going to vary depending on your version of Windows, your version of SQL Server, and exactly what storage you’re talking about.
For new shared storage on Sever 2008 or later, the basic process is presenting the storage to all of the nodes, bringing the drive online on one node, creating a volume, adding the disk in the failover cluster, and then adjusting dependencies in the cluster as needed. (Dependencies can be adjusted online in SQL Server 2008 and later).
If you have new non-shared storage that you want to use under tempdb (such as SSDs), you’ve got to make sure that every node in the cluster has the drives for tempdb online / volumed/ formatted/ and configured identically, and then you can move tempdb files over to it. You will need to restart SQL Sever to make modified tempdb files recognize the new paths.
Sharding and Mirroring Questions
Q: I have a peer to peer replication with 3 nodes (all bidirectional). Very beneficial but a big pain to maintain. Is that what the industry feels?
Answer from Jeremiah: SQL Server peer-to-peer replication solves a very specific need – the ability to have multiple active SQL Servers where writes can occur and where you can have near real-time updates to the other servers. While peer-to-peer replication meets that need, it has a relatively heavy price tag in terms of DBA expertise, support, and licensing costs. Even experienced teams want to have multiple DBAs on staff to deal with on call rotations and, let’s face it, while peer-to-peer replication hasn’t been deprecated, it’s a difficult feature to work with.
Q: I’ve implemented db sharding on Oracle in several environments. Is there an applicable tech in SQL Server?
Answer from Jeremiah: Sharding is just a buzzword for horizontal partitioning. In a sharded database, either the application or a load balancing router/reverse proxy is aware of the sharding scheme and sends reads and writes to the appropriate server. This can be accomplished with SQL Server, Oracle, MySQL, or even Access. There are no technologies from Microsoft and I’d be wary of anyone attempting to sell something that Just Works™ – database sharding is time consuming, requires deep domain knowledge, and adds additional database overhead.
Q: Currently using SQL 2008 Mirroring. Planning a move to 2012. Your thoughts about jumping 2012 and going straight to 2014 Always On technologies?
Jes here. There were no major changes to Database Mirroring in SQL Server 2012, and I don’t foresee any coming in 2014. Eventually – we don’t have a specific version yet – Mirroring will be deprecated. Read our AlwaysOn Availability Groups Checklist to get an idea of the work involved in setting these up – it’s much more complicated than Mirroring – before you decide to jump in.
Whether you want help choosing between an active/passive and an active/active cluster, or if you’re the kind of DBA who knows that’s not even the right name for failover clustered instances anymore, check out our SQL Server clustering training page.
Business is booming; the sales team is bringing in more business and the software developers are successfully scaling out at the web server tier. There are signs of pressure on the database tier and you realize that it’s time to scale out. You’re using SQL Server 2012, you need to improve performance, and the first thought that comes to mind is the new hotness: SQL Server AlwaysOn Availability Groups.
Get Your Bearings
Before making any infrastructure decisions you need to stop and consider your goals. Ask yourself: What pain am I trying to solve?
If you’re trying to scale out reads without changing much code, AlwaysOn Availability Groups are probably the right idea. With an AlwaysOn Availability Group we can scale out to a total of 5 servers to scale out reads across the board. This is fantastic for scaling out – we change a connection string or change a DNS CNAME and point connection strings at the Availability Group.
The hope and dream is that we can wildly scale with SQL Server AlwaysOn Availability Groups. Reality is different.
The Average of its Parts
Think about this – in order for an AlwaysOn Availability Group to work for scale out reads, we have to get data from the write server to the replicas. The bottleneck isn’t how fast we can read; the bottleneck is how fast we can write.For synchronous replicas, every replica must be at least as capable as the the primary. If peak load on the primary is 16 gigabit per second, the synchronous secondaries need to be able to write at least 16 gigabits of data per second. To keep our application moving, and to handle potential spikes, we need to make sure that the secondaries are sized to be able to handle increases in load from the primary server. In short, we’re going to need to make sure that the secondary servers have the same storage as the primary.
Of course, you’re only going to use those secondaries for the same databases as the primaries, right? Nobody would ever want to add another database to one of those secondaries, right? Keeping in mind that we can only have two synchronous secondaries, how much faster can we make our SQL Server for synchronous data?
We can increase read performance, but write performance isn’t going to move.
The Average of its Parts… Over Time
The game changes when we bring asynchronous secondaries into the mix. With synchronous secondaries, all servers in the mix need to have the same write characteristics – if the primary has eight SSDs, the secondaries need to have eight SSDs (or at least the throughput of eight SSDs). When we’re using asynchronous secondaries, the secondaries have to be able to keep up with the primary’s workload on average.
To figure out what kind of hardware we need for the asynchronous secondaries, we need an SLA. Think about it like this: in order to make a hardware decision, we need to know how long our asynchronous secondaries can take to catch up. If we don’t know how far the secondaries can lag behind the primary, then we can’t accurately size the hardware. Conversely, once we have an SLA, we have to size all of the equipment to make sure that the async secondaries can never fall further behind than our SLA. Of course, that SLA is also going to have to include index rebuilds, bulk loads, and other activity that abuses the transaction log.
The Average of its Parts… with a Perfect Network
Assuming a perfect network, this all holds true. Once network congestion comes into play, performance guarantees go out the window.
During the design stage of building out an AlwaysOn Availability Group it’s critical that the network is taken into account as a critical component of the infrastructure. The network must be able to handle the load of multiple secondary nodes – network performance may cause more problems than poor disk performance.
Consider Your Environment in Aggregate
Before planning a SQL Server AlwaysOn Availability Group deployment, consider your environment as a whole. Make sure that you have the bandwidth available to stream regular operations between the primary and replicas as well as to stream under peak load. Remember – adding four additional servers may put considerable load on your network backplane. Also consider that these won’t be physically cheap servers; if you need performance from the primary server, the synchronous and asynchronous secondaries require enough power to meet your performance SLAs. This isn’t an activity for the faint of heart.
If you have an existing application and you want to scale with money, you may be able to get away with an AlwaysOn Availability Group deployment. You’ll be buying multiple servers, multiple SQL Server licenses, and a lot of high end switching equipment, but you’ll only have to change a few connection strings in your application.
If you’re considering scaling out and you’re willing to make code changes, consider Windows Azure SQL Database. In many cases, you’ll be able to scale out using database federations. Through the changes you’re likely to lose flexibility in how you tune T-SQL, perform database maintenance, and work within your schema. The end flexibility may be worth it, but there may be significant code changes to go along with the database changes.
If you’re considering scaling out and you’re willing to consider many options, take a step back and consider your application requirements. Think about the database features your application is using, how those features fit into different databases, and how those features can be spread across different services. There are a lot of distributed databases out there that may meet your requirements and provide better scale out functionality than Windows Azure SQL Database or AlwaysOn Availability Groups.
Learn more in our AlwaysOn Availability Groups resource page.
SQL Server 2012 introduced AlwaysOn Availability Groups, a feature intended to replace clustering, database mirroring, log shipping, replication, and other technologies by giving us one easy-to-manage feature for high availability, disaster recovery, and scale-out.
One year in, how’s it working out? I’ve done a lot of AG deployments, and in this 30-minute video, I explain some of the gotchas:
You can learn more at BrentOzar.com/go/alwayson.
Many companies are investigating a move into cloud hosting. With the recent release of SQL Server 2012, these very same companies are also looking at a migration to SQL Server 2012 to take advantage of readable secondaries. Combining Amazon Web Services and SQL Server 2012 seems like a perfect match. Unfortunately, there are some gotchas, like limited support for clustering, to keep in mind as you consider deploying SQL Server 2012 AlwaysOn Availability Groups in AWS.
What’s Wrong With Clustering?
AlwaysOn Availability Groups don’t require shared storage, but they do require Windows Server Failover Clustering. Let’s be clear: there’s nothing wrong with clustering.
There’s a white paper from Amazon Web Services that suggests clustering won’t work in AWS because clustering requires support for multicast networking. This changed with Windows Server 2008 – multicast networking is no longer part of Windows Server clustering. For those who don’t know, multicast networking is delivering one packet from the source computer to multiple destination computers. Multicast relies on the intermediate hardware making the decision when to make copies of the packets to send to the destinations and, in general, relies on the hardware making that decision at the latest possible moment. Because so much of the AWS infrastructure is virtual, a lot of these decisions become trickier for the intervening networking gear (which might also be completely virtual). Ultimately, this is why you’d want to avoid multicast networking in a purely virtual environment. Since Windows Server 2008 and newer don’t use multicast networking, it’s not an issue.
It’s Not You, It’s… You
If multicast networking isn’t what stops clustering from working in AWS, what does?
Every cluster has a cluster access point – a name and IP address used to administer the cluster. The cluster access point is a virtual IP address, when you set up the cluster access point, Windows creates a virtual IP address that is used to provide a way to access the active node in the cluster. This isn’t strictly necessary for SQL Server clustering, but it does make life easier when managing the underlying Windows Server Failover Cluster.
In addition to each server’s individual IP address, the cluster access point requires its own IP address. This is where things start to get sticky. When you assign static IP addresses in AWS, you don’t assign the IP address at the server level – you assign the static IP address to a virtual ethernet adapter and attach the ethernet adapter to your EC2 instance. Each of these Elastic Network Interfaces (ENI) can only be assigned one IP address, for now.
Creating an Availability Group, Is It Possible?
Since it isn’t possible to create virtual IP addresses it would be easy to conclude that it isn’t possible to create an availability group – during availability group set up SQL Server has to get in touch with the cluster control point. There’s a way around this limitation – along as you create the availability group on the current host server of the cluster, then you can create the availability group.
It is possible to create an availability group, however it isn’t possible to create a TCP listener for the same reason that it isn’t possible to create the cluster control point – it isn’t possible to have more than one IP address per ENI in AWS. This doesn’t mean that AlwaysOn Availability Groups are a non-starter in AWS, but it does mean that you won’t have access to the full AlwaysOn Availability Group functionality until later this year when Amazon introduces multiple IP addresses per ENI.
Faking Out the TCP Listener
Here’s what we know so far:
- It is possible to set up a Windows Server Failover Cluster in AWS.
- It is possible to set up AlwaysOn Availability Groups on our Failover Cluster in AWS.
- It is not possible to configure a Cluster Control Point or TCP Listener.
It is possible, though, to get much of the same benefit of the TCP Listener without using the SQL Server’s built-in functionality. A load balancer like HAProxy can be used to create groups of readable secondaries for different applications. True, your applications won’t be able to take advantage of the other TCP Listener features like specifying read only intent in the connection strings, but it will be possible to create multiple connections in HAProxy that will work the same as connections specifying read only intent. HAProxy has the ability to use sticky sessions, monitor connections, and detect when servers are offline and route traffic away from the dead server. People have been doing this with MySQL for years.
Where Do We Go From Here?
Teams using AWS who need SQL Server 2012 Always On Availability Groups can start making use of these features right now. All of the benefits of SQL Server 2012 Always On Availability Groups won’t be available, but it will be possible to use multiple readable secondaries for scale out reads. Many of the features of Always On Availability Groups can be configured using a load balancer like HAProxy for now and configured using native functionality once multiple IP addresses can be attached to a single ENI. With a little work and some patience, you’ll be able to use this feature right now and be ready for when AWS supports virtual IP addresses.
If you want to learn more about how to design, test, and manage this slick feature that offers high availability, disaster recovery, and scale-out reads, visit our AlwaysOn Availability Groups resources page.
What a fast week. I was in Dallas this past week doing a project, and the week simply flew by. I’ll be back up there again this coming week, which means I’ll miss the HoustonWireless monthly meeting for the third time in a row. Not thrilled about that, but I’ll live.
The server move to Dallas didn’t go as smoothly as I’d planned. I’m still basing my mail server out of Houston for the time being, while I figure out the finer points of Imail 7. I’m in the midst of changing over to Imail from Exchange, and I’m not exactly tickled pink with it. You get what you pay for – Exchange rocks. It’s a memory hog, but it’s got an awesome UI.