Sizing SQL Server for AWS

Last Updated 7 years ago

Let’s skip the shenanigans and get right down to it – you’re responsible for SQL Server and someone in your company wants to move things into Amazon Web Services (AWS). You’ve got SQL Server setup covered thanks to our helpful SQL Server setup checklist and you’re confident you’ve got hardware under control, but things are different up in the clouds.

The question on your mind is “What’s the best way to size a SQL Server in the cloud?”

The Basics: Sizing the Server

Amazon gives you a set list of server sizes to choose from. It’s up to you to pick the right server for the job. The available choices can be overwhelming at first glance, but it gets simpler when you consider different classes of SQL Server and map them to different families of AWS instances.

By the way: we call servers instances in the cloud. Yes, it’s confusing. No, we can’t change it.

Instances are grouped into instance types – you can think of an instance type as a general category of performance. This is going to be your starting point for deploying SQL Server. Don’t worry, though, you can usually change the instance type after you’ve configured SQL Server.

Possible SQL Server instance types:

M3 General Purpose – These are basic servers with up to 8 cores and 30GB of memory. M3 instances won’t win any awards for speed, but they are cheap.
R3 Memory Optimized – These instances ship with up to 32 cores (2 sockets) and 244GB of memory. Two local SSDs are thrown in – seems like a good place for TempDB to me!
I2 IO Optimized – Sounds great at first, right? All the memory of the R3 with 6.4TB of local SSD. At only $7 an hour, it’s a steal!

So, which one do you need? Remember that you don’t want to just choose the biggest and baddest instance right off the bat – there’s a noticeable cost associated with any of these instances.

We’ll get back to picking an instance type after covering a few other details.

Support Level

In many organizations, DBAs are lucky to have tiered support. Mission critical applications like POS systems, or credit card processing get put in the highest tier of support. Internal applications may end up in a second tier. And, finally, legacy applications or applications that are not critical in any way to the business end up in the third tier.

We can’t ask AWS to give us faster disks, SAN replication or some other magical enterprise HA feature. But we can use this support information to figure out if this SQL Server requires the attention of a full time DBA or if it can be offloaded to something like Amazon RDS. Making this decision is outside the scope of this article, but knowing that there is an option is enough.

What’s the support level required for this application?
Does the application require regular manual tuning?
What’s the support SLA?
Is there an incident response plan?
- Has anyone ever reviewed the incident response plan?
- Have you ever had to follow it? That is: does it actually work?

Networking

I bet you don’t even think about networking.

The first thing you need to know is: by default, AWS instances use a single network card (NIC) for all traffic. Storage and communication use the same pipe. If you want to use database mirroring and rebuild your indexes, you may be in for a nasty surprise when mirroring drops in the middle of a rebuild.

Second: not all instance types are networked equally. Look in the Networking Performance column to get a vague idea of what’s going on. “Moderate” equates to 500 Mbps, “High” is 1000 Mbps, and “Low”/”Very Low” shouldn’t be entering the equation for a SQL Server.

Watch the storage and network throughput of your SQL Server to verify how much total throughput you need.

For SQL Server it’s better to guarantee a certain level of both network and disk performance. AWS offers what are called
EBS-Optimized Instances. These EBS-Optimized Instances use a separate network to communicate with storage in AWS. Traffic on the storage network won’t cause problems for traffic on the regular network.

Use EBS-Optimized Instances for your most demanding applications.

Other applications may not require as much throughput, but you should adjust maintenance to prevent the shared network from becoming a bottleneck.

Network considerations are:

What is typical network throughput for the application like?
Do you need separate storage and networking pathways?

Storage

storage

I’m going to repeat myself: Use EBS-Optimized Instances for your most demanding applications. You can change this after the instance is up and running but it will require a restart. Plan carefully.

For I/O intensive workloads, AWS offers Provisioned IOPS (PIOPS). The PIOPS option lets you decided on a guaranteed number of IOPS (+/- 10%) available to the system using that disk. PIOPS disks have an increased cost associated with them – they’re 2.5 times more expensive per GB than regular storage and you pay $0.10 / IOP / month.

Stripe drives with caution. In a conventional data centers, it’s a best practice use RAID to configure large volumes and to stripe data across many disks for additional performance. Although RAID striping is possible in AWS, keep in mind that any RAID volume is only as fast as the slowest disks in the RAID volume. Since all permanent storage in AWS is networked attached (and subject to unpredictable latency), think very carefully before setting up RAID volumes.

Storage is somewhat limited in AWS, too. Each AWS disk (EBS volume) can be up to 1TB in size. Up to either 16 or 25 disks can be attached to an instance (this depends on some driver limitations). At a maximum, 25TB of data can be stored on an AWS instance using EBS volumes. I bet you were worried for a minute, right? Anyone wanting to store more than 25TB of data will need scale out across multiple servers.

Update: Storage is no longer limited in AWS – EBS volumes are now limited to 16 TB per volume. Each volume can also perform 20,000 IOPS. Of course, those IOPS still need to make it to the server, and you may be limited by raw throughput, but storage capacity or responsiveness should no longer be a bottleneck for all but the most demanding applications.

Thanks to tawzic for reminding me of this in the comments.

The storage considerations are:

How much storage do you need for data, logs, and backups?
How fast does that storage connectivity need to be?
How much are you going to access that storage in any given second?
Do you need to guarantee the accesses per second?
What is the rate of data growth?
What is the rate of data retention?
What is the total volume of data being stored?

Picking an AWS Instance Type for SQL Server

So how should you go about picking an AWS instance type for SQL Server?

Take a look at your memory requirements for SQL Server and compare that the instance types on offer in AWS. This may exclude several instances types immediately.

After answering the storage questions, determine if you need provisioned IOPS, dedicated storage connectivity, or even 10Gb networking. This further narrows down the choices you have for instances.

Once you’ve reached this point, you should have several choices of instance size available.

Put It All Together:

Network
- What is typical network throughput for the application like?
- Do you need separate storage and networking pathways?
- Are features in place that require additional network throughput?
  (This might be AlwaysOn Availability Groups, database mirroring,
  or log shipping.)
Storage
- How much storage do you need for data, logs, and backups?
- How fast does that storage connectivity need to be?
- How much are you going to access that storage in any given second?
- Do you need to guarantee the accesses per second (IOPS)?
- What is the total volume of data being stored?
Memory
- What are the memory requirements of the existing SQL Server?
- What is the use case for this SQL Server? OLTP? DSS?
- What other services will run on this system?
- Do you need Enterprise Edition?
CPU
- What is the current CPU count?
- What is the current CPU speed?
- Do you require any Enterprise Edition features?
Support
- What is the support agreement for this SQL Server?
- What’s the support level required for this application?
- Does the application require regular manual tuning?
- What’s the support SLA?
- Is there an incident response plan?
  - Has anyone ever reviewed the incident response plan?
  - Have you ever had to follow it? That is: does it actually work?
Instance Size: If you still have choices left to make between one of several instance sizes, ask a few more questions:
- What is our current rate of activity growth? (e.g. as data volume doubles, will user activity double?)
- How much will this instance cost over 1 year? 3 years?
- Is a reserved instance a possibility (taking into account your growth rate)?

When you put it all together, sizing a SQL Server in AWS is a lot like sizing a on-premises SQL Server with several constraints around sizing and capacity.

Bonus questions:

Performance Profile
- Is there a performance SLA?
- What are the top wait types of an existing SQL Server?
- What aspects of the application can be changed?
Doing Less Work
- Can you cache more data?
- Can you offload searches?
- What about data warehousing?

Ignorable AWS Instance Types

Instance types we can completely ignore:

G2 GPU – This is for people who do math on graphics cards.
C3 Compute Optimized – Instances with a lot of CPU, some local SSD scratch space, and not a lot of memory. Great for HPC, not great for SQL Server.
HS1 High Storage Density – These monsters can have up to 24 2TB local drives delivering 2.6 gigabytes per second of disk throughput. At first glance, they look like a dream. However, the data stored on the local disks will be erased if you stop and then start your AWS instance. You could try to use AlwaysOn Availability Groups, but a total AWS Availability Zone or Regional outage could lead to massive data loss. These are primarily intended for clustered computing like Hadoop or distributed data warehouses (GreenPlum).

Capturing Deadlocks in SQL Server

In-Memory Analytics: What’s Different Between SQL Server and Oracle

10 Comments. Leave new

Chris
June 30, 2014 3:27 pm

> If you need an EBS-Optimized instance, you must check the box when you build the instance.

So long as the instance type supports EBS-Optimized, you can turn the option on or off after building the instance using the EC2 CLI.

ec2-modify-instance-attribute instance_id –ebs-optimized Boolean

http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/ApiReference-cmd-ModifyInstanceAttribute.html

Reply
- Jeremiah Peschka
  July 1, 2014 9:44 pm
  
  Great catch! When this feature was first introduced, it was a set it at creation or nor at all. I’ll update the post. Thanks!
  
  Reply
Bill
January 28, 2015 11:13 am

Would one Windows LUN made up of many EBS PIOPS volumes cause an IO bottleneck for SQL? I do have separate Windows drives for data, logs and tempdb, but I don’t know if splitting the drives to match EBS volumes is a better idea. Any thoughts?

Reply
- Jeremiah Peschka
  January 28, 2015 11:32 am
  
  Maybe. If you’re on Windows Server 2012R2, you shouldn’t see much of a performance hit. Personally, I think it’s easier to set up multiple EBS volumes and just use multiple volumes instead of creating a single virtual disk in Windows. This will make it easier to determine if you’re getting bad performance on any single EBS volume or if they’re all having performance issues. After all, a RAIDed volume is only as fast as the slowest disk in the set.
  
  Reply
  - Bill
    January 28, 2015 11:38 am
    
    Thanks Jeremiah!
    
    Reply
eric twilegar
August 31, 2015 2:34 pm

I wouldn’t completely ignore the c3 types ( c4 now ). We had some issues with CPU bottlenecking on r3 2xlarge types. This could have been CPU steal or other issues and we are still trying to verify a bit. Under heavy load our CPU’s would spike to near 100%. We did a lot of work optimizing our app, but eventually we just had to try switching our instance to a c4.8xlarge with a ton of cpu, but only 60 gigs of memory 🙁

The results where pretty amazing. The CPU issue basically went away and we could scale our app quite a bit more. Our app has legacy stuff in that we are slowing eating away at, but even so.

We found I/O to be very little of an issue for us. We used optimized EBS and the latest and greatest SSD EBS is quite good.

In general I think AWS does a pretty poor job of specing their types when it comes to bigger types of instance. I don’t really need 64 cpus, but I do need lots of memory and I need highly dedicated CPUs.

We are also going to look at the dedicated hardware type to see if being isolated makes a difference.

I’ll add another comment if we find our r3.4x and 8x instance types make the same difference as the C4. Keep in mind the c4 is also haswell vs ivy bridge so you might get some extra kick from it.

Reply
- Jeremiah Peschka
  August 31, 2015 5:21 pm
  
  Good to know that you got things working for you. It’s always possible that you saw a dramatic increase when you moved from a v2 Xeon to a v3 Xeon, too.
  
  I would still avoid the compute intensive instance types by default unless you know the application has high CPU requirements – the licensing cost is high enough to make most people run away.
  
  Reply
  - eric twilegar
    November 21, 2016 10:33 pm
    
    Its been a year and we just bit the bullet and got the extra licensing. Hoping R4 solves this issues so we can get back down to a more reasonable size. Crossing fingers that R4 comes out at reInvent this year.
    
    After more research I’m starting to think AWS has a fundamental flaw in the way it allocates resources.
    
    When you buy a dedicated instance you have to say you want it to be a particular instance types That is you say its a r3.2xlarge then all instances on that “box” have to be r3.2xlarges. The obvious problem here is that inevitably a r3.2xlarge is going to be a heavy app. You can only have 4 on the dedicated instance, so you basically are putting 4 heavy apps on the same CPUs.
    
    If this holds true for all EC2 instances in the AWS world then you see the issue. Basically AWS is setting up all 2xlarge and 4xlarge folks for failure and basically pushing them towards 8xlarge instances where you are basically getting bare metal. This is why Netflix basically just uses the big boys, but they have the $$$s.
    
    Does AWS actually do this for normal EC2? Is there a t2.micro running along side a r3.4xlarge. I have no idea, but I somehow doubt it. From what I’ve seen they have been hush hush about this sort of thing. If you know please do tell.
    
    If MS would relax their licensing model a bit so companies like mine could afford secondary readers we could probably scale this problem horizontally. Just too much money to get around flaws inherent with SQL Sever and traditional dbs to begin with. If the app was just a few years younger I’d be on MySQL and I’d be scaling horizontally. 2016 SP1 didn’t solve this problem because they didn’t extend the Enterprise vs Standard issue to AlwaysOn type functionality. We need Standard priced readers! Why pay Enterpirse for a box that can’t do writes. Give us writable secondaries and I’d pay Enterprise $$$ for that.
    
    Alas, I wish I could have a high speed pipe between AWS and a RackSpace style bare metal for this one piece of equipment. I’d save money and have a better system. I’d probably even be able to just have a fail over cluster with a log shipper to a different region and really save the cash. One can dream.
    
    Reply
tawzic
September 8, 2015 1:22 am

The new EBS Volumes are up to 16tb Each and 20000IOPS
https://aws.amazon.com/blogs/aws/now-available-16-tb-and-20000-iops-elastic-block-store-ebs-volumes/

Reply
Michael J Swart
November 13, 2019 11:49 am

Late to the game, but I like how this is stil relevant five years later:

“If you want to use database mirroring and rebuild your indexes, you may be in for a nasty surprise when mirroring drops in the middle of a rebuild”.

In 2019, put another way, it’s still true that when using Availability Groups in AWS, there are still problems when rebuilding large indexes.

Reply