The schedule is out! Here’s where we’re cruising next year:
SQL Cruise Miami 2012 departs from Miami, FL on Saturday, January 28th on the Norwegian Epic. We spend two days at sea training, then spend all day Tuesday in beautiful St. Marten. On Wednesday we hang out in St. Thomas just a ferry ride away from my favorite island on the planet, St. John. Thursday we get our learn on again, and then Friday we spend an afternoon in the Bahamas before arriving back home on Saturday.
SQL Cruise Alaska 2012 leaves Seattle on Saturday, May 26th on the Norwegian Jewel. When we’re not learning at sea, we’re spending entire days in Ketchikan, Juneau, and Skagway, plus an evening in Victoria before returning to Seattle a week later. You’ll come back home rested and ready to tackle old problems with new techniques.
My Sessions on SQL Cruise 2012
Scaling SQL with Solid State – SSD prices are coming down and adoption is going up. My clients have used them to solve all kinds of performance problems, and we’ve found some surprising results. In this session, I’ll explain the internals of these fast drives and show the pros and cons of the various connection methods (SATA, PCI Express, SAN). Then we’ll switch over to the SQL Server world to see how to tell when and where you need SSDs. You’ll even get real-world before-and-after metrics to help make the case to management.
Procedure Cache BI – SQL Server stores tons of data about SQL statements and execution plans in memory, but unfortunately a lot of it is hard-to-query XML. I’ve got the fix: we’ll treat it like a BI project. We’ll build an ETL process to get the data out, normalize it into a reporting structure, and then make smart tuning decisions based on the aggregated findings. You’ll step off the boat armed with ready-to-go scripts to do consultant-style analysis right away.
SAN Features for Senior DBAs – SAN snapshots and SAN replication can be the DBA’s best friend. They bring one-click disaster recovery options to antiquated SQL Server 2000 instances, and one-minute backup options to even the largest data warehouses. In this session, we’ll cover how my clients have saved time, money, and manpower with these features. We’ll briefly dig into the “how” using NetApp as an example SAN, and then cover the SQL Server management techniques that work regardless of the SAN vendor.
There’s more sessions from other presenters including:
- Making You Look Smarter One DMO at a Time – Tim Ford (Miami & Alaska)
- The Lazy DBA – Tim Ford (Miami & Alaska)
- SQL Server in Amazon’s Cloud – Jeremiah Peschka (Alaska)
- SQL Server Internals – Jeremiah Peschka (Alaska)
- Faking Big Data – Kendra Little (Alaska)
- Everyone Wants a Piece of This – Kendra Little (Alaska)
On SQL Cruise Alaska, I might never get off the boat, though, because Jeremiah, Kendra, and John & Yanni Robel teamed up with me to get the Garden Villa room. Here’s a video tour:
That’s the perfect place for SQLCruisers to gather when we’re not in class. SQL Cruise isn’t just about the classroom training: it’s about access to some of the brightest minds in the SQL community. You get the ability to network with MCMs, MVPs, and SQL Server professionals in a relaxed, no-holds-barred environment where no question is off limits. Bring your technical challenges and your career challenges.
Come join us – check out SQL Cruise and register today.
When you need to reduce the number of database servers, the worst thing to do is just pile ‘em all onto one and hope it goes well. A successful virtualization or consolidation project involves a simple set of steps.
Inventory the existing SQL Servers and databases. In a perfect world, we’d already have an up-to-date inventory of every database on every server including its OS version, SQL version, security needs, HA/DR plans, performance requirements, and more. However, even good inventories are often missing a few key pieces required for a consolidation project – for example, compatibility lists for any third party vendor software installed on the server. If we’re hosting a database for an ISV application, we need to know what versions of SQL Server the vendor supports. I frequently see ISV apps that are no longer under maintenance or have been abandoned by the original manufacturer, and in those cases, we may not be able to move them onto a newer version of SQL. In the video at the end of this post, I cover some of the attributes you’ll need to inventory.
Categorize them into groups by attributes. After building our inventory, we can take a few steps back and look at similarities. If we’ve got a lot of ISV apps that still require SQL Server 2005, don’t need high availability, and aren’t performance-sensitive, we can group them together. If we find a lot of homegrown apps that are mission-critical and can’t wait to get onto the newest version of SQL Server, there’s another category. These groups of databases often pop right out of the list once we see the big picture.
Ask the right questions to the right people. Talk to the application owners (business users) of each database to find out more about their long term plans. When we need to merge multiple databases onto less servers, and we make our plans known to the business, we often discover hidden business requirements or budgets. You’d be surprised how often people make money available to keep their own hardware, or decide they’d like to abandon a database altogether rather than deal with upgrade requirements.
Choose a consolidation method to meet business needs. Some of our options include active/active clustering (also known as multi-instance clustering), virtualization, and cloud computing. Each one has its own pros and cons, and I discuss these in the video below.
Build the new environment and load test it. Before we run a single live production database on our newly created consolidation/virtualization hosts, we need to make sure they’ll be able to sustain as much throughput as possible. CPU, memory, storage, and network should all be tested using both off-the-shelf hardware synthetic testing tools like CrystalDiskMark and database testing tools like restoring your own databases and running known processes against them.
Plan the right migration method for each databases. We’ve got lots of options including copying the MDF/LDF files, doing a backup/restore, or performing log shipping or database mirroring to the new server. Log shipping and mirroring give us the ability to migrate databases with a matter of a few seconds downtime, but require more work and planning up front. Copying the MDF/LDF files or doing a backup/restore involves more downtime, but take less planning. It all comes down to business needs.
Test the plan, work the plan, and monitor metrics to ensure success. No matter what consolidation or virtualization method you pick, be very comfortable with your migration plan ahead of time. You’ve got a new set of servers for the transition – why not test your plan several times first to make sure it’ll work on go-live day? After the end users start logging in, know which performance metrics you’ll monitor to know if the app is working well or choking. Ideally, you’ll even have a plan to mitigate performance issues as they come up.
In this video from our Tech Tuesday Triage webcast series, I explain two of these steps – building an inventory and choosing a consolidation method:
Download the zipped spreadsheet and the PDF version of the PowerPoint slides mentioned in the webcast.
If you want help virtualizing or consolidating your SQL Servers, contact us. I’ve managed hundreds of SQL Server migration projects, and I’m completely comfortable helping businesses get out of end-of-lease hardware or out-of-support SQL Server versions.
Writing and deploying an application is pretty easy, there’s a process to follow that will get you most of the way there. The tricky part is getting the business logic down. But this isn’t about writing code, this is about what happens once you have an application out in the wild. The tricky part comes when you have to grow with your application. There are some tricky parts of keeping an application up and running. Things get complicated when you start trying to add complex functionality for your data. For those of us who use SQL Server there are a lot of options within SQL Server, but some of them require a seasoned DBA, expensive features, or both.
Making sure your data sticks around is important. Losing data can be catastrophic for a business. Even if you have a backup, it can take hours or even days, to restore your database and verify that everything is back up and running. That doesn’t even include the possibility that a backup is corrupted and you won’t be able to bring your database online. Or, worse yet, you could lose your SAN completely. Durability is important.
With SQL Server it’s possible to set up a cluster, mirroring, or replication to provide a second copy of your data for readability. These three solutions all require knowledge of some specialized features of SQL Server. Keeping each of these features up and running requires some knowledge, planning, set up, and monitoring. None of it is easy, some of it’s hard, and some features even need a lot of care and feeding to stay up and running. DBAs tell horror stories about hand feeding replication for days to get it up and running again after a catastrophic failure; it’s almost a badge of honor. Here’s the thing: none of this should be difficult. One way data durability (mirroring, transactional replication) is limited in its functionality – reads can happen everywhere, but writes can only happen in one place. If the master server goes down, it can be difficult to switch the master server around.
Durability problems plague DBAs every day, whether they know it or not. I spent a lot of time trying to solve my durability problems by configuring replication or log shipping and building complex application logic to support the possibility of multiple write locations. I later solved the problem by using Riak. One of the main benefits of Riak is the availability and fault tolerance it provides. That is: it’s durable as all get out. Data isn’t written to one place, it’s written to several places at once; writes won’t succeed unless more than one server responds that a write is successful. In a way, it’s like a really fancy version of database mirroring or SQL Server Denali’s Availability Groups feature that you can have right now.
When a server in a Riak cluster eventually fails, as hardware always does, you could recover it using a filesystem backup. Riak will handle making sure everyone has the right data by using a very cunning technique called read repair. If the server completely fails the other option is to simply replace it. You tell the cluster of servers that one server is gone and another server is taking its place. The cluster then figures out how to perform recovery or redistribute the data. It’s a lot less painful than shipping full database backups or replication snapshots across the data center or even across the country.
Relational database, and SQL Server in particular, use B+trees indexes to store data. B+trees provide reasonably fast look ups of data (any lookup of a single row will need to traverse the same number of pages on disk as any other lookup). By contrast, inserting data into a B+tree is not always particularly fast – page splits may occur, forcing data to be shuffled around on disk and moving things out of order could require intermediate pages to be updated. Because of the B+tree structure, many SQL Server DBAs advocate using an ordered, constantly increasing key for clustered indexes.
Riak, by contrast, defaults to using the Bitcask storage back end (with the option to use several others). Bitcask is unique in that it doesn’t use a B+tree to store data on disk. Instead Bitcask uses a pair of data structures – a series of data files (written as log-structured hash table) and an in memory directory of keys (a keydir) to make it easy to find records in the log-structured hash table. Reading data out of Bitcask will take two hops – one for the keydir and one for the data file – versus many potential hops in larger SQL Server tables (probably around 4 physical hops with a cold cache on a large-ish table).
There are two things that immediately stand out to me about Riak’s data latency. One is that Riak’s latency is predictable. Any write is going to take as much time as it takes to stream the data to disk. Since Bitcask is a log-structured hash, there’s no possibility of fragmentation; data is written in the order it arrives. Like some other databases, Riak does not perform in place updates of data. Instead, a new record is written in the log-structured hash table and the keydir is updated with the location of the new record. Because of this, it’s easy to predict how long an insert or update will take – as long as it takes to write that many bytes of data to disk.
Just as Riak provides a predictable low latency for writes, it also provides low latency for reading data from disk. Riak doesn’t use locking, so there can be no blocking. Instead read latency boils down to the amount of time that it takes to pull data off of disk. The more data there is in a given record, the slower the read is going to be. Obviously, there’s some seek time involved, but it’s negligible when you consider that a seek involves a single memory read and a single disk read.
Full Text Search
SQL Server’s full text search has caused a lot of problems for a lot of smart people. The more complex the schema and the more data load involved, the more likely that SQL Server’s full text search is going to have some problems. It’s a great tool, but there are some limitations and tuning full text queries is very different from tuning regular T-SQL queries.
A problem around full text search is the inability to scale the full text search service independently of the SQL Server. They’re tied together on the same physical instance. If you need to increase the performance of full text search, you must resort to increasing the performance of SQL Server, including the same licensing costs for SQL Server. Anyone how has gone from a 2 socket to a 4 socket machine can tell you that a doubling in license costs isn’t trivial. Full text search can, like many things, be moved to a separate SQL Server using replication, but SQL Server’s transactional replication has a reputation for being a bit manpower intensive. Many teams eventually move full text search to something other than SQL Server. Pushing full text search outside of the database engine frees up the database engine to serve other queries. Some people, like the fine folks at StackOverflow, have used Lucene.net. One of the advantages of using a full text search engine is that it offers phenomenal flexibility. Unfortunately, both SQL Server’s full text search Lucene and limited to a single node. Riak Search is Lucene compatible, flexible, and durable.
Riak Search automatically indexes documents whenever they’re saved in a specific bucket, just like SQL Server. Server side triggers are created to asynchronously index data as it is saved. Unlike SQL Server’s full text search, it’s possible to create custom functionality for searching. Different word breakers and tokenizers can be created to support different methods of indexing. Riak Search also allows complex documents to be indexed using custom schemas, much like Lucene.
The icing on the cake, for me, is that Riak Search can be used to provide linear scaling – as servers begin to run out of capacity, additional nodes can be brought online to quickly scale out. The upside of Riak Search is that there’s also durability built-in: the search indexes are spread across the cluster. Spreading indexes across the cluster, term partitioning, means that load from complex queries can be served by many nodes at once making it possible to provide overall higher query throughput on large data sets.
Making It Work
It’s fair to say that many people using SQL Server are also using the .NET Framework. While Riak is implemented in Erlang and runs on Unix based operating systems, it’s easy to work with SQL Server and Riak together in the same environment. CorrugatedIron is a community developed, open source, .NET library for Riak. While it’s still under heavy development pending Riak’s 1.0 release later this month, CorrugatedIron makes it possible to develop cross database functionality to save data in SQL Server and Riak without requiring developers to learn a new programming language or operating system. Functionality can be moved out of SQL Server and onto a distributed platform where functionality can be scaled horizontally and linearly.
This isn’t to say that I’m advocating for teams to abandon SQL Server in favor of Riak. In fact, that’s the opposite of what I want to say. Teams should pick a best of breed solution for their data storage needs. SQL Server is a great relational database, but there are some things it doesn’t do as well as we’d like. Riak is a great distributed database that is impervious to single node failure. They both provide different features and functionality that complement each other very well. Riak removes single points of failure, provides rich full text search functionality, and provides consistent latency guarantees. SQL Server provides rich querying semantics on high structure data and support for atomic transactions.
What’s a post about the DotNetRocks podcast doing over here? Turns out that I recorded a podcast! You can check it out on September 27th (that’s in just a few days). So what the devil did I talk about?
It turns out that there’s some interested in the .NET library for Riak that I’ve been working on. Over the course of the interview, we hit on some of the difficulties with relational and non-relational database, how Riak solves them, and how CorrugatedIron solves some of the problems that developers can run into working with Riak. Along the way we talk about Riak, NoSQL, distributed data storage, load balancing, and functional programming. It was a great conversation and I’m glad I had the chance to chat with Carl and Richard.
When it goes live, you’ll be able to find the recording over on DotNetRocks.com
How do you know which indexes to create to improve your database performance? Microsoft Certified Master Kendra Little shows you how to find a missing index, decide on the best index definition, and measure the usefulness of your new index. If you know the basics of what a database index does and want to learn the smart way to design indexes, this talk is for you.
Want to play along with the video with the sample scripts? Check out the links below the video.
Here’s what you need to miss (and fix) these indexes at home:
You’re stuck in a rut. You’ve been writing the same thing for the same way for months.
You’ve deluded yourself into thinking you’ve built up a style that works for you, but that’s the thing about style: it rarely lasts forever. As Heidi Klum says, “One day you’re in, and the next day…you’re out.”
Project #1: Tell a personal story first and a tech story second.
There are millions of dry, boring technical posts around, but there’s only one you. I bet you’ve got some hilarious, awkward, touching, or endearing moments from your past that you’d love to share, but it just doesn’t feel right on your professional blog.
Rather than thinking about a technical feature that you want to illustrate with a personal story, turn it around. Pick your favorite story that you’d like to tell, pour your heart out, and then figure out how to tie something technical into it.
I used this approach with my SQL Server Data Compression: It’s a Party! post. I look back and laugh at the party that got me disinherited from the Ozar Family (Lack Of) Fortune. I loved the story, so to get it out there, I tied it into a SQL Server feature that’s been written about many times, and I brought a personal touch in the process.
Project #2: Forget the answer – describe the question.
It’s not always about helping the community by giving the answer. Sometimes you can help just by clearly explaining all of the challenges involved with a question.
My recent post How Do You Mask Data for Secure Testing post was inspired by @ErinStellato‘s question on #SQLHelp. Twitter’s 140-character limit just doesn’t allow for the full explanation of a complex question like this, and I quickly got frustrated with people suggesting half-ass solutions that I’d seen fail. Rather than tweet one limitation at a time, I poured my heart into the question in a blog post and let readers contribute ideas.
In that particular situation, I knew there wasn’t a good answer, but that’s not always the case. Even when you know there’s several good answers, try to write a post just describing the question. For example, how do you find out the most resource-intensive queries running on a system? Try to write an entire post just describing the problem will improve your ability to step back and see the big picture without taking a knee-jerk reaction to explain a tool.
Leaving the answer open for debate in your blog’s comments encourages readers to get interactive.
Project #3: Schedule a blog post months in advance.
The next time you’re about to hit Publish on a blog post, ask yourself if you’re really proud of that post.
If not, don’t publish it. Click Schedule, and use a date three or four months from now. Walk away from the post because at this point, the pressure’s off. Sure, it’s kinda sorta good enough, and you were okay with going live with it right away, but instead it’s just going to get better with age.
Days or weeks from now, you’ll be inspired. You’ll think of several ways you want to improve the post, and you’ll jump in to write more when you’re in the zone. You’ll be tempted to revise the publication date earlier, but don’t give in. The improvements will just keep coming with time, and when the final publication date arrives, you’ll be giddy with anticipation for the world to see your polished, honed work.
Blog posts are the opposite of milk: the younger they are, the worse they smell. Blog posts are more like wine: you want to craft timeless words that will snowball and bring you more and more visits over time.
Project #4: Ask a real writer for their opinion.
Don’t ask a fellow technology blogger. Print out your blog and take it down to your company’s marketing department – the people who write the brochures. If you have the choice between someone who writes press releases or someone who writes brochures, pick the brochure person, but take whoever you can get.
Say, “I’d like your honest, brutal opinion about something. I’m trying to improve my personal blog. Absolutely nothing is off limits. If you could throw this thing in the Author-o-Matic and remix it completely, what would you do differently?” Tell them to ignore the grammatical mistakes for now and save their red ink for big-picture stuff. (They’ll probably mark up the grammar anyway, but I’m just trying to make you feel better about the inevitable stream of red ink. It happens to me too – Jeremiah constantly kicks my ass about my addiction to commas.)
You don’t have to obey their every whim, but getting this completely different view of your work will help you see things in a new light. Your blog doesn’t have to be a brochure, but I bet they’ve got tricks that will help bring life to your prose.
Project #5: Make a list of storytelling tools you’ve used, and skip them once.
I bet you’ve got at least a few favorite blogs that you could identify even if the author info was missing. They always use exactly the same tools to tell a story: they always use code snippets, always use screenshots, always use polls, always use SEO-friendly titles, etc. It’s great to have an identifiable brand, but that doesn’t mean you have to stagnate as an author.
Reread your blog posts from the last several months and make a list of every non-text tool you used. If that list is short, it’s time to teach your old dog some new tricks. Over the next week, as you read other peoples’ blogs, make a list of the non-text tools they use. Get inspired to experiment.
In my Building a Better BrentOzar.com post, I talked about experimenting with pull quotes. It’s free, it’s easy, and it brings a new dimension to your blog posts. Even better, it makes your posts look like something completely new to the tech community, and that brings us to our last project.
Project #6: Read posts from a completely different genre.
If we only draw inspiration from the database blog community, our content is going to look like the British Royal Family’s gene pool. I try to read at least five blog posts from completely new (to me) blogs per week.
Database servers suck, don’t they? Don’t you just hate dealing with these damn things? They’re totally different than any other server in the shop, making a mockery of any CPU, memory, and storage you throw at ‘em. What the hell’s going on inside? Today I’ll give you the basics – starting with the question every sysadmin always asks me:
Why Isn’t SQLServer.exe Using Much Memory?
When you remote desktop into a server and look at Task Manager, sqlservr.exe’s Mem Usage always seems wacky. That’s not SQL Server’s fault. Task Manager is a dirty, filthy liar. (I know, it sounds like the SQL guy is shifting the blame, but bear with me for a second.) On 64-bit boxes, this number is somewhat more accurate, but on 32-bit boxes, it’s just completely off-base.
To truly get an accurate picture of how much memory SQL Server is using, you need a tool like Process Explorer, and you need to identify all of SQL Server’s processes. In the server I’m showing at right, there’s two SQL Server instances (shown by sqlservr.exe), plus SQL Agent, SQL Browser, and SQL Server backup tools. It’s not unusual to also see SQL Server Analysis Services, Integration Services, and Reporting Services also running on the same server – all of which consume memory.
So how much memory is SQL using? I’ll make this easy for you.
SQL Server is using all of the memory. Period.
No matter how much memory you put in a system, SQL Server will use all it can get until it’s caching entire databases in memory and then some. This isn’t an accident, and there’s a good reason for it. SQL Server is a database: programmers store data in SQL Server, and then SQL Server manages writing that data to files on the hard drive. Programmers issue SELECT statements (yes, usually SELECT *) and SQL Server fetches the data back from the drives. The organization of files and drives is abstracted away from the programmers.
To improve performance, SQL Server caches data in memory. SQL Server doesn’t have a shared-disk model: only one server’s SQLserver.exe can touch the data files at any given time. SQL Server knows that once it reads a piece of data from the drives, that data isn’t changing unless SQL Server itself needs to update it. Data can be read into memory once and safely kept around forever. And I do mean forever – as long as SQL Server’s up, it can keep that same data in memory. If you have a server with enough memory to cache the entire database, SQL Server will do just that.
Memory makes up for a lot of sins like:
- Slow, cheap storage (like SATA hard drives and 1Gb iSCSI)
- Programs that needlessly retrieve too much data
- Databases that don’t have good indexes
- CPUs that can’t build query plans fast enough
Throw enough memory at these problems and they go away. By default, SQL Server assumes that its server exists for the sole purpose of hosting databases, so the default setting for memory is an unlimited maximum. (There are some version/edition restrictions, but let’s keep things simple for now.) This is a good thing; it means the default setting is covering up for sins. To find out if the server’s memory is effectively covering up sins, we have to do some investigation.
Is SQL Server Caching Data to Lessen IO Demands?
In my SQL Server Perfmon tutorial, one of the counters I recommend checking is SQL Server: Buffer Manager – Page Life Expectancy. This counter records how long SQL Server is able to cache data in memory, measured in seconds. Higher numbers are better. In the Perfmon poster several of us SQL Server experts produced, we suggest that this number shouldn’t dip below 300 seconds (5 minutes) for too long. Take that number with a grain of salt – we had to pick *some* number to use as a general guideline, but we can’t boil down tons of troubleshooting down to a single number. For example, there are situations like multi-terabyte data warehouses where we simply can’t cache more than a few minutes of data in memory no matter what.
Generally, though, if this number is below 300, the server might benefit from more memory. Added memory would let SQL Server cache data, thereby sending less read requests out to the storage. As you add more memory, the Page Life Expectancy counter should go up, and the Physical Disk: Average Reads/sec counter should go down. A nice side effect is that the Physical Disk: Average Sec/Read counter (aka latency) should also go down, because the less work we make our storage do, the faster it’s able to respond.
If all of these things are true, consider buying memory:
- Users are complaining about performance
- The total size of the MDF files on the SQL Server’s hard drives is more than 2x memory
- Page Life Expectancy is averaging under 300 during end user load times (typically weekday business hours)
- The server’s running a 64-bit version of Windows
- The server has 32GB of memory or less
- Additional memory will cost under $1,000
- You’ve got no SQL Server DBA on staff
I know, that’s a lot of qualifications, but I’m trying to give you a no-brainer window where the limited investment in memory is very likely to pay off in increased performance. 16GB of memory for most modern servers comes in at $500 or less, and can make an unbelievable performance difference on a SQL Server. I see a lot of SQL Servers running on boxes with just 4-16GB of memory, trying to support 100GB of databases, and the sysadmin just needs a quick, easy, and risk-free fix. Memory is usually that fix.
If Page Life Expectancy is already over 300 – say, in the tens of thousands – then SQL Server probably has enough memory to cache data. (DBA readers – yes, I know, I’m generalizing here. Give your poor sysadmins a break.) That doesn’t mean you can reduce the amount of memory in the server, either.
Is SQL Caching Queries to Ease CPU Pressure?
When end users request data, SQL Server has to compile an execution plan: a task list of which tables it needs to hit, what order it needs to hit them, and when to do operations like sorts and calculations. Your end users write some pretty dang bad queries, and execution plans can end up looking like the picture at right. Compiling an execution plan like this is hard work, and hard work means CPU time. When SQL Server is done building that plan, it says, “Whew! I’ll save that execution plan in memory, and if somebody sends in a query like that again later, I’ll be able to reuse this plan instead of building a new one.” To determine how much that’s helping SQL Server performance, check out the Perfmon counters for SQL Server: SQL Statistics – Batch Requests/sec and Compilations/sec. Batch Requests is the number of incoming queries, and Compilations is the number of new plans we had to build.
Microsoft’s SQL Customer Advisory Team’s Top SQL 2005 OLTP Performance Issues says that if Compilations/sec is more than 10% of Batch Requests/sec, you may be experiencing CPU pressure because SQL Server has to build execution plans. This one gets tricky, and frankly, it’s trickier than I want a sysadmin to hassle with. This rule just doesn’t work in too many cases because it ignores the quantity of work being done. If you’ve got a small number of queries coming in, and you’ve armed the server with big multi-core processors, then building execution plans is hardly any work at all even if you’re compiling every single statement from scratch. However, if Compilations/sec is 25% or higher relative to Batch Requests/sec, and if you’ve got in-house developers, it’s time to start asking questions. They’re probably using development tools like LINQ or dynamic SQL that can force SQL Server to build execution plans unnecessarily. We have to work around that by educating the developers, because no amount of memory is going to fix that problem.
Here’s where things really start to suck: if your developers are using those techniques, SQL Server is caching their execution plans – yet never actually reusing them. Your valuable memory is getting used to cache plans that will never be seen again – instead of caching data. Ouch. Thankfully, SQL Server has an “Optimize for Ad Hoc” setting we can enable so that we only cache execution plans after the second time they’re used. I don’t recommend sysadmins set this on their own, either, but I wanted to touch base on it just so you’re aware that there’s an easy fix. (I’m not saying that educating your developers to improve their code isn’t an easy fix. Okay, yeah, that’s exactly what I’m saying.)
How to Reduce SQL Server’s Memory Use (or Increase It)
If If you’re going to run other software on the server, you can set SQL Server’s maximum amount of memory to leave memory free for other applications.
Before we start, remember, memory is probably covering up for other sins. There’s a reason I put these instructions at the bottom of the post rather than the top. In most cases, reducing SQL Server’s memory footprint will increase your complaints from end users. This might be completely okay when dealing with infrastructure databases, though, so here we go.
Open SQL Server Management Studio. If you don’t have this installed on your local machine, you can remote desktop into the SQL Server. Upon opening SSMS, it will ask you what server to connect to, and that’s the name of the server you’ve RDP’d into. (In some cases like clusters and named instances, this gets more complicated.)
In the Authentication dropdown, choose Windows Authentication and it’ll use your domain credentials. I’m assuming you were the one who installed this SQL Server, or you’re getting lucky that someone added one of your groups into the admin group for this server – just because you’re a local admin or a domain admin doesn’t mean you’re an admin inside SQL Server. If you get a security error, you’ll need to do some legwork to find out who manages this SQL Server.
Click View, Object Explorer, and you should get something like what you see at right – a server name, then a bunch of stuff underneath. Right-click on the server name and click Properties. Click on the Memory section on the left hand side, and you’ll see:
- Use AWE to Allocate Memory – generally speaking, this should be checked on 32-bit servers with >4GB of memory, and unchecked the rest of the time.
- Minimum Server Memory – default is 0. If reading this article is your first exposure to SQL Server memory, don’t change this number. If this number has already been set at a higher number, ask around to find out who did it. I’ve seen a lot of cases where people set both min and max memory to the same number in an effort to get greedy. If that’s the case here, and you’re reducing the Maximum Server Memory number, then reduce the Minimum as well. If you’re increasing the Max, leave Min where it is.
- Maximum Server Memory – default is 2147483647. Many people just assume that’s two petabytes. Not true. It’s actually 214-748-3647, the phone number to Max’s Pizza in Dallas. I highly recommend the pepperoni. Anyway, here’s where things get a little tricky: if it’s still set to Max’s Pizza, and you’re trying to reduce the amount of memory SQL Server uses, then you can set it to anything you want. The lower you set it, the lower performance will generally go. If it’s already set to a different number, then someone’s been in before you. Often I’ve seen people start out with a certain amount of memory in the server – say, 8GB – and they set Max Server memory to a lower number – say, 4GB – to leave memory free for the OS to breathe. Later on, they add more memory to the server, but they forget to increase SQL Server’s Max Server Memory – so that memory just sits around unused.
Some of these changes (like AWE) will only take effect upon restart of the SQL Server service, while others (like decreasing Max Server Memory) will take effect instantly. SQL Server will not restart itself in order for the changes to take effect. This has pros and cons: it means you won’t get an accidental service outage, but it also means you might get a surprise the next time the service is restarted – your changes will suddenly take effect.
To learn more about how SQL Server memory works, check out these books:
- SQL Server 2008 Management in Action by Rod Colledge – for starter DBAs
- Professional SQL Server 2008 Internals and Troubleshooting – for more advanced DBAs who want to know what’s going on under the hood
Want to learn more? We’ve got video training. Our VMware, SANs, and Hardware for SQL Server DBAs Training Video is a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview:
Sure, the cloud is a cheap and easy way to spin up more servers and spin up more servers faster, but what do you do when performance problems hit? There’s no SAN team to blame and no infrastructure team to move you to a new server. This talk will expose some common performance complaints about moving SQL Server to the cloud and provide explanations and mitigations to keep you floating along on cloud 9.
This talk is for DBAs, developers, and managers who are considering moving their infrastructure into EC2. If you’re already running in EC2 and you haven’t solved some of your performance problems, this might help you out too, but I wouldn’t count on it.
To learn more, check out Jeremiah’s post on Configuring SQL Server in EC2. It covers who’s deploying EC2 VMs with SQL, what problems they’re running into, and how to improve performance.
The cloud is robust and reliable. The cloud solves all of our scaling needs. The cloud makes my poop smell like roses. While all of these statements are theoretically true it takes some effort to make them true in reality, especially when a database is involved.
Who Is Deploying SQL Server in EC2?
A question I hear a lot is, “Who is putting SQL Server into EC2?” Sometimes there’s a hint of incredulity in that question, hinting that people can’t seriously be deploying SQL Server into EC2. That’s far from the truth: many companies, large and small, are deploying SQL Server into Amazon’s cloud, with varying degrees of success.
Back to the question of who’s deploying SQL Server in EC2…
I work with a start up building a platform on a Microsoft stack. The front end is hosted on mobile devices, the middle tier is written in .NET, and the database is SQL Server. They’re just starting out and they don’t have the time to spend speccing out servers, ordering them, and waiting for delivery. It’s easy to spin up servers to test new features and functionality and if a feature doesn’t catch on the servers are turned back off. Traditional hosting facilities didn’t offer the the flexibility or response times that they were looking for, so they chose Amazon EC2.
Another company I work with is a ISV transitioning to a Software as a Service (SaaS) model. The ISV works in an industry where there is a lot of specialized knowledge and technical ability, but that knowledge and ability doesn’t have anything to do with keeping a server up and running. In the past they’ve added additional maintenance routines to the software that they sell to customers, but sometimes customers don’t have anywhere to host this mission critical software. The ISV has hosted a few customers on their own servers, but they don’t want to keep buying new hardware in response to customer demands – what happens when a customer leaves? The ISV hasn’t made the move to cloud computing yet, they’re feeling out the different options available, but EC2 provides a level of flexibility that they won’t get through traditional IT solutions.
What are some of the problems?
There are, of course, problems with every way that you can possibly deploy SQL Server. When you deploy on premises, you have to purchase hardware, wait for it to arrive, set it up, and then you have to maintain it. When you host your database with another hosting provider, you’re limited by the amount of server space they have available. When you host in EC2 there are a different set of problems.
One of the perceived problems with deploying into the cloud is the cost. You’re renting something by the hour (like that “friend” your uncle brought to Thanksgiving last year); and renting something by the hour can get really expensive when you want that something to be up and running all the time.
How much would it cost to keep a reasonable SQL Server running 365 days a year? $10,862.40 assuming that you’re using a Double Extra Large Hi-Memory instance (that’s 4x 2.61GHz Xeons with 34.2 GB of memory). You can really kick out the jams and get yourself a Quadruple Extra Large Hi-Memory instance (8x 2.66 GHz Xeons and 68.4GB of memory) for twice the price. Yeah, that’s expensive. Of course, you can also reserve the instance and just buy it outright for a fraction of that cost, but who wants to do that?
What would a similar server cost you from HP? You can get an HP ProLiant DL380 G7 with a pair of fancy pants Intel E5640 CPUs (that’s right around the same speed as the EC2 instance), 32GB of RAM, and 1.2TB of in chassis storage for about $8,600. That price doesn’t include an OS or any other licensing costs. It also doesn’t include power, cooling, or even a spare server sitting around ready to pick up the load if your primary server fails.
Storage isn’t tremendously expensive in Amazon’s cloud – 10 cents per GB per month of provisioned storage. Over the course of a year, 1 terabyte of storage is only $102.40 a month, and that storage is fairly redundant within a single data center.
Despite the cost, Amazon’s cloud is still incredibly popular with many businesses. Why? Simple: it’s easy to start with a small amount of resources and rapidly scale in response to customer demands. If something doesn’t pan out, you just stop the servers and delete the storage and you don’t have to worry about it anymore. The costs go away as soon as you stop using the compute cycles.
I used to live in an apartment. It wasn’t a great apartment. In fact, it wasn’t even a nice apartment. I could hear everything that my neighbors did. After a while, I knew about their girlfriends, their love of Super Tecmo Bowl, and I learned that they liked to listen to loud music on week nights when they didn’t have to work the next day.
My noisy neighbors made it difficult for me to get things done. When you’re sharing a host, noisy neighbors can make it difficult to get things done, too. This sort of thing can happen on a server right now – when one program is aggressive writing to disk, read performance will suffer. If you’re sharing a physical server with another guest operating system, you have no idea what that other OS is doing. It could be sitting there doing nothing, it could be chugging along doing 3d rendering, or it could be a BitTorrent server. You have no idea what’s going on in the next room and how it could be affecting your performance.
Unfortunately, there’s not a lot that you can do about noisy neighbors apart from moving. In the real world, you move to a new apartment or buy a house. It takes time and money to get a new place, but it’s feasible. In EC2 it’s a lot easier to get a new place: you just upgrade your instance. You pay more by the hour, but it’s easy to get a bigger place.
Crashes in general
To be fair, you can’t do much about crashes if you’re using someone else’s hosting service; when the data center loses power your server is going to go down. So, how do you protect yourself from crashes? Redundancy, of course.
Unfortunately, redundancy gets expensive fast. If you want a redundant mirror for that 10k a year server, you need a second 10k a year server. If you want multiple servers spread across the country, you’re paying for each one. Thankfully the cost of the servers includes the Windows license, but it’s still a large cost for many people to stomach.
SQL Server has a number of options to help you keep your servers up and running in case something horrible happens. Traditional database mirroring will work in EC2. You are free to implement whatever Rube Goldberg machine you want. It’s important to keep in mind that you have little control over your storage (there are no SAN snapshots or SAN mirroring that you can control) and there is no dedicated network that you can rely on. All of your connectivity goes over the open infrastructure within Amazon’s data centers. You’re fighting for bandwidth with Amazon, Netflix, Heroku, and everyone else in the cloud.
Limits: Number of Drives
How many drives can you connect to an installation of Windows locally? 26 or so, right? That’s how many letters there are in the alphabet. You can go beyond that using mount points to boost the amount of storage you have. For most purposes, though, you effectively can attach an unlimited number of drives to a Windows machine. Certainly more than you’ll need. SANs make it easy to create huge volumes, after all.
In EC2, that’s not quite the case. You can only attach 16 drives (Amazon call them volumes) to a server. It’s important to keep this limitation in mind when designing your strategy to move into Amazon’s cloud. If you need to store more than 16 terabytes of data, you’ll need to find a way to spread that data out across multiple EC2 instances. Considering some of what we’ll cover later on, this is going to be a really good idea.
Limits: Network Throughput
Did you know that there’s a pretty solid limit on network throughput in EC2? I didn’t either until I start performing some storage benchmarks (more on this soon). It turns out that there’s a single active gigabit connection into each host. That’s right: a gigabit connection per host. You might be sharing that server with a lot of other servers. If there are 4 other SQL Servers sharing the same gigabit connection, you could end up with some very poor performance.
Amazon recently added instances with 10 gigabit network cards, but they’re limited to Linux machines in the US-East region right now. In the future there might be instances with 10 gig connections that run Windows, but there’s nothing in EC2 right now to make the path between a Windows server and the storage any faster. To keep up to date on which instances have 10 gigabit ethernet, visit the list of EC2 Instance Types and search for “I/O Performance: Very High”.
In short, if your applications are chatty, you’re going to need to learn to break them apart into many smaller pieces to keep throughput high.
Limits: Instance Sizes
We just briefly touched on this in the last section – not every size of instance is available in every region. Typically, the US-East region gets the newest fanciest servers, at least as far as I can tell. Other regions slowly get new hardware over time. The name of the instance type (m1.small, m2.4xlarge) will stay the same over time, but it may not be possible to bring up instances of the same size in every region.
Just like any other computer, there are a finite number of resources available on any host. Likewise, the VMs that you spin up have a finite amount of resources available to them. Unlike your local data center, you have no control over which VMs share the same host.
Normally you could put different machines together and stagger their functionality around the clock so that guests that did bulk processing at night would be on the same host as machines that record orders during business hours. When everything is under your control, it’s very easy to make sure that any host isn’t overcommitted during a certain time window.
The only way that you can guarantee performance is to allocate larger instances that take up all of the physical resources of the host but cost more. By doing this you’re able to eliminate the effect of noisy neighbors. It’s important to understand how to get the best performance out of your resources.
If you want to keep data around for any length of time, it’s important to persist it somewhere. Persisting your data means that you have to write it and, if you aren’t writing it to your users’ computers you have to write it to the database. Hiding beneath all of that fancy database software is something as simple as a pile of disks for storing data in rows and tables. Making sure the disks are working as fast as they can is critical to keeping a database running at peak performance.
A Note About How Gigabit Ethernet Sucks and Infiniband is Your Friend
Gigabit ethernet sucks. If disk I/O is your absolute bottleneck, you only have two options – you need to write to as many drives as possible. We’ll find out why that isn’t entirely possible in EC2. The other option is to make the connection between your database server and the storage as fast as possible. Since you can’t get super fast storage in EC2, you’ll have to find another way to make your database fast.
Just How Fast is Gigabit Ethernet?
The question isn’t rhetorical. It’s important to understand how fast gigabit ethernet is. Gigabit ethernet can really only transfer, at most, 120 megabytes per second. That’s the theoretical maximum assuming that everything along the line is operating perfectly and there is no latency or additional hops to make.
We’re not likely to see this in the real world. You’re not likely to see this in your own server room and you’re sure as hell not going to see it in EC2.
Why is your storage going to be slow? You have keep in mind that not everything involved may be able to push data at 120 MB/s; the drives could be slow, fragment, or you could be sharing space with someone else who is doing a lot of reading and writing. Other people could be reading from the same host as you or they could be streaming movies from the host to a customer in the suburbs.
In an attempt to find an answer, I conducted several tests on an EC2 instance over the course of several days to answer the question, “Just how fast is gigabit ethernet?”
For all of the tests, I used the same m2.4xlarge instance. I installed it from a standard Windows Server 2008R2 SP1 AMI that Amazon supply. Nothing was changed with the operating system. This Windows was about as vanilla as it gets.
For my tests with the drives, I used EBS volumes that were allocated in complete 1TB chunks. I made two rounds of tests. In both cases, the drives were formatted with a 64k sector size. The first round of tests was done with drives that were quick formatted. The second round of tests were performed with drives that were not quick formatted. FYI, formatting 10TB of drives in EC2 takes about 2 days. I used a Large Windows instance to format the drives at a cost of about $26 in CPU time plus another $57.20 in I/O cost just to format the drives.
The first set of tests I ran involved using CrystalDiskMark. I used the default configuration and ran it across the quick formatted drives. Drives were configured with 1 EBS volume, 2 EBS volumes, 3 EBS volumes, and 4 EBS volumes. I conducted a second set of CrystalDiskMark tests using fully formatted drives with 1, 2, 3, 4, and 10 EBS volumes in a single drive.
The second set of tests was more complex and involved using SQLIO to simulate sequential and random reads from SQL Server over a period of time. All tests were performed on a single EBS volume, two striped EBS volumes, four striped EBS volumes, and a pair of striped four EBS volumes.
EC2 Storage Test Results
Right away, we can see that performance quickly spikes for sequential reads and writes. While the results for read decrease, this isn’t a reflect of the storage itself so much as it is a reflection of trying to cram a lot of data through a tiny ethernet pipe. Overall, the quick formatted drives don’t show a lot of performance improvement no matter how many spindles we throw at the problem.
There are some things that we can conclude about the performance of disks in Amazon’s platform. First and foremost: format your drives. Don’t just use a quick format; use the slow format. The downside is that it takes time to format disks and time is money, especially in the cloud.By using a full format, disk performance improved by 30% for 4K writes with a queue depth of 32 and improved, on the whole, by around 15%. There were a few anomalies – specifically the 4k write performance spike with 4 EBS volumes and the sequential read performance spike for 2 and 4 EBS volumes. The only conclusion I can come to is that reads were cached and did not hit disk.
The SQLIO tests show a different story. Right away we can see that sequential I/O performance peaks with a maximum of two volumes, but random reads and writes keep scaling right up until the tests stop – the more spindles that were used, the more random I/O that we could perform. And here’s where things get interesting again. Looking at the graphs you can see that sequential I/O falls apart a little bit north of 1000 IOPS. That’s not a limit of the underlying disks, we’re just maxing out the ethernet connection. When there’s a finite pipe limiting how fast we can read, of course the number of IOPS is going to stop pretty quickly.
Looking at the throughput in terms of the amount of data (rather than the number of reads and writes) that we can move through EC2, it’s very easy to see where we hit the ceiling. With a theoretical maximum of 120 MB/s, it’s easy to see that sequential read and write max out when two volumes are use. It’s clear, though, that random reads and writes continue to scale as we add more drives. I suspect that if I had been able to add more spindles, the random write performance would have continued to scale up to 120 MB/s.
What does this mean for SQL Server in EC2?
You can get the random I/O performance that you want, but sequential read/write performance is not going to be good. Taking that a step further, here’s what you don’t want to do in EC2:
- Data warehousing
- Large ETL
- Bulk data processing that requires sequential reads
- Anything else that requires a lot of sequential read and write
On the flip side, what should you use SQL Server in EC2 for?
- Off site processing that isn’t time sensitive
- Distributed write loads (sharding)
SQL Server hosted in EC2 is going to be very effective for highly random loads and especially for loads where data can be kept in memory (currently less than 68.4 GB).
A Sidebar about Ephemeral Storage
Don’t use it. It goes away and may never come back. Well, it goes away when your instance reboots.
Because it’s the drives that live in your server chassis. You can’t predict where your server is going to be when you turn it back on again, so you can’t predict if that data is going to stay with you or not. This is a best practice for virtualization, if you think about it. When you’re setting things up using VMware or Hyper-V you don’t want to attach long-term storage that’s in the server chassis, it makes your virtualization more brittle. Using local storage, if any thing happens (a motherboard fails, a drive fails, a CPU fan goes out, or memory fails) you’ve lost that host. You can’t bring that VM up on another server because the OS is still on hard drives in that broken server.
Local storage is great for temporary (ephemeral) things. Make sure that you take that into account when you’re designing your infrastructure. Fast, cheap, temporary storage is great for fast, cheap, temporary operations like TempDB or local scratch disks.
First Touch Penalty
Amazon mention in a lot of their documentation that there’s a penalty, and potentially a large one, for the first write to a block of storage. When you look at the numbers below you’ll realize that they weren’t kidding. Amazon further go on to claim that if you benchmark the storage (as I did) you’ll be horrified by the results (as I was) but that you shouldn’t worry because the first write penalty is only really obvious during benchmarking and will be amortized away for time in the real world.
Here’s where I have to call bullshit. If your workload consists almost entirely of writing new data, you’re going to see a first touch penalty until the drive is full. It won’t end. Writes will suck until your drive is full. Then you’ll add more drives and the suck will continue. This is compounded for relational databases where the currently accepted best practice is to ensure sequential writes by clustering on an arbitrarily increasing integer.
The only way to guarantee acceptable performance is to format your drives and write to them block by agonizing block until the storage is full formatted.
Staying Up in the Clouds – HA/DR Options
Clustering: Sad Trombone of the Cloud
You can’t cluster in EC2. Don’t try. There’s no shared storage. EBS volumes can only be attached to a single server at a time. You could share the drives, but you really shouldn’t do that. You could use some Windows based SAN system that mounts drives and pretends it’s a SAN, but without any guarantees of performance or longevity, why would you want to?
We’ll just leave this one as: You can’t cluster in EC2… with SQL Server 2008 R2
Changes in SQL Server 2012 make it possible to host a clustering using shared disks on SMB file shares. Whether or not that’s a good idea is completely up to you, but at least it’s a possibility.
Availability zones and regions
In case you didn’t know, Amazon’s magical cloud is divided up into regions. Each region is made up of multiple availability zones. They’re all right next door to each other, but they’re in separate buildings with separate internet access and separate power.
Availability zones exist to protect users from localized failure. Hopefully, if there’s a problem in a data center, it’s only going to be a problem in one data center (one availability zone). If that zone goes down, the others should be up and running. If you want to keep your server up and running through local failures, you need to make sure you keep a copy of your data in multiple availability zones. If you want to keep your business up and running in the event of a regional failure (the smoking crater scenario), you need to keep copies of your data in multiple regions.
Region – national level.
Availability zone – local level.
Got it? Good.
Mirroring SQL Server in EC2
One of the easier ways to keep things up and running in the event of a failure is to use SQL Server’s mirroring. Using synchronous mirroring with a witness is going to put you in a great place in terms of disaster recovery. It’s both possible and advisable to put the primary in one zone, the secondary in another, and the witness in a third zone. We can feasibly survive some pretty heavy duty hardship by configuring SQL Server mirroring in multiple availability zones. The applications talking to SQL Server need to be able to fail over between the mirrors but that should be pretty easy for today’s modern developers to handle, right guys?
SQL Server Replication in EC2
Replication is always an option and is probably a better option when you want some kind of geographic fail over. Instead of diving into contentious issues around designing a replication topology, I’m going to hit on the pros and cons of replication in the cloud.
We’ve already seen that there significant limits to network throughput in EC2. We’re limited to, at most, a gigabit of storage throughput. We’re also limited by our ability to write to disk in 64k chunks – only a little bit more than 1000 IOPS are possible at a 64k write size. These limitations become a problem for transactional replication. In an on premise set up, transaction log performance is frequently a problem when replication is involved. If you think that the performance of the transaction log won’t be an issue in the cloud, think again. The I/O characteristics of Elastic Block Storage guarantee that the performance of the transaction log will become a problem under heavy load.
Replication latency is another concern for DBAs in EC2. Not only do reads and writes occur slower, but that read/write slowness means that transactions will be committed to the subscriber further and further behind the publisher. Of course most applications aren’t writing constantly, but it’s important to consider what might happen if the subscriber isn’t able to commit the load that’s being written throughout the business day – will it be able to catch up after hours? If it can catch up today, will it be able to catch up tomorrow?
Crashes happen in EC2, entire regions have gone offline. Unfortunately, at any kind of scale something is bound to be broken right now. What do you do when that happens? What do you do when that happens to the distribution database? (You are running the distributor on a separate server, right?) What do you do when the distributor goes offline and never comes back up, drives corrupted irreparably?
I don’t have the answers, but that’s something your recovery model must take into account when you’re deploying into the cloud.
SQL Server Backups in the Cloud
You gotta keep that data safe, right? Taking backups and hoping for the best isn’t going to cut it once you’ve moved all of your data up into the cloud. You need a retention and recovery strategy.
EBS Data Protection
Amazon’s Elastic Block Storage (EBS) looks like some kind of SAN drives that you can attach to your server. When you look more closely at the disks themselves, they look like drives attached to some kind of storage device powered by Red Hat Enterprise Linux. I’m overusing the phrase “some kind” because you just don’t know what’s back there.
There are, however, some certainties around EBS. An EBS volume lives within a single availability zone. There are multiple drives involved and multiple block level copies of your data. There could be a failure of a back end drive and you’ll never know about it.
The downside of EBS is that there’s no global availability of your data. There is no SAN replication, no global single back up of your data. If you want that, you’ll need to look into using Amazon S3 to provide greater safety.
Be Safe with S3
Amazon Simple Storage Service is a very simple storage service indeed. It’s a block level data store that refers to big binary chunks of data by name. It’s like a filesystem that exists everywhere. Sort of. The important part of S3 is that your data can be persisted at a very paranoid level over a very long period of time. In their own words S3 is “Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.”
Why mention S3 at all? It’s not very fast and it’s not very easy to mount S3 storage to Windows. There are two things you can do with S3:
- Store SQL Server backups.
- Snapshot EBS volumes.
Storing SQL Server backups in S3 should be a no brainer for anyone. S3 storage is cheaper than EBS storage and it’s far more durable. Since S3 is global, it’s also possible to push data up to S3 from one region and pull it down in another. Your servers will live on.
S3 can also be used to store snapshots of EBS volumes. Don’t be fooled by the word “snapshot”. The first time you take a snapshot you will likely make a very bad joke about how this isn’t a snapshot, it’s an oil painting. Don’t do that.
The first EBS snapshot to S3 is going to be slow because there is a block level copy of the entire EBS volume happening in the background. All of your data is zipping off into the internet to be safely stored in a bucket somewhere. The upside is that only the blocks of data that get changed are going to get snapshotted. If you perform frequent snapshots then very little data will need to be sent to S3, snapshots will be faster, and recovery should be a breeze. Keep in mind, of course, that these are not crash safe SAN snapshots, so they shouldn’t be used for live databases. I’m sure you can think of cunning ways to work with this.
A Plan for Safe & Secure SQL Backups in EC2
What’s the safe way to manage backups in EC2? Carefully.
Were it up to me, here’s what you’d do:
- Backup your data to an EBS volume.
- Mount an S3 drive to your Server.
- Copy the data to the S3 drive using xcopy with the /J flag
Of course, if it really were up to me, you’d have mirroring set up with a primary in one availability zone, a secondary in another, and the witness in a third. Oh, and you’d be using synchronous mirroring.
Scaling SQL Server in EC2: Scaling Up
This is, after all, why we’re here, right? You’re not just reading this article because you want to learn about database backup best practices and how many IOPS some magical storage device can perform. You want to know how to give your SQL Server some more zip in EC2. There’s a finite limit to how much you can scale up SQL Server in EC2. That limit is currently 8 cores and 68.4GB of RAM. Those 8 cores currently are Intel Xeon X5550s that clock in at 2.66 GHz, but that might change by the time you’re reading this.
In EC2, there’s a limitation on how much we can scale up. Commodity hardware really does mean commodity hardware in this case. If you can’t scale up, the only thing left is to scale out.
Scaling SQL Server in EC2: Scaling Out
It’s incredibly easy to create more instances of SQL Server in EC2. Within a few clicks, you’re most of the way there. You can automate the process further using Amazon’s developer tools to script out additional configuration, attach local storage, and attach EBS storage. The developer tools make it a breeze to create and deploy your own custom images so you have pre-configured instances up and running. You can even use tools like Puppet or Chef to completely automate adding more servers to your environment.
Summing It Up
So, there you have it – it’s possible to get great performance for certain types of SQL Server operations in EC2. If you’re looking for a cheap way to set up a data warehouse, you should look elsewhere. You might be able to slap something together in EC2 that performs well enough, but there will be a lot of man power involved in making that solution work. If you’re looking for a quick way to scale your application up and out without worrying about buying hardware and making sure it’s being used appropriately, then you’ve come to the right place.
Want to read it later? Download the PDF.
Face it; your database doesn’t get much exercise. It spends its day sitting on the same old server in the same old rack eating garbage all day long. Sure once in a while it may go for a spin, but SSDs are chipping away at that. They may hop around in a VM farm, but for the most part they sit there lazily wasting away the day. What goes into the log? What settings affect the size of the log and what can you do to keep it lean? Join Tim in this 30-minute video to find out:
People who liked this also liked our library of free SQL Server videos.