I’ll never think “I’ve seen it all” when it comes to SQL Server– there’s just always someone waiting around the corner who’s found a weird new thing to do with it. But there are some things I really wish I could stop finding.
One of those things I hate finding is bad fillfactor settings. Fillfactor can be a useful tool to help performance, but it’s often a performance killer if you use it incorrectly.
A quick introduction: what is fillfactor, and how is it set?
“Fillfactor” is a setting for indexes in SQL Server. When you create or rebuild an index, you can tell SQL Server what percentage of each 8K data page used in the “leaf” level of the index it should fill up.
In other words, if you set a 90% fillfactor when you rebuild a clustered index, SQL Server will try to leave 10% of each leaf page empty. The empty space will be present on the page as it’s written on disk, and it’ll take up the same amount of space as it sits in memory.
By default, SQL Server uses a 100% fillfactor and tries to fill up all the pages in indexes as close to full as it can. Depending on how many rows actually fit on the page, your mileage may vary.
There are two ways to set fillfactor in SQL Server:
- At the SQL Server instance level using a sys.configurations setting for fill factor. (I don’t recommend using this option, for reasons I’ll describe below.)
- At an individual index level by specifying fillfactor when you create or rebuild the index.
Index maintenance jobs sometimes automate the second of these options. If you’re using an index maintenance stored procedure you picked up from around the web, you may be setting fillfactor without realizing it.
Why do people love to set fillfactor?
DBAs and developers often read that lowering the fillfactor improves performance by reducing page splits. Perhaps they’re trying to fix a performance problem, or perhaps they’re feeling paranoid. They either lower fillfactor too much on some indexes, or apply a fillfactor change to all indexes.
Here’s the scoop: it’s true that the default fillfactor of 100% isn’t always good. If I fill my pages to the brim, and then go back and need to insert a row onto that page, it won’t fit. To make the data fit and preserve the logical structure of the index, SQL Server will have to do a bunch of complicated things (a “bad” type of page split), including:
- Add a new page
- Move about half the data to the new page
- Mark the data that was moved on the old page so it’s not valid anymore
- Update page link pointers on existing pages to point to the new page
And yep, that’s a lot of work. It generates log records and causes extra IO. And yes, if you have this happen a lot, you might want to lower the fillfactor in that index a bit to help make it happen less often.
Where do people mess up fillfactor?
Here’s the thing: having a bunch of empty space on your data pages is ALSO bad for performance. Your data is more spread out so you probably have to read more pages into memory. You waste space in cache that’s just sitting there empty. That’s not only not awesome, it can be TERRIBLE in many cases.
This is particularly wasteful because not all indexes are prone to “bad” page splits. Let’s say I have a clustered index on an incrementing INT or BIGINT identity value. I insert loads of new rows and values are rarely updated or deleted. In this case I can fill my pages very full because I’m always putting new rows at the “end” of the index. Adding these new pages aren’t bad page splits– although unfortunately they are counted in the “page splits/sec” performance counter, which makes it very tricky to find just the “bad” splits.
I frequently find that people have put a fillfactor setting of 80 or below on all the indexes in a database. This can waste many GB of space on disk and in memory. This wasted space causes extra trips to storage, and the whole thing drags down the performance of your queries.
Best practices for setting fillfactor
Here’s some simple advice on how to set fillfactor safely:
- Don’t set the system wide value for fillfactor. It’s very unlikely that this will help your performance more than it hurts.
- Get a good index maintenance solution that checks index fragmentation and only acts on indexes that are fairly heavily fragmented. Have the solution log to a table. Look for indexes that are frequently fragmented. Consider lowering the fillfactor gradually on those individual indexes using a planned change to rebuild the index. When you first lower fillfactor, consider just going to 95 and reassessing the index after a week or two of maintenance running again. (Depending on your version and edition of SQL Server, the rebuild may need to be done offline. Reorganize can’t be used to set a new fillfactor.)
This second option may sound nitpicky, but in most environments it only takes a few minutes to figure out where you need to make a change. You can do it once a month. And it’s worth it– because nobody wants their database performance to slow down and realize that they’ve been causing extra IO by leaving many gigabytes of space in memory needlessly empty.
Now that we’ve covered the basics, how about those five things? Here’s the fine print on how fillfactor is implemented, and what it does and doesn’t impact.
1) Books Online warned you
The most painful thing about finding bad fillfactor settings is that the Microsoft team has tried to warn people about how bad fillfactor settings can hurt performance. Unfortunately, most people don’t seem to find the warning.
Check out this quote in Books Online: “For example, a fill factor value of 50 can cause database read performance to decrease by two times. “
That’s pretty clear, right? So if I set a fillfactor of 70 (when I don’t need it), I risk decreasing performance by 30%. That doesn’t sound great, either. Because of this delicate balance, follow the best practices above.
2) Fillfactor does not apply to heaps
The fillfactor setting only applies to indexes, not to all tables. If you have a table which does not have a clustered index, it’s called a “heap” in SQL Server. Heaps are weird in several ways. One of those ways is that fillfactor doesn’t apply– not even the fillfactor you set at the instance level. (Page splits don’t work the same way either, but this is not a good reason to have heaps.)
3) Fillfactor doesn’t impact new pages inserted at the end of an index
SQL Server only uses fillfactor when you’re creating, rebuilding, or reorganizing an index. It does not use fillfactor if it’s allocating a fresh new page at the end of the index.
Let’s look at the example of a clustered index where the key is an increasing INT identity value again. We’re just inserting rows and it’s adding new pages at the end of the index. The index was created with an 70% fillfactor (which maybe wasn’t a good idea). As inserts add new pages, those pages are filled as much as possible– likely over 70%. (It depends on the row size and how many can fit on the page.)
4) Fillfactor does not apply to LOB pages
Fillfactor applies to in-row data pages. When you create a table, depending on the data types, you have options as to when some large types get stored on-row, or off-row. When data is not stored in-row, fillfactor settings don’t apply to those special pages.
A general rule of thumb is that if you infrequently read large object columns, it’s better to keep those columns off-row. You will incur extra IO every time you need to fetch off-row data. but it keeps the frequently accessed in-row columns of your index more efficient.
5) Someone may have changed fillfactor without you realizing it
Once fillfactor is set on an index, it stays there. Further rebuilds or reorganizations of the index maintain that fillfactor unless you specify a different value. It’s easy for a change in fillfactor to sneak in. Unless you check for indexes with a fillfactor set, you might not realize what’s going on in your database.
Like anything else, there’s weird exceptions where in very rare cases, setting a super-low fillfactor on a very heavily updated table (which is probably small), can help reduce contention. These cases are very rare. Often there’s a better long term change in the application layer that would handle the issue.
Bottom line: If you find you’re using a low fill factor and you don’t know exactly why it was needed, always look carefully to see if you’re hurting performance more than helping it. And if you’re using the same fillfactor on all of your indexes, you’re probably wasting lots of valuable memory.
How to check your fillfactor today
Our free sp_BlitzIndex tool looks for all sorts of insanity in your indexes at the database level. It will snoop through your metadata and warn you about low fillfactor values, plus let you know exactly how big those indexes have grown. Check it out today.
We write a lot about best practices, we teach customers best practices, and we publish popular references for best practices about all sorts of things like performance monitoring, SQL Server Setup, and Availability Groups.
But the internet is a sea of information and many SQL Server instances have unique needs. How does a SQL developer or DBA decide what’s a best practice?
How Most People Define Best Practices
In the real world, the #1 definition of a Best Practice in effect this: “Something we’ve been doing for years without noticing any problems.”
That’s just how human nature works. We all have a set of things we do that are “normal”, and we usually put them into our mental Best Practice Bucket. If we start to have problems, we assume it’s not due to anything “normal”.
Unfortunately, human nature makes people persist all sorts of bad practices. I find everything in the wild from weekly reboots to crazy settings in Windows and SQL Server that damage performance and can cause outages. When I ask why the settings are in place, I usually hear a story that goes like this:
- Once upon a time, in a land far far away there was a problem
- The people of the land were very unhappy
- A bunch of changes were made
- Some of the changes were recommended by someone on the internet. We think.
- The problem went away
- The people of the land were happier
- We hunkered down and just hoped the problem would never come back
- The people of the land have been growing more and more unhappy over time again
Most of the time “best practices” are implemented to try and avoid pain rather than to configure things well. And most of the time they aren’t thought out in terms of long term performance. Most people haven’t really implemented any best practices, they’ve just reacted to situations.
Why Best Practices Can Get You Fired
There’s one thing you can count on: many of yesterday’s best practices aren’t true today. Maybe the change you made long ago was a best practice at the time (or at least not an unusual or bad practice), but it may be hurting you now.
It doesn’t matter if something used to be a best practice or not. If times and technology have changed and you haven’t kept up, that outdated best practice can still get you fired if you don’t learn to let go.
What You Should Do Today
We try to make it easy for people to find out where their configuration veers away from current best practices.
Here’s what you should do today: run our free tool that checks your SQL Servers for signs you’re not following a best practice.
Look carefully and critically at everything that it points out. Don’t dismiss any of the items by saying, “oh, but we need that because we’ve always done it that way.” If it’s something you’ve been doing a long time, question it even more.
Are you getting the most out of your TSQL queries? Most people still write queries like they’re using SQL Server 2000, but there’s lots of new functionality that can make your TSQL simpler to read, write, and support.
Kendra Little introduces you to five cool features that you should think about using to improve your SQL Server queries (along with the time the demo on that feature starts):
1) The glory of the OUTPUT clause – 4:53
2) Over and Partition By – 12:23
3) The APPLY operator for reusable computations – 16:56
4) EXCEPT and INTERSECT – 22:03
5) The Magic of indexed computed columns – 24:24
Looking for the scripts from the webcast? Scroll on down to below the video for the download link.
Scripts and Links
Need to check your database compatibility level? Our sp_Blitz script does that and much more.
Download demo scripts from the webcast here: 5-TSQL-Features-Youre-Missing-Out-On-Demo-Scripts.zip
Questions from the Webcast
Here are some great questions we got from folks during the webcast which I didn’t have time to answer live, but caught up with after the show.
From John on subqueries
Question: Is the difference with using a subquery vs Over/Partion is the subquery is processed for every row and the over isn’t?
Answer: SQL Server is too tricksy to make things so simple to remember. Subqueries won’t necessarily get processed repeatedly for every row as a general rule. The optimizer will do its darndest to rewrite things as efficiently as it can in the time it is allowed for optimization. The behavior depends on the query and the version of SQL Server.
Special things are definitely done when you bring in OVER, though. If you’d like to nerd out on how this works, good news! Paul White wrote a detailed post on this very topic. (Warning: brain melting may ensue.)
From Riley on CROSS APPLY vs CTEs
Question: How does the CROSS APPLY performance compare to using a CTE, where the Subtotal Plus Tax is computed in the CTE and then added to in the final SELECT? Do they generally result in the same plan?
Answer: As as general rule, I don’t expect CROSS APPLY and CTEs to get the same plans– I’ve seen many cases where they do not.
In this specific case for making computations reusable, the plans are very different. The APPLY operator lets you put in the calculation without a FROM clause or a correlation– it uses the same Compute Scalar operator that the original version of the query used. If you put the calculation in a CTE you have to essentially make it a subquery, and that’s not necessarily something SQL Server’s going to be able to normalize out. (I did a quick test of this particular scenario just to see if it was more clever than I thought in 2012, and it gets a much more expensive plan.)
From Donald on computed columns
Question: I thought Computed columns were a bad idea ?
Answer: Not at all! Like anything else, you can do something terrible with them, but they can also be absolutely brilliant by simplifying code or improving performance. Persisting and sometimes indexing the computed column can be very powerful when it’s a frequently read calculation.
From Grace on portability/ANSI standard
Question: Are these statements compatible with ANSI SQL or are they specific to MS T-SQL?
Answer: All my demos and comments here are specific to TSQL, specifically the Microsoft SQL Server implementation. Some of these are likely in the ANSI spec, but SQL Server very frequently doesn’t adhere to ANSI SQL specs– it likes to do it’s own thang. So I wouldn’t count on it!
From Mike on mikes
Question: Hey guys, what microphone setup do you use.. the sound is good. I moderate some PASS VC’s and have to get a new microphone…
Answer: Brent, Jeremiah and I all use Yeti Microphones. We love ‘em. Glad the sound is good live! It suffers a little in the recordings we publish of the webcasts because the audio is recorded through our online webcast service. For trainings we record at home, the Yeti really gets to shine.
Hungry for more on making queries better? Check out our TSQL Training.
Taking care of your health is incredibly important (says the marathon runner). Eating right, exercising, and getting plenty of rest help you stay healthy and perform optimally.
Your database server’s health is important to your company’s health. You need to take regular backups, check for consistency, and keep indexes tuned for optimal performance. Here are a few tips to make sure your database stays in top shape.
Back up System Databases
The master, model, and msdb databases on every SQL Server are integral parts of the system.
Master contains information such as logins, linked servers, and information about all other databases on the server. Without this database, SQL Server can’t start up. If you don’t have a current backup of this database and a disaster occurs, you lose this information.
Model is used as a template for every new database that is created on your system. There are certain default settings you may adjust, such as file sizes, autogrowth, and the recovery model. If you don’t take a backup of this and a disaster occurs, when you set up the server again you have to remember to reconfigure all of these settings, rather than restoring them.
Msdb contains all of the information about SQL Server Agent operators, jobs, and alerts. It also holds backup and restore history tables. If this isn’t backed up and a disaster occurs, you will need to recreate all of your Agent items. I had to do this once – it was one of the worst weeks in my first DBA job.
Ensure that your backup job covers these databases, because in the event of a disaster, you want to restore them rather than rebuilding all of that information.
Run DBCC CHECKDB On All of Your Databases
You need to run DBCC CHECKDB to check for consistency errors in your database –yes, corruption. If you’re not running this on system databases, or only running it on “important” databases, you run the risk of not finding corruption in a database.
As already explained, the system databases are incredibly important. Combine not running DBCC CHECKDB with not taking a backup, and you can have a disaster without a hardware failure of any sort.
Only checking those databases you feel are “important” is dangerous. Who defines “important”? Is archived data as important as live data? To the person that uses that information for forecasting and trending, it is. Is a system used for reporting once a quarter as important as production data? To the person that has to use those reports to file taxes, it is. If a database is truly “not important”, and isn’t being used, it should be backed up and removed from the server – not sitting out there, taking up space and resources.
Indexes are the closest thing to a turbo button that SQL Server has. Proper indexes on tables will help your select queries run faster, and won’t hinder the performance of inserts, updates, and deletes. But putting an index or four on a table when it is released to production, then doing nothing further, is not productive. Table structure, the data in the tables, and the queries from users and applications change over time.
As data is added, updated, and removed, the pages in the index will become fragmented. Fragmentation leads to poor performance. The more scattered across the disk the data is, the harder storage has to work to retrieve it. Regular, consistent reorganization and rebuilding of the indexes can combat this.
Are you wondering where to start? Ola Hallengren has a brilliant set of maintenance scripts, including index maintenance, which can help you out!
Keep Your Server Healthy
By performing regular maintenance on your SQL Servers, you’ll ensure that your data is secure and it is performing optimally. Much like you don’t want to wait until you suffer a heart attack to see a doctor, you don’t want to wait until a disaster to find the weak points in your server.
In our recent webcast on HA & DR basics for DBAs, we got a bunch of questions that couldn’t be answered fast enough. Here you go:
Q: Do you have book recommendations for replication, mirroring, and the other topics?
Absolutely. Check out the resources on our HA & DR Basics video page.
Q: Do any of these techniques work for companies with just 1-2 IT staff?
To keep things simple, I’d check out virtualization replication like VMware SRM. SRM isn’t necessarily easy, but if you use that one technique, you can protect all of your virtual machines no matter what they’re running. That’s easier than learning different HA/DR techniques for a lot of different applications.
Q: For 1,000 databases on an instance, what uses more resources – mirroring or log shipping?
Technically, the answer is going to be log shipping because you’re probably already doing transaction log backups on those databases anyway. However, when you’re running thousands of databases per instance, several other interesting issues arise. How do you react to newly added databases? Can you back up all 1,000 in a timely manner, or do you need to roll your own transaction log backup jobs to run simultaneously? What’s your RPO/RTO? There’s a lot of neat questions that determine the real answer at this scale. Feel free to contact us for consulting help – we’ve got clients who run as many as 10,000 databases per instance, and we can help you learn lessons the easy way instead of the hard way.
Q: What happens if my DR is in a different datacenter, on another domain?
Your life is harder. Many built-in solutions become more challenging, and even just getting your users to be able to connect to the database become more challenging. This is a scenario where I like to step back and ask, “What problem are we trying to solve by using different domains?”
Q: Do my HA and DR servers have to be on the same subnet or VLAN?
No, all of SQL Server’s HA/DR features work on different subnets and VLANs.
Q: Do you have to kill all SQL Server connections before restoring a database?
Yes. Restoring a database – even a similar copy – means starting over with all-new data. Restoring transaction logs means SQL Server will be jumping around all over inside the database, and it doesn’t have time to deal with your queries.
Q: Does clustering work for SSAS, SSIS, and SSRS?
Microsoft sells these things inside the same box as SQL Server, but for all practical purposes, you should consider them different products. They’re like free toys that just come in the same box, but they’re wildly different. The techniques you use to protect the SQL Server engine won’t be the same as the techniques you use for the rest.
Q: Can you read a database mirror?
Not directly, but you can take a snapshot of it and read the snapshot. I rarely see this used, though, because in order to refresh the snapshot, you have to drop it – and that means kicking everybody out of the snapshot. Users don’t seem to be a big fan of getting their queries killed.
Q: What Windows Server edition is needed for clustering and AlwaysOn Availability Groups?
You can use any version of Windows Server 2012, but if you’re still using Windows Server 2008R2, you’ll need Enterprise Edition.
Q: How do you get hosts in different data centers into the same cluster?
Clusters don’t need shared storage anymore. This is why I wrote the post Everything You Know About Clustering Is Wrong. Things really have changed a lot in the last several years around clustering.
Q: How well do these features work over a slow connection?
If you’ve got enough bandwidth to keep up with copying transaction log backups as they’re happening, you can stay up to date. If you can’t, they don’t work well.
Q: Can you install SQL Server Enterprise on Windows Standard?
Q: I just joined the webcast and it’s almost over. Can you send me the slides?
You’d be surprised how often I get this question. This is why we make our material available for free on our YouTube channel. I don’t usually share the slide decks, though, because I’ve had some nasty incidents of people taking my slide decks, plagiarizing them, and presenting them as their own material.
Q: I’m using an AlwaysOn Availability Group. If I fail over to an asynchronous replica, will there be data loss?
Probably. If your primary has been doing a lot of work, and hasn’t been able to send all of that work to the asynchronous secondary, you’re doing to lose that data. It’s up to you to monitor the DMVs to see how far behind the replicas are.
Q: Is (feature A) better than (feature B)?
No. They both suck.
Q: Seriously, I need an answer.
Getting to the right answer means knowing your RPO, RTO, business needs, staff capabilities, hardware, network throughput, and more. If you ask a stranger to guess the right answer for you with just five minutes of background information, odds are it’s going to be the wrong answer. You’re going to have to buckle down and do some research into the features and your business needs, or you can bring in an experienced outsider who does this for a living. This is what we do, and we can help you get to the right answer as quickly as possible, and probably for less money than you’d expect.
Q: Is there a way of automating failovers for SQL Agent jobs for HA purposes?
If you’re using clustering, then this is built in. If you’re using a solution that does failovers at the user database level, then things get trickier because you have to know which jobs you want to fail over. For example, you probably don’t want backup jobs failing over (because you can just run those jobs on all nodes), but you might want a few database-based jobs failing over. That’s left as an exercise for the reader, though, because every shop’s needs are different – often even different between database servers in the same shop.
Q: What’s the right quorum configuration for my cluster?
Q: Can I integrate clustering with (mirroring/log shipping/replication/AlwaysOn AGs)?
Clustering is probably the strongest feature in terms of interoperability with other features. It’s been around a long time, so it’s got a good interoperability story for pretty much everything.
Q: How can one second RTO be achieved in a DR scenario?
By having both the production and disaster recovery systems live at all times, accepting queries. At that point, it’s up to the application logic to try both servers to see which one is available fastest. Some applications do this by timing out their database connections after just one second. Other applications do it by trying both database servers simultaneously and taking the first available connection.
Q: Can you back up a SQL Server to SQL Azure?
Not in the sense that you can restore directly into SQL Azure (or as it’s known now, Windows Azure SQL Database.) However, you can use Windows Azure as a backup solution if you’re so inclined.
Q: Azure is a French word meaning blue. Americans are not pronouncing French correctly.
French is the most beautiful language in the world – when it’s spoken by the French. When the rest of us try it, we sound like we’re coughing up an unfiltered Marlboro. Thank you for your cheese, wine, and wonderful words like lingerie and ménage à trois. You are a beautiful country, and the rest of us apologize for the sounds we make when confronted with French.
Q: Can you cluster inside VMware?
Q: What are the best practice recommendations for achieving 99.999% uptime?
Obviously this is way beyond what I can answer quickly, but the best advice I can give is to remember that there’s three parts to every solution: people, process, and technology. You need written processes that cover exactly how to react to an outage, and you need to rehearse and improve those processes constantly. Car racing teams practice over and over to get the fastest pit stop possible so that their race car spends more time driving and less time sitting. DBAs need to do the same.
Q: What HA option is Brent’s personal favorite, and why?
I like failover clustering because it protects everything in the instance – jobs, logins, and the server name itself – with as little manual intervention as possible. It still has weaknesses in a few single points of failure, and it’s not completely easy to implement and manage. I think the benefits still outweigh the costs.
Writing queries with date and time data types can be tricky! Join Kendra to learn tips and best practices for selecting the right data types, choosing the best performing functions to round (or truncate) values, and how to avoid common pitfalls with dates and times in SQL Server. If you have one year of experience writing T-SQL queries, this free 30 minute webcast is for you.
Notes and Links
A quick note about datetimeoffset. This is one of the more confusing topics of the webcast, because figuring out how to make applications timezone aware is tricky. A rule of thumb: if you truly need to store the time zone “offset” of the value from where the data originated for legal or research purposes (“What time did the data appear to be from the perspective of the user?”), then this is the datatype for you.
For display purposes in most applications, it’s more efficient to normalize off the time zone preference of the individual user in a separate table, then store datetime in a single standardized timezone like UTC. Essentially, since people travel around and governments change timezones, persisting the information of what timezone someone was in when they did something is usually not worth it, because it’s usually not valuable information.
A great detailed discussion (with cartoons!) of circumstances where you might choose datetimeoffset vs datetime/datetime2 is in this StackOverflow question.
Poster downloads are over this way.
Without looking in your wallet, do you know how much cash you have? Most of us know within a few dollars. Now, without looking in your SQL Server, do you know much data is cached in memory? You probably don’t and that’s okay; you shouldn’t know how much data SQL Server is caching in memory. We can’t control how much data SQL Server is caching, but we can control how we cache data.
Different Types of Cache
There are a lot of different ways to approach caching. One of the most prevalent ways involves thinking about cache in two different levels (much like CPU cache): first level cache and second level cache.
First level cache is an immediate, short-lived cache that works within a single session to attempt to minimize database calls. Unfortunately, first level cache is only used for the duration of a current session or transaction (depending on your terminology). This is very short lived and it’s only useful to the current process. While helpful, first level cache has a limited scope.
There’s another type of cache: second level cache. Second level cache exists outside of the current process and can be shared between multiple transactions, processes, servers, or even applications. When we talk about adding cache to an application, we really mean second level cache.
A Bit of Cache
Even the most basic of ORMs have a little a bit of cache available. The first level cache is used as a short lived buffer to reduce the amount of work that the ORM has to do. First level cache is used for caching objects in the current transaction and query text. Although this cache can be helpful for the current process, this cache isn’t shared across multiple processes or even multiple database batches. If we want to have a more robust cache, we have to look elsewhere.
ORMs like Entity Framework or the LLBLGen Framework don’t have a second level cache. It’s up to developers to add a cache when and where they need it. This exposes developers to additional concerns like cache invalidation, cache updates, and query caching. All of these features and functionality may not be necessary, but that’s an acceptable trade off – it’s up to developers to implement cache features in ways that support application requirements.
Although it takes up developer time, building the second level cache yourself has the benefit of creating a cache that’s suited to the application’s requirements. For many application level features, this is good enough. It’s important, though, that developers pick a caching layer capable of meeting their operational requirements. Operational requirements include horizontal scalability, redundancy and fail over, recovery of cached data, or customizable cache expiration on an object-by-object basis.
These basic ORMs aren’t really all that basic – they have full features in other parts of the ORM, but they only offer basic support for automatic caching through the ORM.
A Lot of Cache
You’ve got memory. You want to use it to cache data. What’s the easiest way to do that?
One of the easiest approaches to adding caching to your application is to use a framework that supports it out of the box. A number of ORMs, including both Hibernate and NHibernate, provide this support. Enabling cache is easy – just change a few lines in a configuration file and the cache will be available to your application. Things start getting tricky, though, when you examine the richness of the caching that’s provided by these tools.
Power comes with a price. When you’re getting starting with tools like Hibernate or NHibernate, there’s a lot to take in and many developers overlook these features. Developers can choose on an object by object basis which caching strategy should be applied. Based on business requirements we can choose to treat certain cacheable objects as read only while others can be used as a read/write cache. Some objects can be cached while others bypass the secondary cache entirely – there’s a lot of complexity for developers to manage.
While this can be overwhelming, this flexibility serves a purpose – not all features of an application have the same requirements. Some features can serve old data to users, other features need to be up to the minute or up to the second. Giving developers the ability to make these choices means that there is a choice to be made. Even if it’s a difficult one, developers can choose how the application behaves and can tailor performance and functionality to business requirements.
Making the Choice
If you’ve already got an existing project and you’re planning on adding a caching layer, don’t think that you have to re-implement your data access layer just to get better support for caching. Both approaches have their benefits and it’s far more important to be aware of which data needs to be cached and the best way to cache it.
When people say “cloud”, they’re simplifying a lot of different solutions into a single catchphrase. Let’s break out the different options and compare them.
1. SQL Server in Amazon EC2 and Azure VMs
Amazon EC2 is a virtualization platform. Amazon buys servers, installs their secret sauce software, and rents you Windows virtual machines by the hour. Microsoft offers a similar product, Windows Azure Virtual Machines, that just went officially live.
You can rent a blank Windows VM without SQL Server installed, and then install SQL Server yourself just like you would on-premise. That’s a licensing mess, though – you have to use your existing SQL Server licenses or buy new ones for your VMs. That doesn’t make much financial sense. Instead, Amazon and Microsoft will rent you a Windows VM with SQL Server already configured, and your hourly fee includes the SQL Server licensing.
SQL Server runs just as it would in your own datacenter, which means you can use this as a disaster recovery option for your on-premise SQL Servers. You can do log shipping or database mirroring up to SQL in the cloud, running in Amazon EC2 or Microsoft Azure. When disaster strikes, fail over to your EC2/MS infrastructure, and you’re off and running.
The term “cloud” conjures up images of easy scalability and redundancy, but that’s not really the case here. We’re talking about a single virtual machine. This isn’t much different from running SQL Server in a VMware or Hyper-V guest in your own datacenter or in a colo datacenter. You can use all your traditional tools and techniques to manage SQL Server, which is both a pro and a con. If you need to patch it or scale out to multiple servers, there’s no tools included here. I still consider this the cloud, though, because the infrastructure and licensing are managed by somebody else. It’s easy to get started with one – or a hundred – virtual SQL Servers with no initial investment.
This method is the most conventional, and as I explain other options, I’m going to move from conventional to really-far-out-there. SQL in EC2 or MS VMs just works – it’s easy to understand and leverage without changing your code or your techniques – but it doesn’t bring a lot of the cloud’s benefits.
2. Amazon RDS for SQL Server
Instead of running SQL Server in an EC2 virtual machine, let’s start giving up a little bit of control in order to get more of cloud benefits. The next layer of clouds is Amazon Relational Database Service (RDS). Here, Amazon builds a Windows VM, installs SQL Server, configures it, and manages both Windows and the SQL Server service for you.
This is still the real, true blue SQL Server you know and love – all of your commands still work exactly the same as you’re used to, as long as you don’t try to access the server’s local drives directly. (Example: you can’t upload a flat file to the server’s C drive and then try to BCP data from that file into SQL Server.)
Amazon RDS is kinda like they’re the DBA, and you’re a very, very powerful developer. You can create and drop databases and users, but you can’t Remote Desktop into the SQL Server, nor can you access the drives.
Amazon RDS takes a few job duties away from you:
- Storage management – want faster storage? Just pick (and pay for) more IOPs. There’s no arguing with the SAN guy.
- Monitoring – Amazon CloudWatch tracks performance metrics and sends you emails when they’re outside of your thresholds.
- Patching – You pick the major/minor versions you want and when you want patches applied.
- Backups – You pick a time window for the full backups, and Amazon manages it using storage snapshots. You can restore from snapshots just by pointing and clicking in the management console.
But herein lies the first real compromise: you can’t restore from anything except snapshots. You can’t upload your own database backup file and restore it. To get data into Amazon RDS, you’ll want to export it to a file, upload that file to Amazon S3 (cloud-based file storage), and then import it. This also means you can’t use Amazon RDS as a participant in log shipping or database mirroring.
Microsoft doesn’t have a competitor to Amazon RDS for SQL Server today. Well, I say that, but some shops already manage their SQL Servers this way – they have an internal admin team that manages Windows and SQL. Departments get access to create & drop databases, change code, etc, but they don’t get access to the server’s desktop or backups. This doesn’t really compete with Amazon RDS, though – RDS is for companies who are too small to have this kind of internal engineering infrastructure. (Or for companies that want to get rid of this large engineering burden, I suppose.)
3. VMware vFabric Data Director
vFabric Data Director is a lot like running Amazon RDS in your own datacenter, but you can control the base Windows virtual machines. You build a Windows template to be used by default whenever a new SQL Server is created. VMware vFabric manages the implementation details for backups and high availability.
vFabric also supports Oracle, PostgreSQL, and Hadoop for a single pane of glass to create and manage your database servers. When someone in your company wants a new database instance, the sysadmins open up vFabric, configure it, and within a few minutes, it’s up and running.
vFabric makes sense for 100% virtualization shops who aren’t interested in moving their databases up to the cloud, but they want easier database management integrated into their virtualization tools.
Microsoft doesn’t have a competitor to VMware vFabric Data Director today. In theory, you could build your own alternative using System Center and a whole lotta scripting. That is left as an exercise for the reader.
4. Microsoft Windows Azure SQL Database (WASD)
The artist formerly known as SQL Azure takes the least conventional approach of all. While it’s technically built on Microsoft SQL Server, that’s like saying your pants are built on your underpants. They both cover your back end, but you can’t use them interchangeably.
Microsoft essentially built a new product designed for common database storage requirements. Like your underpants, you only get a minimum of feature and command coverage here. The new features and commands Microsoft has been adding to the boxed product for the last couple of versions just aren’t available in WASD including partitioning, Resource Governor, Service Broker, and CLR. But hey – are you really using those anyway? Most folks aren’t.
Rather than building large databases, WASD encourages developers to shard out their data across multiple smaller databases. While on-premise SQL Servers have had techniques to do this in the past, Microsoft started over and developed a new technique that makes more sense for cloud implementations. Again, though – we’re talking about a difference from the boxed product, something else that developers have to learn differently. As we’ve gone up this ladder into the clouds, we’ve been handling our problems differently. WASD’s partitioning technique is a good reminder that once you’ve gone this far up into the cloud, you’re dealing with something very different from SQL Server. You’re not going to take large volumes of code written for SQL Server and simply point them at WASD – you’re going to be doing a lot of testing and code changes.
Like Amazon RDS, there’s no backup/restore functionality here to get your existing data into the cloud. You’ll be exporting your data to a file, loading it into the cloud, and then…staying there. There’s no database mirroring or log shipping to/from Azure SQL Databases to on-premise SQL Servers.
Which One is Right for You?
If you’ve got an existing app, and you just want to cut costs without changing code, you can get started today with any of the first three options. They support the same T-SQL commands and datatypes you already know and love.
If you’ve got an existing app, and you’re willing to make code changes & do testing, you might be able to save even more plus gain new flexibility by going to Windows Azure SQL Database. In most cases, though, the cost savings won’t come anywhere near the costs required for the code changes and code review. We haven’t seen a case yet where the tradeoff made sense for our clients. I’m sure it’s out there – the perfect example would be a small amount of code that the developers know very well, can easily modify, and produces large server loads in short bursts.
If you’re building a new app from scratch, then let’s take a step back and survey the database industry as a whole. This is an incredible time to be in databases and there’s a bunch of really good options we didn’t even cover here.
I’m doing a 1-hour open Q&A session on May 8th for the PASS Virtualization Virtual Chapter. Bring your VMware and Hyper-V questions about setup, performance, management, monitoring, or whatever, and I’ll answer ‘em.
You can even get a head start here – post your questions in the comments below, and I’ll build slides to answer ‘em ahead of time. That way you can make sure you get the best answer possible. (Well, from me anyway, ha ha ho ho.)
Then come join us on the webcast and hear the answers. See you there!
I joined RunAsRadio’s Richard Campbell to talk about how SQL Server 2012 adoption is coming along, how I do performance tuning, why the cloud hasn’t been catching on with most businesses, and why DBAs need to care about business metrics. Listen up here.
I also wrote about Being Invincible with SQL Server 2012 AlwaysOn Availability Groups for SQL Server Pro. You can preview the article here if you’re not a subscriber. I had a ton of fun with this article – lots of superhero and Superman references.