It seems a like a stretch. After all, plumbers get called when sinks are running backwards and when toilets and drains are clogged. But, really, the users and developers do the same thing with DBAs that you do with a plumber – when something is broken, we get the call.
Unclog Your Database Fast
The main thing that plumbers do is fix immediate problems. When this is your job, you have to focus on diagnosing the problem, understanding the ramifications of different solutions, and coming up with the right solution. In databases, as in plumbing, we’re almost never called on to fix problems when we can do this at our leisure – there’s a mess that needs to be cleaned up quickly.
The first thing that we need is a set of tools. Don’t go scrolling down looking for scripts, you won’t find any. The first tool that you need is the ability to listen.
You might recall when Brent had to call a plumber to fix his garbage disposal. Based on Brent’s description, the plumber had a good idea of the problem and knew that there was a simple solution: just give the disposal a good crank. Afterwards, the plumber gave Brent advice on how to keep the problem from happening again. Brent happily wrote a check and the plumber made $1000 an hour for those 10 minutes. The plumber was able to use his listening skills to quickly diagnose a problem, pick the right tool, solve the problem, and leave the customer with tools to prevent the problem in the future. As a plus, by listening, he knew that he didn’t have to bring heavy tools upstairs, just a tiny wrenchette.
As a DBA, you need a similar set of tools – when there’s a problem you need to be able to understand what the problem is, come up with a solution, implement the solution, and watch to make sure things are working correctly. But you can’t just focus on the immediate problem; take some time after you’ve solved the immediate problem to focus on the system as a whole.
The Bigger Picture
While you’re busy unclogging an index, have you ever found yourself saying “You know, if this database design was different, this wouldn’t be an issue”? You’re not alone – your plumber wants your kids to stop flushing toys down the toilet. There’s always something else that needs to be fixed; whether it’s another quick fix or a larger architectural change.
Plumbers also have a different job: sometimes a plumber gets handed a set of plans for a building and is told to make it work. It doesn’t matter that the right parts may not be on hand, there’s a deadline. Walls need to be put in place and if the plumber isn’t done on time, the entire schedule is going to slip. In these situations, the plumber has to deal with the constraints of reality. There are some things that you can do without all of the parts in place – you can put plumbing in place for a washer and dryer without the appliances being on site. But you can’t put the washer and dryer in place and expect them to work without any plumbing.
When you’re designing a new system from the ground up, you don’t have the luxury of taking all the time in the world to build things The Right Way™. You focus on filling the customer’s specs with the equipment that you have on hand. Don’t have particular type of pipe fitting? You’ll either have to wait or devise a solution. This is how things end up different in the real world from the way they were devised on paper – implementation details are frequently more than just minor details. Those database architects who really screwed things up? They weren’t incompetent, they just had a deadline.
You call your plumber to fix clogs that are happening right now. Likewise, management is usually bringing you in because there’s a clogged database. You’re there to fix a problem (unclog the database) and get back to your regularly scheduled day. It’s tempting to bring up all of the design flaws but, unless you’re asked, you probably shouldn’t bring it up. Write them down, keep a list of database design issues in your wish list. You’ll feel better about it.
While you’re writing down all the big problems that you noticed in the database, make sure you write down how this problem could have been prevented. If it’s conducting stress tests to find missing indexes, talk to the development team about the tools they can use. If the developers need some query tuning help, get in there and help them with their query writing skills.
Fixing the Problem with Great Service
Users, management, and developers don’t want to know what they’re doing wrong. They want to know how it can be made right. Sure, there are problems; nobody is happy about it, but the users know that everything can’t be made perfect.
When you help the users make things better, show them the problem. Take the time to explain why it’s a problem – “the dimensions of this toy don’t mesh with the dimensions of your plumbing, it’s only natural that it get stuck”. Then explain how the problem can be prevented going forward.
Great plumbers don’t just explain how to solve a problem, they deliver great customer service while they’re doing it. A great plumber doesn’t make you feel like an idiot because your kids decide to have a toy soldier flushing contest. A great plumber fixes the problem, tells you it happens to everyone, and makes a joke with your kids about how flushing toys is how he became a plumber.
What About the Long Term Problems?
Every time you look at a system, write down all of the long term fixes that should be put in place. Keep a list of them around and build out a plan of attack for the best way to fix it with minimal disruption. If you do a good job of solving the immediate problems, there’s good chance that someone is going to bring you back to fix problems in the future.
Don’t think you need to be silent, though. Tell the users about things you can easily fix right now. If it isn’t much effort to add an extra index during an outage window, suggest the index and go on your merry way. The last time I had a plumber over, he suggested that I throw ice in the disposal once a week in order to keep the blades sharp and to free up any gunk that was floating around in there. Did he try to sell me another disposal? No. Is that plumber going to get my business the next time there’s a problem? You bet.
You’re a plumber. Be happy that you’re unclogging a drain today. Solve the problem you’re brought in to solve and there’s every chance you’ll get to implement the plumbing the next time around.
The best way to improve your skills as a database professional is to practice. If you’re like me, you might even take that practice outside of the workplace and do a bit of homework on the side. You can’t take a copy of your production database home with you, so what can you do? Thankfully, the fine people Microsoft have put together a set of databases that you can download and play with.
Introducing the AdventureWorks OLTP Sample Database
AdventureWorks is a transactional database for a fictional bicycle and outdoor supply company. It’s not very exciting data and it’s not terribly big data, but it gives developers a highly normalized schema to work with for testing different querying techniques. How Microsoft managed to cram all of the relevant SQL Server features into a 215MB database is beyond me. All that aside, AdventureWorks is the first place that database professionals should go to get started practicing their skills.
There are a number of downloads available for AdventureWorks, but only a few that matter to the OLTP crowd. To get started, database professionals should download one of the following:
- AdventureWorks 2012 Data File – this is just an MDF file. You’ll need to drop it into a folder and tell SSMS to create a log file.
- AdventureWorks 2012 data and log file – there’s an MDF and an LDF zipped up in a file.
- AdventureWorks 2012 OLTP Script – these scripts will create AdventureWorks from scratch.
For a local database, go with one of the first two options – just drop the files in their appropriate folders, attach the database, and then you’re off to the races. If you want something for Azure, use the script file to create the database and deploy data.
Protip: There’s usually a CS option for each of these. The CS lets you know it’s a case sensitive option. If you want to deal with international databases, pick the CS version.
A full list of AdventureWorks OLTP options is available at http://msftdbprodsamples.codeplex.com/. There are some light weight data warehouse options available, but they are still relatively small. If you’re just getting started, those are a good option (especially since there’s an SSAS tutorial available).
Introducing the Contoso Data Warehouse
Let’s say you want to do something a bit more adventurous and you want to branch out into the bigger BI world that Microsoft has to offer. The AdventureWorks data sets are a great starting place, but the data set size doesn’t pose many challenges.
The Contoso Retail DW data set is several times the size of AdventureWorks and comes as a pre-built star schema. While it’s not the biggest database (my copy is around 1.6GB), ContosoRetailDW provides a large enough data set where aspiring database professionals can really start to push the limits of a local SQL Server.
ContosoRetailDW is also a good place to try out Enterprise Edition features like table partitioning and ColumnStore indexes. The data set is large enough that it’s possible to make use of these features but it’s still small enough that you won’t need enterprise grade hardware to accomplish these things in a reasonable amount of time.
What Are You Waiting For?
Database professionals, this is your chance to start exploring the features and functionality that SQL Server has to offer. The sample databases from Microsoft give you plenty of opportunities to work with different techniques for both OLTP and data warehouse and even familiarize yourself with Azure without having to come up with your own data.
I’ve been told that an attendee at the PASS Summit pre-conference event asked about using dynamic SQL to get around some of the problems with filtered indexes. I’m not entirely sure what the question was, but it did give me the opportunity to play around with filtered indexes and write some simple demo code to illustrate just the kind of shenanigans that you can get up to.
Creating a Filtered Index
The first step for our demo is to create a filtered index. Grab yourself a copy of Adventure Works and get ready for some fun. Here’s our filtered index:
CREATE INDEX jp_filters_not_offline ON Sales.SalesOrderHeader (OrderDate) INCLUDE (SalesOrderID) WHERE OnlineOrderFlag = 1 ; GO
That creates our initial filtered index. We want to search by
OrderDate and find only those orders that were placed online. This is a realistic index to create in the AdventureWorks database, too – there are 27,659 online sales in my copy of AdventureWorks, but only 3,806 offline sales.
Want to prove that the index works? Run this SQL and take a look at the execution plan:
SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 ;
There’s No Problem Here!
Let’s create a pair of stored procedures:
CREATE PROCEDURE jp_online_only_date ( @OrderDate DATETIME ) AS BEGIN SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 AND OrderDate > @OrderDate END GO CREATE PROCEDURE jp_online_only_date_2 ( @OrderDate DATETIME ) AS BEGIN DECLARE @sql NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 AND OrderDate > ''' + CAST(@OrderDate AS NVARCHAR(MAX)) + ''';' EXEC sp_executesql @sql; END
If you were to run both of these, you’d notice that they both make use of the filtered index. What gives?
Both of these stored procedures are able to take advantage of the filtered index because the filter condition (
OnlineOrderFlag = 1) is matched by predicate in the query. In other words – SQL Server can easily make this match.
The Problem with Parameters
There’s a problem, though. When we switch over to using a parameter, SQL Server isn’t able to make effective use of the filtered index. Check it out:
CREATE PROCEDURE jp_online_only_params ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = @OnlineFlag AND OrderDate > @OrderDate ; END GO EXEC jp_online_only_params '2007-12-12', 0 ;
Oh no! This procedure doesn’t use our filtered index any more. There’s got to be a way to trick SQL Server into using our filtered index. We could resort to an index hint, but that would mean we have to keep using the same index or keep re-using the name of that index. I’m not a big fan of that, so let’s think of something else.
What if we were to use string concatenation? That sounds like magic, it has a lot of syllables.
Hooray for Strings!
We can solve the problem by forcing SQL Server to see the literal value that we’re passing in:
CREATE PROCEDURE jp_online_only_strings ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN DECLARE @sql AS NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = ' + CAST(@OnlineFlag AS NVARCHAR(1)); SET @sql += ' AND OrderDate > ''' + CAST(@OrderDate AS NVARCHAR(MAX)) + ''';' PRINT @sql EXEC sp_executesql @sql; END
If you were to run that stored procedure, you’d get effective use of the filtered index. SQL Server sees the
OnlineOrderFlag predicate as a literal value, matches that up to the index definition, and we’re off to the races.
There is one problem, though – because we’re also using a literal value for
OrderDate, there could be significant bloat in the plan cache. SQL Server will create one execution plan for every possible combination of parameters that we pass in. In AdventureWorks there are 1,124 distinct values in the
OrderDate column. That’s a lot of execution plans.
Avoiding Plan Cache Bloat
We want to avoid bloat in the execution plan cache. In order to do that, we’re going to need to parameterize part of the query but still leave the filter intact. That’s really easy to do:
CREATE PROCEDURE jp_online_mixed ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN DECLARE @sql AS NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = ' + CAST(@OnlineFlag AS NVARCHAR(1)); SET @sql += ' AND OrderDate > @order_date;' PRINT @sql EXEC sp_executesql @sql, N'@order_date DATETIME', @OrderDate; END
By adding the
@OnlineFlag as just another string, but still parameterizing the
@order_date we end up with only two execution plans. One plan will be created for each value of
@OnlineFlag, but we’ll still get use of the filtered index. This is all but impossible to accomplish with forced parameterization and difficult to accomplish with regular parameteriziation (usually you have to resort to index hints).
Summing It Up
Dynamic SQL and filtered indexes are a great match. By carefully making use of dynamic SQL, you can coerce SQL Server into behaving well and providing you with an execution plan that uses the filtered index. Through fully parameterized dynamic SQL you can get good plan re-use and avoid SQL injection attacks. By making careful use of dynamic SQL and filtered indexes, you should be able to get performance out of SQL Server without having to resort to tricks like index hints or plan guides.
What Are We Doing?
The goal of this presentation, and code sample, is to provide users a starting point for learning Solr.
This github repository contains the source code of a program that will load the data into a searchable format. As of this initial release, this will only work on the smaller Stack Exchange sites (dba, etc). Multi-file sites will fail, but this can be easily changed.
This repository also provides a set of starter queries to demonstrate some of the search functionality of Solr.
What Are We Not Doing?
This is not a detail discussion of Solr, Solr internals, installation processes, or search in general. I’m certain that somebody has covered that far better than I.
Getting the Data Dump
I used the September 2013 Stack Exchange public data dump.
To get yourself a copy:
- Find a BitTorrent client.
- Hit up the download link.
- Wait for a while.
- Unzip the files.
- Load them using the SolrLoader.
You’ll need to configure a separate Solr core (or just set up a completely new Solr). I did this by copying the Solr installation to a new folder and then renaming my Solr core to
dba.stackexchange.com. To save you some hassle, here’s what you need to do:
- Create a copy of Solr. You can do this by taking the Solr example (in the
examplefolder) and copying it into a new folder named
core.propertiesfrom this repository.
solrconfig.xmlwith their respective files from this repository.
- Find the
data/indexfolder. Delete everything in there.
- Decompress the Stack Overflow data dump of your choosing.
- Modify the
pathvalue to point to the location of the Stack Overflow data dump you extracted in the previous step. (I have no idea if this is going to work if you have spaces in the path… so… don’t have spaces in your path.)
Using the Query UI
In the Solr administrative web page, select the core we just created and then click on Query in the sidebar. This will take you to the Solr query GUI where you can set up sample queries until you’re blue in the face.
Your First Query
There are a lot of boxes in the query interface. We’re going to focus on the
q box to start with. Delete
*:* and drop
body:*brentozar.com* in there.
This will search for any document indexed in Solr where the
body field contains the text string
brentozar.com in any position. The
* characters are, as you’d expect, wildcards. You can put them anywhere in the text you’re searching for and Solr will do its best to answer your question. Be careful in your use of wildcards since they will make searches more expensive.
A More Complex Search
Let’s look for something a little more realistic – we’ll search for answers that have something to do with SQL Server replication or SQL Server Notification Services. We’ll pretend that we have a problem with both (which wouldn’t be surprising if we were using both).
q field, enter
body:*cation. The results should look something like this:
Scroll through the results and satisfy yourself that these are, indeed, the correct results. Let’s review the results section.
The responseHeader contains metadata about the search results. You can see the
params that you passed in to the search. For the curious, the
_ field is the MVCC document version that was current when you started the search. The
QTime field is the number of milliseconds that the query took to execute.
The response contains, you guessed it, the full response from the server. This will include the number of documents that were found, the start offset (if you’re paging), the documents that were found (if you asked for them), and any facets that you included in the query.
The Solr query UI will return all fields in the documents that it finds. Solr query tools (such as the HTTP interface or using solrnet) require that you specify a list of fields to be returned.
Narrowing Search Results
Let’s narrow down our search results – our initial search returned around 3,000 results, but that includes user descriptions, comments, and any number of other things tracked in the Stack Overflow schema. We’re really only concerned with the answers where replication or SQL Server Notification Services was the answer. Also, we are really only interested in highly rated answers.
In the query interface, let’s click in the
fq field and type
post_type:Answer. Now click the
+ button to add another field and then add
stack_score:[5 TO *] (make sure the
TO is capitalized or else Solr will throw an error).
fq field is used as a filtering query. The default Solr behavior is to score documents that are returned. Anything that is included in the
q field will influence that score. The more frequently a document’s body contains
*cation, the higher the score. If you simply want to filter a document, then use the
fq search terms instead. These are just filters, much like a
WHERE clause in your favorite RDBMS.
Here’s what that search looks like:
What About Facets?
I mentioned facets earlier. What are facets?
Facets are a way to break up search results into multiple categories, making it easier for users to drill down into search results. There’s usually a count of the number of results found in each of the facets. Many websites use faceted navigation to make it easy for users to narrow down their browsing to a particular category or set of features.
Let’s just facet out the scores for starters. Follow this URL http://localhost:8983/solr/dba.stackexchange.com/select?q=:&echoParams=none&rows=0&wt=json&indent=on&facet=true&facet.field=stack_score (or paste it into your browser). This will facet out the scores into a list that contains the score and a count of documents with that score. Breaking down the URL we have:
q=*:* # match any document echoParams=none # don't bother returning parameters rows=0 # don't bother returning data, either indent=on # make results pretty! facet=true # let's facet stuff, yeah! facet.field=stack_score # the faceting field.
Faceting by one filed is only moderately interesting. We can add additional parameters to make the facets more useful. This query lets us see a breakdown of scores by user: http://localhost:8983/solr/dba.stackexchange.com/select?q=body:*&rows=0&wt=json&indent=on&facet=true&facet.pivot=author_name,stack_score. The difference on this faceted query is that we’ve exchanged
facet.pivot. We’re still seeing facets, but now the faceting is first done on
author_name and then the results are further faceted by score. A count is provided for each top level facet (the
author_name field) and then a count is provided at each level of the pivot.
You can further enhance faceting by supplying a range, instead of faceting on individual field values. To do this, we add a
facet.range set of parameters to the end of the search. In this case, they might look like:
facet.range=stack_score # Create a range based on score facet.range.start=0 # Start the range at 0 facet.range.end=60 # Don't create ranges after 60 facet.range.gap=5 # the size of each range is 5
This produces the kind of faceting that makes it possible to show ranges of prices, volumes, or other range based buckets.
If you want to learn more about Solr, head over to http://lucene.apache.org/solr and browse the documentation. There’s a lot of guidance about operations, performance, and querying Solr that will help you get started using Solr quickly.
What was the interview like for the fine folks who went through it?
We’re big believers that one of most important things for a consultant is their ability to interact with customers. The type of work we do relies on communicating clearly with our customers, and we wanted to make sure that our interview process reflected that.
We also wanted to make sure we asked candidates what they thought about the job without outright saying “Hey, what would you think if we asked you to do this?”
Tell Me About You
When you’re a small company, there’s no escaping your co-workers… or your management. We asked the candidates a few questions to make sure that we were all on the same page.
The questions that we ask are no different than the questions that you’ll get asked on any other kind of job interview – What excites you? What do you dread? Knowing what motivates people is really important. If you dread looking at a new SQL Server every week, you’re not going to enjoy working with us.
There aren’t any right or wrong answers, there are just answers that told us more about the candidates. That’s why we also ask fun questions like “What was the last time you caused a production outage?” We’ve all broken things and sharing funny stories about the ways that we’ve broken production environments is always a good way to break the ice, both with candidates and when we’re working with clients.
What would you say… we do here?
We asked the candidates to tell us about us. Well, that’s not entirely fair. We first asked people why they wanted to work with us. There’s no right or wrong answer, but as an employer it’s helpful to know what motivates people. There are different ways to motivate different types of people; knowing why someone wants to join the team helps us understand how we can motivate an employee and keep them interested in the job. Hey, even the most exciting job can seem like a slog some days.
How would you feel if I told you that your job would include creating an 80 slide presentation every week? What if I added in that the recommendations would be custom for an individual client? How would that make you feel? Not everything we work on is a rocket science. Sure, we do our fair share of high end performance tuning, but a lot of what we do is design ways for our clients to solve their every day problems. We help clients understand and prioritize their solutions by creating presentations that summarize the key pain points and solutions to those problems.
If the thought of writing an 80 page presentation every week was something a candidate dreaded, we knew that we weren’t the right fit. Some people are passionate about digging into problems, building a plan, and then implementing the plan themselves. Other people like to teach.
What Didn’t We Ask?
What we didn’t do is ask candidates a lot of minutiae. While it’s important to know technical details about SQL Server, we know that you can look this information up online. With co-workers like Brent, Kendra, and Jes a smart answer is only a quick email away.
There’s nothing secret about our interview process. These are the kinds of questions that potential employees can expect to hear wherever they interview.
Was There A Technical Screen?
Oh yeah, we had a technical screen. After the first round of interviews, we discussed who should move on to the second round. Since we didn’t ask any of the candidates difficult technical questions in the first round of the process, we wanted a way to see how they worked under the usual types of pressure that we deal with.
We set up a pair of demo environments and invited Doug to take a look at them and tell us what was wrong. Once again, we weren’t looking for right or wrong answers; we wanted to see how Doug looked at each instance and get a feel for how he approached problems. In an amusing turn of events, one of our demo environments was accidentally corrupted and couldn’t even boot. We didn’t ask Doug to look at that because, let’s face it, when you find corruption you should just call Microsoft.
Throughout our process, there weren’t any wrong answers. We looked for people who were as excited about technology as we are, who share a passion for learning and growing, and who didn’t laugh when Brent broke his VM during the technical screen.
In the 1950s, the global economy saw a tremendous change – container ships revolutionized global commerce. Shipping costs got 36 times cheaper with the introduction of containerization. What if you could reduce operational costs and revolutionize application and database deployment in the same way?
Containers to the Future
In the last few months, the developer world has been excited about docker. Docker is a container system. Basically, it’s an easy way to do application deployments. Rather than deploy an entire VM, Docker lets developers deploy a consistent stack – the developers can set up individual services with independent configurations and deploy them.
Sounds like a contained database, right?
The best part about these containers is that, if you do it right, you might not need an OS under the covers. Or, if you do, it doesn’t matter which one! The container hold the configuration it needs to run. It doesn’t matter if you deploy on Linux, Solaris, or (maybe) even Windows – as long as that container knows what to do, you’re good.
With the database, should you really care which edition of SQL Server you’re running on as long as it supports the features you need?
Surely My Storage Is Safe From This Madness!
The storage game is rapidly changing. Not that long ago, traditional storage vendors required that you buy expensive and proprietary big systems. That part of the industry was disrupted by modular technology like the Dell EqualLogic. As storage has become cheaper, it has returned into the chassis in specialized servers; you can easily cram 24TB of SSD storage into a server with gear you can buy off the shelf.
Large scale storage is getting even more accessible: Seagate have announced their Kinetic Cloud Storage. It’s storage that uses an API, rather than an operating system. The drives feature new technology and Ethernet connectivity. This makes it possible for application developers to interact with the drives using an API rather than through a storage controller, cache tier, and storage tiering.
The idea might sound crazy, but third party libraries already exist that take advantage of this technology. Come launch time, developers will have drive simulators at their disposal, more libraries, and you’ll have a better idea of what this costs. You’ll be able to put 2.4 petabytes into a full rack of hardware. All without buying a complex storage appliance.
How Much Farther Can It Go?
Think about it:
- Applications can be deployed as containers.
- Storage will soon be deployed as containers.
- Databases are on their way to being deployed as containers.
Just as the cargo container had far reaching implications for the shipping industry, the software container has far reaching implications for our industry. The operating system and even the database platform become raw materials that are used to support the applications deployed on them.
What Does It Mean?
What’s it all mean for IT professionals? It means that the times, they are a changin’. If the application can be deployed as a container, there’s likely to be a decrease in managing the complexity of production applications. Once you can treat storage as an API, you change how storage administrators interact with storage. Over the next few years, containerization is going to change the way you manage the IT infrastructure.
Session state – we can all agree that it needs to be as durable as our application database, but does it need to be in the database? It’s common for teams to move from the ASP.NET in process session state provider to a SQL Server session state provider as soon as they move from a single web server to a web farm. Unfortunately, this means that the database server needs to scale with session state – even if session activity is overwhelming application activity.
Session state is inherently ephemeral – your application is storing what a users was doing at a particular moment in time. This is only necessary because the internet, specifically HTTP, is not stateful. If we were to manage state data in the browser, we would have to keep sending that information back and forth between the user and the server.
Rather than increase traffic, developers can choose to keep session data in a local data store. To start with, this is invariably an in process data store that is little more than a glorified cache living inside the web server. Once you’re successful, you need two web servers. At that point, you need somewhere else to store session state. Frequently, the SQL Server ends up as the session state server. This is a Bad Idea™.
Putting session state in SQL Server leads to a lot of problems. Every read to SQL Server refreshes a last updated column (this keeps the session alive while users are only reading data). A large amount of small writes can lead to
WRITELOG waits in SQL Server. But, beyond that, storing a lot of data in session state can lead to other problems since the session data can grow large enough to move off row (which can also lead to
WRITELOG waits). And, to top it all off, on very busy servers, session state activity can lead to waits on reading data out of the table.
In short – putting session state in SQL Server is a very bad idea.
What About Hekaton?
What about it? Even though Hekaton is being stored in memory, there are a number of features of Hekaton that make it unsuitable for session state when you compare Hekaton to other options. (For those that don’t know, Hekaton is the SQL Server in memory OLTP database that’s being released as part of SQL Server 2014.)
Hekaton isn’t a good fit because it ties up valuable SQL Server memory space for storing temporary state. Yes, Hekaton is fast. Yes, it avoids the locking problems. But if you want session state to persist between restarts, the log file is still involved and that can become a significant bottleneck. Hekaton tables also don’t support LOB storage mechanisms, so teams wanting to use this feature will have to write a custom session state provider.
When you’re already going down that route, why not use something else?
ASP.NET Session State Providers
Anyone could write their own session state provider – Microsoft document the API and it’s pretty easy to do. A number of developers have created session state providers for different databases. How would you choose between one of the many options available?
In the video, I cover different criteria like:
- High Availability
- Fault Tolerance
- Resource Utilization
There are many different reasons why you should choose one session state provider over another, including using the SQL Server session state provider. The most important consideration is matching a set of criteria up with your application requirements.
Changing the Session State Provider.
Let’s say you’ve decided to use Redis as a session state provider. How would you go about configuring your ASP.NET application to use it? As it turns out, changing the session state provider is as simple as changing a few lines in the
By default, your
web.config file will contain something like this line:
<sessionState mode="SQLServer" sqlConnectionString="data source=127.0.0.1; user id=<username>;password=<strongpassword>" cookieless="false" timeout="20" />
This just tells ASP.NET which SQL Server to use and provides a set of credentials for login. So far, this is pretty easy, right? How do we change the session state provider?
After you’ve downloaded or installed the right DLL, you can just change the relevant lines in the
<sessionState mode="Custom" customProvider="RedisSessionStateProvider"> <providers> <clear /> <add name="RedisSessionStateProvider" type="Harbour.RedisSessionStateStore.RedisSessionStateStoreProvider" host="my-redis-host.contosofroyo:6379" clientType="pooled" /> </providers> </sessionState>
That change will configure ASP.NET to immediately start using the
RedisSessionStateStore library for managing ASP.NET session state. Make sure you time the change with an outage, because that cut over will be swift and may lose user data.
…and they all lived happily ever after
It is very easy to get session state out of SQL Server – you just need to pick a new provider and set everything up. Different session state providers will have different requirements, so it’s important to think about what is involved in setting up one provider as opposed to another one. Not all session state backends are created equal – evaluate what you need from your session state provider and go from there. Developers, you can make a few simple changes today and earn the love, or at least grudging respect, of your database administrators.
You’ve been nursing your SQL Server through day after day of increasing load. Stored procedures are taking longer and longer to run. There are timeouts on the client side. Both customers and management are getting upset. As you’re bemoaning the terrible code that the developers wrote, it hits you: you don’t have a code problem, you have a scalability problem.
The Types of SQL Server Scalability Problems
It seems obvious to an outsider, but hardware has to be replaced on a regular basis. Between replacements, hardware is like code: it requires attention and maintenance to keep it running smoothly. As a DBA, it’s your job to pay as much attention to the database hardware as you do to the wait stats, slow queries, and missing indexes.
This doesn’t mean you can throw a monitoring package in place and walk away. Understanding how SQL Server and the underlying hardware respond under your application’s workload is important. Once you have a baseline on SQL Server and the hardware, you can easily tell if a problem is bad code or you need to call up your friendly hardware salesperson.
Database design is probably not your bailiwick; you might even say you’re not a database architect, you’re a database administrator. That may be, but you’ve also got the keys to the kingdom of database performance.
Through the DMVs you can tell which indexes have lock escalation problems, which files have slow reads, and even narrow these problems down to specific queries and times of day. Even if you can’t tell Boyce-Codd Normal Form from Backus-Naur Form, you have tools to help you identify problems at a physical level.
You probably want an example – if you add several new queries to the application and suddenly there are a lot more lock escalation attempts on a table, you can safely conclude that at least one of those queries would benefit from a new index.
I lied, sometimes the scalability problem is a code problem.
DBAs love to blame those pesky developers for causing problems. There’s some truth in that statement – developers introduce changes into an otherwise stable system. But when developers are adding new features and functionality, they can’t always know which indexes will be the best; after all, you didn’t have the resources to provide them with a full copy of the database, right?
Thankfully, you can track down top resource consuming queries very easily. Once you’ve identified those queries, you can either tune them or work with the developers to educate them about what they can do better the next time. All too often, time is the only issue that prevents things from being done correctly – developers are pressed for time to get a feature out, so they don’t spend as much time tuning code as they’d like. Help them out – find the bad queries and share the knowledge or just tune them yourself.
Outside of bad T-SQL, there are a number of patterns that can cause problems for database performance. So bad, in fact, that they’re going in a new section.
If you’ve thought that something didn’t smell quite right in the database, you were probably right. There are a number of ways to use SQL Server that work… for a while. I like to call these scalability anti-patterns. These anti-patterns work well as long as you can keep throwing more physical resources at the problem.
Anti-patterns can be difficult to identify as an anti-pattern – is it just bad code or are you seeing something worse? Over at ACM Queue there’s a guide 20 Obstacles to Scalability that lists patterns that will prevent you from moving forward. The most common patterns you’ll find are a lack of caching, serial processing, using the database as a queue, and full-text search.
Fixing anti-patterns will take a long time. These are architectural decisions that are baked into the entire application, not just the database. In order to fix these, you’re going to need to work with the developers, research new tools, and figure out how to implement the features
Solving the Problems
Easy Mode: Hardware
Let’s be honest, using money to solve your scalability problems isn’t a solution. Using money to solve scalability problems can often be a quick fix. If storage is slow, you can embark on a query tuning project or buy additional memory (or even SSDs). Although you’ll eventually need to tune those queries, you can buy yourself some breathing room by adding new hardware.
Don’t let new hardware lull you into a false sense of security. If you buy hardware to temporarily solve a problem, make sure you also record the performance problems and get time scheduled in the future to put a fix in place.
Medium Difficulty: Tuning
When you finally decide to tune your application, identify your biggest bottlenecks (if you don’t know, we can teach you how) to make sure that you’re tuning in the right place. You can’t tune T-SQL to make the network faster. Once you know what you’re trying to fix, identify the queries cause the most pain. Is memory a problem? Look for queries with large memory grants and tune them. Think about adding indexes.
Database performance tuning is as much about improving code as it is about making the database do less. Better indexes mean you have to sort less, but so does sorting in the application layer. Maybe you can pre-compute complex queries – even aggregations may be slightly out of date, there may not be a real-time requirement.
The Hard Stuff: Redesign
When hardware and tuning aren’t enough, it’s time to roll up your sleeves and redesign parts of the application. You’ll still need to identify bottlenecks, but you aren’t going to be tuning code.
Got a queue in the database? Figure out how you can move that queue based processing outside of SQL Server or process the queue in large batches rather than line by line.
Reporting against live data? Find ways to report against readable secondary servers or even report against a time delayed copy of the data in a data warehouse.
There are ways to redesign any feature for scalability. Tools and techniques have been documented in books, articles, blog posts, and conference talks. The question is no longer “How could you possibly do that?” The question is “Which technique are we going to use?”
What Should You Do?
For most of you, the answer is simple: find your bottleneck and decide if you can make a simple fix or if you need to devote more time to the problem. If you need more time, buy a bit of hardware to help you through. Don’t jump on the re-write bandwagon too early, but always keep ideas in the back of your head. And, above all else, make sure you’re solving the right problem with the right solution. Adding more processors won’t help if every query is wrapped in a
We work with SQL Server every day, happily sending and receiving millions of packets of data across the network. Have you stopped to think about what happens when something goes wrong?
It’s a regular Tuesday afternoon. Users are happily submitting orders into the shopping cart. After browsing around the website for a while, the users decide that it’s time to buy something. After clicking “Check out now”, the user goes through the check out process, enters their payment information, and then clicks “Buy some stuff!” A few milliseconds later, they get a message that their order was successful and everything is zipping over to their house.
It’s something that we do every day, right? Heck, it’s simple enough that I drew a little diagram:
What happens when something goes wrong?
There are a lot of places where things could go wrong – a browser bug could prevent form submission, the shopping cart application could go down, the ordering system could be down for maintenance, or SQL Server might even crash (but you’ve configured a solid HA system, so that ain’t gonna happen).
What’s left to fail? The network.
Everything works as it should – the user clicks “Buy some stuff!” and a message is sent to the application servers. The application servers do their magic, approve the credit card, build an order, and save the order in SQL Server. Immediately after SQL Server acknowledges the write and commits the transaction, but before any confirmation is sent to the user, the top of the rack switch power cycles. Randy, the operations intern, accidentally unplugged the power cord before plugging it back in.
This is the part that’s up to you. Sound off in the comments:
- Did the user’s write succeed or fail?
- What does SQL Server think?
- What does the application think?
- Did SQL Server do the right thing?
- What can you do about this sort of thing?
- How can your application handle failures like this?
Storage vendors brag about the IOPS that their hardware can provide. Cloud providers have offered guaranteed IOPS for a while now. It seems that no matter where we turn, we can’t get away from IOPS.
What Are You Measuring?
When someone says IOPS, what are they referring to? IOPS is an acronym for Input/Output Operations Per Second. It’s a measure of how many physical read/write operations a device can perform in one second.
The answer, as it turns out, is a resounding “maybe.”
Most storage vendors perform their IOPS measurements using a 4k block size, which is irrelevant for SQL Server workloads; remember that SQL Server reads data 64k at a time (mostly). Are you slowly getting the feeling that the shiny thing you bought is a piece of wood covered in aluminum foil?
Those 50,000 IOPS SSDs are really only going to give you 3,125 64KiB IOPS. And that 7,000 IOPS number that Amazon promised you? That’s in 16KiB IOPS. When you scale those numbers to 64KiB IOPS it works out to 1,750 64KiB IOPS for SQL Server RDS.
Latency vs IOPS
What about latency? Where does that fit in?
Latency is a measure of the duration between issuing a request and receiving a response. If you’ve ever played Counter-Strike, or just run
ping, you know about latency. Latency is what we blame when we have unpredictable response times, can’t get to google, or when I can’t manage to get a headshot because I’m terrible at shooters.
Why does latency matter for disks?
It takes time to spin a disk platter and it takes time to move the read/write head of a disk into position. This introduces latency into rotational hard drives. Rotational HDDs have great sequential read/write numbers, but terrible random read/write numbers for the simple reason that the laws of physics get in the way.
Even SSDs have latency, though. Within an SSD, a controller is responsible for a finite number of chips. Some SSDs have multiple controllers, some have only one. Either way, a controller can only pull data off of the device so fast. As requests queue up, latency can be introduced.
On busy systems, the PCI-Express bus can even become a bottleneck. The PCI-E bus is shared among I/O controllers, network controllers, and other expansion cards. If several of those devices are in use at the same time, it’s possible to see latency just from access to the PCI-E bus.
What could trigger PCI-E bottlenecks? A pair of high end PCI-E SSDs (TempDB) can theoretically produce more data than the PCI-E bus can transfer. When you use both PCI-E SSDs and Fibre Channel HBAs, it’s easy to run into situations that can introduce random latency into PCI-E performance.
What About Throughput?
Throughput is often measured as IOPS * operation size in bytes. So when you see that a disk is able to perform X IOPS or Y MB/s, you know what that number means – it’s a measure of capability, but not necessarily timeliness. You could get 4,000 MB/s delivered after a 500 millisecond delay.
Although throughput is a good indication of what you can actually expect from a disk under perfect lab test conditions, it’s still no good for measuring performance.
Amazon’s SQL Server RDS promise of 7,000 IOPS sounds great until you put it into perspective. 7,000 IOPS * 16KiB = 112,000 KiB per second – that’s roughly 100MBps. Or, as you or I might call it, 1 gigabit ethernet.
What Does Good Storage Look Like?
Measuring storage performance is tricky. IOPS and throughput are a measurement of activity, but there’s no measure of timeliness involved. Latency is a measure of timeliness, but it’s devoid of speed.
Combining IOPS, throughput, and latency numbers is a step in the right direction. It lets us combine activity (IOPS), throughput (MB/s), and performance (latency) to examine system performance.
Predictable latency is incredibly important for disk drives. If we have no idea how the disks will perform, we can’t predict application performance and have acceptable SLAs.
In their Systematic Look at EC2 I/O, Scalyr demonstrate that drive latency varies widely in EC2. While these numbers will vary across storage providers, keep in mind that latency is a very real thing and it can cause problems for shared storage and dedicated disks alike.
What Can We Do About IOPS and Latency?
The first step is to make sure we know what the numbers mean. Don’t hesitate to convert the vendor’s numbers into something relevant for your scenario. It’s easy enough to turn 4k IOPS into 64k IOPS or to convert IOPS into MB/s measurements. Once we’ve converted to an understandable metric, we can verify performance using SQLIO and compare the advertised numbers with real world numbers.
But to get the most out of our hardware, we need to make sure that we’re following best practices for SQL Server set up. Once we know that SQL Server is set up well, it’s also important to consider adding memory, carefully tuning indexes, and avoiding query anti-patterns.
Even though we can’t make storage faster, we can make storage do less work. In the end, making the storage do less gets the same results as making the storage faster.