It’s coming…the next version of SQL Server. What surprises are in store for you? “In-memory” is a hot buzzword. Windows Azure capabilities are expanding. Data Explorer is being talked about. Jes will give you an idea of some of the new features and enhancements coming next!
Get an overview of new backup capabilities, integration with Windows Azure, new capabilities in Resource Governor, incremental statistics, and an overview of In-Memory OLTP (the artist formerly known as Hekaton)!
One question came up several times: which features will be Enterprise Edition only, and which will be available for all versions? As of today, Microsoft hasn’t released that information – but keep an eye out for “Features Supported by the Editions of SQL Server 2014″ on msdn.microsoft.com.
It was a dark and stormy… Oh, wrong story. It was actually a warm, sunny afternoon in Charlotte, NC. I was presenting “Index Methods You’re Not Using” at PASS Summit. In this talk, I discussed how indexed views, filtered indexes, and compressed indexes can improve your query performance by reducing I/O.
From stage right, an intrepid audience member raised his hand and asked, “Can you think of an example of when you would use filtered indexes instead of partitioning?”
This question left me speechless (temporarily).
My first thought was one of requirements and practicality: “What if you have Standard Edition? You can’t use table partitioning, but you could create an indexed view” I said.
The better question would have been, “What are you trying to accomplish?”
On the surface, filtered indexes and partitioning may appear similar: both are designed to hold only portions of a table’s data in separate structures. However, how they go about doing this, the use cases for each, and the requirements for each are very different.
Filtered indexes store only a portion of the data in a table. For example, in an orders table, instead of storing all orders in a nonclustered index on the “quantity” field, we could create a filtered index to only store orders with a quantity greater than or equal to 20, or between 10 and 19. In this case, the only column stored in the index leaf pages is the quantity (and the clustered index key).
Sample of data from the table
What a nonclustered index on Quantity would store
What a nonclustered index on quantity, filtered to include quantity between 10 and 19
Let’s say the data in the underlying table changes – someone wants to increase their order quantity from 750 to 1,250. The row is updated at the data level – the clustered index. At the same time, the nonclustered index must also be updated.
Table partitioning breaks the whole of the data into separate physical structure based on a partitioning key. It’s generally used to improve query performance or make administration of large tables easier. Partitioning is based on a schema and a function. The table contains a partitioning key – the field in the table that contains the value you’ll split your partitions on. Most commonly, this is a date field of one sort or another – tables are partitioned by day, month, or year.
To directly compare this to a filtered index: could you use an order quantity as a partitioning key? Yes, as long as it’s a valid data type. The difference is that the partition is going to store all columns from the table – not just the key. When you insert or update a row, SQL Server would have to look at that value to determine which partition to place it on. It would then perform the update on the appropriate partition.
A table, partitioned on quantity – 1-10, 11-20, 21-30, etc
Want to know more about partitioning? Start here.
Comparing How They’re Used
When it comes to reading the data, if you have a filtered index for a specific value, and the query optimizer can use that index, you can often reduce I/O by orders of magnitude because you are storing less of the data in the index itself. With partitioning, SQL Server has to determine which partition the data is stored on, then access it. This can be helpful on very, very large tables – but the care and time taken to implement it and the upkeep required mean it must be very carefully considered and maintained.
Which Should I Choose?
There are many features in SQL Server that may appear similar, but once you look at how they work and what they do, they are very different. Database mirroring is not the same as replication. Change Data Capture is not the same as Change Tracking. Filtered indexes and partitioning solve two very different problems. When considering implementing any new feature, stop to ask, “What is the problem I am trying to solve?”
You’ve heard the term ‘parameter sniffing’, but you’re always a little bit fuzzy on the details. Is this a bug in SQL Server? Could it be happening to you? Join Kendra in this free 30 minute video to learn the truth about this complex and often confusing feature in SQL Server.
Download demo scripts from the webcast here– the script contains commands that are only safe for test environments (not production): Parameter Sniffing Demo.sql.
Views are logical objects in SQL Server databases that present you with a “virtual table”. Views are typically created for one of three reasons: security, simplification, or aggregation.
- Security: we create views so that a user can read specific columns out of certain tables, but not all the data.
- Simplification: sometimes, it’s easier to write a SELECT against one view than a SELECT against multiple tables, with joins – repeatedly.
- Aggregation: again, it can be easier to SELECT already-aggregated data than to write the query – repeatedly.
The down side to views is that when you query them, you’re still reading data from all of the underlying tables. This can result in large amounts of I/O, depending on the number and size of tables. However, you can create a unique clustered index on the view – referred to as an indexed view – to persist the data on disk. This index can then be used for reads, reducing the amount of I/O.
There are some limitations to indexed views, but when you can create one and it improves performance, it really improves performance. But, as with all features in SQL Server, indexed views aren’t the answer to everything. Here’s a quick look at some things you can and can’t do with them.
The view definition can reference one or more tables in the same database.
Once the unique clustered index is created, additional nonclustered indexes can be created against the view.
You can update the data in the underlying tables – including inserts, updates, deletes, and even truncates.
The view definition can’t reference other views, or tables in other databases.
It can’t contain COUNT, MIN, MAX, TOP, outer joins, or a few other keywords or elements.
You can’t modify the underlying tables and columns. The view is created with the WITH SCHEMABINDING option.
You can’t always predict what the query optimizer will do. If you’re using Enterprise Edition, it will automatically consider the unique clustered index as an option for a query – but if it finds a “better” index, that will be used. You could force the optimizer to use the index through the WITH NOEXPAND hint – but be cautious when using any hint.
Choose the Right Tool
If you’re looking to have a complicated, aggregated query persisted to disk to reduce I/O, an indexed view may be the right tool for your job. Test it as an option, and if it works, put it to use!
Learn more about indexed views in my video The Okapis of SQL Server Indexes!
Back in 2009 (wow, seems like only yesterday!), I wrote about designing a recovery strategy for Stack Overflow. Back then, I wrote:
With these answers in mind, Stack Overflow’s decisions not to do transaction log backups, offsite log shipping, database mirroring, and so on make good business sense. Us geeks in the crowd may not like it, and we might demand the latest and greatest in backup & recovery technology, but at the same time we want Stack Overflow to remain free. As their volunteer DBA, I’d love to do 24×7 log shipping or database mirroring to a secondary server at another colo facility – but I wouldn’t be willing to pay out of my own pocket for expenses like that.
Today, the situation is totally different. They’re in the top 50 web networks, 4.4 million users, a job posting network, dozens of crazy smart employees, and I’m not even close to a volunteer DBA for them anymore. (Heck, we’ve even paid to advertise on Stack Exchange.) I hop into the company chat rooms now and then, and I swing by the offices whenever I’m in New York, but these guys don’t need me. I jump in whenever I can for fun, because it really is fun working with engineers this sharp.
That means this time, I’m blogging about designing Stack’s recovery strategy more as an outsider’s perspective. I know you folks like reading real-life case studies, and Stack’s pretty open about their infrastructure, so this will be fun for all.
What Changed? Downtime became more expensive.
If you’ve looked at the Stack Exchange team page, you’ll notice dozens of people with the word “sales” in their job title. Stack now sells ads on the Q&A sites, plus sells job postings to companies on Careers.StackOverflow.com.
There’s real money going through the network now, and downtime starts to cost more money. If the sites are down, people may go back to Google, get their answers from another site <shudder>, and there goes some ad revenue.
This meant that at least a few databases – for ads and careers – we needed to do full recovery mode in SQL Server, and start doing transaction log backups. This didn’t start across-the-board – it started only with the most high-value databases.
As the company grew, the relative cost of standby SQL Servers in a different location started to drop. Downtime seemed more expensive, and interestingly, the actual price of the standby SQL Servers started to drop. As Stack Exchange added more systems administrators, it wasn’t really much extra work for these guys to manage a few extra database servers in other locations. And as long as we’ve got extra database servers somewhere else, kept up to date with the most recent data, we might as well put ‘em to use.
What else Changed? We Scaled Out.
Stack Exchange’s API lets the public run queries against the databases in real time (and starting with API v2.1, they can even write). For a demo of how it works (but not using the live database), try out Data.StackExchange.com. For example, here’s the most recent 10 questions from Stack Overflow:
Yes, Virginia, that’s really an Execution Plan tab at the bottom with the actual plan from your query:
Like many of Stack Exchange’s tools, Data Explorer is completely open source, so you can install this same tool in your own environment if you’d like to let internal power users query your databases without having to install SQL Server Management Studio or a reporting tool.
Enter SQL Server 2012′s AlwaysOn Availability Groups
SQL Server 2012′s AlwaysOn Availability Groups allow for multiple replicas to serve 2 purposes at Stack Exchange: easier failover to remote data centers with minimal data loss, and read-only capabilities out of those remote data centers.
I’m a huge fan of AlwaysOn AGs, but they’re like the opposite of “the easy button.” Sure, you can offload read-only queries, backups, and DBCCs to secondary replicas, but you also have to introduce a lot of new dependencies like clustering, Windows Active Directory, heartbeats, and quorums. After Stack and a few of my other clients went live with the early 2012 versions, I started jokingly calling it OftenOn – the high availability functionality ended up being a source of a lot of downtime.
Stack worked with Microsoft for months to troubleshoot sporadic outages, resulting in this Windows 2008R2 hotfix and some other fun discoveries. After a lot of challenges, Stack (and most of my other AG clients) ended up moving to Windows Server 2012 for their SQL clusters because so many of the clustering problems were fixed with rewritten clustering code.
The big gotcha, though, was that if any replica loses its network connectivity to any of the other replicas, all of the involved databases will drop offline. This is not a bug – this is the desired result.
Well, this was Microsoft’s desired result. It sure wasn’t anybody else’s, ha ha ho ho, and thankfully those nice folks at Microsoft decided the behavior would be different in SQL Server 2014. Now, if a node loses connectivity, its AG-involved databases continue to stay online.
Stack Exchange’s Upgrade to SQL Server 2014
As Nick Craver writes in his post about running SQL Server 2014 in production at Stack, this advantage was absolutely killer for Stack’s uptime goals. The Stack developers have done a killer job at coding the sites – if they detect that the SQL Server databases are in read-only mode, all of the involved sites fail into a read-only mode too, along with a nice polite banner informing the reader that they can’t make any changes right now.
The Stack engineers, God bless ‘em, are so dedicated to making IT better that they’re not just willing to document their infrastructure to help others, plus build open source tools to help people, but they’re even willing to put their production web site in the hands of preview code. So earlier in November, Nick Craver and Steven Murawski did a rolling upgrade of their production clusters to SQL Server 2014.
Each cluster involves 3 nodes. Take the StackOverflow cluster:
- NY-SQL01 – the primary replica handling all incoming write connections. Sits in Manhattan.
- NY-SQL02 – the asynchronous secondary replica sitting next to it in the same cage. NY-SQL01 copies transaction log data off to this server gradually in the background. In the event of a localized failure on NY-SQL01, the admins can manually fail over to NY-SQL02 with some data loss.
- OR-SQL01 – the asynchronous secondary replica sitting in Oregon. NY-SQL01 also copies data here in the background, but due to the vagaries of wide area networks, it can be farther behind than NY-SQL02. To leverage extra hardware in Oregon, Data Explorer can be hosted in Oregon’s web servers using this read-only capacity.
All three were running SQL Server 2012 on Windows Server 2012. Unfortunately, with Windows clustering, we can’t upgrade any of the nodes in-place to a new version of Windows (2012 R2), so if we wanted to deploy that, we’d have to tear down some of the existing nodes temporarily. That was a lot of work for relatively little gain. There wasn’t a need for new hardware, either – the database servers typically run under 10% CPU, and they can cache everything they need in memory.
Since we could keep the same OS, the SQL Server 2014 upgrade process looked like this:
- Upgrade one of the readable replicas (in our case, NY-SQL02) to SQL 2014. From this point forward, replication stops to the other nodes. They can’t apply data from a newer SQL Server version, so they’ll just kinda freeze in limbo. We could still fail back to those, but we would lose all data changes made from this point forward.
- Test the apps in read-only against the newly upgraded replica.
- Test the apps in writeable mode against the newly upgraded replica.
- Make a go/no-go decision. If no-go, fail back to the original replica (NY-SQL01) and send the upgraded replica to detention. If go, start upgrading the other readable replicas. As they come online with SQL Server 2014, the AlwaysOn AG replication will catch them back up.
- Optionally, fail back to NY-SQL01 as a primary.
- Monitor the servers and the plan cache looking for queries getting unexpectedly poor execution plans. (Remnant of our SQL 2008 upgrade, and subsequent full text search problems.)
The upgrade went really smoothly, barring one technical issue involving a combination of the installer and Windows Core. The installer assumes that a particular feature will be installed (whether you need it or not), and that feature requires the full Windows GUI, so it fails on Windows Core. The “fix” was to tell the installer to skip feature compatibility checks.
I was online purely for comic relief. I think I made the sum total of one technical contribution during the entire call, and I left shortly after the go/no-go decision. Nick and Steven are top notch admins. Having said that, I wouldn’t recommend this DO-IT-LIVE! upgrade approach for most companies.
Should You Follow Suit with SQL 2014?
The Stack Exchange team is very in tune with what’s happening in their SQL Server environment. They know exactly which queries are the top 20 resource users, what those execution plans look like, and when a new query pops up to the top of the list. If SQL Server suddenly produces different query plans for a frequent query, these guys know how to work around it. They have great relationships with the dev teams, instant access to source code, and easy, automatic deployments to production to fix query problems.
I’m thankful that there’s cutting-edge shops out there like Stack Exchange that dedicate so much talent to managing their database stack and give so much back to the community (and to Microsoft). They’re helping make sure SQL 2014 is a solid product when it eventually hits RTM, and hopefully they’ll iron out any surprise bugs before folks like you deploy it in production.
If you believe your shop has what it takes to run SQL 2014 in production, you can join Microsoft’s early-access programs to get access to more frequent builds, extra support contacts, and community experts to help with the deployments. Contact a Microsoft MCM or MVP to get started.
I’ve been told that an attendee at the PASS Summit pre-conference event asked about using dynamic SQL to get around some of the problems with filtered indexes. I’m not entirely sure what the question was, but it did give me the opportunity to play around with filtered indexes and write some simple demo code to illustrate just the kind of shenanigans that you can get up to.
Creating a Filtered Index
The first step for our demo is to create a filtered index. Grab yourself a copy of Adventure Works and get ready for some fun. Here’s our filtered index:
CREATE INDEX jp_filters_not_offline ON Sales.SalesOrderHeader (OrderDate) INCLUDE (SalesOrderID) WHERE OnlineOrderFlag = 1 ; GO
That creates our initial filtered index. We want to search by
OrderDate and find only those orders that were placed online. This is a realistic index to create in the AdventureWorks database, too – there are 27,659 online sales in my copy of AdventureWorks, but only 3,806 offline sales.
Want to prove that the index works? Run this SQL and take a look at the execution plan:
SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 ;
There’s No Problem Here!
Let’s create a pair of stored procedures:
CREATE PROCEDURE jp_online_only_date ( @OrderDate DATETIME ) AS BEGIN SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 AND OrderDate > @OrderDate END GO CREATE PROCEDURE jp_online_only_date_2 ( @OrderDate DATETIME ) AS BEGIN DECLARE @sql NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = 1 AND OrderDate > ''' + CAST(@OrderDate AS NVARCHAR(MAX)) + ''';' EXEC sp_executesql @sql; END
If you were to run both of these, you’d notice that they both make use of the filtered index. What gives?
Both of these stored procedures are able to take advantage of the filtered index because the filter condition (
OnlineOrderFlag = 1) is matched by predicate in the query. In other words – SQL Server can easily make this match.
The Problem with Parameters
There’s a problem, though. When we switch over to using a parameter, SQL Server isn’t able to make effective use of the filtered index. Check it out:
CREATE PROCEDURE jp_online_only_params ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = @OnlineFlag AND OrderDate > @OrderDate ; END GO EXEC jp_online_only_params '2007-12-12', 0 ;
Oh no! This procedure doesn’t use our filtered index any more. There’s got to be a way to trick SQL Server into using our filtered index. We could resort to an index hint, but that would mean we have to keep using the same index or keep re-using the name of that index. I’m not a big fan of that, so let’s think of something else.
What if we were to use string concatenation? That sounds like magic, it has a lot of syllables.
Hooray for Strings!
We can solve the problem by forcing SQL Server to see the literal value that we’re passing in:
CREATE PROCEDURE jp_online_only_strings ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN DECLARE @sql AS NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = ' + CAST(@OnlineFlag AS NVARCHAR(1)); SET @sql += ' AND OrderDate > ''' + CAST(@OrderDate AS NVARCHAR(MAX)) + ''';' PRINT @sql EXEC sp_executesql @sql; END
If you were to run that stored procedure, you’d get effective use of the filtered index. SQL Server sees the
OnlineOrderFlag predicate as a literal value, matches that up to the index definition, and we’re off to the races.
There is one problem, though – because we’re also using a literal value for
OrderDate, there could be significant bloat in the plan cache. SQL Server will create one execution plan for every possible combination of parameters that we pass in. In AdventureWorks there are 1,124 distinct values in the
OrderDate column. That’s a lot of execution plans.
Avoiding Plan Cache Bloat
We want to avoid bloat in the execution plan cache. In order to do that, we’re going to need to parameterize part of the query but still leave the filter intact. That’s really easy to do:
CREATE PROCEDURE jp_online_mixed ( @OrderDate DATETIME, @OnlineFlag BIT ) AS BEGIN DECLARE @sql AS NVARCHAR(MAX) = N''; SET @sql += 'SELECT SalesOrderID FROM Sales.SalesOrderHeader AS soh WHERE OnlineOrderFlag = ' + CAST(@OnlineFlag AS NVARCHAR(1)); SET @sql += ' AND OrderDate > @order_date;' PRINT @sql EXEC sp_executesql @sql, N'@order_date DATETIME', @OrderDate; END
By adding the
@OnlineFlag as just another string, but still parameterizing the
@order_date we end up with only two execution plans. One plan will be created for each value of
@OnlineFlag, but we’ll still get use of the filtered index. This is all but impossible to accomplish with forced parameterization and difficult to accomplish with regular parameteriziation (usually you have to resort to index hints).
Summing It Up
Dynamic SQL and filtered indexes are a great match. By carefully making use of dynamic SQL, you can coerce SQL Server into behaving well and providing you with an execution plan that uses the filtered index. Through fully parameterized dynamic SQL you can get good plan re-use and avoid SQL injection attacks. By making careful use of dynamic SQL and filtered indexes, you should be able to get performance out of SQL Server without having to resort to tricks like index hints or plan guides.
When you write a query, you (usually) don’t want to return all the rows from the table(s) involved – so you add a WHERE clause to the statement. This ensures that fewer rows are returned to the client – but doesn’t reduce the amount of I/O done to get the results.
When you create a nonclustered index, you can add a WHERE clause to reduce the number of rows that are stored at the leaf level – a filtered index. By having fewer rows in an index, less I/O is done when that index is used.
Filtered indexes are great performance boosts if you have a value that is used in a predicate very frequently, but that value is only a small amount of the total values for that table. (Say that ten times, fast.) For example: I have an orders table that contains a Status column. Valid statuses are Open, Processing, Packing, Shipping, Invoicing, Disputed, and Closed. When the business first starts using this table, there’s a good chance there is a fairly even distribution of orders across these statuses. However, over time, a majority of orders should be in Closed status – but the business wants to query for Open, or Disputed, which are only a small percentage.
This is where a filtered index can come in.
When you add a WHERE clause to the index creation statement, you’re limiting which rows are stored in the index. The index is smaller in size and more targeted. It can be trial and error to get the query optimizer to use the filtered index, but when you do, gains can be significant. In one case study I have, logical reads dropped from 88 to 4 – a 95% improvement! What are some of the things you can – and can’t – do with them?
Use equality or inequality operators, such as =, >=, <, and more in the WHERE clause.
Use IN to create an index for a range of values.
Create multiple filtered indexes on one column. In my order status example, I could have an index WHERE Status = ‘Open’, and I could have another index WHERE Status = ‘Shipping’.
You can have multiple items in the same where clause. I could have an index WHERE Status = ‘Open’ OR Status = ‘Shipping’.
Create a filtered index for all NOT NULL values – or all NULL values.
Create filtered indexes in SQL Server 2005.
Use certain expressions, such as BETWEEN, NOT IN, or a CASE statement.
Use date functions such as DATEADD for a rolling date range – the value in WHERE clause must be exact.
The query optimizer won’t consider filtered indexes if you’re using local variables or parameterized SQL. This is one of the hardest limitations to work around – so I usually reference Tim Chapman’s The Pains of Filtered Indexes for help with this!
Filter All The Things!
Using either a WHERE clause in a query to limit the rows returned or a WHERE clause in a filter to reduce the rows stored is beneficial to performance. You want to reduce the amount of data read and the amount of data returned to clients to improve performance. Filtered indexes can help!
What Are We Doing?
The goal of this presentation, and code sample, is to provide users a starting point for learning Solr.
This github repository contains the source code of a program that will load the data into a searchable format. As of this initial release, this will only work on the smaller Stack Exchange sites (dba, etc). Multi-file sites will fail, but this can be easily changed.
This repository also provides a set of starter queries to demonstrate some of the search functionality of Solr.
What Are We Not Doing?
This is not a detail discussion of Solr, Solr internals, installation processes, or search in general. I’m certain that somebody has covered that far better than I.
Getting the Data Dump
I used the September 2013 Stack Exchange public data dump.
To get yourself a copy:
- Find a BitTorrent client.
- Hit up the download link.
- Wait for a while.
- Unzip the files.
- Load them using the SolrLoader.
You’ll need to configure a separate Solr core (or just set up a completely new Solr). I did this by copying the Solr installation to a new folder and then renaming my Solr core to
dba.stackexchange.com. To save you some hassle, here’s what you need to do:
- Create a copy of Solr. You can do this by taking the Solr example (in the
examplefolder) and copying it into a new folder named
core.propertiesfrom this repository.
solrconfig.xmlwith their respective files from this repository.
- Find the
data/indexfolder. Delete everything in there.
- Decompress the Stack Overflow data dump of your choosing.
- Modify the
pathvalue to point to the location of the Stack Overflow data dump you extracted in the previous step. (I have no idea if this is going to work if you have spaces in the path… so… don’t have spaces in your path.)
Using the Query UI
In the Solr administrative web page, select the core we just created and then click on Query in the sidebar. This will take you to the Solr query GUI where you can set up sample queries until you’re blue in the face.
Your First Query
There are a lot of boxes in the query interface. We’re going to focus on the
q box to start with. Delete
*:* and drop
body:*brentozar.com* in there.
This will search for any document indexed in Solr where the
body field contains the text string
brentozar.com in any position. The
* characters are, as you’d expect, wildcards. You can put them anywhere in the text you’re searching for and Solr will do its best to answer your question. Be careful in your use of wildcards since they will make searches more expensive.
A More Complex Search
Let’s look for something a little more realistic – we’ll search for answers that have something to do with SQL Server replication or SQL Server Notification Services. We’ll pretend that we have a problem with both (which wouldn’t be surprising if we were using both).
q field, enter
body:*cation. The results should look something like this:
Scroll through the results and satisfy yourself that these are, indeed, the correct results. Let’s review the results section.
The responseHeader contains metadata about the search results. You can see the
params that you passed in to the search. For the curious, the
_ field is the MVCC document version that was current when you started the search. The
QTime field is the number of milliseconds that the query took to execute.
The response contains, you guessed it, the full response from the server. This will include the number of documents that were found, the start offset (if you’re paging), the documents that were found (if you asked for them), and any facets that you included in the query.
The Solr query UI will return all fields in the documents that it finds. Solr query tools (such as the HTTP interface or using solrnet) require that you specify a list of fields to be returned.
Narrowing Search Results
Let’s narrow down our search results – our initial search returned around 3,000 results, but that includes user descriptions, comments, and any number of other things tracked in the Stack Overflow schema. We’re really only concerned with the answers where replication or SQL Server Notification Services was the answer. Also, we are really only interested in highly rated answers.
In the query interface, let’s click in the
fq field and type
post_type:Answer. Now click the
+ button to add another field and then add
stack_score:[5 TO *] (make sure the
TO is capitalized or else Solr will throw an error).
fq field is used as a filtering query. The default Solr behavior is to score documents that are returned. Anything that is included in the
q field will influence that score. The more frequently a document’s body contains
*cation, the higher the score. If you simply want to filter a document, then use the
fq search terms instead. These are just filters, much like a
WHERE clause in your favorite RDBMS.
Here’s what that search looks like:
What About Facets?
I mentioned facets earlier. What are facets?
Facets are a way to break up search results into multiple categories, making it easier for users to drill down into search results. There’s usually a count of the number of results found in each of the facets. Many websites use faceted navigation to make it easy for users to narrow down their browsing to a particular category or set of features.
Let’s just facet out the scores for starters. Follow this URL http://localhost:8983/solr/dba.stackexchange.com/select?q=:&echoParams=none&rows=0&wt=json&indent=on&facet=true&facet.field=stack_score (or paste it into your browser). This will facet out the scores into a list that contains the score and a count of documents with that score. Breaking down the URL we have:
q=*:* # match any document echoParams=none # don't bother returning parameters rows=0 # don't bother returning data, either indent=on # make results pretty! facet=true # let's facet stuff, yeah! facet.field=stack_score # the faceting field.
Faceting by one filed is only moderately interesting. We can add additional parameters to make the facets more useful. This query lets us see a breakdown of scores by user: http://localhost:8983/solr/dba.stackexchange.com/select?q=body:*&rows=0&wt=json&indent=on&facet=true&facet.pivot=author_name,stack_score. The difference on this faceted query is that we’ve exchanged
facet.pivot. We’re still seeing facets, but now the faceting is first done on
author_name and then the results are further faceted by score. A count is provided for each top level facet (the
author_name field) and then a count is provided at each level of the pivot.
You can further enhance faceting by supplying a range, instead of faceting on individual field values. To do this, we add a
facet.range set of parameters to the end of the search. In this case, they might look like:
facet.range=stack_score # Create a range based on score facet.range.start=0 # Start the range at 0 facet.range.end=60 # Don't create ranges after 60 facet.range.gap=5 # the size of each range is 5
This produces the kind of faceting that makes it possible to show ranges of prices, volumes, or other range based buckets.
If you want to learn more about Solr, head over to http://lucene.apache.org/solr and browse the documentation. There’s a lot of guidance about operations, performance, and querying Solr that will help you get started using Solr quickly.
Let’s get this out of the way: I’m every bit as surprised as you are that I’m going to work for Brent Ozar Unlimited. Two months ago, I would not have believed it.
I saw the blog post, just as you did. And I dismissed it. I believed because it wasn’t a BI job, I wasn’t the right person for it. I told my wife, ”That’s a great job, but not for me.” Then a funny thing happened. She said, “If it’s so great, go ahead and apply for it.”
At that moment, my thinking changed from “Why bother?” to…
I went back and re-read the blog post. I read over exactly what Kendra was asking for. I noticed there was nothing – nothing – about being a DBA. I read last year’s blog post for Employee #1 and saw it was markedly different. Employee #1 had to be a DBA; Employee #2 could be anyone.
I contemplated what working for them would be like. I’d travel infrequently and work from home. I’d get to do training work. I’d get to make videos (right in my wheelhouse; I went to college to be a screenwriter). I’d learn a metric ton about SQL Server from incredibly sharp and experienced people. (In a sense, I also was picking all my co-workers — four people I enjoy spending time with.) In short, they’d pay me to do what I already love doing and I’m doing for free. As if I needed further proof, I found a note I’d written to myself over a year ago that listed my dream job attributes.
One by one, I went down the list. One by one, they matched.
Time to get to work.
LIGHTS! CAMERA! ACTION!
It was the perfect scenario. I love making videos and here I was applying to a company that (among other things) makes videos. Naturally, the only way I could possibly express interest in this job was a video. I got my hands on a trial version of Adobe Premiere, searched my mental movie library for funny clips to use, and 18 hours later I sent in the finished video. In a bit of subliminal sleight-of-hand, I titled it, “‘So You’re Hiring’ Doug Lane”.
Once I’d sent that in, I got to work positioning myself for the best possible shot at my dream job. I wanted to know what their customer experience was like, so I bought and completed Kendra’s $29 DBA Job Interview Question and Answer Kit (I know! Too perfect!). I watched sample videos from Brent, Jeremiah, and Jes. I read the supplemental PDFs that go with the videos. I downloaded and ran sp_Blitz on a couple of my own VMs. I wanted to be able to offer feedback on as many products as I could. I wanted them to know that I cared about their business.
WE’LL TALK ABOUT COFFEE, DOGS, DAUGHTERS…NO BIG WHOOP.
I’d never interviewed with people I knew ahead of time, let alone people I considered friends. I wasn’t quite sure how to behave. I was trying to toe the line between casual banter and serious discussion. I figured I’d let them set the tone for what was appropriate. Fortunately, Brent, Kendra, and Jeremiah all put me at ease, and I simply stopped worrying about it after a while.
I thought I only really botched one question: “When was the last time you caused a production outage?” Now, I’ve brought down a production box through pure idiocy more times than I can count, but I’ve never brought my entire company to its knees. That’s how I interpreted the question and thus answered, “I don’t think I’ve ever done that.” YEAH, RIGHT. I did go back and clarify that later on, but still felt like an doofus on that one.
After the interview, I sent them a quick thank-you e-mail restating what I thought they were looking for and how I was a good fit, as well as a couple of changes I’d suggested for sp_Blitz.
TALK AMONGST YOURSELVES…I’LL GIVE YOU A TOPIC.
The tech interview was more challenging. In the first two minutes of poking around Brent’s VM, I had somehow destroyed it. I couldn’t right-click in Kendra’s VM so I had to ask her to do it. Every time. It didn’t help that the lighting in my house was terrible; I looked like one of those “Meth: NOT EVEN ONCE” poster guys on my webcam and it bugged the hell out of me.
I was slow to recognize certain symptoms, but ultimately found the problem. I told myself the whole interview was just okay, and didn’t feel too good about my chances. When Kendra asked, “How do you think you did?” my stomach filled with dread.
An hour later, I got another meeting request from the group. I couldn’t tell if it was to immediately disqualify me or to make an offer. I couldn’t resist screaming with joy when I found out.
And here we are.
WHAT ABOUT BI?
It seems odd to make such a sharp career turn from BI to the database engine. In truth, this has been a long time coming. My favorite SQL Server BI product, Reporting Services, hasn’t had a major update since 2008. More and more BI functionality is ending up in Excel or SharePoint: two places I’m not all that interested in following. Plus, the DAX/Tabular/Power Everything revolution meant I was going to have to start over learning a new BI language, new model, and new tools. My heart just isn’t in that — not as much as I’d need it to be to continue down that path.
At the same time, I’m looking forward to contributing what BI knowledge I have that’s still valid and useful (one benefit of SSRS’s slow pace) to Brent Ozar Unlimited. I expect Jes and I will put our heads together for some SSRS fun in the days to come.
Last year, I hit a wall…hard. I was overworked at work. I wasn’t doing anything fun or interesting with SQL Server. I plateaued as a speaker, doing my same stock SSRS talk for every presentation. I wasn’t blogging much. I got very sick right before the PASS Summit and had a miserable time there. 2012 was awful. I couldn’t wait for 2013 to arrive.
This year has been a different story. I got three new presentations off the ground, including a murder mystery session I’ve had percolating for two years. I presented to over 300 people on Reporting Services at the PASS Summit last month. And of course, I’ve somehow talked the very nice people sketched all over this site into hiring me. Even just two months ago, I never would have thought that was possible.
I was listening to Billy Joel (I love Billy Joel) at my desk a couple of weeks ago and in one of those transcendent moments where life compels you to pay attention, I stopped working and just listened to what I was hearing:
I survived all those long lonely days
When it seemed I did not have a friend
‘Cause all I needed was a little faith
So I could catch my breath and face the world again
Don’t forget your second wind
Sooner or later you’ll feel that momentum kick in
I leaned back in my chair and fought back tears, unsuccessfully. My second wind had come.
I can’t wait to see where it carries me.
What was the interview like for the fine folks who went through it?
We’re big believers that one of most important things for a consultant is their ability to interact with customers. The type of work we do relies on communicating clearly with our customers, and we wanted to make sure that our interview process reflected that.
We also wanted to make sure we asked candidates what they thought about the job without outright saying “Hey, what would you think if we asked you to do this?”
Tell Me About You
When you’re a small company, there’s no escaping your co-workers… or your management. We asked the candidates a few questions to make sure that we were all on the same page.
The questions that we ask are no different than the questions that you’ll get asked on any other kind of job interview – What excites you? What do you dread? Knowing what motivates people is really important. If you dread looking at a new SQL Server every week, you’re not going to enjoy working with us.
There aren’t any right or wrong answers, there are just answers that told us more about the candidates. That’s why we also ask fun questions like “What was the last time you caused a production outage?” We’ve all broken things and sharing funny stories about the ways that we’ve broken production environments is always a good way to break the ice, both with candidates and when we’re working with clients.
What would you say… we do here?
We asked the candidates to tell us about us. Well, that’s not entirely fair. We first asked people why they wanted to work with us. There’s no right or wrong answer, but as an employer it’s helpful to know what motivates people. There are different ways to motivate different types of people; knowing why someone wants to join the team helps us understand how we can motivate an employee and keep them interested in the job. Hey, even the most exciting job can seem like a slog some days.
How would you feel if I told you that your job would include creating an 80 slide presentation every week? What if I added in that the recommendations would be custom for an individual client? How would that make you feel? Not everything we work on is a rocket science. Sure, we do our fair share of high end performance tuning, but a lot of what we do is design ways for our clients to solve their every day problems. We help clients understand and prioritize their solutions by creating presentations that summarize the key pain points and solutions to those problems.
If the thought of writing an 80 page presentation every week was something a candidate dreaded, we knew that we weren’t the right fit. Some people are passionate about digging into problems, building a plan, and then implementing the plan themselves. Other people like to teach.
What Didn’t We Ask?
What we didn’t do is ask candidates a lot of minutiae. While it’s important to know technical details about SQL Server, we know that you can look this information up online. With co-workers like Brent, Kendra, and Jes a smart answer is only a quick email away.
There’s nothing secret about our interview process. These are the kinds of questions that potential employees can expect to hear wherever they interview.
Was There A Technical Screen?
Oh yeah, we had a technical screen. After the first round of interviews, we discussed who should move on to the second round. Since we didn’t ask any of the candidates difficult technical questions in the first round of the process, we wanted a way to see how they worked under the usual types of pressure that we deal with.
We set up a pair of demo environments and invited Doug to take a look at them and tell us what was wrong. Once again, we weren’t looking for right or wrong answers; we wanted to see how Doug looked at each instance and get a feel for how he approached problems. In an amusing turn of events, one of our demo environments was accidentally corrupted and couldn’t even boot. We didn’t ask Doug to look at that because, let’s face it, when you find corruption you should just call Microsoft.
Throughout our process, there weren’t any wrong answers. We looked for people who were as excited about technology as we are, who share a passion for learning and growing, and who didn’t laugh when Brent broke his VM during the technical screen.