Sometimes our identical twin turns out to be evil – and sometimes our SQL Server does, too. When we encounter data corruption, hardware failure, and OS errors, we need to fail over from our principal to our better mirrored twin. I’ll show you how to handle database mirroring failovers in both high safety and high performance setups in this week’s webcast:
Hiring a DBA? Need to get a job description for the human resources folks? Here’s how to get started.
First, decide whether it’s a production or development DBA. Think of the database in terms of a fridge. When you run a restaurant, you need at least one (and probably several) refrigerators to keep your raw ingredients and your prepared dishes cold.
Your chefs rely on the fridges to get their jobs done. They have tons of training to pick the right ingredients to put in the fridge, prepare the food correctly, and know when to take things in & out of the fridge.
If your restaurant absolutely, positively cannot go down, you’ll end up hiring a handyman or facilities guy. He has to know how fridges work, and if a fridge can’t keep the food cold enough, he steps in to diagnose and fix it.
The chefs are your developers.
When you have a LOT of chefs, you hire a development DBAs to organize the fridge and clean it out. They don’t usually write code, but if they do, the code is inside the database – they’re not writing presentation-layer code in C# or Java.
The handyman or facilities guy is your production DBA. He’s more concerned about the back side of the fridge than the front side. He doesn’t do any
They all work with the fridges, but the similarities end there. Small shops might indeed hire one guy to buy the food, put it in the fridge, cook it, and fix the fridge when it breaks. However, those shops aren’t going to win any awards for food quality, and when the fridge breaks, the cooking stops while he fixes the fridge.
Sample Production DBA Job Description
This position’s job duties include:
- Ensure all database servers are backed up in a way that meets the business’s Recovery Point Objectives (RPO)
- Test backups to ensure we can meet the business’ Recovery Time Objectives (RTO)
- Troubleshoot SQL Server service outages as they occur, including after-hours and weekends
- Configure SQL Server monitoring utilities to minimize false alarms
- As new systems are brought in-house, choose whether to use clustering, log shipping, mirroring, Windows Azure, or other technologies
- Install and configure new SQL Servers
- Deploy database change scripts provided by third party vendors
- When performance issues arise, determine the most effective way to increase performance including hardware purchases, server configuration changes, or index/query changes
- Document the company’s database environment
To do a great job in this position, experience should include:
- On-call troubleshooting experience with at least one production SQL Server for a year. You don’t have to be the only DBA or have DBA in your job description, but you should have been the one person that the company would call if the SQL Server service stopped working.
- Finding DMV queries to answer questions about server-level performance
- Using free tools like sp_Blitz™ and sp_WhoIsActive to diagnose server reliability and performance issues
The following skills aren’t strictly necessary, but will make you a well-rounded candidate for bonus points:
- Tuning T-SQL queries to improve performance
- Troubleshooting hardware using tools like Dell OpenManage, HP System Insight Manager, and IBM Director
Sample Development DBA Job Description
This position’s job duties include:
- Ensure that new database code meets company standards for readability, reliability, and performance
- Each week, give developers a list of the top 10 most resource-intensive queries on the server and suggest ways to improve performance on each
- Design indexes for existing applications, choosing when to add or remove indexes
- When users complain about the performance of a particular query, help developers improve the performance of that query by tweaking it or modifying indexes
- Conduct SQL Server lunch-and-learn sessions for application developers
- Advise developers on the most efficient database designs (tables, datatypes, stored procedures, functions, etc)
To do a great job in this position, experience should include:
- Writing and improving SQL Server T-SQL queries for at least a year. You may have technically had “C# Developer” or “Java Developer” on your job title, but you were known amongst the office as the go-to person for T-SQL questions.
- Designing tables and picking datatypes
- Using Profiler traces and other tools to find the most frequently run queries
- Using free tools like sp_BlitzIndex™ and DMV queries to answer questions about index usage
The following skills aren’t strictly necessary, but will make you a well-rounded candidate for bonus points:
- On-call troubleshooting for SQL Server service outages
- Deciding whether clustering, log shipping, mirroring, replication, etc are the right fit to solve a business problem
Things I Didn’t Include In These DBA Job Descriptions
If you’re using any of the following technologies, mention it in your job description so that the candidates know what to expect:
- Failover clustering, SAN replication, and other high availability technologies
- SQL Server merge, peer to peer, or transactional replication
- LINQ, Entity Framework, NHibernate, or other ORMs
- Service Broker
- Analysis Services, Integration Services, or Reporting Services
There’s nothing wrong with having your production or development DBA work with those technologies, by the way – but they’re special technologies that require prominent placement in job descriptions.
When it comes to indexes, SQL Server is really helpful. It lets you see what indexes queries are asking for both in execution plans, and missing index dynamic management views (“DMVs”). I like to look at the DMV missing index requests using sp_BlitzIndex™.
When you look at missing index requests, it’s always important to remember one of the biggest things: these missing index requests won’t ever ask for or recommend a specific clustered index.
Let’s take a look at what this might mean against a slightly modified version of the AdventureWorks2012 sample database.
Hey, we have a high Value missing index!
Let’s say we run sp_BlitzIndex™ and it diagnoses that we have a high value missing index.
--To diagnose a database, we would run: EXEC dbo.sp_BlitzIndex @database_name='AdventureWorks'
We scroll to the right on this line to get more info and see that the index request has an overall “benefit” over over one million. That benefit is a made up number– it’s a combination of the number of times the index could have been used, the percentage by which SQL Server thought it would help the queries generating the request, and the estimated “cost” of the requests. These factors are all multiplied together to help bubble up the biggest potentially useful requests.
We also see that the index request is really quite narrow. It only wants an index with a single key column on the “City” column! That seems really reasonable.
Scrolling over farther, we can see that this could have potentially been used about 49 thousand times. The queries that could have used it (it’s quite possibly more than one), were rather cheap– their costs were less than one on average– and it would have improved those queries a whole lot (around 93% overall). We currently don’t have ANY nonclustered indexes on the table in question, so we don’t have to worry about creating a duplicate nonclustered index.
This index seems really reasonable. It would be used a lot, and it would help those queries out.
If we keep going to the right, we see that there’s some sample TSQL to create the suggested index at the far right:
CREATE INDEX [ix_Address_City] ON [AdventureWorks].[Person].[Address]
Sure enough, that syntax will create a nonclustered index. That seems really good, but hold on a second!
Always look at the whole table
Just to the left of the “Create TSQL” column is a really important helper– the “More Info” column.
Copy the command from the appropriate row out. It will look something like this:
EXEC dbo.sp_BlitzIndex @database_name='AdventureWorks', @schema_name='Person', @table_name='Address'
This command will help you look at the whole table itself and assess if there might be something you’re missing.
When you run this command, you’ll see a big picture view of the indexes on the table, the missing index requests for the table, and the number and types of the columns in the table:
In this table, note that we don’t even have a clustered index. This table is a heap! Heaps have all sorts of wacky problems in SQL Server.
In this case, we have an OLTP database and we definitely want to avoid heaps to keep our sanity.
Our queries are requesting an index on the City column, but it looks an awful lot like our table was modeled with another clustered index in mind (AddressID). Don’t just assume that the missing index request itself will always make the best clustered index. You need to take a look at your overall workload and the queries which use the table. You need to decide based on the whole workload and overall schema what should be the clustered index, if you have a primary key, and if the primary key is the same thing or different than the clustered index. After you have that design, then make decisions for your nonclustered indexes.
The clustered index is special
The clustered index in any table has special uses and significance. This index is the data itself, and it will be used in every nonclustered index in the table. If you are defining a new clustered index or changing a clustered index, SQL Server will need to do IO on every nonclustered index on the table as well. Always make sure to test your indexing changes and choose your index definitions carefully.
Want to try this out on your own?
Feel free! A copy of the scripts I used to set up AdventureWorks for this very simple demo are below. Note that this is only suitable for test environments, and the script will mess up your Person.Address table (so make sure you can re-restore AdventureWorks2012 afterward). Get sp_BlitzIndex™ here
RAISERROR('Careful there! Run in sections.',20,10) WITH LOG; GO --****************** -- (C) 2013, Brent Ozar Unlimited. -- See http://BrentOzar.com/go/eula for the End User Licensing Agreement. --WARNING: --This script suitable only for test purposes. --Do not run on production servers. --This query may have a very long runtime on some systems. --****************** --Modify this restore to fit your file system and preferences IF DB_ID('AdventureWorks') IS NOT NULL BEGIN USE master; ALTER DATABASE AdventureWorks SET SINGLE_USER WITH ROLLBACK IMMEDIATE; END GO RESTORE DATABASE AdventureWorks FROM DISK=N'C:MSSQL11.SQL2012CSMSSQLBackupAdventureWorks2012-Full Database Backup.bak' WITH MOVE N'AdventureWorks2012_Data' TO N'C:MSSQL11.SQL2012CSMSSQLDATAAdventureWorks2012_Data.mdf', MOVE N'AdventureWorks2012_Log' TO N'C:MSSQL11.SQL2012CSMSSQLDATAAdventureWorks2012_log.ldf', REPLACE, RECOVERY; GO USE [AdventureWorks]; GO ---Remove all the indexes from Person.Address ALTER TABLE Person.BusinessEntityAddress DROP CONSTRAINT FK_BusinessEntityAddress_Address_AddressID; GO ALTER TABLE Sales.SalesOrderHeader DROP CONSTRAINT FK_SalesOrderHeader_Address_BillToAddressID; GO ALTER TABLE Sales.SalesOrderHeader DROP CONSTRAINT FK_SalesOrderHeader_Address_ShipToAddressID; GO DROP INDEX [AK_Address_rowguid] ON [Person].[Address] ; GO DROP INDEX [IX_Address_AddressLine1_AddressLine2_City_StateProvinceID_PostalCode] ON [Person].[Address] ; GO DROP INDEX [IX_Address_StateProvinceID] ON [Person].[Address] ; GO ALTER TABLE [Person].[Address] DROP CONSTRAINT [PK_Address_AddressID] GO --Run a query a bunch of times set nocount on; set statistics time, io off; GO declare @i int =0; while @i &lt; 10000 begin declare @AddressLine1 nvarchar (60), @AddressLine2 nvarchar (60), @City nvarchar (30), @StateProvinceCode nchar(3), @PostalCode nvarchar (15) SELECT @AddressLine1 = AddressLine1, @AddressLine2 = AddressLine2, @City=City, @StateProvinceCode=[StateProvinceCode], @PostalCode=PostalCode FROM [Person].[Address] AS [pa] JOIN [Person].[StateProvince] AS 1 on pa.StateProvinceID=ps.StateProvinceID WHERE City = 'Kenosha'; set @i=@i+1; end GO declare @i int =0; while @i &lt; 39291 begin declare @countme BIGINT SELECT @countme = count(*) FROM [Person].[Address] AS [pa] JOIN [Person].[StateProvince] AS 1 on pa.StateProvinceID=ps.StateProvinceID WHERE City = 'San Diego' and ps.StateProvinceCode='CA'; set @i=@i+1; end GO --Diagnose! EXEC dbo.sp_BlitzIndex @database_name='AdventureWorks' --Look at the Person.Address table specifically exec sp_BlitzIndex @database_name='AdventureWorks', @schema_name='Person', @table_name='Address' GO
Sometimes, the best stories are the horror stories. Join Brent as he talks about some of the worst situations he’s seen in his years of database administration. He’ll share the inspiration behind some of his favorite entries at http://DBAreactions.tumblr.com. We’ll either laugh or cry. Or both.
Liked that webcast? We’ve got many more coming up – just check the boxes and put in your email address.
Until recently, database benchmarks have been performed by vendors in carefully controlled labs or by engineers at companies reporting on application specific workloads. TPC benchmarks, like TPC-H, provide metrics about the number of queries per hour and a cost per query per hour. While these results can give us a guess about the total cost to implement a system, they have no bearing on the cost to operate a system. How much will it cost to maintain a high throughput system?
UC Berkeley’s AMPLab has provided a benchmark that makes it easier to compare both performance and cost of different database solutions. The AMPLab Big Data Benchmark provides a benchmark for several large scale analytic frameworks. Most importantly – it’s possible for anyone to reproduce these benchmarks using the tools provided by AMPLab.
Performance AnalysisFirst and foremost, the AMPLab benchmark provides a performance analysis of four products – Amazon Redshift, Hive, Shark, and Impala. Several query types are used to provide a general view of analytic framework performance. Not all frameworks are implemented in the same way, and providing a broad set of queries makes it possible for users to evaluate how a workload might perform in production.
Several exploratory queries, an aggregate, an aggregate with joins, and custom UDFs are tested at several sizes and with several variations. If these aren’t representative of a given workload, it’s possible to extend the benchmark framework to include representative queries on the sample data set. Ambitious teams could even go so far as to point the benchmark their own data to discover which product provides the most benefit.
A great deal of flexibility is available to let teams benchmark potential solutions in Amazon Web Services – different servers and data sets can be repeatably tested and evaluated before settling on a platform.
Typically, benchmarks are based on hardware, different database engines are compared on the same hardware. Other benchmarks are based on performance: how much performance (based on an arbitrary metric) can be eked out of any set of hardware.
Neither approach addresses the real concern of many businesses: cost.
Interestingly, the AMPLab benchmark is not based on hardware configuration. Instead of fixing on specific hardware types, the AMPLab benchmark is based on a cost metric. All systems were created in Amazon Web Service making it easy to compare cost based on published instance costs. In the case of the initial AMPLab benchmark, the systems cost $8.20 per hour (the Amazon Redshift system cost $8.50 per hour).
This is important for the simple reason that we now have a far more important way to compare the performance of different databases. For $8.20 – $8.50 an hour, on the workloads tested, I can make an easy decision about how I should perform my data analysis.
Taking the AMPLab benchmark a step further, we can customize the benchmark and how our workloads will perform at different cost levels. If you’ve wondered whether you should use one Hadoop variant or another, SQL Server, or Amazon Redshift for cloud analytics, you can easily find out. For teams already using cloud based analytics frameworks, it’s easy to use these benchmarks to determine how workloads would fare on different systems or with different instance sizes.
The AMPLab benchmark produces results that most people in the RDBMS world would be happy about – Amazon Redshift comes out ahead of the competition. Equally unsurprising, the results are fastest when the entire result set can be coerced into memory. What’s surprising, though, is how well newcomers Shark and Impala perform when stacked up against an MPP database like Redshift. Sure, Redshift is about twice as fast as Shark, but Shark is a new product (the first source code commits occurred on April 23, 2011) and I’m sure we can expect big improvements in the future. This is important, though, because it shows that tools like Shark and Impala complete in the same realm as MPP databases like Redshift, Teradata, and PDW.
What’s It All Mean?
Using the AMPLab benchmark we have an easy tool that lets us compare analytic database performance in a hosted environment. We can perform multiple tests to understand how our workload will perform within different database products and hardware environments. Continued improvements to both the underlying database platforms and the test framework itself should lead to interesting discussions, prototypes, and technology decisions.
Servers are like milk cartons: both need an expiration date printed clearly on the outside.
When a project needs a new database server, I ask the business users, “How long does this server need to last?”
Inevitably somebody gives the flip answer of, “Forever.” I laugh along with the group and then wait with a calm smile on my face, and eventually people understand that I’m serious. That’s when the uncomfortable part starts, because nobody wants to talk about old age. They want to assume they’ll live forever, their servers will always be fast, and vendors will always support today’s software.
Expiration dates can be calculated in:
- Time – how long the vendors support each component (app code, database server, operating system, hardware)
- Data size – we built this solution to support 100GB of data, and we could stretch it to 500GB, but beyond that it’s going to start busting apart at the seams
- User quantity – we’re predicting 1,000 users, but if we suddenly grew to 10,000, we’ll hit concurrency issues
And each component may have its own expiration date:
- Hardware – the server, storage, and even networking may be subject to increased support costs from the manufacturer when it’s out of warranty. Before virtualization, one of the servers I supported was a mission-critical boat anchor, and we bought a few identical ones off eBay to keep on the shelf as backups. The manufacturer just didn’t offer parts anymore.
- Operating system – Windows has its own end-of-life dates. If you’re in the cloud, Windows Azure has its own support policy. You won’t be able to roll out new VMs with old OS’s.
- Application language/framework/platform – Some of my clients have been excited to adopt SQL 2012 AlwaysOn Availability Groups, but then met with surprise when their third-party JDBC drivers weren’t being updated anymore.
- Database server – of course, us DBAs, this is the only one we think about.
In any given project, some people assume the solution will be carved in stone to last forever, and others are building a cardboard house to get us past an agile finish line. DBAs are a great example – often when we build solutions for others, we try to build an amazing, scalable solution that can handle all kinds of demand. When we implement our own maintenance scripts, though, we duct tape things together, assuming that we’ll constantly be revisiting them to hone and perfect them. In reality, maybe we’re doing it backwards. Let’s build our backup and monitoring tools so that we never have to revisit them, and instead put our ongoing work into the things the business really cares about.
Have you ever wished your SQL Server could have an identical twin, holding the same data, in case you ever needed it? SQL Server mirroring provides just that, and you can choose if you want it for high availability or disaster recovery. If you’ve ever been curious about what mirroring really is, and what the pros and cons are, this is the session for you.
From the Dept of Corrections: During the webcast a viewer asked in Q&A if automatic page repair was a one way or two way street. Kendra answered that if the mirror gets an IO error it will go into a suspended state. This is somewhat correct but incomplete– the mirror will also try to correct the issue with a page from the principal afterward and attempt to resume mirroring. More info here.
Just when you thought SQL Server couldn’t get better, Microsoft is announcing the features for SQL Server 2014. They haven’t announced the licensing/pricing, but I’ll tell you what I do know so far.
First, open this in another tab and hit play so you’ve got some background music while you read. Done with the commercial? Okay, let’s get to it:
Cache frequently used data on SSDs. You can specify an SSD (or an SSD array) to be used to extend memory. SQL Server 2014 will automatically cache data there with zero risk of data loss. (Only clean pages, not dirty pages, are stored there.) The best use case is for read-heavy OLTP workloads. This works with local SSDs in clusters, too – each node can have its own local SSDs (just like you would with TempDB) and preserve the SAN throughput for the data and log files. SSDs are cheap, and they’re only getting cheaper and faster. Here’s the questions you’ll want to ask before you use this feature:
- Is your total actively queried data set bigger than you can fit in memory? Note that I didn’t say all data: you might have archive or history or audit tables in your databases that are never queried, and there’s no sense in caching those.
- Have you already maxed out the memory on the server? If not, start there first – memory can be used for more than just caching clean pages.
- Do business requirements force you to use shared storage or magnetic local storage? If not, consider moving the data to local SSD entirely.
- Does your server have room for locally attached PCI Express or SAS/SATA solid state drives?
If the answer to all of those questions is yes, an SSD buffer pool extension may be for you. Honestly, Microsoft could stop there and I’d probably still recommend the new version for most of my clients, because that’s a killer performance benefit.
More online maintenance operations. Got big data in a partitioned table? Is nobody giving you any time to do maintenance? Just no time to stop and get away cause you work so hard to make it every day? Well, with SQL 14, you can rebuild a single partition’s index online, and you can switch partitions in/out using DBA-specified lock priorities. For 24/7 workloads, this gives the DBA the ability to do maintenance with lower locking, CPU, and memory overhead. There’s also new Extended Events stuff you can use to monitor who’s getting blocked and killed. Here’s how the syntax works:
ALTER INDEX MyIndex ON MyTable
REBUILD PARTITION = 3
WITH (ONLINE = ON (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 5, ABORT_AFTER_WAIT = BLOCKERS)))
The new parameters involved are:
- PARTITION = 3 – you can pick the specific partition you want to rebuild, and you can do it online.
- WAIT_AT_LOW_PRIORITY – just hang out when you need the schema mod lock.
- MAX_DURATION = 5 – wait for up to 5 minutes.
- ABORT_AFTER_WAIT = BLOCKERS – can be a few different variables. If BLOCKERS, then SQL Server will abort (kill) the queries blocking your index rebuild. If SELF, your index rebuild will give up and let user queries keep going. If NONE, everybody just keeps waiting, doin’ the neutron dance. This is the current behavior in SQL Server 2012, and it’ll be the default.
AlwaysOn Availability Groups get more secondaries. If you really need to scale out your reads, SQL 14 gives you up to 8 secondaries (up from 4). Of course, you’ll be paying for Enterprise Edition licensing on these, but if you were already going to replicate data out to various reporting or BI servers, now your life is easier.
AlwaysOn AG readable secondaries will be more reliable. In SQL 2012, if your primary drops offline or the cluster loses quorum, the readable replica databases drop offline. (This is hilarious because it’s right when you really want to be able to query the secondaries.) No way to control it – it’s totally automatic. In SQL 14, the secondaries remain online and readable when the primaries aren’t available. However, keep in mind that typical AlwaysOn AG connections go through the AG listener name, and then fetch the list of readable replicas from the primary. This just means that in order to keep your report queries online, you can’t use the AG listener – you have to connect directly to the replica’s server name. I like using a separate set of DNS records for readable replicas, like readonly.mydomainname.com, and have my report servers point at those.
Use Azure VMs as AlwaysOn AG replicas. Nobody wants to pay for expensive offsite datacenter space with machines that sit idle all the time. Now, in the AlwaysOn Add Replica wizard, there’s an “Add Azure Replica” button that integrates with your Azure subscription logins. The wizard lets you pick the VM image, VM size (cores & memory), cloud replica name, admin password, etc. Lots of gotchas here though:
- Initializing the replica means a full database backup/restore from on-premise up to the Azure VMs, too, so this isn’t a great solution for big databases with limited bandwidth.
- Connectivity from on-premise to the Azure VM requires a VPN appliance from your datacenter to the Azure datacenter, and today that means a hardware appliance, so there’s still some expenditures required. Still way cheaper than buying hardware for a colo, though, and much more flexible.
- If you’re really going to use it for disaster recovery, you need a Windows Domain Controller up in Azure as well. Without that, when your primary site dies, all of your Windows machines won’t be able to log on, so that wouldn’t be very useful. SSMS doesn’t automate the deployment of a DC (nor does it alert you if you didn’t think this through.)
Failover Cluster Support for Clustered Shared Volumes. With regular volumes, only one node can own the volume at any given time. He owns the entire volume, and no other node can see/read/write files on that volume. However, Windows Server clusters have a type of drive volume called Clustered Shared Volumes with much more flexibility. Multiple cluster nodes can be connected to the same volume at the same time, but each node can access different files on the drive independently. Windows and Hyper-V have supported this for a while (see the BOL section on the benefits), and now SQL Server supports it too. The big benefit here is that if one node of the SQL Server cluster loses connectivity to the storage, it can still read and write data over the network to a different node’s SAN connection.
Smart Backup to Azure. SQL Server 2012 CU2 already lets you back up databases to Azure storage. I hear a lot of people ask me, “Brent, how can I make my backups slower and less predictable?” Those folks loved backing up from on-premise databases over their Internet connections, but that wasn’t enough. They wanted even less predictability, so now they get Smart Backups. With this feature, SQL Server figures out whether it should do a full or differential backup, how often it should do a transaction log, and more. Humor aside, this makes sense for people who host their servers in VM providers with very fast Internet connections that don’t pay for bandwidth – specifically, people hosting SQL Server in Windows Azure VMs. Both of those guys are going to be thrilled.
On-premise SQL Server with data/log files in Azure storage. Great news for those of you who really like juggling chainsaws! Now you get the best of everything:
- Expensive on-premise licensing
- Expensive bandwidth costs to the cloud
- Paying for data storage at Microsoft
- Slow backups (because your data has to come down from Azure storage to local on-premise memory then back out to wherever you want it stored, and heaven forbid you be dumb enough to send it back up to Azure storage and pay TWICE for bandwidth in and out)
Here’s the syntax:
CREATE DATABASE foo
ON (NAME = foo_dat, FILENAME = ‘https://internetstorage.windows.net/data/foo.mdf’ )
LOG ON (NAME = foo_log, FILENAME = ‘https://internetstorage.windows.net/data/foolog.ldf’);
I’ll just leave that there.
Hekaton: specialized in-memory OLTP tables. If your application is facing serious concurrency issues with thousands of simultaneous connections trying to lock data, Hekaton offers an intriguing solution. I’m not even going to try to type out an explanation here, but I’ll point out a few challenges with it:
- You’ll probably need to change your data model. For example, identity fields aren’t supported – you’ll need to use a GUID as a primary clustered key.
- You’ll also probably need to change your code. Hekaton works best with stored procedures, and specifically stored procs that it can compile into native code.
- It’s memory-only. If you experience sudden data growth in your Hekaton tables, that means you can cache less of your other tables. If you run out of memory – well, let’s just say you’d better pray you don’t run out of memory, because hello, downtime.
You’re so excited, I can feel you getting hotter. But wait, there’s more.
Other cool improvements:
- Updatable clustered column store indexes
- Query performance improvements due to better cardinality estimator
- Resource Governor for IO
- Sysprep enhancements
- Wizard to deploy database to Azure VM
- Separation of duties enhancements to support DBAs who aren’t allowed to read the data, or auditors who are allowed to read the data but not manage the server
- Windows Server 2012 R2 cooperation improvements – ReFS support, online resize VHDX, storage tiering, SMB improvements
The Pointer Sisters music probably gave away my true feelings here, but really, folks, I’m so excited. There’s very real improvements in here for everybody. If you’re a DBA on a multi-terabyte database, you’re going to love the SSD buffer pool extensions and the granular index rebuilds. If you’re BI-curious, you’re going to be experimenting with the clustered column store indexes. If you’re a software-as-a-service vendor with lots of clients, you’re going to love failover cluster support for CSVs and query performance improvements. And if you’re a developer who works with a SQL Server back end, you’ve got all kinds of new tricks to scale.
I know some DBAs were worried that Microsoft was “all in” with the cloud, and that they’d stop improving the box product. SQL2014 shows that Microsoft is still bringing the awesome.
Now, can somebody just get us a release date and some pricing? I was talking to a PR guy who almost spilled the beans, but he’s so shy.
Occasionally you check out job listings and wonder, “Could I have a better job?” If you’ve been working as a database administrator for a few years, it’s time to learn how to tell a dream job from a potential nightmare. Join Kendra Little for a 30 minute guide on how to read hidden messages in job listings and find the right next step for your career.
Liked this video? Check out our upcoming free webcasts.
Fast – When developers ask how quickly a piece of code needs to run, don’t say fast. Give them a finish line so that they can know when it’s time to move on. “This query currently does over 15mm logical reads. We don’t allow production OLTP queries that do over 100k logical reads – anything higher than that needs to hit the reporting server instead.” Developers don’t want to write slow queries, but if you don’t show them how to measure their queries, they don’t know what’s slow versus fast. Show them how to measure what makes a query successful, and they’ll start measuring it long before they bring it to you.
Sometimes - When code works unpredictably, don’t say it sometimes works. Look for things it has in common when it fails. Does it bomb every Sunday, or when it handles over ten customers, or when it’s only being called once at a time? Keep digging for environmental coincidences until we can give the developers or QA staff a lead. Sometimes (see what I did there?) I find that developers want access to the production server just because they can’t get enough specific troubleshooting help from the DBAs. “Your code fails sometimes” isn’t going to cut it.
Never - When the developer tries to deploy a trigger in production, don’t say, “We never allow that.” The business, not our emotional desire, dictates the technical solutions we use. We’re here to advise the business, and sometimes the business won’t go along with our advice. Our job is to lay out our concerns clearly and concisely, preferably with risk assessments and real-life stories, and then listen. I’d love to build every box and application perfectly, but we gotta ship if we’re going to keep paying salaries.
Fine - When you’re asked how the server is performing, don’t say it’s fine. Quantify it in terms of batch requests per second, average duration for a stored procedure, or some other metric that you can measure precisely. Bonus points if we correlate this number over time, like if we can say, “We normally average 1,400 to 1,500 batch requests per second during peak weekday hours, and we’re doing 1,475 right now.”
Large – When developers asks how big a table or database or index is, don’t say large. What’s large to you is small to someone else, and vice versa. Give exact numbers: number of terabytes in the largest database, number of databases, number of rows in the largest table, or the number of times you’ve updated your resume in terror because the backup failed.
It Depends – DBAs love giving this answer as a cop-out for tough questions, and if you’re not careful, it comes off as a condescending know-it-all. For best results, immediately follow this phrase with the word “on”, as in, “It depends on the table’s access pattern – in read-focused systems like our data warehouse, we can have up to 10-15 indexes on our dimension tables, but in OLTP databases like our web site, we need to aim for 5 indexes or less per table.” The faster you can help someone to the answer they’re looking for, the more they’ll respect you as a partner, not an enemy combatant.