Category Archives: SQL Server

SQL Server

Webcast Recording – DBA Darwin Awards: Index Edition

Right after the DBA says “Uh oh,” they call in the consultants. You wouldn’t believe some of the stuff I’ve seen, but here’s the worst of it: some of these things are probably happening in your environment right now. If you’re a production DBA of any experience level, do yourself a favor and look for these common mistakes before they call in the consultants.

In this 30-minute video, we’ll discuss heaps, fill factor, maintenance plans, over-indexing, and the Database Tuning Advisor’s stupidity so you can learn what you’re doing wrong before your boss has to hear it from the outsider:

The links we discuss in the video:

Enjoyed this video?  Check out our upcoming webcasts.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

How to Test Your Backup Strategy: Five Simple Questions

Backup strategies are like rashes: people tend to ignore them for as long as they can, then they ask a specialist if it’s going to be OK.

There’s a lot of good ways to back up databases. The challenge is finding best strategy for your data. To find out if your SQL Server backup strategy measures up, ask yourself the following questions:

1. How Do You Know When Your Backups Aren’t Successful?

People deal with lots of databases, and it’s hard to keep track of every last one. Different strategies are used to manage backups for all these databases — sometimes a service external to the database runs the backup. Do you know when that service fails, or doesn’t start? At other times, individual SQL Server Agent jobs handle backups for each database. If a new backup is added and a backup job isn’t created, will you know it wasn’t run?

  • Make sure you have an alert system for failures.
  • Supplement your backups with regular checks of last backup date. Make sure you’re checking the last date of log backups if you’re not using the simple recovery model.

2. How Much Do You Lose if Even Just One Backup File Goes Bad?

Backups take up a lot of space. There are a lot of backups, and people rarely use them. It’s human nature to go a little cheap, and keep them only in one place. That place may have been provisioned using RAID 0. Wherever it is, a backup file might get deleted or corrupted.

You need to look at all the different types of backups you run and consider things like this:

  • If you’re using just full and transaction log backups, the chain ends at the last good log backup.
  • If you’re doing full backups once a week and differentials on weeknights, those differentials are no good without the full.

For critical databases, I prefer to keep backups in more than one place— onsite and offsite. Think you can’t afford an offsite location? Here are the rates for Amazon S3.

3. How Fast Do You Need to Restore Your Data?

If you’re a manager, this is your ‘Recovery Time Objective’. If you’re a project manager, you just call this ‘RTO’. If you’re a database administrator, this is the number of minutes until you’re pumped full of adrenaline and sweating heavily when something goes wrong. Whenever I’m considering a backup technology, I want to know how long it will take me to restore my data if everything is gone except the backup files.

This means you need to know:

  • How long will it take to get to the backup files? If they’re on tape or in a magical Data Domain device, can you access them yourself?
  • How long will it take to copy those files to a good place to restore them from?
  • If you don’t have a server to restore to, how long will it take to bring one up and configure it?
  • How long will the restore take?

If my backup technology can’t meet my Recovery Time Objective, I need to start thinking about high availability options that can mitigate my risk. Once you’ve got high availability in place, you still want to keep a plan to restore from backups and test it periodically, but you’re less likely to need to use it.

4. How Much Data Can You Afford to Lose if Your Transaction Log File is Vaporized?

When you stop and think about it, isn’t ‘Recovery Point Objective’ a strange way of saying how much data it’s OK to lose? When you think about backup strategies, you need to know:

  • Can you do incremental restores? (And have you tested it?)
  • Can you restore to a single point in time? (Example time: right before the data was truncated. And have you tested it?)
  • If your database and transaction log files disappeared and you were left with only the backup files, how much data would be lost?

5. How Much Can Your Backup Impact Performance?

Whenever your backup is running, you’re burning precious resources: CPU, disk IO, memory, and (depending on your configuration), network. In many cases, it’s just fine to use this up during non peak times. However, for VLDBs which are backing up multiple terabytes and for critical OLTP databases which serve customers around the globe, backups need to be run as fast as possible. After all, backups are only one part of maintenance.

When your performance is critical, these are the questions you need to ask:

  • Are you backing up at the right frequency? If we’re talking about a slow transaction log backup, check how often it runs.
  • Are you using all the available magic? If you’re using SAN storage, have you explored all the options for backing up your databases from the SAN?
  • Are you using backup compression? Compression can burn more CPU, but reduce the amount of writes and overall duration of your backup.
  • Does your backup have enough throughput to the storage device? Whether you’re using iSCSI or Fiber Channel, throughput is often a big part of the solution.
  • Have you read the SQL CAT Team’s Case Study on backing up VLDB’s? It’s chock full of configuration tips to make your backups blaze through faster.

That was More than Five Questions

It’s true, I cheated. Got your own backup test? Tell us in the comments!

Kendra Little

Kendra specializes in high availability and performance tuning. She is a Microsoft Certified Master in SQL Server-- the highest technical SQL Server Certification available. Kendra loves databases and software development more than long walks on the beach. Those cartoons in her blog posts? She draws 'em all. Read more and contact Kendra.

Website - Twitter - Facebook - More Posts

SQL Server Certifications: What to Take and How to Prepare

Whether you’re already a database professional or would like to become one, you’ve probably thought about getting certified in SQL Server. Do you need to be certified? What certifications are available and which should you take first? How should you prepare for the exams? Kendra Little is a Microsoft Certified Master in SQL Server. She’ll explain what certifications are available, what the advantages are of becoming certified, and give practical tips for making the most of your exams.

Want to see the links from the video? Scroll on down– the links are at the bottom of the post.

SQL Server Certifications: Helpful Links

Certification Exams for Microsoft SQL Server 2008

MCTS MCITP MCM MCA
Database Administrator Exam 70-432 Exam 70-450 Exam 88-970
then
Exam 88-971
MCA Board Exam
Database Developer Exam 70-433 Exam 70-451
Business Intelligence Developer Exam 70-448 Exam 70-452 N/A N/A

Studying for SQL Server Certifications

Tools to Build Your Own SQL Server Lab

Low Cost (or Free!) Virtualization Products

Using Amazon’s EC2 as Your SQL Server Lab

Kendra Little

Kendra specializes in high availability and performance tuning. She is a Microsoft Certified Master in SQL Server-- the highest technical SQL Server Certification available. Kendra loves databases and software development more than long walks on the beach. Those cartoons in her blog posts? She draws 'em all. Read more and contact Kendra.

Website - Twitter - Facebook - More Posts

Join Me for Lunch in Chicago on March 23rd

On March 23rd, I’ll be the guest of honor at the first SQLFriends Lunch in Chicago.  For just $30, you get lunch at Buca di Beppo on Rush Street plus two hours of talking about SQL, virtualization, blogging, consulting, the MCM program, you name it.

And you really do name it – on the SQLFriends registration form, there’s a spot to put in the questions you want answered.  I’ll get a copy of those, and I’ll come armed with things vaguely resembling answers.

This is a refreshing change of pace from user groups and presentations.  Here’s your chance to ask me anything in a friendly, relaxed environment and get insight from your peers.  Network with the kind of people who find me interesting – believe it or not, you’re not the only one.  (There’s also my mom.)

There’s only 30 spots available, so sign up now.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL Server 2012 BareMetal Workshop

Last week Brent, Jeremiah and Kendra attended the SQL Server 2012 BareMetal workshop at Microsoft Headquarters. This was a special train-the-trainer style event where Microsoft Program Manager Dandy Weyn (blog | twitter) brought MVPs, Microsoft employees, and MCMs together. We talked about implementing SQL Server, training the community to use new features, and helping people adopt new technologies to scale out their applications.

What did we think? Here’s our highlights.

Brent Says, “You’ve Come a Long Way, Baby”

I love the command line.  I love batch files.  I love queries.  But somehow, I could just never get into the habit of installing SQL Server via unattended scripting.  The GUI works well enough, and I don’t install that many SQL Servers, so why bother scripting things out when I’ve heard so many horror stories about the difficulty of unattended install scripts?

After Dandy Weyn’s #SQLBareMetal training in Seattle, I see the attraction.  Dandy’s put a hell of a lot of work into an elaborate set of scripts that:

  • Download evaluation versions of everything (Windows, SQL, Office, and more)

    Brent, Kendra and Jeremiah with Amit Bansal at #SQLBaremetal

    Brent, Kendra and Jeremiah with Amit Bansal at #SQLBaremetal

  • Create Hyper-V templates for Windows
  • Copy the templates into a domain controller VM and several SQL Server VMs
  • Creates a domain and joins the other VMs to it
  • And much more

It’s impressive.  Really impressive.  With a properly equipped host (24-32GB of memory and a single SSD or a RAID 10 of conventional hard drives), you can spin up your own SQL Server 2012 lab in an hour or two complete with a cluster and Availability Groups.

If everything works.

Unfortunately, we’ve still got a way to go.  Troubleshooting all these moving parts still isn’t easy, as evidenced by the number of stumped Microsofties, MVPs, and MCMs in the room that struggled for hours getting everything working correctly on brand-new clean virtual machines.  Troubleshooting machines in the wild is another matter altogether – scripting doesn’t make that much easier.  In fact, it’s probably worse: if you take a stranger’s scripts and run ‘em, and you run into problems, you’re fighting two battles simultaneously.  You have to figure out what the scripts were doing, and figure out how to fix problems.

Sitting in the workshop, watching people struggle, it hit me: Microsoft’s not bragging about easier management of SQL Server 2012.  If you want easy management, the solution (supposedly) is SQL Azure.  SQL Server, on the other hand, gives you much more powerful features and tools, but you pay for that power by way of more difficult setup and troubleshooting.  Scripting doesn’t help that – it just makes the errors pop up faster with less repetitive manual labor.

We’ve come a long way, baby, but in terms of manageability, we’ve still got a really long way to go.

Jeremiah says, “You press the button, we do the rest.”

I love automation; the more I can remove a meat bag from a decision making process, the happier I am. It’s not because I don’t trust people; it’s because people go on vacation or play games on their cell phone, or don’t check their email for hours on end. Of course we all like to build complex Rube Goldberg style solutions to problems that should have simple solutions. To put it plainly – the more critical something is, the simpler the solution must be.

It’s pretty well known that I’m a big fan of large scale distributed databases. Part of the appeal is the massive fault tolerance that comes along as part of the deal. SQL Server 2012′s Availability Groups bring that massively robust and massively simple fault tolerance to SQL Server. Before going in to the training, I understood what Availability Groups brought to the table, but I didn’t quite see the bigger picture. As Brent mentioned, getting things up and running isn’t the easiest thing on earth. Once we were up and running with the Availability Group the administration wasn’t as simple as firing up a Riak cluster and throwing a switch, but it was really close.

During the configuration process, there were a few places where the wizards weren’t as clear as I’d like, but all of the functionality is configurable via PowerShell and T-SQL. Like Brent I love me some scripting. The ability to script Availability Groups once and then modify a base set of scripts means that I can deploy this feature fairly easily, provided that the Windows side of things is set up correctly.

George Eastman originally said “You press the button, we do the rest.” SQL Server 2012 is getting very close. The UI isn’t always as simple as pressing the button, but SQL Server will do the rest… most of the time.

Kendra says, “Give us a week, we’ll take off the weight.”

Brent told you above about how we built out a lot of installations with a great set of scripts. One thing that thrilled me about our installs was how fast and simple it was to deploy Windows and SQL Server onto Windows Server Core.

What’s different about Server Core? There’s only a very lightweight GUI: most of your work is done through a command prompt. There’s still a little bit of the old familiar, though– you can fire up Task Manager in a pinch for an easy graphic view of  what’s running on the server itself. For the most part, however, you administer Server Core locally either by command line or PowerShell. If you want to fire up the graphic Windows Event Viewer, other MMC snap-ins, or SQL Server Management Studio, you need to enable those to be run from remote machines.

Server Core is important to SQL Server database administrators for three reasons:

  1. A smaller surface area is stronger and has less room for attack. (Bonus: that means less of a surface to patch, and fewer downtimes!)
  2. Not having the GUI protects you from yourself. All those tools we run on servers use up precious resources in terms of CPU and memory. If the patient is having a hard time breathing, they don’t want a doctor who needs to sit on their chest to do an exam.
  3. In the future, Microsoft is headed toward minimal GUI features. It’s time to start adapting now.

Do yourself a favor: make one of your goals this year to install Server Core with a SQL Server 2012 instance into your development or test environment. Spend some time with it and include it in your experiments with the new features of SQL Server 2012.

Yes, there’ll be tricky points you hit. You’ll find yourself reading a few new blogs and learning a few new PowerShell commands. But in a few years, you’ll be really glad you did.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Understanding Database Choice

Everyone needs a distributed database right? Wrong. It’s easy to get so excited about trying out a new technology that you close your eyes to the problem you’re trying to solve. I came across a question on Stack Overflow where the poster was asking what kind of distributed database they needed. Rather than jump right into the fray and say “You don’t need it,” I took a step back and asked myself “What’s the real question here?”

Oh boy, it's an untested database!

Understand Your Requirements

We all hope that some day our business will be as successful as Google or Facebook or whatever the next big thing is. Those business got that way by first concentrating on doing something incredibly well and then growing as needed. Nobody reasonable wakes up and says “We’re going to build this incredibly robust application and here’s how we’re going to do it.” The internet is full of articles about how different companies have dealt with their problems of scaling. Do your job well and worry about success when it happens.

In the StackOverflow question, Ted (we’ll call this person Ted) was asking which distributed database they should use to scale their system. They gave some vague requirements for data throughput, but left out why the system needed to be distributed across multiple servers.

This triggered my buzzword detector. I think distributed databases are incredibly cool, but they have a place; Ted’s requirements didn’t match at all. without much explanation for why a distributed database would be important here, it was hard to even refute the argument about using a distributed database.

Distributed databases have some operational advantages – they tend to be more robust and tolerant of equipment failures, but that’s based on certain configuration details like using multiple server replicas. RDBMSes, out of the box, aren’t distributed across multiple servers, but there are a lot of features that have been built to make it possible to replicate data across data centers or to shard the database across multiple servers. The closest thing to a business requirement was that the database needed to be free; open source would be nice, too.

Understand Your Hardware

Business requirements only matter so much. Eventually somebody has to write data to a disk. Once that drive head starts moving, the best designed software won’t matter if the underlying hardware can’t keep up with the load. That’s why it’s important to know what your hardware is capable of handling. It’s just as important to know what your application is capable of producing

Ted needed a system that would be handling less than 100 transactions per second and would probably end up writing data at a rate of around 400 kilobytes per second. Neither of these requirements are show stoppers. Assuming that the server was going to be writing at a constant rate, the amount of data generated and kept would be around 10 terabytes of data a year. While it’s nothing to scoff at, it’s not an unheard of data generate rate. The thing is, almost any off the shelf database software can handle these kinds of load. Almost any off the shelf server can handle this kind of data throughput.

The requirement to handle ~100 requests per second at around 4 kilobytes per record isn’t a matter of choosing a database product, it’s a matter of designing a storage solution that can handle the ongoing needs of the business. When SAN space can be purchased from around $15,000 per terabyte, 10TB per year becomes a minor budget line item for all but the most cash strapped startup.

Understand Your Data

There was one feature I left out until now. It’s important to understand how your data will be used. Graph databases excel at helping users explore the relationship between different data points. Relational databases make it easy to build incredibly flexible, good enough solutions to most problems. Key value databases make it possible to do single key lookups in near constant time. The way that data is read limits the playing field.

Ted mentioned that almost of the data lookups were going to be by primary key lookups. If this were the only requirement for reading data, this problem could be solved by any database. Then he threw in a little hook – there would be some joins in the data. In the world of databases once you use the j-word, your options get very limited very quickly. You have to start thinking about querying patterns, potential optimizations, and the trade offs of read vs write optimizations.

If you do need joins, you can take one of two approaches – let the database do it, or write it yourself. Neither approach is difficult (and one is certainly easier than the other), but they’re both feasible – heck the hard part has already been done for you: someone else came up with the algorithm.

Understand Obscenity

Justice Potter Stewart in an attempt to classify obscenity said “I shall not today attempt further to define the kinds of material I understand to be embraced… but I know it when I see it.” Right now, there’s no good definition of what makes data into Big Data. Some people say that you’ve hit Big Data when you can no longer predict query performance. Some people use hard and fast numbers of data volume in bytes, in data churn rate, or in the massively parallel nature of the database. There’s no right or wrong answer and Big Data is something that varies from organization to organization.

It’s important to understand what problem you’re trying to solve, understand the volume of data, and understand how the data is going to be used before making the final selection. There are many ways to store data.

What would I have done in this situation? Taking into account that I know SQL Server well, I would use SQL Server. SQL Server can perform admirably as a glorified key value store. B+trees are pretty quick in most use cases and they balance many of the problems of simultaneously reading and writing data to provide a good enough solution to the problem (with great management tools on top). When business users demand better querying capability, it’s easy enough to start adding non-clustered indexes on top of the solution.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

Website - Twitter - Facebook - More Posts

DBA Darwin Awards: Log File Edition

Right after the DBA says “Uh oh,” they call in the consultants.  We get to see some really amazing acts of career suicide.

I'll pull in close to the island so they can see me fail.

DBSEASEA SINK

You wouldn’t believe some of the stuff we’ve seen, and some of these things are probably happening in your environment right now.  In this session, we’ll talk about some of the most common ways to shoot yourself in the foot, and we’ll show you how to put the gun back in the holster.

If you’re a production DBA of any experience level, do yourself a favor and look for these common mistakes before they call in the consultants. We’ll discuss recovery modes, bulk inserts, VLFs, replication, copy_only backups, and of course, shrinking – all so you can learn what you’re doing wrong before your boss has to hear it from the outsider.

Links We Discussed

Automatically Shrink Your Transaction Logs – Jeremiah Peschka shares a diabolically dangerous idea to shrink your log files as soon as they grow.

A Busy/Accidental DBA’s Guide to Managing VLFs – Dave Levy explains the basics of Virtual Log Files, gives you a script to check how many you’ve got, and shows how to fix the problem.

Performance Impact of VLFs – Linchi Shea tested how delete/insert/update (DUI) performance is affected by 20,000 VLFs, and then follows up with an OLTP test too.

Minimally Logging Bulk Load Inserts – Greg Robidoux has a great chart explaining how to get minimally logged inserts depending on whether you’ve got clustered indexes, nonclustered indexes, existing data, etc.  Also, in the video, I don’t explicitly say that these also work in simple and full recovery modes, but they do.  (I wanted to keep things simple – no pun intended – for the 30-minute session.)

Microsoft Data Loading Performance Guide – Thomas Kejser, Peter Carlin, and Stuart Ozer discuss the improvements of trace flag 610 to get minimally logged inserts in 2008.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Availability, Data Locality, and Peer-to-Peer Replication

I want to make something clear: high availability is not load balancing. The two options don’t have to be mutually exclusive, but they aren’t the same thing. Several months ago, I wrote about resolving write conflicts. Some of the approaches I mentioned for resolving write conflicts (such as taking a “last write wins” approach) involved using peer-to-peer replication. It’s important to understand conflict resolution and peer-to-peer replication. Since I’ve already talked about conflict resolution, let’s dig into how peer-to-peer replication fits into the mix.

Peer-to-Peer Replication and You

Peer-to-peer replication is a special and magical kind of replication, it works in a ring or mesh to make sure that one row’s updates will magically spread to all servers. You’d think that this would mean every server is equal, right?

In some distributed databases actions can take place on any server. Using Riak as an example, when you want to add a new record to the database you can write the record to any server and the record will be routed to the server responsible for handling that data. This is part of the beauty of a specific type of distributed database: the database is a collection of nodes that serve reads and writes without regard for hardware failure. There’s a lot of software trickery that goes into making this work, but it works quite well.

SQL Server’s peer-to-peer replication is a distributed database, just not in the sense that I’ve used the term previously. In SQL Server peer-to-peer replication every node is an exact copy of every other node: the same tables, rows, indexes, and views exist on every node. This is where the difficulty begins – if every row exists on every node, how do we know where to update data? The problem is that we don’t know where to update a row. There is no out of the box mechanism for determining row ownership.

Distributed database systems like Riak, Cassandra, HBase, and Dynamo work by assigning an owner to every record, shouldn’t we do the same thing with SQL Server? When we’re spreading data across a number of servers, we have to ensure that writes go to the correct location, otherwise we need to build a large number of checks in to ensure that all nodes have the appropriate updates and that everyone is working on the correct version of data. Otherwise, we run into conflicts. This is the reason I hinted at using peer-to-peer replication combined with write partitioning and a last write wins method of conflict detection. If changes to a row can only occur on server A, we don’t need to worry about updates on other servers – those updates can be ignored since they did occur not in the correct location.

The difficulty lies in finding a way to do all of this. SQL Server’s replication offers no routing functionality, it just replicates data to the appropriate subscribing servers. In order to make sure that data gets to the right place, there must be another piece to the puzzle. There must be a way to correctly locate data.

Record Ownership

If you absolutely must use peer-to-peer replication as a form of load balancing, record ownership is an important concept to consider. Regardless of whether the distributed database is relational or not, software still needs to be aware of where the definitive version of a record can be found. If there’s no way to determine which version of a record is the definitive version of a record, two updates can occur in different locations. This will undoubtedly lead to painful conflict scenarios. Instead of worrying about handling conflicts, we worry about getting data to the right place. Once we know that the data is in the right place, we can trust our database to be as accurate as possible.

I use the term record instead of row for an important reason: a record represents a complete entity in a system. A row may be part of a record (e.g. a line item in an order) but the record is the complete order.

Record ownership is a tricky thing to think about; how do you determine who owns any single row? What’s a fair and efficient way to handle this? Let’s take a look at different techniques and see how they stack up. Here’s a quick list of possible ways we can distribute row ownership in a database:

  • Random
  • Range-based
  • Static

Random Record Ownership

Randomness is frequently used to ensure an even distribution. Randomly bucketizing data turns out to be a very effective way of ensuring that data will be split very close to evenly across any arbitrary number of locations. The difficulty is in ensuring randomness.

Some systems like Riak and Cassandra use a hash function to distribute data ownership around the database cluster. Different nodes are assigned a range of values – if there are four servers in the distributed database, each one is roughly responsible for 1/4 of the data in the database (I’m simplifying, of course, but you get the drift). Special routing code takes care of getting data to clients and sending writes to the appropriate place. The location of a record is typically determined by applying a hashing function to the record’s key. In this way, we can always find a row at a later date: by applying a function to the key we can quickly find the row even if the number of servers in the cluster has changed.

This mechanism provides a reliable way to uniquely identify data and distribute it among many servers. This technique is difficult to accomplish with SQL Server. There is no peer-to-peer replication functionality in the SQL Server space that makes it easy to say “This record belongs on server A and this record belongs on server B.” There’s a reason for this: peer-to-peer replication is a high availability technology. It exists to make life easier in the unfortunate event that your data center slides into the ocean. It’s possible to build some kind of load balancing layer in SQL Server using SQL Server Service Broker (or just about any other technology), but the point remains that SQL Server doesn’t provide out of the box functionality to automatically implement random record based ownership.

Range-Based Record Ownership

Range-based ownership is far simpler than random record ownership. In range-based ownership a range of records are claimed by a single server. This could be users 1 through 100,000 or it could be users whose names start with ‘A’ through users whose names start with ‘K’. At a quick glance range-based record seems like it doesn’t have many down sides: it’s easy to determine where an appropriate record goes. My data goes to server A, your data goes to server B, his data goes to server C.

Range-based record ownership has a major flaw: some servers will experience more load than others. For example, if we’re partitioning by name we will quickly discover that first names aren’t very unique, at least not in Western cultures. In a survey of first names conducted in the UK, one quarter of women were likely to have only one of ten first names in 1994. One in three women was likely to be named Emily. Needless to say, data distribution will cause skew in the activity distribution on different servers. If one server accumulates a clump of very active users (e.g. a group of active early adopters), that server may experience a higher load than the others.

Designing an effective range-based record ownership scheme for SQL Server peer-to-peer replication is possible but very difficult. The effectiveness of the scheme depends on intimate knowledge of write patterns. Most of us don’t have the time to develop a deep understanding of how data is written and then develop a scheme that takes into account those patterns.

Static Record Ownership

With static record ownership, we assign each record to a server when it is created. This could be as simple as assigning a user to the closest server or it could mean assigning records to a server by some other arbitrary means. However this is accomplished, it’s important to remember that some piece code still must able to determine where a record should go and that the mechanism for identifying that initial location should be general purpose enough to meet your user’s needs in the long term.

There are several common ways to split out data. If you have a system that’s multi-tenant, it becomes easy to assign ownership for all of a single client/customer’s data to a single server. If that customer grows, you can buy a separate server or move them onto a different server with fewer users. Every record ends up having a composite key made up of the record identifier and the client identifier, but this is a small price to pay for clearly being able to separate data responsibility by client.

Another way to split out data is geographically. If I sign up for a service, it’s nice if the primary place to write my data is as close to me as possible. In this case, the service might have three data centers: in LA, in New York, and one in London. Much like using a multi-client architecture, a geographic method to determine ownership would use the location as part of the key for each record – records stored in LA would use a composite key with the data center location (‘LA’) and some other arbitrary key value to identify a unique record.

No matter what scheme you decide to use, static record ownership is an easy way to determine which SQL Server should be responsible for writes to a single record. An advantage of static record ownership is that routing can be handled in the application or a sufficiently sophisticated router can handle routing writes without any additional application code being added to the application – just a few load balancer rules will need to be created or changed
.

In Summation

Here’s the trick: throughout all of this we’ve ignored that order of events is important. We’ve just assumed that when data is being written, we’re guaranteeing the order of events. If the data is being written to random servers, there’s no guarantee of event order. In a naive system, a record might be written to one server and an update applied to a second server before the original record even shows up! Distributing data is difficult. Randomly distributing data is even more difficult. No matter how you distribute your data or distribute writes, remember that distributing data in SQL Server through peer-to-peer replication is a high availability technology. It can be co-opted for scale out performance improvement, but there are some design decisions that must be made.

Jeremiah Peschka

Jeremiah Peschka has worked as a database and emerging technology expert at Quest Software where he researched new trends and technologies in the world of data storage. Over the course of his career he’s worked with companies across many industries as a system administrator, developer, and DBA. He’s been involved with all aspects of application development and deployment. He likes cheesecake, coffee, and ice cream.

Website - Twitter - Facebook - More Posts

SQL Server Interview Questions and Answers – Book Review

I read a lot of technical books – stuff on SQL Server, VMware, storage – and they all start to blend together.  Within the first few pages of reading SQL Server Interview Questions and Answers by Pinal Dave and Vinod Kumar, though, it jumped out and grabbed me because it’s really different.

Just kidding. When I'm talking to a recruiter, I read The Art of War

This is gonna be the year I get a better job, I swear. I'm outta this hole.

When buying this book, you need to appreciate it for what it is: a series of questions and answers.  No more, no less.

There’s almost no code, no syntax – and believe it or not, that’s refreshing.  It’s just plain English dialog.  That’s not to say the book is shallow, either, because it touches on topics that all of us could stand a refresher on.  This book works really well for its purpose in much the same way that Applied Architecture Patterns works; it doesn’t teach you as much as it just exposes you to subjects.  From there, it’s up to you to follow the trail if you’re interested in the topic, and that’s where Pinal and Vinod break new ground.

Check out this quote from page 50:

“What is a filtered index?  A filtered index is used to index a portion of the rows in a table.  This means it applies a filter on an INDEX which improves query performance, reduces index maintenance costs, and reduces index storage costs when compared with full-table indices.  When we see an index created with a WHERE clause, then that is actually a Filtered Index.  (Read more here http://bit.ly/sqlinterview27)”

The “read more here” part is actually in the book, and it links to one of Pinal’s intro posts about filtered indexes.  Just like all of Pinal’s posts, it includes clear, easy-to-follow script examples and screenshots.  Having said that, if you don’t like Pinal’s blog, you’re probably not going to like the book either.  The book has less technical details than the blog, and there are grammatical/editing issues.  That kind of thing doesn’t bother me as much with this book because of its intended market – it’s just a series of questions and answers.

Will The Answers Help or Hurt the Interview Process?

In my DBA interview questions and SQL developer interview questions blog posts, I went out of my way not to include the answers.  I wanted to give managers some starting points for questions that they could use to filter out candidates, but I didn’t want to give unqualified candidates a leg up in the interview process.  Those blog posts are consistently popular in Google searches, but based on the search terms being used, I know the candidates are looking for ways to cheat the interview process.  Therefore, my first concern going into this book was, “Are unqualified candidates going to sneak into jobs by reading this book?”

I don’t think that’s going to be a problem because the book is more of a refresher than a cheat sheet.  As I read the questions, I found myself nodding and saying, “Yep, that’s a great way to answer that question, but if somebody doesn’t understand the underlying concepts, they’re still going to fail the interview as soon as somebody drills down.”  An interviewer can see past the fakers just by asking, “And how have you used that concept in your work?”  The book also doesn’t stand alone as a complete interview process: managers still need to ask candidates to whiteboard concepts or reverse engineer a stored procedure to see what it’s doing.  Frankly, if somebody walked into an interview with this book memorized start to finish, and they didn’t have any other skills whatsoever, they’d still be more qualified than a lot of candidates I’ve interviewed.

My second concern about the book was that an incorrect or poorly-worded answer might cause a good candidate to miss out on a job.  If a non-SQL-savvy manager asked questions verbatim out of this book and expected verbatim answers, the book could lead to problems.  Some of the book’s answers are open to interpretation – for example, on page 41, the book asks, “What is OLTP?”  The book’s answer doesn’t match with what I would give, nor with Wikipedia’s definition.  Some of the answers are more clear-cut, though, like when page 54 says there’s a limit of 256 tables per query, and that’s wrong.  I’d recommend keeping the book’s online errata handy.

I get really nervous about the thought of a manager relying on this book’s answers as their only gauge of a candidate’s knowledge.  However, I’ve been through interviews like that myself – having a disagreement with a manager about the answer to a particular topic – and that’s a place that should be inside every DBA’s comfort zone.  We work with developers, project managers, and end users who have all kinds of incorrect assumptions about technology, and they’ve got books or blog posts to back up those assumptions.  We have to be able to back up our own answers, and sometimes that means saying to a manager, “We both believe we’re right – can you pop open a search engine and let’s check the latest documentation on SQL Server from Microsoft itself?”

Is This Book Just for Job Candidates?

I also bet that any of my readers will learn at least one thing reading this book, and it’ll drive you to go dive deeper into a particular topic.  I’d already learned something by page 34 when the book covers the order of the logical query processing phases: FROM, ON, OUTER, WHERE, GROUP BY, CUBE | ROLLUP, HAVING, SELECT, DISTINCT, ORDER BY, TOP.  I read that and a little light went on above my head – I’ve seen this kind of thing before, but I hadn’t thought about it in years, and it inspired me to go hit Google and learn more about the phases again.

The funny part about this book is that I like all of the questions, most of the answers for accuracy, and the rest for provoking thought.  I know that wasn’t the original intent, and it’s a hilarious thing for me to write in a book review, but there it is.

This book is a good starting point for MCITP test-takers, too.  The questions give you an idea of your qualifications because when an answer doesn’t feel natural to you, you can go drill down into the web to learn more about the topic.  If a topic feels like a no-brainer to you, then you probably shouldn’t waste time studying MCITP information about that topic either.

You can buy SQL Server Interview Questions and Answers for about $10 at Amazon, and Amazon Prime members with Kindles can read the Kindle version for free.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Anatomy of the SQL Server Database: Settings and Secrets Video

What are the key settings for a SQL Server database? What are some lesser-known settings that may have unexpected consequences? Kendra Little checks out a lot of databases in the wild– she finds loads of cases where database settings are dragging down performance or not protecting the database properly. If you’re a database administrator or developer who’s been working with SQL Server for more than six months, this 30-minute session is for you.

If you liked this video, here’s our upcoming webcast schedule:

  • January 17 – DBA Darwin Awards: Log File Edition – Brent Ozar – Right after the DBA says “Uh oh,” they call in the consultants. You wouldn’t believe some of the stuff I’ve seen, but here’s the worst of it: some of these things are probably happening in your environment right now. If you’re a production DBA of any experience level, do yourself a favor and look for these common mistakes before they call in the consultants. We’ll discuss recovery modes, bulk inserts, VLFs, and copy_only backups so you can learn what you’re doing wrong before your boss has to hear it from the outsider. Register now.
  • Jan 31 – Anatomy of the SQL Server Transaction Log: Limitations and Performance – Kendra Little – How does the SQL Server transaction log impact the performance of your code? What impacts the performance of the transaction log, and how do you know when you’re pushing the limits of your log? Kendra Little explains the role of the transaction log in SQL Server and what you can do to optimize your use of the log– we’ll discuss everything from the physical structure of the log to using minimal logging in your code. This talk is aimed toward database administrators and developers who have been working with SQL Server for one year or more. Register now.
  • February 7 – DBA Darwin Awards: Index Edition – Brent Ozar – Right after the DBA says “Uh oh,” they call in the consultants. You wouldn’t believe some of the stuff I’ve seen, but here’s the worst of it: some of these things are probably happening in your environment right now. If you’re a production DBA of any experience level, do yourself a favor and look for these common mistakes before they call in the consultants. We’ll discuss heaps, duplicate indexes, and the Database Tuning Advisor’s stupidity so you can learn what you’re doing wrong before your boss has to hear it from the outsider. Register now.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts