Book Review: Database Reliability Engineering by Campbell & Majors

Database Reliability Engineering – good buy

When you see the cover of Database Reliability Engineering, the first question you’re probably gonna ask is, “Wait – how is this different from database administration?”

And I’ve got good news: that’s the very first thing @LaineVCampbell and Charity Majors (@MipsyTipsy) cover in the preface.

“…for a long long time, DBAs were in the business of crafting silos and snowflakes. Their tools were different, their hardware was different, and their languages were different. (…) The days in which this model can prove itself to be effective and sustainable are numbered. This book is a view of reliability engineering as seen through a pair of database engineering glasses.”

The book absolutely delivers: it’s a 250-page version of the concepts in Google’s Site Reliability Engineering book (which I love) targeted at people who might currently call themselves database administrators, but want to go to work in fast-paced, high-scale companies.

How Senior DBAs should read this book

Jump to page 189, the Data Replication section of Chapter 10. Campbell & Majors explain the differences between:

  • Single-leader replication – like Microsoft SQL Server’s Always On Availability Groups, where only one server can accept writes for a given database
  • No-leader replication – like SQL Server’s peer-to-peer replication, where any node can accept writes
  • Multiple-leader replication – like a complex replication topology where only 2-3 nodes can accept writes, but the rest can accept reads

The single-leader replication discussion covers pages 190-202 and does a phenomenal job of explaining the pros & cons of a system like Availability Groups. Those 12 pages don’t teach you how to design, implement, or troubleshoot an AG. However, when you’ve finished those 12 pages, you’ll have a much better understanding of when you should recommend a solution like that, and what kinds of gotchas you should watch out for.

That’s what a Database Reliability Engineer does. They don’t just know how to work with one database – they also know when certain features should be used, when they shouldn’t, and from a big picture perspective, how they should build automation to avoid weaknesses.

I love those 12 pages as a good example of just how big in scope this 250-page book really is. The authors have very, very deep knowledge – not just database specifics, but how the database interacts with applications and business requirements. They abstract their experience just enough to make it relevant to all data professionals, yet keep the language clear enough that it’s still directly mappable to the technologies you use today.

For example, it doesn’t teach you how to use version control to treat your infrastructure as code. It just tells you that you should, and gives you a few key terms to look for as you start to build that skill.

You’re going to learn new terms and techniques. It’s going to take you years to turn them into a reality in your current organization. That’s okay – it’s about broadening your horizons.

How managers should read this book

Managers, you’re gonna read this and go, “Wow! I want a DBA team that thinks like this!”

Go back, read chapter 2 (Service-Level Management) carefully, and start working on it now with the staff that you have. Start crafting your service level objectives and defining how you’re going to measure them. In my experience, this is the single toughest part of the book, and it relies on the business stakeholders being able to come to a consensus. It’s a political problem, not a technical problem, and as a manager, it’s the part that you have to deliver.

That chapter’s recap includes two lines I adore, emphasis mine:

The SLOs (Service Level Objectives) create the rules of the game that we are playing. We use the SLOs to decide what risks we can take, what architectural choices to make, and how to design the processes needed to support those architectures.”

Availability and latency are to database reliability engineers as revenue and profits are to salespeople. You wouldn’t dream of telling your sales team, “Ah, just get the best price you can, and we’ll be okay.” You can’t do that with your reliability engineers, either.

How developers & sysadmins should read this book

If you’re coming into database administration for the first time, some of the concepts are going to be familiar to you (release management, SLOs, monitoring, not treating human error as the root cause.)

Chapters 10-12 will seem terrifying.

In those chapters, you’ll learn a lot of very big concepts (ACID, CAP Theorem, caching, message systems.) When you read those, your eyes may get large, and your ego may get small. Don’t freak out: just by reading these chapters, you’re already ahead of what most database administrators know about those topics.

See, most of us DBAs are resemble the way Campbell & Majors described the starts of their careers in the beginning of the book: accidental DBAs. We didn’t go to school for this, and most of us don’t have computer science backgrounds. Reading chapters 10-12, you’ll think you’re getting a crash course on something that everybody else already knows well. Good news – we don’t know it well either. (That’s also part of why I told DBAs to start with pages 190-202.)

And yes, I do recommend this book.

It’s the kind of book that’s easy to read, and hard to implement. Seriously, just implementing the SLOs described in chapter 2 takes most traditional companies months to agree on and monitor.

Over time, the brand names and open source tools will change, but the concepts are going to be rock solid for at least a decade. This book is a great waypoint marker set about 5-10 years in the future for most of us, but it’ll be one you’ll be excited to work towards.

Get the book now on Amazon.

Previous Post
[Video] Office Hours 2017/11/22 (With Transcriptions)
Next Post
Good Reasons to Rebuild or Reorganize Indexes

4 Comments. Leave new

  • I’m sold, going to order it. Thanks!

    Reply
  • The book definitely covers some great ground in the whole design, development, release, operations lifecycle — and a must read for DBAs who want to raise their contrition to the next level — beyond just managing infrastructure.

    There is a tendency among some DBAs to be the “grumpy gatekeeper” at the end of chain. In contrast, this book emphasizes the importance of the data professional being involved at the very start of the planning and development lifecycle.

    One of my complaints with this book, however, is the use of unnecessarily verbose vocabulary when simple words would do. For example, on page 212: “There are four prevalent permutations of data models in this section…”. Bleh !

    Also, the book is obviously biased towards open source platforms (Linux, MySQL), which might be off-putting to those versed in Microsoft platforms. (Ie SQL Server!). For SQL Server DBAs, you will just need to translate the concepts to map to this platform

    The focus of the book is clearly geared towards internet scale, distributed applications — not necessary analytics, big data,, data warehouses. An assumption they do make clear: (page 86): “The assumption if this book is that all data stores are distributed.”

    Anyway — apart from a few minor quirks, this book is an overall great resource for DBAs to review and even share with their teams.

    Reply
  • […] TV references: MacGyver and The A-Team (13:40) Chaos Monkey (00:00) Brent‘s book review of Database Reliability Engineering (15:00) Andy cannot count 9’s… (19:15) The shifting node for “big data.” […]

    Reply
  • […] TV references: MacGyver and The A-Team (13:40) Chaos Monkey (00:00) Brent‘s book review of Database Reliability Engineering (15:00) Andy cannot count 9’s… (19:15) The shifting node for “big data.” […]

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}