Today is the second day of the 2018 PASS Summit, the largest annual gathering of Microsoft data professionals. Yesterday, we got the marketing keynote that caught attendees up with the current state of SQL Server 2019 and Azure services. The attendees I talked to yesterday were impressed with the demo amounts & topics, too.
Today is the day 2 keynote. You can watch it live starting at 8:15AM Pacific on PASSTV on PASS.org.
Two Decades of Data Innovation: Celebrate the Evolution of the Data Platform and See into the Future with Extreme Cloud-Based Data Solutions
Twenty years of PASS Summit and twenty-five years of SQL Server; together we’ve come a very long way. Join SQL Server team past and present as they take a journey through the evolution of the Microsoft data platform into the broad ecosystem you see today. You will hear from many of your familiar friends: Conor Cunningham, Bob Ward, Lara Rubbelke, Mark Souza and a few other surprises.
Then buckle-up for a deep dive with Microsoft Data Platform CTO Raghu Ramakrishnan on the internals of our next evolution in engine architecture which will form the foundation for the next 25 years of the Microsoft data platform. However you interact with data – be the first to look under the hood and see the future of data straight from the Azure Data engineering team.
What to expect: a new approach to a day 2 keynote. In the last several years, the day 2 keynotes have been much better for us technical geeks than the marketing-focused day 1 keynotes because:
- Day 1 usually hopped around from topic to topic, only spending a few minutes on each, never really getting technical, just showing off lots of brochure-style demos
- Day 2 was for the geeks: it had one speaker (recently Dr. David DeWitt or Dr. Rimma Nehme) diving very, very deeply into one specific topic, taking people from near-zero to near-hero, albeit often losing people along the way because they go so deep
Looking at today’s abstract, this looks like a really cool hybrid of the two, and the scope is ambitious for 90 minutes. (Day 2 keynotes also involve PASS talking about the state of the
union non-profit organization.)
What to expect in this blog post: I’m sitting at the Blogger’s Table in the middle of the keynote room, taking notes. Every few minutes, I’ll update this blog post. The page won’t refresh itself, so if you’re following along at home and you want to see my latest takes, hit refresh.
Today will conclude the Summit 2018 keynote coverage, by the way. There’s no keynote on day 3 of the conference.
The keynote is scheduled to start at 8:15AM Pacific. Here we go!
8:12AM – you’re likely to see a lot of guys in kilts around today – that’s a grassroots movement to support the Women in Technology luncheon on Thursdays at Summit. (I’ll see ya there!)
8:15AM – Video introducing the #SQLfamily movement. It’s so much fun recognizing all these faces up on the video. I love this community.
8:19AM – PASS VP of Finance (and excellent Chicagoan) Wendy Pastrick takes the stage. Traditionally on day 2, PASS talks about the state of the organization, how the finances are doing, event turnout, etc. First, though, she’s talking about the networking first – that’s the real value of being here, I think. (That’s why today, I’m going to be doing what we call “the hallway track” – walking around, catching up with a lot of folks from around the world that I only get to see at conferences.)
8:20AM – 185 countries, 300,000 members at PASS.
8:21AM – Wendy’s showing how the PASS organization uses Power BI to make data-driven decisions about where they should be spending the community’s money.
8:22AM – Wendy just covered the financial slides BY SINGING I WILL SURVIVE! Oh my God, Wendy is my new hero. I doubt any of us will remember the numbers (other than the finishing note that the budget is balanced), but we will ALWAYS remember that moment.
8:23AM – PASS VP of Marketing (and excellent former-Michigander, now Pacific Northwesterner) Tim Ford takes the stage. 40% of attendees this year are first-timers.
8:26AM – Taking a moment of silence to remember those no longer with us.
8:28AM – Tim explains how Summit’s content has adapted over the years to reflect changes in the Microsoft data platform and things the community is interested in. (Some of this is to remind Microsoft that hey, PASS wants sponsorship money because we’re hip, we’re cool. There are some tough discussions happening around the budget & Microsoft’s sponsorships going forward, especially with Ignite conflicting with the Summit dates next year. Microsoft will have some tough decisions as to which team members go to which events.)
8:31AM – Microsoft’s Mark Souza takes the stage, credits PASS for running 20 Summits, and he’s attended all of them. Now joining him are Ron Soukup, Paul Flessner, Ted Kummert, and Rohan Kumar, folks who have been running the SQL Server program since 1989! It’s an incredible amount of historical knowledge about the data platform up onstage.
8:33AM – Ron Soukup joined the team in 1989 when there were only 5 folks, ran it on 2MB of RAM, and shipped it on floppies. It focused on OS/2 initially, but OS/2 was a sales flop, so then they pivoted over to running it on Microsoft Windows instead.
8:39AM – A group of 17 folks brought SQL Server from the OS/2 release over to Windows NT, getting it from 4.2 to 6.5. Amazing to think about the knowledge encapsulated in those 17 folks. It’d be neat to read interviews from ‘em.
8:40AM – Video talking about the SQL 2000-2005 days, the Slammer virus in 2003, and how Conor Cunningham closed his office doors for a couple of weeks and created DMVs. The guy is a visionary – same thing with Query Store, something that really has the potential to change the way people do performance tuning. (People, you should be using it. Although maybe if we get to the point where it’s on by default in all databases, that might change things a lot.)
I'd like to take this opportunity to congratulate either #PASSSummit and/or the Washington Convention Center – the WiFi works GREAT this year!— MidnightDBA (@MidnightDBA) November 8, 2018
8:42AM – I totally concur. I’m on an iPad today, no wired Ethernet, and it’s been great.
8:43AM – Paul Flessner talking about how SQL Server would make a run at Oracle by upping their game and beating Oracle on total cost of ownership.
8:45AM – Paul explaining how SQL Server 7 was such a groundbreaking release in the way it automatically updated statistics. However, he got a series of voicemails from Pennzoil about how performance was initially terrible, then better, then exceeding all expectations, all within the span of 3 hours. I think we’re going to face some of those same kinds of horrible-then-delightful reactions with SQL Server 2019 with things like adaptive memory grants, batch mode in rowstore.
8:47AM – Buck Woody says via video, “I really miss the R2 designation, and I’m lobbying to get that back.” The crowd laughs, and he gives a knowing grin in the video. Now there’s a guy who knows his audience.
8:48AM – Ted Kummert managed the SQL Server business from 2005-2014. “If someone asks you if you want to manage the world’s most successful commercial database, you should say yes.” Like Ron & Paul, he seems happily laid back about reminiscing about his time here. “You gotta be great at the craft of engineering.”
8:51AM – Video of everyone recapping “The cloud: we’re all in.” They did own that phrase big time, really drumming it into every Microsoft person you talked to during the early 2010s. Got a little old.
8:53AM – Heh heh heh, Ted, I see that you read yesterday’s keynote recap, hahaha. I’m honored. It’s great seeing you onstage again.
8:55AM – Rohan talks just a little, but in fairness, he gets a lot of stage time (rightfully!) these days. It’ll be interesting to hear him look back in several years about his tenure here. Managing the data platform in 2018 has to be spectacularly challenging with so many competitors, distractions, and customer requirements. A lot of tough choices to make. (And I’m happy with where things are going, too – I was talking with Kendra and a couple of first-timers at breakfast this morning, and I just couldn’t be happier with the improvements in SQL Server 2019. Sure, I may not use all the features – I’m probably never going to touch big data clusters – but there is just so dang much good stuff in the 2019 box.)
8:57AM – Raghu Ramakrishnan @raghurwi, CTO for Data, takes the stage to talk about a future-looking vision for data management and how that becomes real in Azure SQL DB Hyperscale.
9:02AM – The cloud gives us elastic compute & storage, but these come with problems: as data grows, size-of-data operations are slow. (For example, try updating a column across all rows in a table.) The solution would be figuring out how to make them faster while masking network latencies (slow speed between nodes and their storage.) He’s starting with a big picture vision similar to how Dr. DeWitt and Dr. Nehme introduced very challenging concepts in a way that everyone in the audience can understand.
9:05AM – Paraphrasing what he’s showing above: when you have to build a bunch of database servers in an Availability Group (or something newer), even if they have local SSDs, you’re going to have to deal with latency when the primary writes to its secondaries in the same data center, and even more when you write to other data centers. There’s (currently) no way around those kinds of latencies, even with fast SSDs and networks.
9:07AM – Today, each of your database servers is a silo: your web site OLTP front end, the accounting system, the data warehouse, then flat files in data lakes. “There isn’t hope to build one system to do it all well.” Good honesty there, especially looking back in the rear view mirror at features like filestream. “So can we break data free from silos?” This is where Hyperscale and Big Data Clusters start to come in – separating compute from storage.
9:09AM – Basically what he’s describing is taking what AWS Aurora MySQL and PostgreSQL does with separating compute & storage, but then “can we open access to other heads?” Meaning, like can you use other query engines, reporting tools, dashboards, etc to connect directly to the data files that are stored on Amazon S3? (AWS wasn’t the first to do this with relational databases either, just putting it in terms that I work with and understand.) If they can execute on this, then it’s beyond what AWS Aurora is doing today. (I don’t have any forward-looking info on what Aurora’s doing in this space either.)
9:11AM – Design targets are infinite max database size, 99.999% availability, <5 minutes for right-sizing replicas, recovery in under 10 seconds, under 200 microsecond commit latency (but currently at 2 milliseconds). These are obviously forward-looking goals, but here’s the thing: you simply can’t build systems like this yourself in your own data center. If they can pull this off, then you’d be an idiot to host high-performance databases yourself.
9:13AM – Announcing that Socrates was the project name for Hyperscale because “he was a database! He went around asking queries!” HAHAHA, nicely done.
9:14AM – While talking about separating compute from storage, he points out, “Our competitors do it.” OK, good, I like that honesty. AWS Aurora MySQL & PostgreSQL are indeed doing this, but AWS was able to jump into this commodity database by taking an open source database and rolling their own improvements in the storage engine. However, that comes with a drawback: they’re limited to MySQL & PostgreSQL’s query optimizer (at least for now, although we’re seeing them improve with things like automatic parallelization of read-only queries across multiple nodes.) Microsoft has a big leg up here when they eventually deliver Hyperscale to general availability: they’re building atop SQL Server, which – and I’m not being a mindless Microsoft shill here – is better engine for a lot of use cases. They’re going to charge for it, obviously – they can’t use free code like MySQL and PostgreSQL do – but their objective is to make sure it’s worth it.
9:18AM – When separating compute & storage, you need fast caching locally to compute nodes (since network latency to the storage is higher when compute and storage are separated.) To solve that, they’re building atop the Buffer Pool Extensions code. (I have passionate feelings about that particular feature, but that doesn’t mean the next version has to be that bad.)
9:22AM – He’s explaining MVCC and the new in-database Persistent Version Store (PVS). These aren’t easy concepts to explain to non-database-administrators. I like how he gives just enough information to make the DBAs happy, but not so much that it stays in the weeds. He was maybe on that slide for one minute. (Not that the next slide is easy to grep for non-sysadmins! Gotta be a master jack of all trades in here to keep up.)
9:24AM – Not taking a lot of notes here because it’s extremely similar to AWS Aurora MySQL and PostgreSQL. If you want to learn more about how this stuff works, AWS has published a lot of re:Invent sessions about Aurora internals on YouTube. It’s not that this material here isn’t good – it is – but given time limits, he can’t really explain it in depth. This is probably a good starting point:
9:33AM – I’m not going to try to recap the technical details of this – you really need to watch the recorded version to see the slides at high resolution – but it’s neat to think about the fact that this could not have been built 5 years ago, let alone 10. The transaction logging at scale only works with today’s new hybrid memory and solid state storage. He mentions that you can get another copy of your transaction log in two Availability Zones at the price of another 1-2 milliseconds of latency, for example – that’s ambitious stuff. (Again, the same thing that Amazon’s doing, but that’s not a bad thing – it’s a good thing that Microsoft understands the competition.)
It’s wonderful for all of us that they have a product offering that keeps the competition strong.— email@example.com (@_randolph_west) November 8, 2018
9:37AM – Now he’s getting to the new fun stuff that AWS doesn’t do: letting other applications share the same database storage. Here’s where Microsoft has a really, really big edge: since they control other engines (like Azure SQL DW), they can make those engines read the exact same data & log structures that your OLTP servers are using. “This doesn’t mean you’ll never have copies – you already have copies in the sense that a non clustered index is a copy of a clustered index. This is the same design trade off.” Great example!
Seriously, people, this is a big hint. A very big hint.
If they can deliver this, then they will have a phenomenal sales story 3-5-7 years from now.
And that’s a wrap! Ooo, I’m really excited about that last hint. I can’t wait to see how this unfolds!