Goooood morning, party people! Today is the opening day of the annual Microsoft Build conference, an event focused on people like developers and power users who build things with Microsoft tools.
I’ve never attended Build in person before because the data part of the event tends to be fairly thin, and the releases for Azure SQL DB and SQL Server aren’t usually tied to Build’s dates. This year, it’s a hybrid event, both in-person in Seattle and online.
I’m at home in Vegas, attending virtually, and I’ll live-blog the keynote. Refresh this page starting at 9AM Pacific, noon Eastern, to see my thoughts on news as it comes out.
The first bits of news are already starting to trickle out: this morning, the SSMS release notes were updated to mention Microsoft Fabric SQL Endpoint and Fabric Data Warehouse. Yes, those would be new products, and yes, Microsoft already has something called Fabric, but this is different. If you’re bored before the keynote, you can go through this morning’s Github checkin for the Build event.
You can join me, but to do it, you’ll need a free registration for Build, so head over there now before the keynote starts at 9AM Pacific.
8:45AM: Based on this morning’s check-ins, looks like they’ll be announcing Hyperscale databases in elastic pools. Each Hyperscale elastic pool supports up to 25 databases on standard series hardware, max 100TB data in the pool. Still stuck at 100 MB/sec throughput on the log file though, and even worse, it maxes out at 130MB/sec across the entire pool.
8:49AM: From the new documentation: Fabric Data Warehouse “provides two distinct data warehousing experiences. Each Lakehouse automatically includes a SQL Endpoint to enable data engineers to access a relational layer on top of physical data in the Lakehouse, thanks to automatic schema discovery. A Synapse Data Warehouse or Fabric Warehouse provides a ‘traditional’ data warehouse and supports the full transactional T-SQL capabilities you would expect from an enterprise data warehouse. Either data warehousing experience exposes data to analysis and reporting tools using T-SQL/TDS end-point.”
8:54AM: From the documentation update list: “Optimized locking available in Hyperscale – Optimized locking is a new Database Engine capability that offers an improved locking mechanism that reduces lock memory consumption and blocking amongst concurrent transactions. This fundamentally improves concurrency and lowers lock memory. Optimized locking is now available in all DTU and vCore service tiers, including provisioned and serverless.”
8:58AM: Analysis thoughts on reading the Github checkins so far: this looks like yet another iteration of Microsoft’s data warehousing strategy that just can’t maintain focus for 3 years straight. From DATAllegro to Parallel Data Warehouse to Hadoop to Analytics Platform System to Azure SQL Data Warehouse to Azure Synapse Analytics to Big Data Clusters, there’s something broken about the leadership vision here. I feel sorry for folks who have to sell Microsoft data warehousing with a straight face: before the deployment finishes, the product’s already been “reinvented” again.
At the same time, I’m also so happy to be working in the relational database space. The language is stable, the product is stable, and I don’t have to tell clients to keep changing the way they access the database. Thank goodness for that.
9:05AM: Hmm, I thought the keynote started at 9, but they’re still running promo videos. Hmm.
9:09AM: Okay, I think this is actually supposed to be the keynote – they’re showing videos of people interacting with AI.
9:10AM: Satya Nadella took the stage and talked about his first Microsoft developer conference. He flashed back through big moments in computer history like The Mother of All Demos, the PC, client/server computing, etc. “All of this has been one continuous journey.” And a hell of a ride it’s been so far.
9:13AM: Satya called ChatGPT’s launch the Mosaic moment of this generation. I think that’s fair, but I had to chuckle – few people remember Mosaic. It was an early thing that’s long since been discarded by the wayside. If that happens to OpenAI, Microsoft is gonna be pissed about their multi-billion investment.
9:15AM: “We’re gonna have 50+ announcements, but I want to highlight 5 of them.”
- Bringing Bing to ChatGPT. (No claps.) Satya: “You can clap.” (Claps, awkward)
- Windows Copilot. I don’t think Cortana ever did that well on Windows desktops – at least, I never see anybody using it – so it makes sense to throw something else at it instead. For corporate PCs with security lockdown, this gives Microsoft another O365 revenue stream, because I’m sure they’ll offer a “secure” Copilot that doesn’t use your documents for training.
- Copilot stack. So other folks can build Copilot for their own infrastructure using Microsoft’s models and AI infrastructure. Totally makes sense given Microsoft’s developer focus – if they can make this easy in Visual Studio, then it stands a chance. I was just horrified by the demo, though: using Copilot in Office, taking legal advice from ChatGPT in Word. I can’t imagine how that might backfire. (Who the hell thought this was a good idea for a demo?!?)
- Azure AI Safety. Testing, provenance, and deployment.
- Microsoft Fabric. “The biggest data product announcement since SQL Server.” Unified storage and compute, unified experience, unified governance, and unified business model.
9:32AM: Microsoft Fabric looks like a data lake where you have a team who governs what goes in & out, regulates the schema and security, tracks the data lineage, and curates the data model. So, uh, a data warehouse?
9:35AM: Satya’s doing human storytelling, so I’ll focus on Fabric for a second here. Fabric is a story about what happens when your data is well-controlled. That was the story of the data warehouse 20 years ago: it solved exactly the same pain points. Data warehouses fell out of favor because there was too much data, changing too quickly, and the tools changed too quickly.
Data lakes became popular because people wanted to just dump the data somewhere and figure things out later. Over time, that ran into the same problems that we used to have before data warehouses: the data wasn’t reliable, we didn’t know where it came from, the changes kept breaking reports, etc. So now, Microsoft Fabric is fixing the same problem with data lakes that data warehouses fixed with scattered relational databases.
Will it catch on? Maybe – data warehouses did – but you can fast forward and see what’s going to happen when Fabric is popular. Users will say, “I have this extra data that I need to join to my reports right now, and I don’t have the time to wait for the Microsoft Fabric admins to bring it in, so I’m just going to put it in this one place for now…”
And we’re right back where we started. Okay. If your company couldn’t fix the data warehouse’s problem, and they added data lakes, and they couldn’t fix those problems, so now they’re implementing Microsoft Fabric… I’m just gonna say maybe the problem isn’t the product you’re using.
Does that mean Fabric is a bad product? Not at all – it might be great – but it’s definitely not something I’m going to pursue.
9:40AM: Kevin Scott, CTO & EVP of AI at Microsoft, took the stage to talk about the era of the AI copilot for the next half-hour. That’s a great topic, but it’s not really my jam, so I’m going to stop the live blog here. Right now, Build’s session catalog doesn’t have any Fabric sessions, but I wouldn’t be surprised if sessions got added over the next hour or two. I’m not going to dig more deeply into there either.
Update: Optimistic Afternoon Thoughts
When I walked away from the computer and emptied the dishwasher (true story), I realized I wasn’t being completely fair to Fabric. There are companies who:
- Successfully implemented a secure, well-documented, rigid data warehouse, and
- Who also implemented Azure Data Lakes later, and
- Now want to control those lakes the same way they control their data warehouse
And for companies like that, Fabric makes a lot of sense. I don’t have a sense for how big or small that market is today, but I’m sure it’s out there – it’s the kind of thing Microsoft BI consultants would facilitate.
I think this also plays to Microsoft’s strengths: they control the cloud, the most common relational databases, the reporting tools, and the development tools. You could make an argument that Fabric stitches those pieces together in a way that Amazon and Google won’t be able to do for years, if ever. (Sure, AWS has Redshift, but that’s just a persistence layer – Microsoft is trying to argue that Fabric is a cohesive unit that brings it all together.)
Paul Turley’s a BI pro who specializes in the kind of market that Fabric services, and he has a quick summary here, plus a list of learning resources. Note the number of different tools involved in his posts and the links – Fabric isn’t just “one thing”, it’s a brand name for a whole bunch of moving parts that have been flying under different brand names over the last few years. Fabric feels like the latest brand name and vision – and that’s where I get nervous, seeing how Microsoft keeps reassembling these parts into different things.