Big Data Clusters: Out of Aces

Last Updated March 1, 2022

When this feature was announced in 2018, I wrote:

<sarcasm> It’s like linked servers, but since they don’t perform well, we need to scale out across containers. </sarcasm>

I just didn’t get it, and I continued:

I like that Microsoft is making a risky bet, planting a flag where nobody else is, saying, “We’re going to be at the center of the new modern data warehouse.” What they’re proposing is hard work – we all know first-hand the terrible performance and security complexities of running linked server queries, and this is next-level-harder. It’s going to take a lot of development investments to make this work well

The thing with bets, as the great strategist K. R. Rogers wrote, is you gotta know when to fold ’em, know when to walk away, and know when to run.

Today, Big Data Clusters died in its sleep.

Back then, I said I liked that Microsoft was making risky bets. They’ve proven time and again that they’re willing to continue to do that with SQL Server, throwing all kinds of crazy features against the wall. Big Data Clusters in Kubernetes. Machine Learning Services on Linux. Calling Java code from T-SQL. I look at a lot of these bets and think, “Uh, I don’t think you can win a poker game with two threes, a Subway discount card, and a happy birthday card from Grandma. That’s not how this works.”

The great part about these utterly wacko bets is that they don’t break the database engine. Microsoft SQL Server is still a robust, powerful persistence layer with a lot of good features. Microsoft hasn’t made dumb moves like saying, “We’re going to replace T-SQL with interpretive dance,” or “We’re not going to run queries if they don’t end in a semicolon.” They’ve figured out that they have to keep the core engine working better and better with each release, too.

If you ask me, five nines is a better hand.

There’s a lesson in here when you gamble on your own career. I beat this drum a lot, but I’m gonna say it again: you only have so many hours in the day to learn new skills. Make sure every hour counts.

Meme Week: Setting Fill Factor to Fix Fragmentation

Building PollGab.com: Design a Database for Live Session Questions

17 Comments. Leave new

Steve Jones
February 25, 2022 9:54 pm

I think this wasn’t a bad idea, and it really seems like the underpinning to Hyperscale and Synapse. Where I think this failed was a confluence of factors:
1. too complex, with many moving pieces
2. lots of infra needed. Already we see people struggling to run K8s on-premises unless they have a team to watch it.
3. poor tooling, really, I think replication tooling shines brightly compared to this and that tooling is poor, IMO.
4. not quite enough support to evolve this into an AG Direct seeding type of setup.

Ultimately, I think the PaaS solutions for MPP analytics just make this easier. Why hassle with BDC when Synapse, DataBricks, et al are just easier. Arguably cheaper as you need some staffing here.

Reply
- Brent Ozar
  February 26, 2022 3:10 am
  
  Yeah, agreed all the way around.
  
  The tooling is such an intriguing angle because that is indeed one place where SQL Server has been suffering over the last decade. Microsoft keeps adding feature after feature, and the management tooling just hasn’t kept up. I don’t see this as a recent thing, either: rewind back to Distributed Availability Groups, and that’s a great example of how the feature was really complex, but the tooling just isn’t there.
  
  Reply
- ms
  March 8, 2022 12:03 am
  
  Most DBA’s are too risk averse to try something as big as BDC 🙂
  
  I worked on a BDC project for about one year. The purpose of that particular project was to replace APS/PDW, which is also being retired by microsoft. With the retirement of BDC and APS/PDW, it’s fair to say Microsoft does not have an On-Prem Big Data or MPP solution offering. As an aside, hardware vendors invested a lot in this space and are now not selling BDC servers (tip: search for big data clusters hardware). The hardware market opportunity is gone and customers will likely return the servers that they purchased – not good for hardware vendors, but expected and in-line with the larger DB cloud-adoption trends.
  
  From what I witnessed, a lot of good engineering and integration work went into BDC and lessons learned will likely impact Azure and other onPrem SQL Server offerings in the future. During the engagement, the product team did a great job at listening to the customer and responding to requests. Those of us who are familiar with Hortonworks HDP know the complexity and depth of the hadoop platform. When you add in vmWare, K8’s, SQL Server, the SQL Spark Data Connector, and integrations, the complexity increases exponentially and engineering delivery gets harder.
  
  That being said, with the current cloud investment, innovation and growth of cloud data warehouse and analytics offerings from AWS, Azure, SnowFlake and GCP, it makes sense to migrate big data platforms to the cloud. DBA’s and Developers now need to understand products like Azure Data Factory, Spark, Databricks, how to handle batch and streaming data, and understand tradeoffs between MPP platforms (SQL Data Warehouse) vs. other DW products such as Big Query. We’ll see what happens with Azure Container Services.
  
  Reply
Phil
February 26, 2022 6:57 pm

Oh know you done gone and sent me off to country music. And it’s Saturday night. Darelene’s been moaning about the payments on the new trailer all week and I stayed off the moonshine promise all week. Going down Harry’s bar to figuur out what’s gone wrong tween me and Darlene and my lil girl on side down at Harrys ms sql server cute lil thang…

Reply
Jimmy May
February 27, 2022 12:29 pm

Brent, having retired a few years ago, I’m out of the loop on the latest-&-greatest SQL Server tech, & I don’t even read your awesome emails every day (shame on me). However, I happened today to do so, & one of your big points resonated with me to my core.
”
There’s a lesson in here when you gamble on your own career. I beat this drum a lot, but I’m gonna say it again: you only have so many hours in the day to learn new skills. Make sure every hour counts.
”

I built my career from scratch, quickly, with passion & with *intention*. I made big bets on certification & on speaking, from my first Win95(!) certification, to my first SQL Server exam–which nailed me my first full-time DBA job in 2001, literally doubling my salary(!), then an MCDBA back in the mid-2000’s including an MCSE to flesh out my skills, & finally to an MS Certified Master–MCM. which led directly to a gig with SQL CAT rebuilding & running the Customer Lab. My intentional work on speaking skills resulted in dozens of opportunities to evangelize best practices, from SQL Saturdays to SQL PASS to SQL Bits to Amit Bansal’s DPS. My successes were layered, built on top of one another, & my “luck” was by design.

So, yes, as you stated, “Make sure every hour counts”.

Keep up the great work!

Reply
- Brent Ozar
  February 27, 2022 12:41 pm
  
  Good to hear from you, sir! Thanks for the kind words. Now get out of here and go retire! 😉
  
  Reply
Josh
February 28, 2022 10:12 am

Interesting. I know a large company which hedged its bets on this platform, presumably because it needed to be on-prem and cloud with large volumes of data. Now their CDO has gone into tailspin over this announcement.

Additionally, I went to Microsoft-sponsored tech event a few years ago and the message was to forget everything you knew about MSSQL because containers, big data and Linux were all the rage. Lucky for me I did not drink too much of their Kool-Aid and here I am, 20 years into my DBA career doing good, old-fashioned RDBMS management stuff on Windows!

Reply
Francesco Mantovani
February 28, 2022 12:51 pm

It wasn’t a bad idea but it was too complicated. @SteveJones I think the next that would be in trouble will be Hyperscale.

How I know that? If we go to https://customers.microsoft.com/ we can find use case scenario for Azure SQL Database, Cosmos DB, Azure SQL Edge, Azure Arc, even Azure Table Storage…. but no use case scenario for Hyperscale.

…For whom the bell tolls?

Reply
- Brent Ozar
  February 28, 2022 12:55 pm
  
  The challenge with Hyperscale is the ingestion speed. The peak sustained log generation rate is 100 MB/s – that’s slower than a USB thumb drive.
  
  https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale-frequently-asked-questions-faq
  
  I’ve had a few clients try it and give up because of that. You can’t call something hyperscale when it’s slower than a $50 USB thumb drive.
  
  Reply
Joe Lax
February 28, 2022 1:19 pm

Speaking of features that never seem to make it, Policy Based Management seemed like a good idea when it was first released, yet I haven’t seen any enhancements to it or seen it used to any extent. Would appreciate your thoughts on the matter.

Reply
- Brent Ozar
  February 28, 2022 1:29 pm
  
  It’s tough to respond to “your thoughts” – I have so many thoughts, and only so many hours in the day. If you want to ask me a question though, feel free!
  
  Reply
  - Joe Lax
    February 28, 2022 6:16 pm
    
    Do you think Policy Based Management was a good feature or not? Do you have any idea on why it never took off?
    
    Reply
Francesco Mantovani
February 28, 2022 2:06 pm

@JoeLax, Policy Based Management is a feature. Big Data Cluster is a whole product.

Talking about feature Static Data Masking is what I needed this week but just yesterday night I discovered it never passed the “preview” phase. Doh!
And Microsoft avoided the walk of shame simply removing all pages about it. But there are still screenshots around.
It was maybe to complex to keep alive. Not sure if there are 3rd party tools that can anonymize a DB statically keeping the constraints. 🙁

Reply
BringTheCat
March 1, 2022 1:28 am

“Uh, I don’t think you can win a poker game with two threes, a Subway discount card, and a happy birthday card from Grandma. That’s not how this works.”

You apparently have not seen the latest Rakuten ads. Throw in a Roomba and that’s exactly how it works.

Reply
Chris Bailiss
March 5, 2022 5:39 pm

I remember being at the SQL Bits the year Big Data clusters was announced. My instinct at the time was that there was a lot of hype about it but it just seemed too complicated (or maybe just the tooling was) to gain wide adoption. I get that it was aimed at on-prem – and may have been relevant if you are there – but MPP analytics is so much easier in the cloud, so the feature seemed opposite to the general direction of travel for many companies.

Unfortunately, while SQL Server still remains a great database platform, a few too many of Microsoft’s forays into analytics aren’t successful. I agree with you, it is good to see innovation and experimentation, but Microsoft have pulled services a few too many times now so that when I see new features/services announced, at the very least I pause to wonder just how long it will last. Anyone remember Azure Data Lake analytics and U-SQL?

While experimentation is good, sometimes putting a bit more focus on core products would be welcome. In the last year or so, I’ve done some work evaluating Cloud DW services – and Azure Synapse didn’t fair brilliantly. At moderate data volumes (the sort of crossover territory where you could easily run in SQL Server Enterprise Edition) Azure Synapse came out as relatively expensive compared to competitors. Synapse has serverless pricing only for data lake queries and is a bit clunky (e.g. no auto-start/resume/scaling) – both areas where others, e.g. Snowflake seem further ahead. Snowflake has per minute serverless pricing, 60-second auto-pause, near instant auto-resume and auto-scaling – i.e. basically what a consumption-based serverless cloud DW should be.

Snowflake for example can be run on Azure – so I wonder if Microsoft have taken the view that it is easier to partner to get the cloud compute revenue rather than build their own?

PS. I 100% agree with the “There’s a lesson in here when you gamble on your own career.” That was exactly the thought I had at that SQL Bits. Since Big Data Clusters was not really relevant to my job – and it probably wasn’t relevant to many organisations (so limited career value) I basically ignored it. I guess a lot of other folks did too. Hype can only get you so far.

Reply
- Chris Bailiss
  March 5, 2022 5:54 pm
  
  PPS. I somewhat disagree with Microsoft’s attempt to justify why they created the feature in the first place:
  >> When we first introduced cloud analytics in 2017, many were still investing in on-premises analytical workloads.
  I think they basically misunderstood why that was happening. Many people I spoke to at the time were crying out for better cloud DWaaS capabilities. It seems Microsoft misread customer thinking/behaviour and put their investment in the wrong place…
  
  Reply
SQL Server 2025 Is Out, and Standard Goes Up to 256GB RAM, 32 Cores! - Brent Ozar Unlimited®
November 18, 2025 6:00 pm

[…] to use that stuff? None of my readers, just like they wouldn’t even dream of considering Big Data Clusters or ledger tables. We know better. Databases are persistence layers, not app servers or blockchain […]

Reply