Leaked: SQL Server 2019 Big Data Clusters Introduction Video

SQL Server 2019

Psst – you’re probably not supposed to see this yet, but look what @WalkingCat found:

What the video says

Growing volumes of data create deep pools of opportunity for those who can navigate it. SQL Server 2019 helps you stay ahead of the changing time by making data integration, management, and intelligence easier and more intuitive than ever before. 

Yep, that’s a Microsoft video alright.


With SQL Server 2019 you can create a single virtual data layer that’s accessible to nearly every application. Polybase data virtualization handles the complexity of integrating all your data sources and formats without requiring you to replicate or move it. You can streamline data management using SQL Server 2019 Big Data Clusters deployed in Kubernetes. Every node of a Big Data Cluster includes SQL Server’s relational engine, HDFS storage, and Spark, which allow you to store and manage your data using the tools of your choice.

Big Data Cluster

SQL Server 2019 makes it easier to build intelligent apps with big data. Now you can run Spark jobs to analyze structured and unstructured data, train models over data from anywhere with SQL Server Machine Learning Services or Spark ML, and query data from anywhere using a rich notebook experience embedded in Azure Data Studio. The torrent of data isn’t slowing down, but it doesn’t have to sink your business. Sail through with SQL Server 2019, and shorten the distance between data and action.

My take on the Big Data Clusters thing

<sarcasm> It’s like linked servers, but since they don’t perform well, we need to scale out across containers. </sarcasm>

Today, Polybase is a rare and interesting animal. You’ve probably never used it – here’s a quick introduction from James Serra – but it wasn’t really targeted at the mainstream database professional. It first shipped in PDW/APS to let data warehouses run queries against Hadoop, and then it was later added to the boxed product in SQL Server 2016.

Polybase is for data warehouse builders who want to run near-real-time reports against data without doing ETL projects. That’s really compelling to me – report on data where it’s at. That seems like a smart investment as the sizes of data grow, and our willingness to move it decreases.

I like that Microsoft is making a risky bet, planting a flag where nobody else is, saying, “We’re going to be at the center of the new modern data warehouse.” What they’re proposing is hard work – we all know first-hand the terrible performance and security complexities of running linked server queries, and this is next-level-harder. It’s going to take a lot of development investments to make this work well, and this is where the licensing revenues of a closed-source database make sense.

If you want to hitch your career caboose to this train, there are all kinds of technologies you could specialize in: machine learning, Hadoop, Spark, Kubernetes, or…just plain SQL. See, here’s the thing: there’s a whole lot of SQL Server in this image:

Big Data Cluster

If you’re good at performance tuning the engine, and this feature takes off, you’re going to have a lot of work to do, and the licensing costs of this image make consulting look inexpensive. This feature’s primary use case isn’t folks with Standard Edition running on an 8-core VM. (I can almost hear the marketers wailing, “But you COULD do it with that,” hahaha.)

Previous Post
6 DBA Lessons I Wish Someone Would Have Taught Me Earlier
Next Post
[Video] Office Hours 2018/9/19 (With Transcriptions)

12 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.