You don’t have a Big Data problem.

SQL Server
33 Comments

Let’s bust the buzzword bubble. Big Data is a sexy problem to have, so everybody’s claiming it. I’m sick of people using this phrase to gloss over their real challenges.

You have a Small Server problem. If you haven’t bought a new server in the last two years, you’re not allowed to complain about the size of your data. You can buy a 4-socket server with 1TB of memory for less than what it’ll cost to bring in a fancypants Big Data consultant for a month or two.

You have a Slow Storage problem. That massively expensive SAN you bought five years ago is eclipsed by a single consumer-grade SSD today. Run CrystalDiskMark on your SSD-equipped laptop, and then run it on your database server. If the laptop’s faster, does that mean your MP3 collection is Big Data?

I have a Big Calorie problem.

You have an Awkward ETL problem. Call it “unstructured data”, but it’s really sloppy data modeling. It’s completely okay to save money on design and pass the savings on to – wait, actually, you don’t save money. If you don’t structure your data, everything else you do costs more money. Anytime you want to interact with that data, it’s going to cost extra – but it’s not the size of the data that’s the problem.

You have a Small Documentation problem. Don’t think that you can hire one 18th Level Data Scientist Mage and suddenly get amazing insights from your data. The first thing he’ll need to do is talk to the rest of your staff about the sources of the data, the timing, and the quality. They won’t remember offhand why the mainframe’s Gross Net Subnet Tax Top Line is different from the accounting system’s Net Gross Top Tax Subnet field. You can’t Hadoop your way out of that one.

You have an Small Momentum problem. Every few years, there’s a new buzzword to describe how your data is going to become magically actionable. Business Intelligence! No, wait, Self-Service Business Intelligence! No, wait, Data Visualization! No, wait, Big Data! Before you embark on the next one, take a critical look at why past data initiatives in the company have failed, and I bet it doesn’t have anything to do with the data’s size.

When someone wants to talk Big Data to you, ask what they’re trying to sell you. Odds are, they’re really trying to solve your Big Wallet problem.

Previous Post
Developers: Check Your SQL Server’s Health
Next Post
Databases Five Years from Today

33 Comments. Leave new

  • Being in that unique position of being the DBA, the developer, the Systems integrator and all around automation guy, I have a lot of data I can produce. I could give us a big data problem by just tracking everything. But I think we’re better off just tracking only the things we really care about.

    So some people might have a big data problem, and their problem is that they need to get rid of it.

    Reply
    • As a fellow Jeff, I totally agree. 🙂 Simply put, many organizations have a Useless Data problem; data collected to say they’re collecting it, but no real use for it at all.

      And, thank you Brent! I had thought I was alone in this thinking, and you’ve shown a light in the darkness! I used to process more data than many of today’s “Big Data” companies when I was working in mainframe-based data processing 20 or so years ago. Big Data is not a Big Thing. 🙂

      Reply
  • Brilliant and concise. Clients tell me all the time that they have big data problems. I chuckle on the inside then spend the next couple of months modeling their data and fixing their ETL problems so it’s usable.

    Reply
  • Love it, nice one Mr Ozar 🙂

    Reply
  • Thank you, sir! It’s so refreshing to see others agreeing with me. I have been in the data warehousing and business intelligence niche for close to 8 years now. The whole “Big Data” buzzword and talk has irritated me from day one. In my experience, the term “Big Data” actually describes data that a person (or lvl 15 data scientist) wants to mine outside of the walls of the company. They want to pair this data up with internal data and analyze. As you said, just like any other clean ETL process, we can identify what needs to be mined, structure it as we see fit, and spend the money to do so in a responsible way!

    So coming from a self-proclaimed BI professional, this “Big Data” problem isn’t a problem with the vastness or the “unstructured” nature. The problem is with the industry analysts promoting the buzzword and for the industry leaders latching on to it. I fully agree with your post, you are a gentleman and a scholar!

    Now I can’t wait to get to my summit next week where “Big Data” will be one of the topics. I’m sure I’ll be viewed as a lunatic but I’ll absolutely stand by my opinion, which you are validating. Perhaps I’m in the wrong business. Does Brent Ozar Unlimited wish to venture into the “Big Data” money pot? I’ll split it with you and gladly take people’s money so I can structure their “big data”. 🙂

    Reply
    • Anonymous Coward
      April 24, 2014 11:50 am

      It all depends. Big data can be real. Think for example of an utility company legally required to store readings for years, which also wants to do some analysis on it (like what influence do air temperature, amount of rain, wind speed, geolocation, demographics and whatnot have on demand, for example). Or a stock trading company which needs to sample tens of thousands of stock prices, each one with several dozen additional parameters (like the amount of rain in South America over the last month for an international beef trader, or the number of collapsed coal mines in the last six months for a steel manufacturer) and feed them into some predictive software. In the first case, it’s sheer size that makes the data big. In the second, size might be less of an issue, but relational databases are an awful fit for storing the data, due to its variability in shape, and the fact that even the data shapes of different data sources can frequently change.

      So yes, many companies don’t have a big data problem. But there are some that do have a big data problem, even if they’re not Google or Facebook. There’s a reason the whole Hadoop family gets such a good support from the open source community – it’s useful.

      Reply
  • Keith Erskine
    March 7, 2013 9:01 am

    Too true, Brent. Big data sets (relative to the available technology) are not new. It’s always been a challenge managing them. What is new is that there is now an expectation that data is available 24/7 with no down time at all, and that the most recent data is immediately available. So it’s not so much a size problem (although of course size always matters) as an availability problem, but the phrase Big Data doesn’t distinguish between the two.

    Reply
  • Dave Wentzel
    March 7, 2013 9:10 am

    Best blog post EVER.

    Reply
  • Businesses love to hear that you can just buy a solution that will solve all of their problems.

    “Just give me one of those big data solutions and a side of data scientists. That will solve it!”

    Reply
  • Kevin Boles
    March 7, 2013 9:31 am

    Good one about the SSDs! I get a combined 900MB/sec sequential read rate off of the two SSDs in my 1yo laptop – EASILY smoking all but 1 or 2 of the client’s I have ever been at for SQL Server consulting! And I also definitely agree about new servers (with sufficient IO and RAM) being able to chew through VAST quantities of data VERY quickly and cost effectively – even without enterprise edition features such as column store indexes!

    Reply
  • I would add that it’s not a big data quality problem so much as it is a clean data quality problem. The inability to do anything with big data (which as you so eloquently point out – is not new), is related to the fact that companies have not bothered to build a system to effectively clean, link and/or integrate the data they already have. Plus, all of that unstructured stuff – needs something to relate back to – like say, a person or a company or some relevant group so that you can decide what to do with it. Here again, not helpful if you can’t reliably say that your contact data is clean and then effectively match them up to all that big data flowing in. The real value that can be had from all the hype is that each time a new buzzword appears on the horizon, it facilitates a slight incremental increase in attention for the value of data in general.

    Reply
  • Yes. Yes, Thank you. And yes again.

    Reply
  • So true…. Its related to the analytics DBs too

    Reply
  • Finally! A real-world take on so-called Big Data, minus the usual crock of sh#t being parroted by so many — the gist of which is that the use of formal data structures is an indication you’re Doing Big Data Wrong, and that Big Data inherently eliminates the need for ETL by allowing you to develop your models on unstructured data and this magic cookie called Hadoop will take care of the rest!

    Reply
  • All good points – generally pointing to the pattern “do the simplest thing that works.” Hadoop is never going to be the simplest path to geting information out of a heap of data. But it’s there when you need it. If you’re using a handsaw to cut down a 100-year-old oak tree, sharpening the blade will help, but not nearly as much as grabbing that chainsaw out of the toolshed.

    I have clients with truly big data who are considering using Hadoop in an ETL role – to make faster work of converting unstructured data into structured. Which is perfectly valid and a thoughtful use of technology.

    As with anything else in our industry, the right answer always depends on many variables, and is rarely apparent at first glance.

    Reply
  • Awesome blog post….

    I feel like there may be some misperceptions about the term “big data” (which I also HATE) being confused with “large amounts of data”…

    I don’t see them as the same thing… Big Data (to me) means large data sets that have TWO separate characteristics…

    1) They are large and rapipdly growing in nature (OK – that’s cool); but also
    2) The data is characterized by rapdily (as opposed to slowly) changing dimensions – using dimensions in the OLAP sense of the term;

    It’s the fact that you are dealing with large amounts of data, rapid growth in said data, and that the data has rapidly changing dimensions that makes Big Data both a different animal AND a difficult thing to deal with when it comes to data modeling and analytics…

    FWIW

    Reply
  • Jeff Goldberg
    March 8, 2013 12:19 pm

    Yeah, this blog post is not only spot on, but it’s pretty freakin’ awesome to boot! Well done.

    Reply
  • So who’s gonna tell the ‘manager’ that just because it doesn’t fit in Excel, it’s not Big Data?

    Reply
  • Vassilis Ioannidis
    March 9, 2013 5:26 am

    Brent, you nailed it once again! Kudos!

    P.S. can you fedex-me a burger like this here in Greece? 😛

    Reply
  • I think that’a what makes Brent’s organization so good, they all know how to write and communicate, as well as all the technical stuff.

    As far as the big data topic goes, he is 110% correct. I just got a call from a developer, asking how to limit access to sensitive customer data in the data warehouse. My answer should be, not to load it. Companies load everything into the warehouse when in reality, it serves no real purpose to keep that level of detail for every system in an organization.

    Reply
  • Well I for one take issue with this article! Once the Web 2.0 arrives, anybody who doesn’t have the latest Big Data installed along with Rich Content in the Cloud is going to feel pretty silly, let me tell you!

    😉

    Reply
  • I completely and wholeheartedly disagree with you! You, sir, are wrong, Wrong, WRONG!

    There’s no possible way that two waffles, two eggs, a sausage patty, and a slice of cheese is an argument for a “Big Calorie Problem” — you’re just blowing this way out of proportion. In fact, at most, you’re probably looking at a sandwich with a total calorie count of maybe 600 calories, depending upon whether that thin slice of cheese poking out is Velveeta or cheddar. Now to some, 600 calories may sound like a bit much for a breakfast sandwich, but my suspicion here from assessing the bright sunlight in the photo and the coffee-crazed eyes is that this meal is actually a mid-morning one, after you’ve had 62+ ounces of coffee, and is probably breakfast and lunch combined. The picture and caption have been combined to mislead your readers into thinking you have a “Big Calorie Problem” when in fact, you simply have highly demanding taste buds.

    Let’s not stretch the truth any more, shall we?

    Cheers.

    Reply
  • So true! (as person who runs on SSD) – “Run CrystalDiskMark on your SSD-equipped laptop, and then run it on your database server. If the laptop’s faster, does that mean your MP3 collection is Big Data?”

    Reply
  • My last boss almost spent *way* too much a suite of big data apps that were supposedly going to solve all of our IT cost allocation, modeling, and performance monitoring problems.

    I asked, “How will we deal with the fact that we don’t have the capability to capture most of the data we’ll need to feed to those apps?”

    His answer? “That’s what their consulting services are for,” (!!!)

    Reply
  • Stefan Hoglund
    October 30, 2013 1:00 pm

    Just found this blog post when searching for small vs. big data, thanks for writing it. This is key and I have never really seen it done: “Before you embark on the next one, take a critical look at why past data initiatives in the company have failed”. I guess wrapping up initiatives in new terms make them easier to sell internally and enable people to wave their hands and explain why any previous initiative would have failed.

    Reply
  • A real witty one. Taught me more than dozen techy texts.

    Reply
  • I’m participating in the tear down of a very poorly funded SQL Server implementation. They think they need Big Data! so they hired some PhDs from LA and are busy rewriting our app. They say it will be faster but never say why or how. We’re running cheap standard edition sql servers on standard drives on amazon ec2. Every possible pitfall you could imagine. I keep saying the servers don’t meet the minimum configuration for sql server but nobody listens. Last guy who made a stink got fired. Thanks for giving words to the voices in my head, 🙂

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.