Tag Archive: serverfault

StackOverflow VC, SQL Server, and Whuffie

Yesterday StackOverflow announced that they’d accepted $6 million in venture capital funding.  Joel Spolsky held a quick public chat to discuss it, and there were some interesting questions from the audience.  I’m going to paraphrase some of the questions and give my own answers.

Q: Now that StackOverflow is going to be big, will they need to dump SQL Server for NoSQL?

If you were going to write a list of things you should never do to SQL Server that needs to perform fast, StackOverflow would check a lot of the boxes:

  • Using LINQ and Full Text Search heavily for queries
  • Storing data on SATA drives
  • Putting both data and logs on the same drive (not to mention the full text catalogs)
  • Using one server, no fancy replication for load balancing

And SQL Server handles the load just fine.

Q: What? I thought M$$QL was the suxx0rz?

Two things make all the difference.  First, it’s not really all that much load.  StackOverflow is the smallest SQL Server database I work with by an order of magnitude.  Most of my SQL DBA readers manage much bigger databases on a daily basis and yet consider themselves to be junior DBAs.  There’s a disconnect between what programmers see as big data versus what enterprises see as big data.  For example, in the last two weekends, I’ve done performance tuning gigs for two separate companies that had data more than 20x the size of StackOverflow’s, yet didn’t have a full time database administrator.

Second, the staff really knows what they’re doing.  They know you’re supposed to cache frequently reused data in the app tier, for example – sounds obvious, but it’s trickier than it sounds.  If you’re really good – and I don’t mean “I’ve got a blog” good – you can build amazing stuff with just about any tool.  You could build something of StackOverflow’s size on any database platform out there.

If you think the reason your code can’t scale is because of the language or database, you’re probably doin’ it wrong.

Q: How much of that $6 million did you get?

None.  I’ve never been paid by StackOverflow.  If I was Joel and Jeff, I’d give money/stock/cocaine to the community moderators long before I gave it to Brent Ozar.  My work is tiny compared to the moderators, and I’m glad (although a little sad) that they recently revamped the StackOverflow About page to reflect that.

That’s right – I get paid in pixels.

Q: Awesome, here’s a picture of bacon. Now I need your help with…

No.  I help with DBA work at StackOverflow for the same reason you answer questions there.  When you post an answer, add tags, or help clarify questions, your reputation score goes up.  You don’t make money on directly – it’s just fun doing it.

But as your score gets higher, you can use that for things in ways that don’t seem immediately obvious to you yet.  I touched on this in my recent Rock Stars, Normal People, and You post.  Jon Skeet is an extreme example – he can probably walk into any geek gathering, show his ID, and people will start buying him drinks.  If he posted a tweet saying he was looking for work, you’d better not hope you have anything pending at the printer, because an army of programmers will be printing up Jon’s resume immediately to run into their boss’s office and say, “WE GOTTA HIRE THIS GUY!”

The Whuffie Factor

The Whuffie Factor Explains Everything

Your StackOverflow score is your living resume.  It’s like whuffie in Down & Out in the Magic Kingdom – it’s a currency that you can use to get things.  When you get to a high enough score, you can trade it for things – things like consulting gigs.  Companies will look at something like StackOverflow, look up the C# tag, and find the highest rated people.  They’ll review your answers, see the high votes from your peers, and then check your availability for short-term consulting – perhaps even just a single hour.

Not every question can be asked in public, and not every answer can be given without spending time in the client’s systems.  Most importantly, those questions and answers are where the most money changes hands.

Q: But your StackOverflow score sucks.

Yep, I’m part of the old guard.  I’m 36.  My generation had/has a different way of measuring reputation, and frankly, it sucks.  We gauge reputations based on personal relationships with people we’ve met, usually in person but sometimes through social networks.  We’re limited to a smaller group of experts on any given topic.  When I need help with something, I have a fairly limited number of trusted people I can call on.

I work (a little) on StackOverflow for free because it’s the old-school equivalent of a reputation score.  People have come to me and said, “I hear you’re the database guy for StackOverflow – what would it take to get you to help me with ___?”  That’s why I’m quite happy to take my pay in pixels, and why I know that your high StackOverflow or ServerFault reputation score will be worth money down the road.

How much would you pay for one hour of Jon Skeet’s time?

What if you had a tough C# question and you couldn’t show your code in public?  What if you wanted to listen to him do training presentations about what he knows?  Would you pay real money for that?  I know that you would, because people are paying to attend SQLCruise with me and Tim Ford, and people lined up to pay $99 to get into StackOverflow DevDays.  What if you had a really high StackOverflow/ServerFault reputation for a given tag, and you organized an event like SQLCruise or DevDays for your own tech interest?

Reputation is everything.  This is why I get so excited about StackOverflow’s reputation scores – conventional forums failed not just because they’re painful to navigate, but because they didn’t measure things.  When you measure reputation, you enable all kinds of ways to make money.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Querying the StackOverflow Data Dump

StackOverflow, ServerFault, and SuperUser are Q&A sites for IT professionals.  I’ve blogged about why I like ServerFault before, but perhaps one of the coolest reasons for database people is that they make their data public.  Every month, StackOverflow dumps out their data to XML.  You can import the data dump into SQL Server, and the whole thing is less than 10gb as of this writing.

But you haven’t done that because you’re lazy.

You just want to open up SQL Server Management Studio 2008 or Toad for SQL Server and connect to the database.  Alright, you got it:

  • SQL Server: brentozar.dyndns.org (as of this writing, it’s 71.57.120.247 – if the name doesn’t work for you, try the IP)
  • Username: StackOverflow_Reader
  • Password: c0mm0ns
  • Databases you can access: ServerFault, StackOverflow, StackOverflowMeta, SuperUser

The data is not a “live” copy – it’s just the monthly Creative Commons data dump, and the schema is the raw output from Sam Saffron’s data dump tool linked above.  The server is a desktop-class machine, nothing fancy, and it’s using my home internet connection.   (Yes, that’s why I’m posting this halfway through the day on Friday – easing into the load.)  You can get a snapshot of my current desk gear and my servers at Flickr.

If you can’t connect to the SQL Server, there’s probably a firewall blocking port 1433 between your workstation and my lab.  Please don’t leave a note to complain – try accessing it from another location, like from your house instead of your work.

To learn more about the schema and how to query it, check out these articles at SQLServerPedia:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

ServerFault.com – Like StackOverflow, but for IT

I hate forums and newsgroups.  I can’t do a better job of explaining why than Tom LaRock did in his post Why I Dislike Newsgroups, including this bullet point list:

  1. It takes too long to get an answer, especially if you need an answer quickly.
  2. Sometimes, people are quite rude.
  3. Most times, the answers are flat out wrong.
  4. Many questions are not being asked in the right forums.
  5. Moderators spend far too much time moving questions between forums.
  6. End users get frustrated when their questions are moved.
  7. You do not know who you can trust.
  8. You can review threads later, but have no idea which answer was correct.

Amen.

The Solutions: StackOverflow.com and ServerFault.com

StackOverflow is for programmers, and ServerFault is for IT workers (sysadmins, Exchange guys, SharePoint folks, network people, rack monkeys, etc).  They both work the same way:

  • A user submits a question.
    • Other users can comment on the question, thereby encouraging the asker to clarify their question or improve it.
    • Users can tag the question with mutiple tags.  This replaces the old group-based forums where you had to move a question around between multiple groups trying to find the right user base to answer it.  A question might involve C#, SQLServer and SQLServer2008, for example, and tagging with all three gets the right audience involved.
    • Users can vote the question up or down.  Highly rated questions get more attention by floating to the top of the question list, and poor questions (like not enough information or inflammatory questions) sink down toward the bottom of the list.
  • A user submits an answer.
    • Other users can comment on the answer, and vote it up or down.  High-ranking answers move to the top of the answer list.
    • Users with high reputation counts can edit and improve the answer.
  • The questioner accepts an answer. This moves it to the very top of the answer list.
  • Other people search the web for similar questions, and end up at StackOverflow or ServerFault.  They see an elegantly arranged list of answers with the best answers at the top.

StackOverflow and ServerFault make it easy for users to get their questions answered and make it easy for answerers to find relevant questions to work on.  Even better, the reputation system rewards everybody’s work: every time a question, comment or answer is voted up or down, it helps record who’s doing good work and gives their work more credibility in the site.

Long-term, I think both of these sites will function like resumes for programmers and IT workers.  If someone wants to work for me, and they can point to a long history of answering questions on sites like this, with steadily increasing reputations and good answers, that’s better than a letter of reference.  It shows intelligence plus dedication to the community.

Getting Started with StackOverflow and ServerFault

StackOverflow has been open for a while, so to get started you just surf over to StackOverflow.com and register for an account with your OpenID.  OpenID is what the old Microsoft Passport system always aimed to be, a single login good for anywhere, except that you actually control your own OpenID.  You can run an OpenID on your own web site, like I do at BrentOzar.com, or you can use one from Google, AOL or Yahoo, among other providers.

ServerFault is brand spankin’ new, and it’s in a semi-private beta for the next week or so.  Go to ServerFault.com and give the secret password “alt.sysadmin.recovery” to register for an account with your OpenID.

And by the way, the race is on – the current high-scoring ServerFault user is Stefan Plattner, aka @SPlattne on Twitter, and he’s positively smoking both me and K. Brian Kelley, and even Jon Skeet for that matter.  If you’re going to compete, use TweetDeck and set up a search column for serverfault OR stackalert.  That term combination will catch alerts from the Twitter bots @ServerFault and @StackAlert, which tweet whenever there’s a new question.  Using a separate column, rather than actually following these accounts, will keep your friend stream clean.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts