About Brent Ozar

The Chicago-Mac Sailboat Race

The Race Course

The Race Course

Since 1898, sailors have gathered at the Chicago Yacht Club each summer to race their sailboats up Lake Michigan to the Mackinac (pronounced mackinaw) Island.  The 333 mile course from Navy Pier to the lighthouse makes it the longest annual freshwater sailing race in the world.

The sailboats race around the clock for days as the crew work in shifts, sailing for a few hours and then sleeping for a few. Night sailing, storms, and quiet windless calms make this a memorable experience.

I’m nowhere near qualified enough to get a crew position on one of the real race boats, but I tried it in high school aboard a friend’s cruiser.  At the time, cruising sailboats weren’t technically allowed in the race, but bystanders shuffle down to Chicago and start at the same time as the serious racers.  We lived about halfway up Lake Michigan, so we were proud that we even made it down to Chicago for the start.  We made it about halfway back up before calm winds and a problematic engine made us give up.  I still fondly remember steering the boat in the middle of the night, watching the compass and the stars, talking to friends about what we planned to do with the rest of our lives.

The Hannah Frances

The Hannah Frances

This year, I’m honored to be able to give it another shot.  The Chicago Yacht Club started a separate class for cruising boats recently, and I’ll be aboard the Hannah Frances.  Mike Cook’s a good friend of mine, and he tolerates my complete ignorance of how to tie a knot.  (I was a Boy Scout – how come I know absolutely nothing about how to tie lines together?!?)

We have no delusions of winning, but we do have delusions of finishing.  The Hannah Frances is a wonderful boat rigged for easy shorthanded sailing and relaxed self-tacking, but fast, she is not.

We’re hoping to finish the race in under 4 days, but that means a lot more than 4 days of sailing.  We’re leaving in two weeks – Wednesday, July 16th – for a couple/few days of sailing down to the Chicago starting line on Saturday.  Then it’s four days of sailing up Lake Michigan, a day of partying with the other sailors on Mac Island, and then another few days of sailing back to White Lake.  By the end of it all, the crew will be intimately familiar with the boat and with each others’ quirks.  (Mike’s already warned me that if I want to listen to Death Cab for Cutie, I’d better bring headphones.)

Over the next couple of weeks, I’ll blog a little about race preparations.  Sailboat racing really is a sport, and it’s harder work than it looks.  For starters, I have to go pick up a Tyvek suit to fend off the black vampire flies.

Google Reader tutorial videos at SQLServerPedia

I can’t imagine trying to keep up with SQL Server news without using Google Reader.  It’s a web-based console for all the web sites you want to follow, all in one place.  Every morning, I open it up and within a matter of minutes I know if there’s any important news in the community.

There are other RSS reader tools too, like Bloglines or Outlook, but I prefer Reader because it works so well from anywhere.  I can catch up on my blogs from my iPhone or from any web browser, and I can share articles with friends with a single mouse click.

I put together a set of quick Google Reader tutorial videos over at SQLServerPedia to show what it is and how I use it.  If you decide to give it a shot, let me know what your Google account email address is so I can subscribe to your shared items too.

More Thoughts on Blog Plagiarism

In the aftermath of the InformationFlash plagiarism incident, several questions have come up from the site’s webmaster and from other bloggers.

Is it okay if the plagiarizer isn’t making money?

No.  Authors work really, really hard to create their original content.  Seeing someone else pass it off as their own, whether there’s a charge or not, reduces the value of our hard work.

If I took the whole content of The Manga Guide to Databases and reproduced it here on my blog, I wouldn’t be making a dime off it.  However, I’d be robbing the author of income.  Even if that author was giving away the work for free, the author might be benefitting in a way that I don’t understand yet, so I need to contact the author before republishing their copyrighted work.

Is it okay if I don’t understand my blog aggregation software?

No.  If you pick up a gun, it’s your responsibility to understand how it works. The first time it accidentally goes off and shoots somebody, you might be able to get away with claiming you didn’t know it was loaded.  After several people complain about gunshot injuries, though, you need to put the gun down.

Just as you can go to a local gun club to learn about firearm safety, you can get help with RSS aggregators too.  Post a message in the product’s support forum, contact other users of the product, or post a message on StackOverflow.  But whatever you do, don’t wave that thing around until you understand what you’re doing.

Shouldn’t the bloggers change their feeds to prevent theft?

Bloggers can choose whether to include the full article or just a few words in the RSS feed.  In my series on how to start a technical blog, I recommend using the full article because readers like it a lot more.  They don’t want to click through to read your full article on your site.  (Personally, I hate the holy hell out of blogs who just include the abstract, and their content has to be insanely good for me to subscribe to one of those kinds of blogs.)

Even if the blogger changes their feed to just include an abstract, it still doesn’t prevent syndication sites from stealing content with screen-scraping techniques.  Then the naysayers would say, “It’s the blogger’s fault for not requiring a username and password in order to read the blog.”

If we have another site pop up like InformationFlash, I’ll probably end up including a copyright note at the bottom of every blog entry.  It’ll say something like, “If you’re not reading this article at BrentOzar.com or SQLServerPedia.com, it was stolen.”  I hate doing that, though, because it looks crappy.  It’s like bolting the TV remote to the nightstand.

Is it okay if end users submit the copyrighted blogs?

No.  When the owner of copyrighted content notifies you that your site has their stuff on it, and they want it taken down, you have to take it down pronto.  YouTube is a good example because people try to upload copyrighted data all the time.  If the original content owner files a DMCA complaint at YouTube, then YouTube acts quickly to take the content down.

Just as a side note – if you try to claim some other user uploaded the copyrighted content, you need to be *very* prepared to show database records and web server access logs to prove the site administrator wasn’t the one uploading content.

How come it’s okay when Digg or DotNetKicks does it?

Because those sites don’t publish the full content of the article.  They show the first few words of the article, and if the reader is interested, they click through to the full content of the article on the blogger’s site.

InformationFlash was showing the entire article, start to finish, without even showing the author’s name.  That isn’t promoting the authors at all.  To make matters worse, InformationFlash had a Google PageRank of a whopping zero – meaning it wasn’t promoting anyone other than itself by stealing content.

Then is it okay if the site promotes the bloggers?

No. When you’re taking copyrighted content from bloggers, you have to get their permission first, period.

Some authors are completely okay with you republishing their work as long as you attribute them appropriately and link back to them.  For example, I’ve told SQL Server Magazine they’re free to use any material from my blog as long as they quote me.  (Part of this is a selfish reason: despite what Compete thinks, I’m pretty sure SQL Server Magazine has more readers than I do.)

Is it okay if it’s not illegal?

Even if you register your domain name anonymously and ignore all incoming emails, sooner or later people are going to figure out your real name.  They’re going to post your name in public along with an explanation of what happened.  That kind of information will turn up in Google searches, and it’ll make for very ugly job interviews and client negotiations down the road.

Besides, don’t you want to be successful?  Your site simply can’t become a success by alienating the very people upon whom your site depends for content.  You can be successful by working with the community and making sure everything is a win-win.  It’s not easy, and it’s not cheap, but it works in the long run.

Stealing is easy and cheap – but the long-term outlook is not so good.

How to Take Action When Your Content is Plagiarized

If your copyrighted blog content shows up in whole on another site without proper attribution like InformationFlash.com is doing, here’s a few steps you can take. IANAL (I Am Not A Lawyer), so YMMV (Your Mileage May Vary).

Send the Webmaster a Cease & Desist Letter

Get a sample cease & desist letter and tailor it to include your own content information. Identify the exact copyrighted blog post that’s showing up on their site.

The webmaster may not be aware of the plagiarism. Sometimes end users post copyrighted material on their own without the webmaster being aware. In other cases, the admin themselves may be doing the copying. Sending a Cease & Desist to the webmaster helps them understand that you didn’t give them permission to post it on their site.

The User Causing All The Problems

The User Causing All The Problems

Some sites like InformationFlash don’t make it easy – they don’t publish any personal information on their site, and they try to hide behind private domain registrations. They only accept emails through a contact form, thereby making it impossible to guarantee message delivery. No problem – keep reading.

Send Their ISP’s Abuse Department a DMCA Takedown Notice

The Digital Millennium Copyright Act protects the intellectual property rights of people who create content, like bloggers. Title II of the DMCA is an agreement between you (the copyright holder) and internet service providers (the web hosting company). As long as the copyright holders notify the ISP and the ISP reacts appropriately, then the ISP is not liable for the copyright infringement. Only the plagiarist is liable. That means web hosting companies and internet providers react swiftly and fairly to complaints of copyright infringement.

Get a sample DMCA notice to hosting companies and send it to the web host. In the case of InformationFlash, you can send it to abuse@dreamhost.com. I took the extra measure of sending one DMCA takedown notice per copyrighted article to show the extent of the problem.

Send Search Engines a DMCA Notice Too

If the site’s webmaster and their web host still don’t react, we have another weapon: the search engines. Before doing that, find out if the site even turns up in search results – the search engines may have already received DMCA takedowns for the site in question. Go to your favorite search engine and type the name of your blog post in quotes, like this:

“Top 10 Developer Interview Questions About SQL Server”

Look at the search results and find out if the offending site shows up. In the case of InformationFlash, it doesn’t show up – even if I add the word InformationFlash to the search. That’s awesome – Google’s already figured out that the site’s up to no good. In order to send a DMCA notice to a search engine, you have to show that their site will show up in a search for your work.

Each search engine has a different procedure for getting sites delisted:

There’s also a sample DMCA notice to search engines that you can use, but make sure to adapt it to each search engine.

Ask for Help From Fellow Bloggers

If you syndicate your blog with SQLServerPedia, email me about the offending site. If you blog at any other site, email the head honcho. All of us are writers, and all of us take plagiarism very, very seriously.

A cynic might ask, “But wait – how is this different than blog syndication at SQLServerPedia?” I’m glad you asked.

  • You ask us to syndicate your content. We don’t go poaching content.
  • We work with you to set up specialized feeds so that you choose what to syndicate.
  • We slather your name all over the place, making it abundantly clear that it’s yours.

If someone takes your syndicated content without your permission, and if you complain to me about it, I will make every effort to go after the offending party with all of the resources available to me. If you want them to syndicate your content straight off your site, that’s completely okay – but they need to take it from your site with your permission, not from SQLServerPedia. You, as a blogger, are completely welcome to syndicate with as many sites as you’d like.

In the case of InformationFlash.com, we’ve already sent them C&D letters, yet they’re still using (y)our content inappropriately. I hate to have to take it to the next step, and I hate to name names in public on my blog. I try to give everyone involved the benefit of the doubt and give them time to do the right thing. If they don’t do the right thing, then I want to make sure the public knows the names of the individuals involved and what they’re doing.

My next post will explain why companies should think twice before hiring individuals who plagiarize intellectual property, whether as full time employees or consultants.

InformationFlash-Content-Copied-From-Brent-OzarUpdate 6/27: as I expected, InformationFlash syndicated my content despite the post actually being about InformationFlash stealing content.  Rather awkward.  Here’s a screenshot of their plagiarized content, as well as a screenshot of a blog post they plagiarized from Gail Shaw.  Also note the name of the user who submitted the content – either their admin account has been hacked, or the site’s administrator is responsible for plagiarizing the content.  The top of the page notes that they aggregate information via RSS, but remember that we’ve already sent them a cease & desist once, and they agreed to do it – they’re just not doing it.

Update 6/28: Dreamhost contacted me and said they’re taking the site down due to our DMCA complaints.  It’s not clear whether the takedown is permanent.  I want to thank Dreamhost for acting quickly to protect the intellectual property rights of bloggers.

Update 6/29: I got emails with questions from the site’s webmaster and from a few bloggers, so I added the answers in a followup post with More Thoughts on Blog Plagiarism.

My Michael Jackson Story

I’ve been joking a lot on Twitter about the passing of the King of Pop.  Somewhere between the Jesus Juice and the Elephant Man, he’d lost a lot of credibility in his fading years.  Earlier in both of our lives, though, things were different.

This Had Me Written All Over It

This Had Me Written All Over It

In middle school when he was at the peak of his popularity, I desperately wanted a red and black leather jacket like his.  I mean, desperately.  I had enough money saved up, and Sears carried one that I could afford.  That right there should tell you everything about my level of style – I aspired to own a piece of clothing carried by the most unhip of 1980s retailers.

My parents, having slightly more taste than me, would not allow me to purchase the jacket.  I was upset, mortified, angry, you name it.  Today, my father sports a diamond earring inspired by Jimmy Buffet – but I digress.

Instead, I ended up buying a large boom box, with which I played songs like Thriller, Bad, and Billie Jean.  Over time, my tastes changed to Huey Lewis and the News, but the King of Pop will always make me wanna get out on the dance floor and perform ill-advised moves that show off my complete lack of physical grace.

So today I’ll be listening to the Essential Michael Jackson collection I just picked up off Amazon MP3 for $17, dancing around the desk, and I won’t stop til I get enough.

How to Get More Twitter Followers

Yesterday, Kevin Kline ran across the WeFollow list of top twitterers for the SQL tag and remarked:

How to Climb a Mountain

How to Climb a Mountain

I hear that same question privately every now and then, and it’s not that hard.  I’ve got the simple answers to get yourself to the top of the popularity list!

Set Up Searches for Key Phrases

If you’re interested in SQL Server, there are tools you can use like RSS feeds from Search.Twitter.com that will alert you whenever someone mentions SQL Server.  That way you can jump right into their conversation and interrupt help them.  They will surely be impressed by your knowledge and your willingness to help, and they’ll follow you for your insight.

The drawback, though, is that there’s a lot of conversations happening on Twitter at any given time.  It’s seriously hard work to keep up with all of them.  You could devote your time to Twitter searches, or maybe hire a savvy assistant to proactively run your Twitter profile, but sometimes even a human being isn’t enough.  At that point, you’ll want to bring in the machines.

Set Up Robots to AutoRespond For You

Clippy

Clippy

Twitter has a cool set of APIs that you can use to build a robot.  Whenever someone mentions a topic, like say SQL Server, you can build an automatic response that says something like:

“I see you’re trying to build a database.  Would you like some help?  I’ll be your best friend.”

If you’re really good with your autoresponses, people will never guess that your witty responses are coming from an automated, heartless piece of software.  Bonus points if they try to carry on a conversation with you, and you have another autoresponse for that.  They’ll line up to follow your Twitter account in no time.  To see an example of a bot in action, check out @joe_kl.

Follow Everybody You Can Find

Go crazy with the Follow button.  Follow anybody and everybody regardless of what they’re talking about.  They might follow you back just out of sheer politeness.

There’s a catch, though: Twitter will yank your account if you follow too many people too fast.  Every few days, go into your Friends page in Twitter, which lists the people you’re following.  You can identify the ones who are following you back because there’s a “Direct Message” link – you can only send DM’s to people who are following you.  Unfollow anybody who doesn’t have a “Direct Message” link next to their name, and presto, it’ll keep your list shorter and let you follow more people.

When you unfollow people, they may get alerted about this if they’re using a service like NutshellMail.  At that point, they’re going to know you’re a bit of a spammer, because they’re going to guess that you followed them just to try to bait them into following you back.  This isn’t a problem at first, but if you try that same trick repeatedly, it pisses off users because they know you’re just an absolute slimeball.  (Doing it even once makes you a slimeball, though.)

Give Stuff Away to People Who Follow You

Announce that once a month, you’re going to pick a random follower and give them something juicy like a gift certificate or a free iPhone.  People will do almost anything for a Klondike bar, I hear.

Once you start, though, it’s like a drug addiction.  If you don’t keep giving things away, people will stop following you, and worse, they’ll start UNfollowing you.  Of course, if you’re in the business of professional marketing, you should have no problem justifying giving away portable hard drives or Macbooks in order to get your spam message out to a larger audience.  Heck, even just the Twitter population as a whole may not be enough, and you may want to…

Send Spam Emails Asking People to Follow You

The majority of humanity isn’t on Twitter yet, so when these measures aren’t enough, it’s time to kick it up a notch.  Send out a broadcast spam email to everyone you can find asking them to join Twitter and follow you.

I’ve been watching the Twitter follower counts of one particular publication who chose to spam me with an invite like this.  I was curious to see if it worked – I had this vision of people saying, “Wow, this is awesome!  I don’t get enough spam through my email client, and it takes so darned long to get it.  I’ll go sign up right now and follow them for up-to-the-minute spam in 140 character chunks!”  Not surprisingly, it doesn’t appear to be working.

Or, Uh, Maybe Just Be Yourself

Maybe I’m old school, but I like to get my followers the old-fashioned way: I earn them.

Don’t follow people just to game the metrics. Unless you’re Ashton Kutcher, nobody really gives a rip how many followers you have.  Twitter is about relationships.  It’s about caring, not calculations.  If you’re out to prove you’ve got the biggest numbers, cut straight to the chase and start giving away free pr0n.

Be yourself, not your company. I follow some company accounts because they have truly kick-ass products.  I want to hear every single bit of news about the cool new stuff they produce.  I work for a company too, but I don’t use my Twitter account as a pimp platform.  If you ask me questions about our products, I’ll be glad to talk with you about it, but not in public on Twitter.  Nobody wants to listen to somebody else buying a used car on Twitter, for example.

Join the conversation. Don’t just spew garbage out automatically – listen, help, and engage.  When you jump into a stranger’s conversation and start blathering about yourself, your opinions or your product, people see through your act.  In meatspace, you can identify the failure of your technique by watching the panicked horror in their facial expression, but on Twitter it’s not so clear.  If the technique doesn’t work in meatspace, it won’t work here either.

Remember that kid in middle school whose mom always sent him in with a bag of cookies trying to make friends?  The one who kept running into you and your buddies and just standing around until he could inject himself into the conversation?  The one that everybody said was trying too hard?  Don’t be That Guy.

Top 10 Developer Interview Questions About SQL Server

Knowing good questions to ask during an interview with a developer can help you filter out the best candidates from the ones who aren’t the most qualified.  There’s a huge difference between “It worked on my machine” and “It scales well in production.”  These interview questions will help you filter out the bad apples before you hire them.

10. Explain why DBAs don’t like cursors.

I like to phrase this interview question this way because I’m not saying the DBA is right – I’m just asking the developer to explain the DBA’s point of view.  I don’t have a problem with the developer rolling their eyes as they explain the answer, but I have a problem with the developer being surprised by the question.

The candidate gets bonus points if they seem even vaguely aware of the terms “set-based processing” and “row-based processing”, but that’s purely a bonus.  (I wish I could say that these concepts are requirements, but in today’s economic market, companies don’t always want to pay top dollar to get the best candidates.)

9. Where do you like business logic – in the app or in the database? Why?

Personally, I like stored procedures because they’re easier for us DBAs to test, tune and tweak. On the other hand, the developer community isn’t always as fond of stored procs.  For their side, see these posts by Jeff Atwood:

I don’t mind what arguments the coder candidate uses, but I want to see ‘em put some thought into it.  No matter which angle they take, I’ll play the devil’s advocate and prod them with arguments just to see how they react.

8. Explain when and how transactions should be used.

Not In The Oprah Book Club, Oddly

Not In The Oprah Book Club, Oddly

Start with just that open-ended interview question, and if they have trouble getting started, give them a scenario.

“Say we’ve got a table for Orders, and a table for OrderDetails.  Someone places an order for two books – Bacon: A Love Story and the hit bestseller Eat What You Want and Die Like A Man.  Tell me what happens.”

After they’ve answered, ask them when transactions should not be used.  I don’t want my developers wrapping anything inside a transaction unless it absolutely needs to be.  (Unlike bacon, which should be used as often as possible for wrapping purposes.)

7. Explain referential integrity and where it can be enforced.

If they stumble on the question, circle back to the Orders and OrderDetails tables we used as examples earlier.  What’s an orphan?  How do we make sure that we don’t end up with OrderDetails for records with no matching Order record?  Where are all the places we could enforce referential integrity?  (Think foreign keys, triggers, the application, or not at all.)  Have you worked in places where there was no referential integrity, and what problems did you run into?

6. What’s the fastest way to get a thousand records into the database?

I’m not looking for the best answers – I’m just looking to hear that they’ve done some work to performance tune their queries.  If they’re doing fully logged individual record inserts, one at a time, into a data warehouse-size system, we’re going to have problems down the road.  (Yes, I’ve actually worked with a BI developer that did millions of individual inserts per night in full recovery mode and thought the performance was the database’s fault.)

Bonus points if they link back to the previous interview question and talk about whether or not they should disable constraints or referential integrity during data loads.  (I don’t care what their final answer is, but I just want them to know the pros and cons.)

5. What’s the difference between a primary key and a clustered index?

This is almost a bonus question.  Most of the time, the candidate doesn’t know because it’s a function of the data modeler or architect, not the developer.  However, I want to see how the candidate reacts to tough questions.  Ideally, they say in a relaxed tone of voice, “I’m not sure, but I know who I’d ask.”  If they don’t mention where they’d go, ask them where they go for SQL Server answers.  Speaking of which…

Bonus Points for This Candidate

Bonus Points for This Candidate

4. What’s your StackOverflow name?

I don’t need to see a high reputation, but I do want to see an awareness of the site.  This interview question serves two purposes: it finds out if they’re serious enough to be active in the community, and it shows them that you’re okay with their community activity.  Start a conversation with them about the level of internet time that you find acceptable in the office, and encourage them to share their knowledge with their peers.  This sells the candidate on your shop.

3. Tell me about a time when a DBA got mad at you.

This is a spin on the classic interview question, “Tell me about a time when you failed.”  Implemented a user-defined function, trigger, CLR in the database, or something else that made the DBA freak out?  I want to hear that the candidate listened to what the DBA had to say, good or bad.

If they say it’s never happened, rest assured it’s going to happen soon.

2. How can you tell if a query will scale for production?

I want to hear that they do things like load tests or maybe look at execution plans.

I’m sometimes comfortable when a senior developer says things like, “I can pretty well tell when something isn’t going to scale, because I know the production boxes really well.”  The key is asking a followup question about times when things didn’t scale.

1. When is the DBA right?

Always, kid.  Always.

If you liked this, here’s a few more of my posts about interview questions for job candidates:

Questions About Automation & Patch Management

Opinion poll time!  I got asked a few questions, and I’m curious about what the rest of you think:

Question 1: How do you feel about Run Book Automation for databases?

Have you used it?  Would you want to?  Why or why not?  If you’ve never done it and never would, please respond too – don’t just be scared off by terms you haven’t heard before.  (I hadn’t heard of it before I came to work for Quest.)

Question 2: Do you have multiple copies of the same database?

Do you have several copies of the same database that you need to keep in sync, whether it’s schema or data?

Question 3: If yes, how do you feel about automated patch and configuration management of databases?

Do you have enough databases that you would consider building a “gold” standard, deploying it, watching for changes, catching out-of-band scenarios, syncing them back to the gold standard, etc.?  Why or why not?

Let me know what you think in the comments.  I’ll hold back my own answers for a couple of days.

SPWho2.com: StackOverflow user and tag statistics

I took the StackOverflow database dump, brought it into SQL Server, and did some slicing and dicing by tags and users.  I wanted to find the answers to questions like:

I’m having a lot of fun with the data, and I thought other people might enjoy it too.

To make it easier, I built SPWho2.com, a site with StackOverflow user and tag statistics.  It’s not terribly attractive yet, unless like me, you find numbers attractive.  It’s a side hobby for me right now, and over time I’ll add in more data visualization with graphs and trends.  If there’s anything you’d like to see added, let me know.

Finding old, unanswered StackOverflow questions

So you wanna start answering questions on StackOverflow, but you’re frustrated because it seems like people are fillin’ out answers even faster than you can type?  Find yourself up at 3am hitting Refresh just hoping a new question comes in?

Necromancer Badge Shirt

Necromancer Badge Shirt

Here’s a new trick:

  1. Import the StackOverflow database into SQL Server
  2. Run this query to find old, unanswered questions
  3. Profit!  Err, no, but watch your reputation go up as you solve old questions.

It turns out there’s a little (and I do mean little) niche of questions ripe for the pickin’.  I spent half an hour around midnight last night answering a handful of old SQL Server questions and I’ve already gained 75 points – pretty surprising for a Friday night.  Even more surprising to me, I’d posted an answer to a question that had been dormant since December (6 months ago), and the questioner already checked my answer and thanked me for it.  See, answering old questions really does help people, not just game the system for reputation points!  (That’s how I rationalize it anyway.)

Another boost: you’ve got a better chance of earning the Necromancer badge, which is awarded when someone answers a question more than 60 days old and then gets 5 upvotes on their answer.

Oh, and the shirt?  Yeah, I’ve set up a few StackOverflow badge shirts, including badges that don’t exist (yet) like Fastest Gun and Bounty Hunter.