When I complain about plagiarism, I hear the same thing over and over from other bloggers: “Nobody ever plagiarizes me. I guess I’m not that important.”
Really? So you’ve been checking to see if your stuff has been copied?
Exactly. It’s time to find out how.
Watch Your Trackbacks and Incoming Links
Odds are your blog posts will include links back to your own site at some point, like when you refer to your other posts. The quickest way to stay on top of this is to glance at the “Incoming Links” module in your WordPress dashboard:
In that screenshot, I can see that Steve Jones, Tom LaRock, Stacia Misner, Ted Kreuger, and “unknown” have all linked to my site recently. By glancing at that list, I can see that most of those are completely okay, but the “unknown” one gives me pause, so I’d click on that to make sure it’s a legit blog. On a side note, you should always monitor these anyway, click on all of the links, and read what people are saying about you.
Another built-in WordPress tool is the list of pingbacks. When people copy your work verbatim and publish it, their blog may try to send a pingback link alerting you. Go into your Comments list and filter it by pings only:
In that screenshot, I can see that Sean Gallardy has linked to my SQL Server checklist. I would want to click on that link to make sure it’s not an exact word-for-word copy of my own checklist, or another one of my blog posts that happened to link to my own checklist.
Set Up Free Google Alerts
Even if the plagiarist is smart enough to disable pingbacks, they probably won’t strip the links out of your blog posts. To catch those, I set up Google Alerts for real-time notifications; whenever Google runs across the word “BrentOzar.com” anywhere on the web, they send me an email. I can tell at a glance if it’s a plagiarized post, a forum question pointing to one of my articles, or a blog comment. I’ve set up similar alerts for sites I manage, my name, companies I work for, and so on.
When I’ve built a blog post I’m particularly proud of, I even set up Google Alerts for key phrases in the post. For example, in my SQL Server 2008 DAC Pack blog post, I used the phrase “Bringing Sexy DAC.” I can be fairly certain that phrase will not come up often, and if it shows up on the intertubes, somebody’s stealin’ my work. That phrase is a little down the page, beyond the first paragraph, so it shouldn’t show up if someone’s only showing the first few sentences of my post (which would be okay.) I set up a Google Alert for that, and if anybody is automatically reposting my work, I get notified.
(Yes, I’ve deleted that Google Alert now because I know by saying this, I’m going to get a bunch of tweets saying “I’m Bringing Sexy DAC!” Heh. I love you people.)
Monitor Your Referrers
If you’re using web analytics tracking to see how (un)popular your site is, it probably has a screen to show which sites are linking to you. In my favorite free web analytics tool, Google Analytics, it’s under Traffic Sources, Referring Sites:
Because the plagiarist may not be popular yet, you need to go through ALL of the referring sites, not just the top ten. The more popular you get, the more painful this gets, but on the plus side, you get a warm, fuzzy feeling seeing everybody linking to you in a good way.
I go through this list looking for sites I don’t recognize, then I drill into the analytics to find out exactly where in the site they’re coming from, and I click on it. Hopefully it’s not an exact copy of one of my posts that links to another one of my posts.
Use Tynt.com to Tweak Copy/Paste
This has to be one of the coolest tools I’ve ever seen. The easiest way to understand how it works is to see it in action. Go to any page on BrentOzar.com, select some text, copy it, and then paste it into a text editor:
SHAZAM. It doesn’t get much more obvious than that. I used to use more polite wording, but after being repeatedly plagiarized, I’m going with the big guns now.
Tynt even gives you a slick dashboard to show where your content is being pasted:
Search Manually with Copyscape.com
Finally, every now and then I go searching for copies of my recent posts with Copyscape.com. I put in a URL to a recent post (30-60 days old), and Copyscape goes hunting for similar copies. Their logic is pretty fuzzy, and it gives me a lot more misses than hits, but when it hits, it hits big time. It catches plagiarists who are smart enough to disable trackbacks, strip out your links, and even futz with your wording to try to make it look different.
This is how I caught CrazySQL initially, and how I found that BugoSQL was trying to hide some of my posts in disguised PDF files.
It’s a lot of work catching these diabolical bastards, and it’s like a never-ending game of Whack-a-Mole. I have to keep playing because I make a living off my content – it’s my marketing tool to bring in new consulting customers. This is especially important to me now that I’ve become a full time consultant; I don’t get paid unless I’m working for a client. I’m not getting paid to write this, either, but I do it because I’m passionate about helping the community and helping bloggers protect their content.