CopyrightedWhile some things may change, others always remain the same. The recurring debate of whether you should use partial or full RSS feeds has been revisited time and time again, because if you have a blog (and it’s successful, usually), you’re also probably a victim of scraping.

Because it’s so easy to grab data from an RSS feed, blogs will arise and people will always try to make a quick buck off of your hard work and effort. This is why many bloggers provide partial feeds, though people who subscribe to hundreds of feeds (like myself) find this very inconvenient.

A lot of bloggers don’t realize that there are options to prevent scrapers from stealing the content off your site even when you have full-feeds enabled.

Here are a few things you can do that are relatively easy:

Report it to the search engines. Stealing content without authorization is a violation of the Digital Millennium Copyright Act. You can file DMCA complaints with any search engine as long as you provide sufficient information behind the suspicion of theft. Furthermore, anyone using Adsense on their sites can have their accounts terminated if Google finds them responsible for content theft. You can report such violations through Google’s DMCA Adsense page.

Prevent hotlinking of your images. There are plenty of sites that discuss the methods for doing this, but a simple .htaccess file in your images directory can be sufficient, especially if you serve the scrapers different image content.

Here’s some code to add to your .htaccess file (via Jackol’s htaccess cheatsheet):

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC]
RewriteRule .(gif|jpg)$ - [F]

Be sure to replace mydomain.com with your own. This will then create a failed request when hotlinking of the specified file types occurs. In the case of images, a broken image is shown instead, or you can create your own image to be shown (get as creative as you want) by adding this line:

RewriteRule .(gif|jpg)$ http://www.mydomain.com/dontsteal.gif [R,L]

Another good in-depth tutorial on setting up this .htaccess file, or alternatively, a PHP file, can be found on A List Apart.

Contact the scraper directly. In some cases, they will tell you what you already know: they stole the content from your RSS feed. Some even have the audacity to ask you to link to them (in a mutual relationship that will benefit all parties). Your goal is to have others who actually comply with the removal of using your content altogether. Nobody wants to be slammed with a duplicate content penalty. Surprisingly, people do comply with this request, so it doesn’t hurt to try.

Does anyone have any other suggestions for how to prevent your content from being plagiarized?

[Thanks, Steve!]


Posted by Tamar Weinberg at 2:49 pm
Bookmark this post: