Here's how it happens. A blogger sets up a feed for their blog so that subscribers can have the convenience of reading their material in a consolidated feed reader at home. Some choose to use email subscriptions but most blogs have both options readily available. Feeds are a nice convenient way of providing your material but this is where feed scrapers practice their abuse. Scrapers subscribe to the blog and use a bot or a spider (a bit of fancy computer coding that grabs the text and pictures of the blog) to harvest the unsuspecting blogger's posts then regurgitate it back on another website. It is extremely hard to stop them through a feed service since the subscriptions may not reveal the identity of the website taking the content. Scrapers then use the stolen material as ready made, text rich content perfect for search engines. When the search engines find them they can make money through ads posted on the site. They can make money through the ads but at least two of the last four scrapers I have seen lately are attempting to build the pagerank of the site presumably to resell it later to a high bidder.
To an unsuspecting blogger feed scrapers won't even be noticed. They accomplish their thievery behind the scenes without asking permission and can easily get away with it unless you add some protections on your blog. Nothing is fool proof but there area couple ways you can find a feed scraper.
- The first way is to regularly check your links by using link:www.yourblog.com in a Google search. This will show you anyone who is linking to your URL and will only indicate a feed scraper if they accidentally left a link to your blog somewhere in the original post. Usually they just remove the links.
- You can also highlight a random section of a post, paste it into a search engine with quotes around the phrase. The quotes tell the engine to look for everything inside the quotations and will match the random text with any site that has been indexed.
- The next way I'm about to tell you works well but takes a few more steps to implement. First go into your blog and insert a post feed footer of some kind that contains text and a link to your blog. Something like "This post was written by ________ for the blog www.yourblog.com Copyright 2009" can work. The more unique you make it the better. Then sign up for Google alerts and copy your whole post footer to use as the search term. Anytime the text of your blog is found by Google you should see an Alert appear in your inbox. If it's your blog ignore it, if it's not your blog it's time to investigate.
No method is foolproof and many of these feed scrapers will do everything they can to make your blog nondescript by removing links and accreditation. Recently I began watermarking my photos with the URL of my blog to ensure that whoever is looking at my pictures knows where they originated. One feed scraper removed all the links from my blog and posted Mr. Gardener as the author. (It was very disturbing to find pictures of a family vacation with my children in it on someone else's site.) If by some chance the scrapers leave the links intact you may benefit down the road from the extra links coming to your site but it is still theft. Copyright has no meaning where they are concerned.
Once you find them, what then? Prepare for battle. It's not an easy thing to get yourself removed and even harder if they are in another country. Some countries recognize copyright law while others don't and in those cases it may be extremely hard to do anything. The first step that most people take is to contact the scraper and ask for removal - that has never worked for me. The first scraper I removed myself from ignored my repeated attempts to contact via contact their website form. Then I moved to commenting on my stolen posts but of course they were all moderated and the comments never appeared on the site. Finally I looked up the Whois information and was fortunate to find a name and email listed as a contact. (Whois is simply who owns the URL and you can find it by looking it up through many Whois finders. I bought my domain through Godaddy which provides a Whois search, many places do.) I contacted the email address and soon my blog was no longer in use. Today the whole site has been parked. It's very likely that you will run into a roadblock called a Private Domain and you will have to find out the web host for the site. Contact the webhost explain the situation and ask them how to proceed. Most should contact the site owner for you or at least forward your email. The second feed scraper I removed myself from was in Australia and had a privacy service on their Whois listing so I had to contact the host.
Lastly if they don't respond to any of the aforementioned methods you should construct a Cease and Desist Letter to send to the host. I've never had to take this step and hopefully won't have to. I suspect that most feed scrapers would rather concede to a lone blogger than risk their blowing their whole feed scraping enterprise. I've also used the Google spam report through Webmaster Tools to report the feed scraping sites for stealing content. I can't verify it's effectiveness but I've reported two out of the four scrapers and both have been removed so maybe it worked.
At some point you may be forced to prove you actually are the owner of the copyright and it's a good idea to take screen shots of your blog and the scrapers site where your stolen articles are. Match the articles and save them so that you have them if you need them.
I was fortunate to find a passionate and extremely helpful fighter of plagiarism in Johnathan Bailey who contacted me through Twitter. He runs the site www.PlagiarismToday.com and gave some great advice to me for dealing with these people. His site is filled with good information about combating plagiarism and well worth your time to visit if you are concerned about your content being stolen.
I have one last piece of advice that may help you find scrapers: get involved in a community of bloggers that watch out for each other. I can't stress this enough. When your friends see your posts somewhere they'll be happy to let you know. There is always another scraper around the corner so be watchful, be wary, and don't give up the fight!
Dave Townsend is an avid gardener, stay-at-home dad, and garden blogger (www.GrowingTheHomeGarden.com). He's appeared on Better Homes and Gardens (BHG.com), talks occasionally on local radio, and is active in the local garden club. On his blog he discusses vegetables, plant propagation, and pretty much anything garden related!
Related Post: Feed Scrapers II