Dec 12, 2009

Feed Scrapers II

I just wanted to followup on Dave's Feed Scrappers guest post with a couple of thoughts. In his post he mentions getting his garden blog removed from a feed scraper in Australia. I recently had an encounter with them too. As Dave mentioned, they have registered their domain privately so there is no WhoIs record for them. When I sent an Email to the account listed in their contact page the Email bounced back undeliverable. I didn't want to use their contact form and provide personal information, so I was at a dead end... or so I thought.

Then I realized that people who like to take the content of others usually don't like it when someone takes something they own. So I looked at the source code of the website and found the copyright notice along with the name of the company among all the code. Once I Googled the name of the company that claimed to own the source code, I found their legitimate site(s) and sent an Email to all of the contact Emails they displayed, informing them that I would file a take down notice if they didn't remove my content from their website. Within a half-hour I got a reply informing me that they had taken down my posts and apologized. Even though they were republishing dozens of my garden blog posts in full what bothered me was that this website made it seem like I had an account with them and was a member. That's what really bothered me at the end of the day.

If you find yourself trying to contact a scrapper and can't get anywhere with the Email they provide look for the source code and contact them through their main company. It used to be that the majority of feed scrappers were just individuals doing it to build up the rank of a website they had and then sell it off. Lately, it seems like more and more companies are getting in on the act. Maybe someday I'll tell you about the time a major advertising company was scrapping my blog. Anyway, to find the source code of the website in question, from the IE browser click on Tools>Developer Tools>View>Source>Original. In Firefox click View>Page Source. In Google Chrome click on the paper icon next to the wrench, Developer>View Source. Looking for a copyright notice may not always work if the person running the scrapper site is using a generic script, but if they are a larger company they'll usually use something more customized and have a need to advertise that it is "theirs."

How to Tell if You're Being Scrapped.

1. Look over the site stats for your garden blog. I know most people are happy using one of the simpler trackers, but I really have to recommend Google Analytics. It may seem a little intimidating and there maybe a bit of a learning curve, what with all the options and links to click, but it is well worth the hassle. If you notice a dramatic decrease in the number of your visitors, then you're probably being scrapped.

2. Feedburner, it isn't just for your feed. It also gives you some statistics about your subscribers and what kind of feed reader they're using to read your blog. Familiarize yourself with the names of the bots and clients that legitimate readers are using, do a Google search for funny names that you don't recognize. In the same section look at the "uncommon uses" you'll find many blogs you recognize because you are on their blogroll and those that you don't you should visit and investigate.

3. Along with appending a message to the end of your post, like Dave recommended, add links to yourself within the post. Most scrappers will tear out the links and just provide plain text, but some may not bother, especially if they are trying to make their website look like a social networking site. That's how I discovered the scrapper in Australia, they showed up in my Google Analytics statistics after some people clicked on the links to older posts.

4. Google Alerts. Set one up for the name of your blog and for the name you blog under.  You'll be sent an Email whenever Google picks up those words on websites.

5. Name the pictures you upload to your blog, then do a Google search for the name of the image. See also ALT attributes and Your Garden Photos.

Have any other tips or suggestions besides the ones Dave provided in the previous post or the ones in this post? Feel free to share them in the comments section.


  1. Personally, either I'm not that popular, or my web administrator is really good at looking out for me. I probably need to add the embedded messages you are talking about, and I might be surprised what I find. I have noticed scrape content and most of them I see on the gardenbots on Twitter. Most of them I have blocked and I think that may be helping as well or at least keeping me from seeing it.

    However, I am seeing an even worse scraping almost weekly by some major names and magazines. It seems like about once a week I will find a great blog or article on twitter, only to find it almost verbatum 3 days later in a MAJOR garden magazine. My solution is to tweet the 2 links side by side with the original dates. I don't accuse I let readers decide. If you see these, please RT and take the time to check it out for yourself. It make be you they are stealing from. If nothing else, you might find the little guy that's not getting noticed.

  2. Botany Buddy,

    That's interesting about the republishing in garden mags. Maybe the bloggers have agreements with the mags?

    You just reminded me that my 4th suggestion should be setting up a Google Alert for the name of the blog or your blogging name.

  3. The site stats are an important one. I've found several places that have sent people to my blog from links on scraper sites. More good ideas!

  4. Thanks Dave, if you have any more ideas feel free to share them. This goes for everyone else with a garden blog.


I hope you find this blog a useful garden blogging resource. Sometimes I may reply to comments with my MrBrownThumb account or I may reply with my Garden Bloggers account. Hope this isn't confusing. If you're looking for gardening information check out "Google For Gardeners"

Note: Only a member of this blog may post a comment.