May 11, 2010

Feed Scrapers III

In the Feed Scrapers guest post by Dave at the Home Garden and in Feed Scrappers II you can find helpful tips that are useful in discovering if your blog is being scraped and republished without your permission. I'll assume you've read those two posts and will not go into detail in this post.

Here is a screen shot of a website I recently discovered scrapping my garden blog's feed. They've also scraped the message (3) appended to my blog's feed that was recommended you add to your blog's feed in one of the posts linked above.

How did I find This?
In one of the previous posts it was suggested that you create Google Alerts for keywords from your blog. Since then, I've been experimenting with creating Google Alerts for the titles of my posts (1) to alert me when 
my blog is being scraped.

What they scraped.
Generally speaking, when your garden blog is scraped they will take all the words and images from your post and republish them. Some scrapers are a lot smarter and will only take a paragraph or two, which falls under fair use. Usually, the first paragraph contains important keywords that these scrapers covet for advertising and to help them rank high in search engine results.

So, to combat this, or make scraping my blog pretty useless, I've started inserting the paragraph you see marked as 2. It is nothing but text and links that promote my blogs, feed, Twitter and Facebook page. I add this paragraph of "junk" text into the beginning of every post, then I remove it after a day once I know the scrapers have picked it up.



Here is a screen shot of my blog. 1 shows the title they scraped and 2 shows the "junk" text I inserted to deal with scrapers who only take the first paragraph. Here is what my post looks like now, after removing the "junk" text put there for them.

Using these two new tips I've discovered that it is even easier to find websites that scrape my blog and I think I've discovered a way to make it less profitable for them to scrape me. I don't lose anything by them taking the first paragraph because it is just links to pages I maintain.

Note
Since scrapers will rip out the HTML from the posts I like to write out the full URL to my blogs. If someone comes across a website that has scraped my post they'll see the "junk" text which will not be of much use to them but with the URL text still there they can always copy/paste that to land on my blog.


6 comments:

  1. Interesting. But how does the insertion of the paragraph affect the search engines (if it does?) I thought the first and last paragraph content is crucial in cataloguing your post as well as the keywords you have in the title?

    Won't your taking the paragraph out happen after the cataloguing process is most likely to have taken place?

    ReplyDelete
  2. VP,

    You're right, the first paragraph & title are crucial. As I explained in the post I take out the "junk" paragraph out after a day or so, leaving the scrapers enough time to pick it up. So, what should happen is that several scrapers will all have the same "junk" paragraph that doesn't have keywords that have anything to do with the tile. Hopefully, since they all have the exact same text they'll be seen as spam because they're all publishing duplicate content.

    The cataloging process doesn't happen right away so I don't think keeping the "junk" paragraph for a day will do any harm to my blog. But if and when my blog is crawled it will have posts with the title and keywords in the first paragraph that the scrapers won't.

    Makes sense?

    ReplyDelete
  3. It makes sense, but I'm worried that the cataloguers and scrapers are following the same or similar schedule...

    Is there any way to test it out I wonder...?

    ReplyDelete
  4. VP,

    I don't think the spiders are indexing blogs at the speed that the feed scrapers are doing it. So far, I haven't noticed the yearly dip in traffic that I usually experience this time of year when a new scrapper picks up my spring posts.

    So, I'm crossing my fingers and hoping this experiment works.

    ReplyDelete
  5. MrBrownThumb,

    I'm glad to be introduced to your website by Rosie of Leaves n Blooms when I was facing problems with internet thieves. I find your tips in the Feed Scrappers I,II and III posts very helpful and I have inserted your links in my latest post, with the intention to inform and help my blog visitors. I hope you don't mind. My link is here:

    http://www.mynicegarden.com/2010/06/my-first-year-at-blotanical.html

    Thank you very much and do keep up the good work.

    ReplyDelete
  6. Hi Autumn Belle,

    Glad you found the information in these posts useful and that you decided to share the links.

    ReplyDelete

I hope you find this blog a useful garden blogging resource. Sometimes I may reply to comments with my MrBrownThumb account or I may reply with my Garden Bloggers account. Hope this isn't confusing. If you're looking for gardening information check out "Google For Gardeners"

Note: Only a member of this blog may post a comment.