Monday, April 12, 2010

Beware, Comment Spammers!

I had this great idea about how to fight comment spam. If you're not familiar with comment spam, you probably don't have your own blog and you think that "Kathryn" and "Patrick" who try to comment on this blog are just brain dead people. You might be right about the brain dead part, but I'm not sure they're really people.

Do you ever wonder why commenting on blogs can be such a hassle, or why so many blogs require moderation, or why many blogs don't accept comments on older posts, or forbid links in comments? It's because of comment spam. Spammers will submit comments such as "Your post is helpful and informative" or "We need to pay attention to the eco friend environment" that don't address the topic of the post in question. I'm not talking about targeted self-promotion here. It's not comment spam to link to an article you wrote on a similar topic, but it's definitely comment spam if you use a robot to do so. Or if you hire people in Asian boiler rooms to get around the CAPTCHA's that stop your robots.

It used to be that comment spam was done to improve the search engine ranking of websites. That motivation has largely gone away with the development of the "nofollow" tag. Blogs such as "Go To Hellman" attach add rel="nofollow" to any links in the comment threads. This tells spidering robots not to follow the specified links and tells search engines to ignore the links for purposes of site ranking.

I guess the people who have been leaving spam comments on my blog didn't get that memo. It's annoying to have to delete the comments, especially the ones in Chinese where links get hidden around the periods in "...". I went to the Blogger help pages to see if there's any way to report the abusive commenters (this blog restricts anonymous comments, so there's at least a user profile for every comment). There isn't. What's worse, Google tells you that if you don't remove those spam comments, your site's ranking will be hurt. Then I had my bright idea. I clicked on one of the links left in the spam comment. Then I picked some keywords from the page and plugged them into Google to find the site. There, at the bottom of the search result, was an option: Dissatisfied? Help us improve. Google is asking for feedback. I pasted in the URL for my comment spammer's site, and checked the radio button labeled "The results included spam." I clicked send, and my spammer's site was bound for Google oblivion!

Beware, comment spammers, I'm going to report you!

Though I felt good about it, I started to have doubts. A lot of these comment spammers seemed to be Asian; could it be that Asian search engines didn't get the nofollow memo either? Some quick googling confirmed my suspicion, China's leading search engine, Baidu, doesn't pay attention to the nofollow attribute! These comment spammers must be using my blog to juice their Baidu ranking!

Well maybe not. I did a few searches in Baidu. Baidu is probably the worst internet search engine I've ever tried! Baidu gives really stupid results for my vanity search. Baidu doesn't index my blog, my website, or anything I've ever posted. Perhaps China has blacked out the entire Google network, including Blogger, and Baidu doesn't see it any more. Or perhaps "Go To Hellman" has been banned for its post on Qin Shi Huangdi. Baidu has spidered a page from WorldCat that mentions some other Eric Hellman, and has picked up blog mentions of my by John Blyberg and in Dear Author but not much else. It's safe to assume that Baidu's strength is not English-language indexing.

So if Baidu doesn't index my blog, then spammers shouldn't be able to improve their Baidu rankings with comment spam in my blog. There must be some other motivation for the comments.

Another thing I noticed is that Baidu seems to be big on searching for MP3's and PDF's. It ranks sites like Rapidshare rather highly. Maybe Baidu and similar search engines spider websites like my blog to discover the mp3 files, the PDFs, and the video files that Baidu users are really looking for, and the intended audience of the spam comments is these content spiders. My blog has discussed ebooks, piracy and related topics, so maybe the spammers think its a good source for links to content. Who knows?

Another possibility is that the spammers are trying to get bloggers themselves to visit the their sites. "Patrick" from Madras is trying to sell "web templates". It turns out that his site has copied content from another site marketing web templates, which appear to me to be copies of other websites with much of the content stripped out. It's ironic: Patrick seems to be using a template for a web-template selling website to sell web templates.

After a few days, I checked back to see if the website I had complained about had been removed from Google or not. As it turns out, the site actually improved its Google ranking from #5 to #1 in my test search. So much for my career in comment spam scourgedom!
Reblog this post [with Zemanta]

6 comments:

  1. I started to make a list of amusing spam comments like you mentioned (see it here: http://docs.google.com/View?id=dfr2jdcs_262gxmwmrfd)

    I had 100's of spam comments on my blog every day. I noticed the vast majority were submitted for only one post, about ebooks (http://commonplace.net/2009/11/is-an-e-book-a-book/). Eversince I disabled comments for just that one post, I only get very few spam comments anymore.

    ReplyDelete
  2. I get a couple comment spams a day on my blog. They're automatically detected and blocked though. There are a few tricks you can use to detect bot behaviour. I wrote about the one I use on my website here:

    https://secure.grepular.com/Blocking_Comment_Spam_Using_ModSecurity_and_Hidden_Fields

    This method hasn't let a single bot comment through in months.

    ReplyDelete
  3. I've noticed that comment spammers find their targets using search terms such as inurl:"node" intext:"post a comment" -"comments are closed" writing service. Thus, the invisibility incantation to the right -->

    ReplyDelete
  4. Stopping coment spammers is really easy - add a mod to your site that has a random text question a human user has to answer as well as capcha.
    Secondly analyzer your logfiles and traffic from ips and use project honey pot to determine if they are spammers. Then block those ips or ranges at the server level. Linux/apache users can easily use mod rewrite and htaccess. Windows/IIS users simply use deny access.

    ReplyDelete
  5. Captcha is now not such affective. Spammers are using De-captcha software to avoid that. I think the best option is question answer. Means asking a question whose answer can only retrieved by search engine.
    Investing in Property

    ReplyDelete
  6. Captcha should be some advance now. There are some software that can decaptcha those images.There should some scrolling task or game which can only be played by mouse, that will help to reduce spammers.
    Property Investment

    ReplyDelete

Note: Only a member of this blog may post a comment.