Small-scale Measures to Combat Comment Spam

May 16 2007 Wed

12:35 am PHT

You might or might not know that I created the blogging software that runs vaes9 by myself. I have my particular reasons why I decided to have complete control over the application layer of my blog instead of opting for the popular blogging systems like WordPress or Movable Type. One particular advantage is the ability to customize and tweak my blog in any way I want. A disadvantage is that it takes a lot more effort to add functionalities that other blogging platforms easily get through plug-ins.

Nevertheless, maintaining my blog has given me some insights into fighting comment spam. Granted, my blog is not a high-profile nor a heavily-trafficked site that it invites a deluge of spam, but I do still get my fare share of these nasties. I’d like to share a few techniques I’ve been using.

Moderation for old posts. I don’t like the idea of closing down comments, especially for older posts. I believe that if you have something to say about what I’ve written, you’re entitled to it, no matter how ancient the post may be. But in order to combat comment spam, I automatically turn on moderation for posts that are more than a month old. This way, I only have to actively monitor the more recent articles for spam since any unsolicited comments in older posts won’t see the light of day, unless I approve it. This technique is used quite a lot by many blogs, and I’ve incorporated it since the first time I introduced commenting on this blog.

Flagging URLs in comment bodies. An almost universal characteristic of comment spam, especially those created by bots, is that they link to some website in the comment body, ostensibly to increase their search engine rankings. Well, if I detect a URL in the comment body, I automatically put it under moderation for later approval. This has the unfortunate side effect of denying legitimate commenters to instantly link to good resources. But I usually quickly approve such comments so the minor lag in publishing is not too onerous.

Unobtrusive CAPTCHAs. You have most probably encountered CAPTCHAs when commenting on some blogs (notably Blogspot). These are usually abstract images containing distorted letters and numbers that you’re supposed to enter in a text box to prove that you’re a human and not a web robot indiscriminately spewing unsolicited offers. Sometimes, these CAPTCHAs appear as simple questions like “What is 1 + 4?” I’ve encountered a particular question that stumped me; it goes something like “What is 5 + 3 x 8?” (Er, so do we multiply first per algebra rules or do we add first per arithmetic?) Needless to say, I didn’t bother commenting anymore.

The problem with CAPTCHAs is that they are an annoying hurdle when you want to leave comments. While they usually take only several seconds, those seconds add up when you like commenting on a lot of blogs.

An ingenious method that I’ve devised and implemented on my blog is a CAPTCHA that requires no human to take! (I haven’t seen any, but this method has probably been invented a lot elsewhere.) Anyway, the idea is to have extra input elements in your comment form that should be left in their original state when the comment is submitted. These elements are then prominently labeled “DO NOT MODIFY!” or “LEAVE BLANK!” and the whole thing—element and label—are hidden using (inline) style sheets. A hyperactive spambot will gleefully stuff these elements with text and URLs while humans won’t see a thing. (The labels are there to accommodate browsers that don’t support style sheets.) So when the comment is submitted and the untouchable form elements are raped with manhood enhancers or gambling scams, the comment is exiled to the junk heap. I’m proud to say that this eliminated approximately 95% of my spam and significantly reduced my comment moderation tasks.

This method can be extended in so many ways. One technique I see is to randomly generate the comment form with ever-changing trap elements. And if the browser support JavaScript, more interesting possibilities crop up. This is left as an exercise for the reader.

I have a few unimplemented tricks up my sleeve in case the comment spam increases or becomes smarter, but I’ll leave it for a (hopefully far) future article.

What do you think? Do you have any other nifty suggestions short of using the Akismet anti-spam service?

Filed under Programming and Web Development

Add your comment | 5 comments

Small-scale Measures to Combat Comment Spam

Comments

Sidebar

Archives

Categories

Months

Et Cetera

Subscribe

Validate