I’ve been bombarded (about 50 a day) by a new kind of spam comment lately. It’s been slipping through my MT-Blacklist filters, because it creates intelligible sentences by varying verbs (like “check” and “visit”) and nouns (like “site” and “pages”). Sometimes, when I’m browsing through other sites I see the same spam comments, so I figured I would post the regular expression I wrote to block it in case anyone happens to be searching for one, like the one I wrote a few months ago.
(check|visit)[\w\-_.]*(pages|sites|information|info)[\w\-_. ]*
This has been the most difficult spam variation I’ve had to deal with. The one weakness of most comment spam is that it’s bound to a static website address. Since spam is usually generated through robots, there are patterns that can be matched in order to block it. The key is figuring out what the pattern is, whether it may be a reoccurring IP address (very unlikely and unreliable), or a reoccurring website address (most likely). This one is different though, because the advertised websites keep changing. Not only that, but the sentences used to present the site are also inconsistent. The pattern, as a result, is more complex.
My problem with MT Blacklist was that I had to have the plugin itself recognize the spam in order to delete, instead of the other way around. I found out about MTCloseComments, which does exactly what the name entails– it closes comments on old entries and you get to decide how old is too old. My option to close the entry is after 7 days and I’ve had no spam since. I’m not going to uninstall Blacklist because it’s good to have a backup.
The MTCloseComments plugin is located here: http://mt-plugins.org/archives/entry/closecomments.php