home :: technology :: opensource :: apacheBadBots.txt

Mar 18, 2005

Banning "Bad Bots" in Apache Cuts My Web Traffic In Half

Well, it's a good thing I'm not advertiser supported, or I'd be severely conflicted over this. I just cut my web traffic numbers in half.

2 days ago I banned a whole bunch of bots from accessing glitchnyc.com to stop "referrer spam." Referrer spam is a way for morally flexible sites and site-affiliate programs to boost their traffic and google ranking by getting their sites into your web statistics pages. Many ISPs generate these statistic pages for their users, and I personally use awstats to generate my own.

To get their links into your statistics page, slimy site owners write an automated script, or bot, to visit your site hundreds of times pretending to come from a site like www.iFreakingLovePoker.com. (Note, not a real site, I don't want to link any of these !*%^#! sites any more here.)

Finally fed up with having 2500 "fake" visitors to my site every month screwing with my actual statistics, I decided to block all visitors with a referer* value that had any questionable words like poker, loans, and hold-em. To be sure I caught all of the sites and many I haven't even seen yet, I define the block-list using regular expressions to match all domains with these words in them.

(*note: "referrer" is misspelled as referer in the apache config file, so I will use the grammatically incorrect but technically correct version in any technical references that follow)
Now, these bots are all happily getting 403 Forbidden errors and regular users can still get my site! I'll have to do some upkeep to add new offending words when they show up, but thats as simple as adding a few more lines to httpd.conf (or .htaccess if I was on a hosted site)

Here's the sections of httpd.conf that blocks referrer spam for those looking to duplicate what I've done here.

First, I define a variable called bad_referers and add the RegEx's to it. Here's a sample:

setenvifnocase referer "^http://.*poker.*" bad_referer
setenvifnocase referer "^http://.*wsop.*" bad_referer

Next, I block access to my site for those offending bots: (this is repeated for directory /cgi-bin/ and /var/www/html/)

<Directory />
Options FollowSymLinks
AllowOverride None
Deny from env=bad_referer
</Directory>
To ensure that it's working, I add my own site to the list of bad referers and test. Surfing straight to my site brings the page up as normal, but clicking a link from my site to itself (which carries a referer value of http://www.glitchnyc.com) gives me a 403 Forbidden. Perfect.

To finish up, I remove my own site from the block-list and add some more keywords to match the rest of the spammers. Watching my logs, I still see the referrer spam, but now they're all getting code 403.

tail -f access_log
bess01.nycps.k12.ny.us - - [18/Mar/2005:12:56:56 -0500] "GET / HTTP/1.0" 403 300 "http://free-texas-hold-em.-.com/" "Mozilla/4.0 (compatible; MSIE 4.01; Mac_PowerPC)"

If you're trying this yourself, remember you'll have to restart apache to make the settings take effect!