Blog
Company news and beyond.

Linux HowTo: Defeating referer spam

January 24, 2006 - Filed in Linux HowTos by Felix
Everytime you visit a website, your browser also sends the URL of the previous page to the server you are accessing. In default configuration, Apache does save that information to the logfiles. This information can then be used by logfile analyzing tools such as Webalizer to create statistics of which pages most of your visitors came from.

Yesterday I noticed that someone was massively accessing our server and submitting referer URLs like

http://www.site-full-of-adverts.tld/cheap_pills.php
http://www.site-full-of-adverts.tld/free_download.php

I did a


grep 'www.site-full-of-adverts.tld' access_log

and found out that all of the requests came first from 83.54.29.254, then from 217.107.222.75 and finally from 66.246.218.107. My first thought was to lock these IPs out by using Apache's access.conf, but then decided that any traffic with these IPs was wasted traffic. To keep the load on the server due to these spam requests as low as possible, I decided that Linux' iptables should drop all packages from these IPs and not even bother sending anything back.

The commands needed for this are


iptables -I INPUT -s 217.107.222.75 -j DROP
iptables -I INPUT -s 83.54.29.254 -j DROP
iptables -I INPUT -s 66.246.218.107 -j DROP

Entering "iptables -L -n" on the shell then shows up all blocked IPs (and anything else in iptables of course) :


iospirit:~ # iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
DROP all -- 83.54.29.254 0.0.0.0/0
DROP all -- 66.246.218.107 0.0.0.0/0
DROP all -- 217.107.222.75 0.0.0.0/0
..

I wonder whether this "referer polution" technique is a new trend among spammers. I sure hope it is not. Compared to normal email spam, it costs a lot more bandwidth, memory and processing time. At least if your site is interactive.

"But how could a spammer possibly benefit from their websites being part of a log file?" you may ask. It's not directly obvious. Their aim is to appear in publically shared logfile analysis results and thus gain a higher search engine page rank for their websites transmitted as faked referer.

A little research provides backing for this theory: Google returns around 17.700 results for the IP "66.246.218.107" alone.

If you want to check your logfiles, grep for the "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; MRA 4.1 (build 00975))" user agent, i.e. via


grep 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; MRA 4.1 (build 00975))' access_log

At least that is what the spammer in question here sent along.

0 comment(s):

Write a comment

The comments are closed for this article.