January 24, 2006 - Filed in Linux HowTos by FelixEverytime you visit a website, your browser also sends the URL of the previous page to the server you are accessing. In default configuration, Apache does save that information to the logfiles. This information can then be used by logfile analyzing tools such as Webalizer to create statistics of which pages most of your visitors came from.
Yesterday I noticed that someone was massively accessing our server and submitting referer URLs like
I did a
grep 'www.site-full-of-adverts.tld' access_log
and found out that all of the requests came first from 22.214.171.124, then from 126.96.36.199 and finally from 188.8.131.52. My first thought was to lock these IPs out by using Apache's access.conf, but then decided that any traffic with these IPs was wasted traffic. To keep the load on the server due to these spam requests as low as possible, I decided that Linux' iptables should drop all packages from these IPs and not even bother sending anything back.
The commands needed for this are
iptables -I INPUT -s 184.108.40.206 -j DROP
iptables -I INPUT -s 220.127.116.11 -j DROP
iptables -I INPUT -s 18.104.22.168 -j DROP
Entering "iptables -L -n" on the shell then shows up all blocked IPs (and anything else in iptables of course) :
iospirit:~ # iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
DROP all -- 22.214.171.124 0.0.0.0/0
DROP all -- 126.96.36.199 0.0.0.0/0
DROP all -- 188.8.131.52 0.0.0.0/0
I wonder whether this "referer polution" technique is a new trend among spammers. I sure hope it is not. Compared to normal email spam, it costs a lot more bandwidth, memory and processing time. At least if your site is interactive.
"But how could a spammer possibly benefit from their websites being part of a log file?" you may ask. It's not directly obvious. Their aim is to appear in publically shared logfile analysis results and thus gain a higher search engine page rank for their websites transmitted as faked referer.
A little research provides backing for this theory: Google returns around 17.700 results for the IP "184.108.40.206" alone.
If you want to check your logfiles, grep for the "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; MRA 4.1 (build 00975))" user agent, i.e. via
grep 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0; MRA 4.1 (build 00975))' access_log
At least that is what the spammer in question here sent along.