Stopping bots in apache at the network layer

by Alan Williamson

If you are running a web farm, you will have it setup according to your own requirements. Generally speaking all network traffic will come in through one machine. This machine may be virtual, passing back the IP address for redundancy purposes using something like a heartbeat.

Fundamentally there are two types of requests; a human request and a computer request. A human request we define as someone sitting infront of a browser accessing the site. The response for this user should be as fast as possible. A computer request is an automated request, from a search engine, rss reader, or many of the new service aggregators that are popping up. The computer request need not have as high a priority as a human request and subsequently can be dealt with differently.

Let's assume then that you have setup an area of the farm that will deal with specific computer related requests. This area could run on much slower machines, or since its read-only data, run from a mirrored database backend. Either way this free's up your machine resources for those human requests to be as fast as possible.

It is fair to assume that not all requests to your web farm are legitimate. Some requests could be naughty and be attempting to break your server. For example, spam bots will attempt to continually post things to public forms, or evil scripts that attempt a memory overrun by hitting invalid URL's. Experience tells us, that if you are a victim of this type of attack, it's usually concentrated over a short space of time, from a small set of IP addresses. Therefore, we don't want to necessarily clog up our filtering network with IP addresses that we may never see again. So we want to spot these IP addresses, and block them for the next few days, before releasing them back.

There are many ways you can block traffic coming to your farm. One of the most efficient ways is to block it at the network level before it gets to any application. The tool of choice for this is iptables which is the backbone of many Internet firewalls.

The following script was based on a script from DAVBlack. The original script monitored the output from the Apache log file and took action accordingly. It would watch for particular URI references and keep a count of how many times a given IP address hit that, and if over a particular count value it would then invoke an iptable command to stop any further requests.

The new script takes this concept and adds a significant number of improvements. The first one we do is to watch for search engines by looking for requests for robots.txt file. When we detect this, we do a further check to see if this request is from a few of the top search engines, such as Google, Yahoo and MSN. If this is the case, then we assume the IP address is an actual fact part of the ClassC block, and we then redirect that whole block instead of just the individual IP. If we get it wrong, then its not a big issue, as we are not denying the requests, merely pushing them to another part of the farm.

One issue with iptables is the ability to add duplicate rules. This doesn't cause any problems except for large rulesets, it can slow down the packet matching algorithm. Therefore to improve this, we simply run a grep to make sure we aren't already filtering for this IP address/block.

The redirection is performed by using NAT translation at the network layer.

For detecting naughty scripts, we check for known suspicious web requests they may be looking for. For example requesting the classic, default.ida or /cgi-bin/ URI is a common trick. There are many others and the list in this script is by no means exhaustive. If they are found hitting one of these, we then store their IP address and count the number of times they try this. There are settings at the top of the script for tweeking all the time outs and traps.

The script below is designed to work with the Spikesource apache configuration, if you are not using the Spikesource stack edit the location of the log directory property.

Edit

Complete Script

#!/usr/bin/perl -w
############################################
#
# apacheFilter.pl 
#
# Based on http://www.pettingers.org/code/davblack.html
#
# http://www.gnu.org/licenses/gpl.txt
# Modified on 02AUG04 as provided under GNU GPL licensing
# no rights reserved under this modification
# 
############################################

use strict;
use Socket;

# The log file you want to monitor
my($LOG) = '/opt/oss/var/apache2/log/access_log';

# The cache file to keep track of attackers
my($CACHE) = '/home/firewall/etc/trackback_deadlist.txt';
my($BOTFILE) = '/home/firewall/etc/historyBotRedirect.rc';

my($REASONS) = '(OPTIONS|/cgi-bin/|default.ida)';
my($BOTS) = '(robots.txt|Bloglines|PubSub|Technoratibot)';
my($BIGBOTS) = '(Google|Yahoo|msnbot)';
my($BLADE) = '192.168.0.14:80';

# regex for whitelisted IPs - never blacklist these addresses
my($LOCALNET) = '^(?:127\.0\.0\.1)';

# your kernel-firewall, see "man iptables" to redefine params used
my($IPTABLES) = '/sbin/iptables';
my($ADDRULE) = '-I'; # cmdline for insert rule
my($DELRULE) = '-D'; # cmdline for delete rule

# Maximum time (sec) before they are removed from the database
# unless they are already blacklisted
my($AGEOUT) = 86400;

# Time delay (day) before they are released from the blacklist in DAYS!
my($RELEASEDAYS) = 3;

# Time delay (sec) to check the database for cleanup
my($CHECK) = 600;

# Maximum number of booboos before they get listed
my($MAXHITS) = 5;

########### No user defined parameters below ################
my($OCT) = '(?:25[012345]|2[0-4]\d|1?\d\d?)';
my($IP) = $OCT . '\.' . $OCT . '\.' . $OCT . '\.' . $OCT;

$RELEASEDAYS *= 86400; # Lots of seconds!

print "\nInitializing...";

# Poor man's touch command
open (TOUCH, ">> $CACHE"); close (TOUCH);

# Start the monitoring

print "running\n";
taillog();

sub taillog {
  my($offset, $name, $line, $ip, $reason, $stall, $ind) = '';
  my (@loser, @buildlist) = ();

  $offset = (-s $LOG); # Don't start at beginning, go to end

  while (1==1) {
    sleep(1);
    $| = 1;
    $stall += 1;
    if ((-s $LOG) < $offset) {
      print "Log shrunk, resetting..\n";
      $offset = 0;
    }
    open(TAIL, $LOG) || print STDERR "Error opening $LOG: $!\n";
    if (seek(TAIL, $offset, 0)) {
      # found offset, log not rotated
    } else {
      # log reset, follow
      $offset=0;
      seek(TAIL, $offset, 0);
    }

    while ($line = <TAIL>) {
      chop($line);
			
      if (($BOTS) && ($line =~ m/$BOTS/)) {
        # Lets look for the Bots/Spikers that should be using another blade
        $reason = $1;
        if ($line =~ m/($IP)/) {
          $ip = $1;
          appendBot( $ip, $reason, $line );
        }
      }elsif (($REASONS) && ($line =~ m/$REASONS/)) {
        $reason = $1;
        if ($line =~ m/($IP)/) {
          $ip = $1;
          print "apacheFilter.Trackback: $ip $reason\n";
          open(LIST, $CACHE) || print STDERR "Error opening $CACHE: $!\n";
          $ind = 0;
          @buildlist = <LIST>;
          foreach $line(@buildlist) {
            @loser = split(/,/, $line);
            # [0] is IP, [1] is time, [2] is hits
            if ($loser[0] eq $ip) {
              # Already listed, increase count
              $loser[2] += 1;
              if ($loser[2] == $MAXHITS) {
                blockIp($ip, $reason);
                $loser[2] += 1; # Avoid double listings (???)
              }
              $line = join(',', @loser); # put back together for saving
              $line .= "\n";
              $buildlist[$ind] = $line;                    
              $ip = 'logged';
	    } # End if already listed
              $ind += 1;
          } # End foreach read
          close (LIST);

          if ($ip ne 'logged') {
            $line = $ip . ',' . time() . ',' . 1 . "\n";
            push (@buildlist, $line);
          }
          open (LIST, ">$CACHE") || print STDERR "Error opening $CACHE: $!\n";
          print LIST @buildlist;
          close (LIST);
        } # End if IP
        next;
      } # End if match reasons
    } # End while read line
    $offset=tell(TAIL);
    close(TAIL);

    if ($stall >= $CHECK) {
      # Time to do cleanup
      $stall = 0;
      @buildlist = ();
      open(LIST, $CACHE) || print STDERR "Error opening $CACHE: $!\n";
      while ($line = <LIST>) {
        @loser = split(/,/, $line);
        # [0] is IP, [1] is time, [2] is hits
        if ($loser[2] >= $MAXHITS) {
          # already blacklisted
          if (($loser[1] + $RELEASEDAYS) > time()) {
            push (@buildlist, $line);
          } else {
            iptables($DELRULE, $loser[0]);
            print "Freeing $loser[0]", " on ", scalar localtime, "\n";
          } #set free after $RELEASEDAYS
        }elsif (($loser[1] + $AGEOUT) > time()){
          # Not listed and not aged out
          push (@buildlist, $line);
        }
      } # End while reading
      close (LIST);
      # open for writing
      open (LIST, ">$CACHE") || print STDERR "Error opening $CACHE: $!\n";
      print LIST @buildlist;
      close (LIST);
      @buildlist = ();
    } # End cleanup check
  } # End while endless loop
} # End sub taillog


sub appendBot {
  my($ip,$reason,$line) = @_;
  my $theIP;
	
  if ( $line =~ m/$BIGBOTS/ ){
    $reason = $1;
    my ($a,$b,$c,$d) = split('\.', $ip);
    $theIP = "$a.$b.$c.0/24";
  }else{
    $theIP .= $ip;
  }
	
  # Check to see this rule hasn't already been defined
  my $grepOutput = `$IPTABLES -t nat --list -n | /bin/grep $theIP`;
  if ( $grepOutput eq "" ){
    my $cmd = "$IPTABLES -t nat -A PREROUTING -s ";
    $cmd .= $theIP;
    $cmd .= " -i eth0 -p tcp --dport 80 -j DNAT --to $BLADE";
    system( $cmd );

    # Open up the bot file and take a copy of this
    $cmd .= "\n";
    print "apacheFilter.BotTrap: $ip $reason\n";
    open (BB, ">>$BOTFILE") || print STDERR "Error opening $BOTFILE: $!\n";
    print BB "## $reason $ip -- $line \n";
    print BB "$cmd\n";
    close (BB);
  }
}

sub blockIp {
  my($ip,$reason) = @_;
  if ($ip =~ m/$LOCALNET/) {
    print "apacheFilter.Trackback: WHITELISTED HOST $ip - NOT BLOCKING \n";
    return;
  }
  print "apacheFilter.Trackback: $ip being blocked because of $reason; " , scalar localtime, " \n";
  iptables($ADDRULE, $ip);
  return;
} # End sub blockIp

sub iptables {
  my($action, $ip) = @_;
  my(@args) = ($IPTABLES,
               $action,
               'WEB', '--source',
               $ip,
               '-j', 'REJECT', '--reject-with', 'icmp-host-prohibited');
  system(@args);
  return;
} # End sub iptables


Most Recent

Most Popular

Most Active Categories




Back To Top Add New Article Printable Page
MediaWiki

This page has been accessed 6,918 times.

This page was last modified 20:02, 28 November 2005.