2015-07-04

Detecting Rogue DHCP servers 2

So my script detecting rogue DHCP servers worked and worked well but having two or three DHCP servers covering the same scope is somewhat problematical.  So we eventually gave in and changed configuration to having a main DHCP server and failover servers.  I had to change the logic slightly to get the script to cover that.  The new version has a mode where it will only alert if it detects a rogue DHCP server it has not been told about, or gets no response from any of the servers in its valid server list.  One or more valid servers and all is right with the world.

As before, this script works in nagios.  I run nagios on ubuntu.  The scripts in the nagios package reside in /usr/lib/nagios/plugins.   Maybe there's a good place to put your own scripts but I put mine in there too (/usr/lib/nagios/plugins/check_rogue_dhcp.pl).

This script uses the nagios builtin DHCP checker: /usr/lib/nagios/plugins/check_dhcp.

Then you need a plugin command config file.  I edited the dhcp.cfg command file (/etc/nagios/plugins/) and added these lines:

# 'check_rogue_dhcp' command definition
define command{
   command_name check_rogue_dhcp
   command_line /usr/lib/nagios/plugins/check_rogue_dhcp.pl -f '$ARG1$' '$ARG2$' '$ARG3$'
}

Then to actually invoke the check you need to define a nagios object where you give the command the IP addresses of the servers (12.34.12.34 etc).  That may need a hostgroup or some other object depending on how you have nagios set up

#check that no rogue dhcp services are running
define service {
   service_description rogue-dhcp
   check_command check_rogue_dhcp!12.34.12.34!12.34.12.45
   use generic-service
   notification_interval 0 ; set > 0 if you want to be renotified
}

There's various ways to do this and I'm not great at strategic configuration of nagios, so I'll leave that to you.

The script can be downloaded from here.

The script:
------------check_rogue_dhcp.pl-------------------


#!/usr/bin/perl -w
# nagios: -epn
# the above line makes nagios run the script as a separately.
# rather than as part of nagios.
use POSIX;
use lib "/usr/lib/nagios/plugins";
use utils qw(%ERRORS);

sub fail_usage {
  if (scalar @_) {
    print "$0: error: \n";
    map { print "   $_\n"; } @_;
  }
  print "$0: Usage: \n";
  print "$0 [<options>] <server> [<server> [<server>]] \n";
  print "$0 [<options>] [-s <server> [-s <server> [-s <server>]]] \n";
  print "    options:  \n";
  print "      [-v [-v [-v]]] (verbose) \n";
  print "      [-t  <secs>] (wait this number of seconds)   \n";
  print "      [-f] (fuzzy - ok if one or more of the designated servers answer)   \n";
  print "      [-F] (force (default) - all the designated servers must answer)   \n";
  print " \n";
  exit 3 ;
}

my $verbose = 0;
my %servers=();
my $opt = "-t 5";
my $time = 5;
my $force=1;

## for some reason I can't test for empty ARGs in the while loop
@ARGV = grep {!/^\s*$/} @ARGV;

# examine commandline args
while ($ARGV=$ARGV[0]) {
  my $myarg = $ARGV;
  if ($ARGV eq '-s') {
    shift @ARGV;
    if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
    if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
    if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; }
  }
  elsif ($ARGV eq '-t') {
    shift @ARGV;
    if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
    if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
    if ($ARGV !~ /^(\d+)$/) { fail_usage ("$myarg must be followed by an number"); }
    $time = $1;
    $opt = "-t $time";
  }
  elsif ($ARGV eq '-f' ) { $force=0; }
  elsif ($ARGV eq '-F' ) { $force=1; }
  elsif ($ARGV eq '-h' or $ARGV eq '--help' ) { fail_usage ; }
  elsif ($ARGV =~ /^-/ ) { fail_usage " invalid option ($ARGV)"; }
  elsif ($ARGV =~ /^\d+\.\d+\.\d+\.\d+$/)
    # servers should be ip addresses.  I'm not doing detailed checks for this.
    { if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; } }
  else { last; }
  shift @ARGV;
}

if (scalar @ARGV) { fail_usage "didn't understand arguments: (".join (" ",@ARGV).")"; }  
my $serversn = scalar keys %servers;

if ($verbose > 2) {
  print "verbosity=($verbose)\n";
  print "servers = ($serversn)\n";
  if ($serversn) { for my $i (keys %servers) { print "server ($i)\n"; } }
}

if (!$serversn) { fail_usage "no servers"; }
my $responses=0;
my $responders="";
my @check_dhcp = qx{/usr/lib/nagios/plugins/check_dhcp -v $opt};
foreach my $value (@check_dhcp) {
  if ($value =~ /Added offer from server \@ /i){
    $value =~ m/(\d+\.\d+\.\d+\.\d+)/i;
    my $host = $1;
    # we find a server in our list
    if (defined($servers{$host})) { $responses++; $responders.="$host "; }
    else {
      # we find a rogue DHCP server.  Danger Will Robinson!
      print "SERVICE STATUS:CRITICAL: Rogue DHCP service running on $host";
      exit $ERRORS{'CRITICAL'}
    }
  }
}
if ($responses == $serversn) {
  # we saw all the servers in our list.  All is good.
  print "SERVICE STATUS:OK: $responses of $serversn Expected Responses to DHCP Broadcast";
  exit $ERRORS{'OK'};
}

if ($responses == 0) {
  # we found no DHCP responses.
  print "SERVICE STATUS:CRITICAL: no DHCP service responded";
  exit $ERRORS{'CRITICAL'}
}

# we found less DHCP servers than we should have. Oh Nos!
$responders =~ s/ $//;
if ($force == 1) {
  print "SERVICE STATUS:WARNING: $responses of $serversn Responses to DHCP Broadcast. Only ($responders) responded. ";
  exit $ERRORS{'WARNING'};
}
else {
  print "SERVICE STATUS:OK: $responses of $serversn Responses to DHCP Broadcast. Only ($responders) responded. ";
  exit $ERRORS{'OK'};
}

No comments: