2015-07-04

Allow only one program at a time to access a resource in linux

I had several programs I needed to run but I wanted to make sure only one program got access to a resource at the same time.  So I wrote a short and simple perl script that would do this.  This script needs a directory in /var/run that it can write to.  Since the script doesn't run as root, I have to create this directory after each reboot.

The script is invoked like this:

1nstance.pl -d /var/run/once -n com1 -- my-program my-args

It will run my-program if there is no other script accessing the named resource com1.  If there is another script currently running and using com1 then it will just quit.

The script can be downloaded from here.

--------------------------1nstance.pl-------------------



#!/usr/bin/perl -w
#
#(c) copyright 2015 Kim Holburn
# Licensed under GPLv3

use strict;
use Getopt::Long;

my $verbose = 0;
my $help = 0;
my $myname = "";
my $mydir = "/var/run/once";
my $PID = 0;
my $time = 0;
my $timeout = 0;

Getopt::Long::Configure('require_order');
GetOptions ('verbose+' => \$verbose,
            'directory=s' => \$mydir,
            'name=s' => \$myname,
            'timeout=i' => \$timeout,
            'help|?' => \$help);

if ($help) {
  print <<EOM;
one instance - make sure a program only runs one instance.
usage $0:
$0 [options] -- <command> [ARGS]
  options:
    -h|-?|--help - print this screen
    -d|--directory <directory> 
       directory to store run files. 
       Default is /var/run/once
    -t|--timeout <n> 
       time in seconds that a program is allowed to run 
       before being considered hung and will be killed
       -t=0 means no timeout. The running process will 
       be left untouched.
    -v|--verbose - print extra messages
    -n|--name
       name of process or resource to be run once
       The default is the name of the program.    

EOM

  exit;
}

if ($timeout < 0) { die "$0 ERROR: specified timeout less than zero ($timeout)"; }
if (10000 < $timeout) { die "$0 ERROR: ($timeout) too large"; }
if ($mydir !~ m{^/}) { die "$0 ERROR: directory ($mydir) is not an absolute path!"; }
if (! -d $mydir) { die "$0 ERROR: directory ($mydir) does not exist!"; }

# first argument is command to run
my $path = shift;
if (!$path) { die "$0 ERROR: No command "; } 
my $program = $path;
# command must be a full absolute path.  
my $dir = ".";
my $args="";
if ($program =~ m#/#) {
  $program =~ s#^.*/##;
  $dir =~ s#/[^/]*$##;
}
if (!$program) { die "$0 ERROR: program name invalid.  Must be a filename"; } 
if (! -e $path) { die "$0 ERROR: program ($path) does not exist"; }
if (-d $path) { die "$0 ERROR: program ($path) is a directory"; }
if (! -x $path) { die "$0 ERROR: program ($path) not executable"; }
if (!$myname) { $myname=$program; }

my $run1 = "$mydir/$myname";

sub deleterun {
  if ( -e $run1 ) {
    unlink $run1;
    if ( -e $run1 ) { die "$0 ERROR: cannot delete file ($run1)"; }
  }
}

if ($verbose) {
  print "debug timeout=($timeout) dir=($mydir) name=($myname) \n";
  print "debug v=($verbose) 1=($run1) prog=($program) args=(";
  print join (")(",@ARGV);
  print ") \n";
}

if (open(my $fh, '<', "$run1")) {
  while (<$fh>) {
    chomp ;
    if (/^\s*$/) { next; }
    if (/^PID (\d+)$/i) { $PID = $1; }
    elsif (/^TIME (\d+)$/i) { $time = $1; }
  }
  close $fh;
  # no PID or time
  # this shouldn't happen and probably means we have the wrong file
  if (!$PID or !$time) { die "$0 ERROR: run file ($run1) has no PID or time"; }
  chomp (my $proc = `ps hp $PID -o %c`);
  # orphaned run file, delete
  if ($proc eq "") { deleterun; }
  else  {
    # another instance running
    my $timediff = time - $time;
    if (0 == $timeout or $timediff <= $timeout) { die "$0 ERROR: another instance of $myname ($proc) running"; }  
    print STDERR "$0 ERROR: another instance of $myname ($proc) has probably hung\n";
    # another process has probably hung
    # grab its children
    my @PIDS = (`ps h --ppid $PID -o %p`, $PID);
    kill ('TERM', @PIDS);
    chomp ($proc = `ps hp $PID -o %c`);
    # couldn't kill it.  Give up.
    if ($proc) { die "$0 ERROR: can't kill instance ($myname) ($proc) PID ($PID)"; }
    # it died OK.  Delete run file if possible.
    deleterun;
  }
}

if (1 < $verbose) { print STDERR "creating run file ($run1) ... \n"; }
open(my $fh, '>', "$run1") or die "$0 ERROR: opening file ($run1) ($!)" ;
print $fh "PID $$\n";
print $fh "time ", time, "\n";
close $fh;

#If something bad happens delete the run file
sub cleanup {
  if (-e $run1) {
    if (1 < $verbose) { print STDERR "Deleting run file...\n"; }
    unlink $run1;
  }
  exit;
};
$SIG{'INT'}=\&cleanup;
$SIG{'TERM'}=\&cleanup;
$SIG{'QUIT'}=\&cleanup;

if ($verbose) { print STDERR "Starting ....\n"; }

system ($path, @ARGV);

cleanup;

Detecting Rogue DHCP servers 2

So my script detecting rogue DHCP servers worked and worked well but having two or three DHCP servers covering the same scope is somewhat problematical.  So we eventually gave in and changed configuration to having a main DHCP server and failover servers.  I had to change the logic slightly to get the script to cover that.  The new version has a mode where it will only alert if it detects a rogue DHCP server it has not been told about, or gets no response from any of the servers in its valid server list.  One or more valid servers and all is right with the world.

As before, this script works in nagios.  I run nagios on ubuntu.  The scripts in the nagios package reside in /usr/lib/nagios/plugins.   Maybe there's a good place to put your own scripts but I put mine in there too (/usr/lib/nagios/plugins/check_rogue_dhcp.pl).

This script uses the nagios builtin DHCP checker: /usr/lib/nagios/plugins/check_dhcp.

Then you need a plugin command config file.  I edited the dhcp.cfg command file (/etc/nagios/plugins/) and added these lines:

# 'check_rogue_dhcp' command definition
define command{
   command_name check_rogue_dhcp
   command_line /usr/lib/nagios/plugins/check_rogue_dhcp.pl -f '$ARG1$' '$ARG2$' '$ARG3$'
}

Then to actually invoke the check you need to define a nagios object where you give the command the IP addresses of the servers (12.34.12.34 etc).  That may need a hostgroup or some other object depending on how you have nagios set up

#check that no rogue dhcp services are running
define service {
   service_description rogue-dhcp
   check_command check_rogue_dhcp!12.34.12.34!12.34.12.45
   use generic-service
   notification_interval 0 ; set > 0 if you want to be renotified
}

There's various ways to do this and I'm not great at strategic configuration of nagios, so I'll leave that to you.

The script can be downloaded from here.

The script:
------------check_rogue_dhcp.pl-------------------


#!/usr/bin/perl -w
# nagios: -epn
# the above line makes nagios run the script as a separately.
# rather than as part of nagios.
use POSIX;
use lib "/usr/lib/nagios/plugins";
use utils qw(%ERRORS);

sub fail_usage {
  if (scalar @_) {
    print "$0: error: \n";
    map { print "   $_\n"; } @_;
  }
  print "$0: Usage: \n";
  print "$0 [<options>] <server> [<server> [<server>]] \n";
  print "$0 [<options>] [-s <server> [-s <server> [-s <server>]]] \n";
  print "    options:  \n";
  print "      [-v [-v [-v]]] (verbose) \n";
  print "      [-t  <secs>] (wait this number of seconds)   \n";
  print "      [-f] (fuzzy - ok if one or more of the designated servers answer)   \n";
  print "      [-F] (force (default) - all the designated servers must answer)   \n";
  print " \n";
  exit 3 ;
}

my $verbose = 0;
my %servers=();
my $opt = "-t 5";
my $time = 5;
my $force=1;

## for some reason I can't test for empty ARGs in the while loop
@ARGV = grep {!/^\s*$/} @ARGV;

# examine commandline args
while ($ARGV=$ARGV[0]) {
  my $myarg = $ARGV;
  if ($ARGV eq '-s') {
    shift @ARGV;
    if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
    if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
    if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; }
  }
  elsif ($ARGV eq '-t') {
    shift @ARGV;
    if (!($ARGV = $ARGV[0])) { fail_usage ("$myarg needs an argument"); }
    if ($ARGV =~ /^-/) { fail_usage ("$myarg must be followed by an argument"); }
    if ($ARGV !~ /^(\d+)$/) { fail_usage ("$myarg must be followed by an number"); }
    $time = $1;
    $opt = "-t $time";
  }
  elsif ($ARGV eq '-f' ) { $force=0; }
  elsif ($ARGV eq '-F' ) { $force=1; }
  elsif ($ARGV eq '-h' or $ARGV eq '--help' ) { fail_usage ; }
  elsif ($ARGV =~ /^-/ ) { fail_usage " invalid option ($ARGV)"; }
  elsif ($ARGV =~ /^\d+\.\d+\.\d+\.\d+$/)
    # servers should be ip addresses.  I'm not doing detailed checks for this.
    { if (!defined($servers{$ARGV})) { $servers{$ARGV}=1; } }
  else { last; }
  shift @ARGV;
}

if (scalar @ARGV) { fail_usage "didn't understand arguments: (".join (" ",@ARGV).")"; }  
my $serversn = scalar keys %servers;

if ($verbose > 2) {
  print "verbosity=($verbose)\n";
  print "servers = ($serversn)\n";
  if ($serversn) { for my $i (keys %servers) { print "server ($i)\n"; } }
}

if (!$serversn) { fail_usage "no servers"; }
my $responses=0;
my $responders="";
my @check_dhcp = qx{/usr/lib/nagios/plugins/check_dhcp -v $opt};
foreach my $value (@check_dhcp) {
  if ($value =~ /Added offer from server \@ /i){
    $value =~ m/(\d+\.\d+\.\d+\.\d+)/i;
    my $host = $1;
    # we find a server in our list
    if (defined($servers{$host})) { $responses++; $responders.="$host "; }
    else {
      # we find a rogue DHCP server.  Danger Will Robinson!
      print "SERVICE STATUS:CRITICAL: Rogue DHCP service running on $host";
      exit $ERRORS{'CRITICAL'}
    }
  }
}
if ($responses == $serversn) {
  # we saw all the servers in our list.  All is good.
  print "SERVICE STATUS:OK: $responses of $serversn Expected Responses to DHCP Broadcast";
  exit $ERRORS{'OK'};
}

if ($responses == 0) {
  # we found no DHCP responses.
  print "SERVICE STATUS:CRITICAL: no DHCP service responded";
  exit $ERRORS{'CRITICAL'}
}

# we found less DHCP servers than we should have. Oh Nos!
$responders =~ s/ $//;
if ($force == 1) {
  print "SERVICE STATUS:WARNING: $responses of $serversn Responses to DHCP Broadcast. Only ($responders) responded. ";
  exit $ERRORS{'WARNING'};
}
else {
  print "SERVICE STATUS:OK: $responses of $serversn Responses to DHCP Broadcast. Only ($responders) responded. ";
  exit $ERRORS{'OK'};
}