Performance Monitoring with Nagios and RRDTool - pnp4Nagios

Historically many people use MRTG in a combination with RRDTool to collect and graph the performance data. With a transition to Nagios for a resource monitoring, it now makes sense to integrate the performance monitoring with Nagios as well. In fact, many Nagios plugins do supply performance data. So it is only a matter of storing this data in an RRD database and graph it with rrdtool. PNP4Nagios does just that!

The definite advantage of PNP4Nagios is an ease of installation and use. If nagios plugin supplies data according to the Nagios plugin development standards, then pnp4nagios will collect and display the data without any additional configuration!

It is also not difficult to write nagios custom plugins that supply performance data. Below is an example of vmstat, iostat and netstat data collected with a help of check_multi plugin and passively sent to the nagios master server via send_nsca plugin.

To use the examples below with check_multi plugin, nagios and nsca plugin need to be recompiled to allow larger plugin output buffer. For example, the default MAX_PLUGIN_OUTPUT_LENGTH definition in the include/nagios.h is set to 4096, which is not enough to collect the performance data. nsca plugin defines its MAX_PLUGINOUTPUT_LENGTH in include/common.h file.

# pwd
/usr/local/nagios

# vi etc/check_multi.cmd
command [system::vmstat]= check_vmstat 10 5 60 80
command [d0::iostat]= check_iostat c0t0d0 80 90
command [d1::iostat]= check_iostat c0t0d1 80 90
command [d4::iostat]= check_iostat c1t1d0 80 90
command [d5::iostat]= check_iostat c0t0d3 80 90
command [d6::iostat]= check_iostat c1t1d2 80 90
command [ce1::netstat]= check_netstat 1250 public 60 80

# vi libexec/check_vmstat 
#!/usr/bin/bash

if (( $# != 4 )); then
  echo "Usage: check_vmstat <mem_warn%> <mem_crit%> <cpu_warn> <cpu_crit>"
  exit 2
fi

usage=0

output=`vmstat 1 2|tail -1|awk '{print "| free_swap="$4"KB; free_mem="$5"KB; usr_cpu="$20"%; sys_cpu="$21"%; idle_cpu="$22"%;"}'`

swap=`swap -s|awk '{print $9" "$11}'|sed s/"k"/""/g`
used=`echo $swap|cut -f1 -d" "`
avail=`echo $swap|cut -f2 -d" "`
total=$(($used + $avail))
avail=$((100*$avail/$total))

free_swap=`echo $output | cut -d";" -f1 | cut -d"=" -f2 | cut -d"K" -f1`
free_mem=`echo $output | cut -d";" -f2 | cut -d"=" -f2 | cut -d"K" -f1`
idle_cpu=`echo $output | cut -d";" -f5 | cut -d"=" -f2 | cut -d"%" -f1`
cpu_usage=$((100 - $idle_cpu))

if (( $avail <= $2 )); then
 echo "CRITICAL: Free memory is ${avail}% ${output}"
 exit 2
elif (( $cpu_usage >= $4 )); then
 echo "CRITICAL: CPU usage is ${cpu_usage}% ${output}"
 exit 2
elif (( $avail <= $1 )); then
 echo "WARNING: Free memory is ${avail}% ${output}"
 exit 1
elif (( $cpu_usage >= $3 )); then
 echo "WARNING: CPU usage is ${cpu_usage}% ${output}"
 exit 1
else
 echo "OK: Free memory is ${avail}, cpu utilization is ${cpu_usage}% ${output}"
 exit 0
fi

# vi libexec/check_iostat
#!/usr/bin/bash

if (( $# != 3 )); then
  echo "Usage: check_iostat <disk> <warning> <critical>"
  exit 2
fi

usage=0

output=`iostat -sxn $1 60 2|tail -1|awk '{print "| r/s="$1"; w/s="$2"; kr/s="$3"KB/s; kw/s="$4"KB/s; wait="$5"; actv="$6"; wsvc_t="$7"ms; asvc_t="$8"ms; w%="$9"%; b%="$10"%;"}'
`
usage=`echo $output | cut -d";" -f10 | cut -d"=" -f2 | cut -d"%" -f1`

if (( $usage >= $3 )); then
 echo "CRITICAL: Disk utilization is ${usage}% ${output}"
 exit 2
elif (( $usage >= $2 )); then
 echo "WARNING: Disk utilization is ${usage}% ${output}"
 exit 1
else
 echo "OK: Disk utilization is ${usage}% ${output}"
 exit 0
fi

# vi libexec/check_netstat
#!/usr/bin/bash

if (( $# != 5 )); then
  echo "Usage: check_netstat <interface> <link_speed> <community> <warning> <critical>"
  exit 2
fi

usage=0
speed=$2
counter32=4294967296

output1=`snmpnetstat -v2c -on -c $3 localhost|grep $1|head -1`
sleep 60
output2=`snmpnetstat -v2c -on -c $3 localhost|grep $1|head -1`

ibytes1=`echo $output1|awk '{print $4}'`
obytes1=`echo $output1|awk '{print $5}'`
ibytes2=`echo $output2|awk '{print $4}'`
obytes2=`echo $output2|awk '{print $5}'`

if (($ibytes2 >= $ibytes1)); then
 ibytes=$((($ibytes2 - $ibytes1)/60000))
else
 ibytes=$((($counter32 - $ibytes1 + $ibytes2)/60000))
fi
if (($obytes2 >= $obytes1)); then
 obytes=$((($obytes2 - $obytes1)/60000))
else
 obytes=$((($counter32 - $obytes1 + $obytes2)/60000))
fi

usage=$((100*($ibytes+$obytes)/$speed))

output=`echo "| input="$ibytes"KB/s; output="$obytes"KB/s; usage="$usage"%;"`

if (( $usage >= $5 )); then
 echo "CRITICAL: Network link utilization is ${usage}% ${output}"
 exit 2
elif (( $usage >= $4 )); then
 echo "WARNING: Network link utilization is ${usage}% ${output}"
 exit 1
else
 echo "OK: Network link utilization is ${usage}% ${output}"
 exit 0
fi


# vi bin/multi_check
#!/usr/bin/bash

HOSTNAME=`hostname`
SERVERNAME=nagios_master_server
SERVICE_NAME=check_multi
CHECK_COMMAND=/usr/local/nagios/libexec/check_multi
COMMAND_ARGUMENTS="-f /usr/local/nagios/etc/check_multi.cmd -r 13 -t 80 -T 480"
SEND_NSCA_COMMAND=/usr/local/nagios/bin/send_nsca
SEND_NSCA_CONFIG=/usr/local/nagios/etc/send_nsca.cfg
ECHO_COMMAND=/usr/bin/echo

OUTPUT=`${CHECK_COMMAND} ${COMMAND_ARGUMENTS}`
RESULT=`echo $?`
${ECHO_COMMAND} ${HOSTNAME}\\t${SERVICE_NAME}\\t${RESULT}\\t${OUTPUT}|${SEND_NSCA_COMMAND} -H ${SERVERNAME} -c ${SEND_NSCA_CONFIG}


# crontab -l |grep check_multi
5,15,25,35,45,55 * * * * /usr/local/nagios/bin/multi_check >/dev/null 2>&1

The collected data is graphed automatically by pnp4nagios but the graphs can be customized by modifying default templates in /usr/local/nagios/share/pnp/templates.dist and storing them in /usr/local/nagios/share/pnp/templates. Here are three examples of templates used to display the above vmstat, iostat and netstat statistics:

# vi vmstat.php
<?php
$line[1]='#00FF00';
$line[2]='#0000FF';
$line[3]='#FF0000';
$fn[1]="LAST";
$fn[3]="AVERAGE";
$dsname[1]="Free swap and memory";
$dsname[2]="User, system and idle cpu";

for ($i = 1; $i <= 2; $i += 1) {
        if ($i == 1) {
                $m = 1; 
                $n = 2;
                $fn[2]="MIN";
                $def[$i] = 'COMMENT:' . '"\t\tLast\t\tMin\t\tAverage \j" ';
        } else {
                $m = 3; 
                $n = 5;
                $fn[2]="MAX";
                $def[$i] = 'COMMENT:' . '"\t\tLast\t\tMax\t\tAverage \j" ';
        }
        $ds_name[$i] = $dsname[$i]; 
        $opt[$i] = '--vertical-label "' . $UNIT[$i] . '" --title "' . $hostname . ' / ' . $servicedesc . '"';
        $c = 1;
        for ($j = $m; $j <= $n; $j += 1) {
                $def[$i] .= "DEF:var$c=$rrdfile:$DS[$j]:AVERAGE ";
                $def[$i] .= "LINE1:var$c" . $line[$c] . ":\"$NAME[$j] \" ";
                for ($k = 1; $k <=3; $k += 1) {
                if ($k != 3)
                 $def[$i] .= "GPRINT:var$c:$fn[$k]:\"%3.0lf $UNIT[$j] \" ";
                else
                 $def[$i] .= "GPRINT:var$c:$fn[$k]:\"%3.0lf $UNIT[$j] \\n\" ";
                }
                $c += 1;
        }
        $def[$i] .= 'COMMENT:' . $TEMPLATE[$i] . '" template\r" ';
        $def[$i] .= 'COMMENT:"Check Command ' . $TEMPLATE[$i] . '\r" ';
}
?>

# vi iostat.php
<?php
define("_LINE1", '#00FF00');
define("_LINE2", '#0000FF');
$dsname[1]="Reads and writes per sec";
$dsname[2]="KB read and written per sec";
$dsname[3]="Average number of transaction being waited and serviced";
$dsname[4]="Average service time in queue and for active transactions";
$dsname[5]="Percentage of time transactions are waiting and disk busy";
$j=-1;
for ($i = 1; $i <= 5; $i += 1) {
        $j+=2;
        $k=$j+1;
        $ds_name[$i] = $dsname[$i];
        $opt[$i] = '--vertical-label "' . $UNIT[$j] . '" --title "' . $hostname
. ' / ' . $servicedesc . '"';
        $def[$i] = "DEF:var1=$rrdfile:$DS[$j]:AVERAGE ";
        $def[$i] .= "DEF:var2=$rrdfile:$DS[$k]:AVERAGE ";
        $def[$i] .= 'COMMENT:' . '"\t\tLast\t\tMax\t\tAverage \j" ';
        $def[$i] .= "LINE1:var1" . _LINE1 . ":\"$NAME[$j] \" ";
        $def[$i] .= "GPRINT:var1:LAST:\"%3.0lf $UNIT[$j] \" ";
        $def[$i] .= "GPRINT:var1:MAX:\"%3.0lf $UNIT[$j] \" ";
        $def[$i] .= "GPRINT:var1:AVERAGE:\"%3.0lf $UNIT[$j] \\n\" ";
        $def[$i] .= "LINE1:var2" . _LINE2 . ":\"$NAME[$k] \" ";
        $def[$i] .= "GPRINT:var2:LAST:\"%3.0lf $UNIT[$k] \" ";
        $def[$i] .= "GPRINT:var2:MAX:\"%3.0lf $UNIT[$k] \" ";
        $def[$i] .= "GPRINT:var2:AVERAGE:\"%3.0lf $UNIT[$k] \\n\" ";
        $def[$i] .= 'COMMENT:' . $TEMPLATE[$j] . '" template\r" ';
        $def[$i] .= 'COMMENT:"Check Command ' . $TEMPLATE[$j] . '\r" ';
}
?>

# vi netstat.php
<?php
define("_LINE1", '#00FF00');
define("_LINE2", '#0000FF');
define("_LINE3", '#FF0000');
$ds_name[1] = "Input and output in KB/s";
$opt[1] = '--vertical-label "' . '"KB/s"' . '" --title "' . $hostname . ' / ' . $servicedesc . '"';
$def[1] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE ";
$def[1] .= "DEF:var2=$rrdfile:$DS[2]:AVERAGE ";
$def[1] .= 'COMMENT:' . '"\t\tLast\t\tMax\t\t Average \j" ';
$def[1] .= "LINE1:var1" . _LINE1 . ":\"$NAME[1] \" ";
$def[1] .= "GPRINT:var1:LAST:\"%3.0lf $UNIT[1] \" ";
$def[1] .= "GPRINT:var1:MAX:\"%3.0lf $UNIT[1] \" ";
$def[1] .= "GPRINT:var1:AVERAGE:\"%3.0lf $UNIT[1] \\n\" ";
$def[1] .= "LINE1:var2" . _LINE2 . ":\"$NAME[2] \" ";
$def[1] .= "GPRINT:var2:LAST:\"%3.0lf $UNIT[2] \" ";
$def[1] .= "GPRINT:var2:MAX:\"%3.0lf $UNIT[2] \" ";
$def[1] .= "GPRINT:var2:AVERAGE:\"%3.0lf $UNIT[2] \\n\" ";
$def[1] .= 'COMMENT:' . $TEMPLATE[1] . '" template\r" ';
$def[1] .= 'COMMENT:"Check Command ' . $TEMPLATE[1] . '\r" ';

$ds_name[2] = "Network link utilization in %";
$opt[2] = '--vertical-label "' . '"%"' . '" --title "' . $hostname . ' / ' . $servicedesc . '"';
$def[2] = "DEF:var3=$rrdfile:$DS[3]:AVERAGE ";
$def[2] .= 'COMMENT:' . '"\t\tLast\tMax\tAverage \j" ';
$def[2] .= "LINE1:var3" . _LINE3 . ":\"$NAME[3] \" ";
$def[2] .= "GPRINT:var3:LAST:\"%3.0lf $UNIT[3] \" ";
$def[2] .= "GPRINT:var3:MAX:\"%3.0lf $UNIT[3] \" ";
$def[2] .= "GPRINT:var3:AVERAGE:\"%3.0lf $UNIT[3] \\n\" ";
$def[2] .= 'COMMENT:' . $TEMPLATE[3] . '" template\r" ';
$def[2] .= 'COMMENT:"Check Command ' . $TEMPLATE[3] . '\r" ';
?>