Historically many people use MRTG in a combination with RRDTool to collect and graph the performance data. With a transition to Nagios for a resource monitoring, it now makes sense to integrate the performance monitoring with Nagios as well. In fact, many Nagios plugins do supply performance data. So it is only a matter of storing this data in an RRD database and graph it with rrdtool. PNP4Nagios does just that!
The definite advantage of PNP4Nagios is an ease of installation and use. If nagios plugin supplies data according to the Nagios plugin development standards, then pnp4nagios will collect and display the data without any additional configuration!
It is also not difficult to write nagios custom plugins that supply performance data. Below is an example of vmstat, iostat and netstat data collected with a help of check_multi plugin and passively sent to the nagios master server via send_nsca plugin.
To use the examples below with check_multi plugin, nagios and nsca plugin need to be recompiled to allow larger plugin output buffer. For example, the default MAX_PLUGIN_OUTPUT_LENGTH definition in the include/nagios.h is set to 4096, which is not enough to collect the performance data. nsca plugin defines its MAX_PLUGINOUTPUT_LENGTH in include/common.h file.
# pwd /usr/local/nagios # vi etc/check_multi.cmd command [system::vmstat]= check_vmstat 10 5 60 80 command [d0::iostat]= check_iostat c0t0d0 80 90 command [d1::iostat]= check_iostat c0t0d1 80 90 command [d4::iostat]= check_iostat c1t1d0 80 90 command [d5::iostat]= check_iostat c0t0d3 80 90 command [d6::iostat]= check_iostat c1t1d2 80 90 command [ce1::netstat]= check_netstat 1250 public 60 80 # vi libexec/check_vmstat #!/usr/bin/bash if (( $# != 4 )); then echo "Usage: check_vmstat <mem_warn%> <mem_crit%> <cpu_warn> <cpu_crit>" exit 2 fi usage=0 output=`vmstat 1 2|tail -1|awk '{print "| free_swap="$4"KB; free_mem="$5"KB; usr_cpu="$20"%; sys_cpu="$21"%; idle_cpu="$22"%;"}'` swap=`swap -s|awk '{print $9" "$11}'|sed s/"k"/""/g` used=`echo $swap|cut -f1 -d" "` avail=`echo $swap|cut -f2 -d" "` total=$(($used + $avail)) avail=$((100*$avail/$total)) free_swap=`echo $output | cut -d";" -f1 | cut -d"=" -f2 | cut -d"K" -f1` free_mem=`echo $output | cut -d";" -f2 | cut -d"=" -f2 | cut -d"K" -f1` idle_cpu=`echo $output | cut -d";" -f5 | cut -d"=" -f2 | cut -d"%" -f1` cpu_usage=$((100 - $idle_cpu)) if (( $avail <= $2 )); then echo "CRITICAL: Free memory is ${avail}% ${output}" exit 2 elif (( $cpu_usage >= $4 )); then echo "CRITICAL: CPU usage is ${cpu_usage}% ${output}" exit 2 elif (( $avail <= $1 )); then echo "WARNING: Free memory is ${avail}% ${output}" exit 1 elif (( $cpu_usage >= $3 )); then echo "WARNING: CPU usage is ${cpu_usage}% ${output}" exit 1 else echo "OK: Free memory is ${avail}, cpu utilization is ${cpu_usage}% ${output}" exit 0 fi # vi libexec/check_iostat #!/usr/bin/bash if (( $# != 3 )); then echo "Usage: check_iostat <disk> <warning> <critical>" exit 2 fi usage=0 output=`iostat -sxn $1 60 2|tail -1|awk '{print "| r/s="$1"; w/s="$2"; kr/s="$3"KB/s; kw/s="$4"KB/s; wait="$5"; actv="$6"; wsvc_t="$7"ms; asvc_t="$8"ms; w%="$9"%; b%="$10"%;"}' ` usage=`echo $output | cut -d";" -f10 | cut -d"=" -f2 | cut -d"%" -f1` if (( $usage >= $3 )); then echo "CRITICAL: Disk utilization is ${usage}% ${output}" exit 2 elif (( $usage >= $2 )); then echo "WARNING: Disk utilization is ${usage}% ${output}" exit 1 else echo "OK: Disk utilization is ${usage}% ${output}" exit 0 fi # vi libexec/check_netstat #!/usr/bin/bash if (( $# != 5 )); then echo "Usage: check_netstat <interface> <link_speed> <community> <warning> <critical>" exit 2 fi usage=0 speed=$2 counter32=4294967296 output1=`snmpnetstat -v2c -on -c $3 localhost|grep $1|head -1` sleep 60 output2=`snmpnetstat -v2c -on -c $3 localhost|grep $1|head -1` ibytes1=`echo $output1|awk '{print $4}'` obytes1=`echo $output1|awk '{print $5}'` ibytes2=`echo $output2|awk '{print $4}'` obytes2=`echo $output2|awk '{print $5}'` if (($ibytes2 >= $ibytes1)); then ibytes=$((($ibytes2 - $ibytes1)/60000)) else ibytes=$((($counter32 - $ibytes1 + $ibytes2)/60000)) fi if (($obytes2 >= $obytes1)); then obytes=$((($obytes2 - $obytes1)/60000)) else obytes=$((($counter32 - $obytes1 + $obytes2)/60000)) fi usage=$((100*($ibytes+$obytes)/$speed)) output=`echo "| input="$ibytes"KB/s; output="$obytes"KB/s; usage="$usage"%;"` if (( $usage >= $5 )); then echo "CRITICAL: Network link utilization is ${usage}% ${output}" exit 2 elif (( $usage >= $4 )); then echo "WARNING: Network link utilization is ${usage}% ${output}" exit 1 else echo "OK: Network link utilization is ${usage}% ${output}" exit 0 fi # vi bin/multi_check #!/usr/bin/bash HOSTNAME=`hostname` SERVERNAME=nagios_master_server SERVICE_NAME=check_multi CHECK_COMMAND=/usr/local/nagios/libexec/check_multi COMMAND_ARGUMENTS="-f /usr/local/nagios/etc/check_multi.cmd -r 13 -t 80 -T 480" SEND_NSCA_COMMAND=/usr/local/nagios/bin/send_nsca SEND_NSCA_CONFIG=/usr/local/nagios/etc/send_nsca.cfg ECHO_COMMAND=/usr/bin/echo OUTPUT=`${CHECK_COMMAND} ${COMMAND_ARGUMENTS}` RESULT=`echo $?` ${ECHO_COMMAND} ${HOSTNAME}\\t${SERVICE_NAME}\\t${RESULT}\\t${OUTPUT}|${SEND_NSCA_COMMAND} -H ${SERVERNAME} -c ${SEND_NSCA_CONFIG} # crontab -l |grep check_multi 5,15,25,35,45,55 * * * * /usr/local/nagios/bin/multi_check >/dev/null 2>&1
The collected data is graphed automatically by pnp4nagios but the graphs can be customized by modifying default templates in /usr/local/nagios/share/pnp/templates.dist and storing them in /usr/local/nagios/share/pnp/templates. Here are three examples of templates used to display the above vmstat, iostat and netstat statistics:
# vi vmstat.php <?php $line[1]='#00FF00'; $line[2]='#0000FF'; $line[3]='#FF0000'; $fn[1]="LAST"; $fn[3]="AVERAGE"; $dsname[1]="Free swap and memory"; $dsname[2]="User, system and idle cpu"; for ($i = 1; $i <= 2; $i += 1) { if ($i == 1) { $m = 1; $n = 2; $fn[2]="MIN"; $def[$i] = 'COMMENT:' . '"\t\tLast\t\tMin\t\tAverage \j" '; } else { $m = 3; $n = 5; $fn[2]="MAX"; $def[$i] = 'COMMENT:' . '"\t\tLast\t\tMax\t\tAverage \j" '; } $ds_name[$i] = $dsname[$i]; $opt[$i] = '--vertical-label "' . $UNIT[$i] . '" --title "' . $hostname . ' / ' . $servicedesc . '"'; $c = 1; for ($j = $m; $j <= $n; $j += 1) { $def[$i] .= "DEF:var$c=$rrdfile:$DS[$j]:AVERAGE "; $def[$i] .= "LINE1:var$c" . $line[$c] . ":\"$NAME[$j] \" "; for ($k = 1; $k <=3; $k += 1) { if ($k != 3) $def[$i] .= "GPRINT:var$c:$fn[$k]:\"%3.0lf $UNIT[$j] \" "; else $def[$i] .= "GPRINT:var$c:$fn[$k]:\"%3.0lf $UNIT[$j] \\n\" "; } $c += 1; } $def[$i] .= 'COMMENT:' . $TEMPLATE[$i] . '" template\r" '; $def[$i] .= 'COMMENT:"Check Command ' . $TEMPLATE[$i] . '\r" '; } ?> # vi iostat.php <?php define("_LINE1", '#00FF00'); define("_LINE2", '#0000FF'); $dsname[1]="Reads and writes per sec"; $dsname[2]="KB read and written per sec"; $dsname[3]="Average number of transaction being waited and serviced"; $dsname[4]="Average service time in queue and for active transactions"; $dsname[5]="Percentage of time transactions are waiting and disk busy"; $j=-1; for ($i = 1; $i <= 5; $i += 1) { $j+=2; $k=$j+1; $ds_name[$i] = $dsname[$i]; $opt[$i] = '--vertical-label "' . $UNIT[$j] . '" --title "' . $hostname . ' / ' . $servicedesc . '"'; $def[$i] = "DEF:var1=$rrdfile:$DS[$j]:AVERAGE "; $def[$i] .= "DEF:var2=$rrdfile:$DS[$k]:AVERAGE "; $def[$i] .= 'COMMENT:' . '"\t\tLast\t\tMax\t\tAverage \j" '; $def[$i] .= "LINE1:var1" . _LINE1 . ":\"$NAME[$j] \" "; $def[$i] .= "GPRINT:var1:LAST:\"%3.0lf $UNIT[$j] \" "; $def[$i] .= "GPRINT:var1:MAX:\"%3.0lf $UNIT[$j] \" "; $def[$i] .= "GPRINT:var1:AVERAGE:\"%3.0lf $UNIT[$j] \\n\" "; $def[$i] .= "LINE1:var2" . _LINE2 . ":\"$NAME[$k] \" "; $def[$i] .= "GPRINT:var2:LAST:\"%3.0lf $UNIT[$k] \" "; $def[$i] .= "GPRINT:var2:MAX:\"%3.0lf $UNIT[$k] \" "; $def[$i] .= "GPRINT:var2:AVERAGE:\"%3.0lf $UNIT[$k] \\n\" "; $def[$i] .= 'COMMENT:' . $TEMPLATE[$j] . '" template\r" '; $def[$i] .= 'COMMENT:"Check Command ' . $TEMPLATE[$j] . '\r" '; } ?> # vi netstat.php <?php define("_LINE1", '#00FF00'); define("_LINE2", '#0000FF'); define("_LINE3", '#FF0000'); $ds_name[1] = "Input and output in KB/s"; $opt[1] = '--vertical-label "' . '"KB/s"' . '" --title "' . $hostname . ' / ' . $servicedesc . '"'; $def[1] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE "; $def[1] .= "DEF:var2=$rrdfile:$DS[2]:AVERAGE "; $def[1] .= 'COMMENT:' . '"\t\tLast\t\tMax\t\t Average \j" '; $def[1] .= "LINE1:var1" . _LINE1 . ":\"$NAME[1] \" "; $def[1] .= "GPRINT:var1:LAST:\"%3.0lf $UNIT[1] \" "; $def[1] .= "GPRINT:var1:MAX:\"%3.0lf $UNIT[1] \" "; $def[1] .= "GPRINT:var1:AVERAGE:\"%3.0lf $UNIT[1] \\n\" "; $def[1] .= "LINE1:var2" . _LINE2 . ":\"$NAME[2] \" "; $def[1] .= "GPRINT:var2:LAST:\"%3.0lf $UNIT[2] \" "; $def[1] .= "GPRINT:var2:MAX:\"%3.0lf $UNIT[2] \" "; $def[1] .= "GPRINT:var2:AVERAGE:\"%3.0lf $UNIT[2] \\n\" "; $def[1] .= 'COMMENT:' . $TEMPLATE[1] . '" template\r" '; $def[1] .= 'COMMENT:"Check Command ' . $TEMPLATE[1] . '\r" '; $ds_name[2] = "Network link utilization in %"; $opt[2] = '--vertical-label "' . '"%"' . '" --title "' . $hostname . ' / ' . $servicedesc . '"'; $def[2] = "DEF:var3=$rrdfile:$DS[3]:AVERAGE "; $def[2] .= 'COMMENT:' . '"\t\tLast\tMax\tAverage \j" '; $def[2] .= "LINE1:var3" . _LINE3 . ":\"$NAME[3] \" "; $def[2] .= "GPRINT:var3:LAST:\"%3.0lf $UNIT[3] \" "; $def[2] .= "GPRINT:var3:MAX:\"%3.0lf $UNIT[3] \" "; $def[2] .= "GPRINT:var3:AVERAGE:\"%3.0lf $UNIT[3] \\n\" "; $def[2] .= 'COMMENT:' . $TEMPLATE[3] . '" template\r" '; $def[2] .= 'COMMENT:"Check Command ' . $TEMPLATE[3] . '\r" '; ?>