We had a Vcenter (appliance) that stopped accepting changes. After some quick investigation we found out the log partition was full. After some clean up and extension + reboot the Vcenter worked fine again but to prevent this from happening in the future we wanted the disk pressure to be monitored by Nagios as well. We already monitored the availability of the web interface, SSH and ping.
Since VMware doesn't support third party software on their appliance software (ESXi,VCSA,vCenter Support Assistant,...) I wasn't keen on having to install the NRPE client again after every update of a VMware appliance. For that I created this solution:
First of all if you didn't do it yet, move your appliance to key based authentication to prevent having to store your password inside a Nagios config file.
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2100508
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002866
I used this guy's script but used a wrapper to prevent having to install the script on the destination host
https://supporthandbook.wordpress.com/2011/10/03/monitoring-hosts-on-nagios-without-nrpe/
And took some ideas from this thread
The script
../../usr/lib64/nagios/plugins/bash_scripts/check_disk_without_nrpe.sh
#!/bin/bash
##checks the used disk space for nagios
##usage disk.sh mountpoint critical_used%value warning_used%value
size=`df -Ph $1 | tail -1 | awk '{print $5}'`
size=$(echo ${size%\%})
if [ $size -gt $2 ]
then
echo "Critical $1 size exceeded $2 % current size $size "
exit 2;
fi
if [ $size -gt $3 ]
then
echo "Warning $1 size exceeded $3 % current size $size"
exit 1;
fi
echo "OK $1 current size $size %"
exit 0;
The wrapper to execute the script on the remote host
../../usr/lib64/nagios/plugins/bash_scripts/check_disk_usage_over_ssh.sh
ssh -oStrictHostKeyChecking=no $2@$1 "bash -s" -- < /usr/lib64/nagios/plugins/bash_scripts/check_disk_without_nrpe.sh "$3 $4 $5" 2>/dev/null
Add this inside ../../etc/nagios/objects/commands.cfg
#Check mount point disk usage over SSH (you will have to add the nagios user key to the user you are trying to connect as)
define command{
command_name check_disk_no_nrpe
command_line /usr/lib64/nagios/plugins/bash_scripts/check_disk_usage_over_ssh.sh $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$
}
And then implement the check inside the config file for the host you want to check ../../etc/nagios/conf.d/hosts/vcenter.cfg
define service{
use generic-service ; Inherit default values from a template
host_name vcenter.koendiels.be
service_description Check / usage
check_command check_disk_no_nrpe!root!/!90!75
contact_groups it-team
}
Repeat this for every mount point you want to check