A Blade is a specialized server hardware used within server infrastructures of medium and large sized companies. As such a solution the HP server blades are delivered as thin, modular servers with memory and one to four processors. They’re generally intended for a single, dedicated application (such as web, database, email, or line of business applications) and can be easily inserted into a space-saving rack that houses similar servers.
Such as the monitoring of services also the overview on the hardware health status is important. Particular checks allow to detect problems and alert the administrator.
As technology can be used the SNMP protocol, allowing to query the required information from the build-in management console of the HP BladeSystem. The introduced check is implemented with a dedicated Perl script, querying the given OID(s) and returning the state OK, Warning, Critical and Unknown.
Fan conditions
The continuously cooling of the hardware components is accomplished by a certain number of active colling units. The BladeSystem management unit allows to read the state of each single cooling unit through a determined OID. Since this would require a check for each single unit, an iteration loops trough all available results and returns an overall status result.
In the special case the OID for the fan category is: .1.3.6.1.4.1.232.22.2.3.1.3.1.11 , where fan 1 answers under .1.3.6.1.4.1.232.22.2.3.1.3.1.11.1 and so on.
$oid = = “.1.3.6.1.4.1.232.22.2.3.1.3.1.11.”;
for (my $i = 1;$i<=15;$i++){
$data_fan = SNMP_getvalue($snmp_session,$oid.$i);
if (int($data_fan)){
$data_returnText_ .=$data_fan.”; “;
#hold the highest return or a return diff from 2. i.e.3
if (($data_fan > $overall_status) or ($data_fan != 2)){
$overall_status = $data_fan;
}
}
}
The variable $data_fan gets the current status of the single fan unit. The integer 2 stands for OK, 3 for Warning and 4 for Critical. In this way a subsequent check verifies weather the value 2 has already been set for the overall check result to return, or whether a higher (worse) result is retrieved with the current iteration. In this case the overall result will obtain the worse check result of all checked fans.
For a better result comprehension the $data_text keeps the status string of all the single check results.
An example check implementation into the file check_snmp_HP_BladeSystem.pl
allows the single check definition for Nagios as follows:
Call of check: /usr/lib/nagios/plugins/check_snmp_HP_Bladesystem.pl -H <hostname> -C public -w 3 -c 4 -t
.1.3.6.1.4.1.232.22.2.3.1.3.1.11
Result: OK Fan-Conditions (2) Fan return codes: 2; 2; 2; 2; 2; 2; 2; 2; 2; 2;
Power supply
This checks can be implemented following the same principle as the fan status check and determines withing a single check the status of all single power supplies.
Call of check: /usr/lib/nagios/plugins/check_snmp_HP_Bladesystem.pl -H <hostname> -C public -w 3 -c 4 -t
.1.3.6.1.4.1.232.22.2.5.1.1.1.17.
Result: OK Power-Supply (2) Power Supply return codes: 2; 2; 2; 2; 2; 2;
In this case 6 power supply module have been detected and registered.
System state
A check of a more general nature is the check for the overall system state reported by the Management console of the HP BladeSystem. This check can be implemented by using only a single OID with a single result evaluation:
$oid = “.1.3.6.1.4.1.232.22.2.3.1.1.1.16.1″;
$data = SNMP_getvalue($snmp_session,$oid);
$data_text=’UnKnown’;
$data_text=’Normal system state.’ if $data eq 2;
$data_text=’Sytem degraded’ if $data eq 3;
$data_text=’Undefined System Error’ if $data eq 1;
$data_text=’Critical System failure’ if $data eq 4;
Possible ok result:OK System-State (2) Normal system state.
Of course there are available other status values used for monitoring and the corresponding OIDs can be retrieved from the documentation from the supplied hardware documentation.
Additional infos for the script
The value $snmp_session hold the SNMP session information. It’s usage requires the following use definition:
use Net::SNMP;
Definition:
($snmp_session,$snmp_error) = Net::SNMP->session(
-version => ‘snmpv2c’,
-hostname => $opt_host,
-community => $opt_community,
-port => $opt_port,
);
The called function SNMP_getvalue($snmp_session,$oid); is used to query the given OID and returns, if successful, the hardware status value as integer.
sub SNMP_getvalue{
my ($snmp_session,$oid) = @_;
my $res = $snmp_session->get_request(
-varbindlist => [$oid]);
if(!defined($res)){
print “ERROR: “.$snmp_session->error.”\n”;
exit;
}
return($res->{$oid});
}
This Nagios plug-in script is used within the Nagios based solution NetEye.
Download
Download check_snmp_hp_bladesystem
LINKS
Additional information and documentation can be retrieved
NagiosExchange:Compaq-HP Proliant Server and Blade Checks (SNMP)
Hello, just browsing for information for my HP website. Lots of information out there. Wasn’t exactly what I was looking for, but great site. Cya later.
Hi Nice script, i am definetly willing to use this..
However upon execution i get “check_snmp_HP_Bladesystem.pl: Permission denied” becuase i password protected my OA with Ldap… Is it possible to submit switches with username\ pass in the check ?
sorry, my mistake offcourse after downloading the file i need to adjust the security permissions !