Monitoring your infrastructure is a crucial task, and (regardless of the software you've chosen) requires you to make use of the Simple Network Management Protocol (SNMP). The Protocol is embedded in multiple local devices such as routers, switches, servers, firewalls, and wireless access points accessible using their IP address — providing a common mechanism for network devices to relay management information within single and multi-vendor LAN or WAN environments.
So whether you would like to be informed of the state of your Loadbalancer.org appliance, Windows Server, or the Cisco switch, it all can be done with the same Protocol.
Net-SNMP is possibly the most popular and the most advanced implementation of the SNMP protocol. It allows SNMPd enabled devices to be queried; as well as providing the daemon capability itself.
snmpwalk -v 2c -c public localhost
Issuing the above command on the Loadbalancer.org appliance with SNMP enabled will output quite a few lines to your terminal (N.B. if it works, please change your community string — it's highly insecure!).
In fact, the 4551 lines of output available will mean that each of these will be responding to a different property that can be queried. However, despite the vast amount of those properties, not all required scenarios are covered.
Luckily net-SNMP provides us with the ability to extend its functionality in various ways...for instance by writing your own SNMP agent like this open-source HAProxy agent. This requires some level of programming knowledge though. There are also easier ways of expanding the SNMP daemon available as well.
Firstly, we can add the log file of our choice to be read by snmpd. It is very easy to do:
logmatch hb_warning /var/log/ha.log 10 WARN
Adding the line above to your
snmd.conf the file will add a new object that can be queried now (snmpd requires a reload to read the new config file in the first place).
Let me explain the particular values found in this new line:
logmatchis the config directive in net-snmp.
hb_warningis the name of your choice for the created trap.
/var/log/ha.logis in my case the path to the heartbeat log file.
10is the interval of how often the file is being checked, expressed in seconds.
WARNis the regular expression to be matched.
If the regular expression is being matched a Success message will be returned, which means you should set up a warning in your monitoring software if such a message is received. Sadly, as you can see, there isn't much flexibility with this solution, and in this scenario, Success doesn't really sound appropriate, although it is simple to implement. Also, it would require a line like that for each value we want to check for, in order to monitor for different events.
Thankfully, by adding just a single line to the config file you can add new functionality to your snmpd.
Another option is to use another feature on Net-SNMP. The Net-SNMP Agent provides an extension MIB (NET-SNMP-EXTEND-MIB) that can be used to query command outputs and arbitrary shell scripts.
This allows for quite a lot of flexibility if an appropriate script is written. A simple example of it is
extend hb_log /usr/bin/tail -n 1 /var/log/ha.log
In this case:
extendis the config directive providing the functionality.
/usr/bin/tail -n 1is the command that we are executing (it must be a full path).
/var/log/ha.logis the argument for the command, which in this case is the log file path.
This will simply send the last line of the heartbeat log to monitoring software, where again you could choose some particular values that trigger warnings etc.
More can be achieved with the use of a shell script, as some messages could be defined for particular keywords appearing in the log file.
As mentioned above, using the same
extend directive we can also use shell scripts to provide new functionality. Below is a very easy example that will tell us if both of your Loadbalancer.org nodes in the pair are Active:
In order to add this functionality to our snmpd, we need to append the below line to our config:
extend /bin/sh /root/both_active.sh
Focusing a lot on Heartbeat in previous examples, I would also like to show you some examples with other programs....
For instance, whenever making any chances to Layer 7 configuration on your Loadbalancer.org appliance, HAProxy needs to be reloaded. Whilst usually it isn't the problem, it may lead to HAProxy processes lingering for longer periods of time.
The script below will let you know how many of the HAProxy processes are currently running:
Another idea of such a script, not necessarily for your Loadbalancer.org appliance, may be finding zombie processes:
I hope these ideas may fit some of your scenarios or at least give you some ideas of what else can be implemented.