For a few years now I have been making due with a simple notification system for MSCS fail-over clusters that implemented either a script, or later, an application that started as the last resource in the cluster to send an email letting me know the cluster group is online. The biggest problem with this is that it doesn't let you know when you have unrecoverable failures. It also is a pull-type polling device, on;y firing at intervals or when it starts up.
I decided to fix this. There currently is not much (if anything) that allows Cluster notification for the MS platforms in the OpenSource arena, so I jumped in a started a simple, yet useful project to make just that. The end result is the MSCS Watcher. It is a .Net 2.0 OpenSource project that utilizes the MSCluster WMI objects, specifically the EventResourceStateChange object. I also used the Cluster Automation API to grab the current state of the cluster for reporting in the SMTP notifications. This is the first cut at a push-style cluster monitor for me, a complete re-write of the previous MSCSWatcher I had made for personal use. The pervious version also included SQL tables for historical and current state data. I've removed those from this version to simplify things, but I'm thinking of taking another cut at it in the future and putting it back in.
Originally I was trying to write an event handler for the events raised from the API for ClusSrv, but this involved writing a Wrapper for C++ events, and I just didn't have the time to get in to that right now... I ended up using a ManagementEventWatcher to grab the WMI events thrown by the cluster service. My first thoughts for the actual notification was to allow it to fire an executable, to help try it in to other monitoring tools. I decided to discard that idea when I realized it really didn't do me much immediate good. Instead, I decided to have it simply fire off an SMTP message. That stuck with my goal of keeping it simple.
The basic functionality of this thing is that it runs as a service (I recommend clustering it on all nodes in your cluster) and waits around for any of the resources in the cluster to change their state. (ie online to failed, offline to onlinepending, etc.). When a state change event is detected, the service records the change and does a scan of the cluster group that holds that resource. It then does a little logic to avoid sending extra messages and if it determines the circumstances are right, it sends a summery of the cluster group that owns the resource that triggered the event.
I decided that it is such a pain that there really isn't much out there either in the form of OpenSource utilities for Clusters or sample code to work with them that I would just give the whole thing away... So it is out on CodePlex now... If you use MS Clustering and don't know when your cluster fails over, go grab a copy and try it out :)
http://www.codeplex.com/mscswatcher