Tuesday, November 29, 2011

Making sure you know when a server goes offline

Rumor has it that servers are somewhat important, and bad things may occasionally happen when they go down for an extended period of time.

All joking aside, servers are crucial. Maintaining and monitoring them is a big part of why you are in business. So nothing feels worse than having your customer call to tell you their server has been offline for 30 minutes, when you supposedly have a system in place to alert you in the event of downtime.

Let's make sure it doesn't happen to you.

Plugging the holes on this one is easy, with one thing to do, and one not to do in the dashboard:

To Do:

First, double click on your server in the dashboard to bring up the edit server dialog, and make the changes indicated:

Getting rid of the Offline (Maintenance Mode) option will give you an occasional alert when you have intentionally turned a server off... but isn't that worlds better than some unforeseen event powering down the server in a manner seen as intentional and therefore never being reported?

Note: If you are below the capped pricing for the server in question, increasing the frequency to 5 minute monitoring will incur an additional cost. You can leave it at 15 minute monitoring, follow the rest of the steps, and still have a reasonably airtight (albeit slower) notification system.

Not to Do:

You may have seen this screen when editing a site in the dashboard, and the completist in you may have said "well, here are some fields, so I guess I should go ahead and fill this out..."

In this case, you shouldn't. The rationale of this option is that if the network goes down, which would be confirmed by the inability to ping your site's router, it's an Internet connectivity issue and therefore nothing you can address. Unless you live in certain parts of the world where steady Internet is not available, this mindset probably does not reflect the type of service you hope to provide your customers. Just leave it blank.

While this covers the areas to address regarding configuration, you should also verify the signal flow for the alert messages themselves:
  1. Have you configured your mail templates in the dashboard?
  2. If you are having the alerts addressed from your domain, have you updated your SPF record (if you have one) to include the RemoteManagement servers?
  3. Have you taken steps to ensure your spam filters allow these messages to pass through?
Once you've gone through this list, the chances of you getting that dreaded call will be a lot smaller, restoring the natural order of things where you are calling your customers first.