Friday, February 3, 2012

Master Class: Outage Preparedness in Max MailProtection

MailProtection is generally thought of as a filtering tool, but it can be invaluable to protect continued mailflow when used properly. The Mail Continuity feature of GFI Max MailProtection offers you a layer of insulation in the event of an internet outage, server crash or planned server migration.

Taking a few minutes now to understand how mail continuity works and learning the best steps to take before, during and after a disruption will help ensure the least amount of inconvenience during these situations.

A Refresher on how MailProtection Handles Mail

To know how to prepare and respond, you need to know how MailProtection will react in a given situation so you can make the proper adjustment.

  • Connection issues or Temporary (400 level) SMTP error replies from destination server: GFI Max MailProtection will queue any mail to an added user for up to four days whenever the destination mail server is unreachable, times out, disconnects or gives a 400-level (temporary) SMTP error. During this time:
    • Users are able to log on to the MailProtection interface and view their mail, reply to messages and compose new mail 
    • Our system will automatically retry delivery of the queued messages at intervals in case the issue on the destination mail server’s end has been resolved 
    • If a message is still queued at the end of four days, it will be deleted from the system and a delivery failure notification will be sent to the sender
    • A reference to this message will be kept in our logs for 10 days from the initial time of attempted delivery
  • Delivery confirmation (250 OK) from destination server: GFI Max MailProtection will delete the message after receiving delivery confirmation since there should be no reason to attempt delivery a second time. A reference to this message will be kept in our logs for 10 days from the initial time of attempted delivery.
  • Permanent (500 level) SMTP rejection replies from destination server: GFI Max MailProtection will delete the message upon receiving a permanent rejection reply since by definition the permanent error should not change if delivery is attempted a second time. A delivery failure notification will be sent to the sender, and a reference to this message will be kept in our logs for 10 days from the initial time of attempted delivery.
Investigating the Outage with the Message Log Search Feature

Due to the fact that your server replies to mail delivery attempts are now collected in the MailProtection interface (without it, they would reside individually on the servers of those who were trying to send you mail), the Message Delivery Log is an invaluable tool for investigating potential mail issues on your server, and is accessible by navigating as shown in the screenshot below.

Click to enlarge

You can search the log to quickly determine the fate of an individual message, and the default search gives you an up-to-date listing of all your messages in reverse chronological order.

Click to enlarge

The results screen will show your messages, sorted by time of receipt. You will also see the disposition of the message in question and the reply (if any) given by your server. Clicking on the disposition links will give you a full message transaction summary which includes a time-stamped message reply.

If you are experiencing a large-scale mail issue, examining the disposition of non-quarantined messages in this log will give you an idea of what immediate steps may be necessary to minimize data loss:

  • Queued (Temp Fail) messages require no additional intervention for proper handling by MailProtection, as the message from the server triggers the proper response. Your customers can log into the control panel and view these messages for up to 4 days while you investigate the issue on their server
  • Bounce (Perm Fail) messages were rejected when MailProtection attempted delivery. The message has already been deleted from the system and the sender was notified of failed delivery. If the rejection is in error, you need to sever communications between MailProtection and the server as seen below
  • Delivered messages are great... except when you can't find those messages on your server. In cases where the server is "eating" messages,  you need to sever communications between MailProtection and the server as seen below
Forcing a Queue in the Event of a Malfunctioning Server

By design, GFI Max MailProtection queues mail for network timeouts and temporary SMTP error replies for up to 4 days. However, it will not hold a message after a 250 OK or 500-level permanent failure reply from the destination server – even if that server gave the reply in error.

It is sometimes to your advantage to guarantee that mail does not reach your server for up to four days while you troubleshoot the issue locally. By changing your destination mail server listed in the interface to an invalid address (most easily done by adding “null.” to the front of the existing entry), you can guarantee the messages will not be delivered and therefore remain safe in our queue.

Click to enlarge

Once you are sure that your server is behaving normally, you can change the destination address back to where it is correctly pointing to your server (note: do NOT change it to or any local IP address, as the results are unpredictable. Use only an address that is completely non-resolvable). If you see the error is still present, you can re-force the queue and the remaining mail should again be protected.

Minimize Inconvenience and Potential Loss with Preplanning

It's easy to be loved by your customers when everything is going great, but it is by your ability to respond to and minimize disasters that you are truly judged. Taking a few minutes to get your systems in place while everything is running smoothly will ensure a minimum disruption in the event of a downed server in your network. The following steps will make sure that everything is in place before unexpected downtime occurs:

  1. Verify that all MX entries point to MailProtection servers only. This will ensure that when the emergency brake is pulled, all mail will spool on our servers instead of being somehow routed to your unprepared server
  2. Give all users the ability to access their MailProtection control panel and instruct them in basic usage. Introduce them to the MailProtection dashboard, provide their login credentials and give them a copy of the end MailProtection End User Manual (contact your trusty sales rep if you don't have a copy!) - consider it the email equivalent of a fire drill.
  3. Perform a test run of the various steps needed to use MailProtection for disaster scenarios. Knowing that you are comfortable with investigating the Message Delivery Log, forcing a queue via the MailProtection Control Panel and that end users are able to login to their queued mail allows you to spend those crucial first few minutes of downtime minimizing losses instead of determining an action plan 
Taking the time now to understand and implement these steps will help you breathe easier knowing that you can quickly step in and save the day when disaster strikes, projecting the image of confidence and competence that will keep your customers loyal.