By: Ernest Byrd
Quite often I may interact with a new Partner that has installed a MSP RMM agent on a Server platform, and I get the question: "How do I address these alerts?". In response, I first have to explain what the monitors are focused on and what the alerts really mean.
Are you using Performance Monitoring Checks? IF yes, are your thresholds set appropriately? Let's find out.
Before beginning a configuration review, it is important to understand why it is good to monitor Performance in a System. Simply put, the items of review have a direct correlation to System Performance. This, in turn, has a direct correlation to your End User's Production and Efficiency. If a System is running slower than normal, employees are usually less productive - this means less profit for an Organization. Now that we have a good understanding of WHY we are monitoring, let's talk about HOW we can monitor effectively:
Step 1:
If you are just starting with the MSP RMM Dashboard, Navigate to Settings > Alerts > Alert Policy. Here, choose "Do not fail check, report only" for all five of the Performance Monitoring Checks dropdowns.
Note: Only proceed if your Server Agent has been monitored for eight days, minimum. This is important because it allows you to set appropriate thresholds for your Performance Monitoring Checks.
Step 2:
Select the target Server Agent that you wish to configure within the MSP RMM Dashboard. Under the Checks tab locate the Performance Monitoring Check of focus, then click the "More Information" column hyperlink.
Step 3:
After the "More information..." window appears, click the View Report button.
Tip: If you have been using Performance Monitoring for some time (and you did not originally use Step 1 in your configurations), you may want to click the "Outage History" tab and make note of any failure figures. This allows you to get a more precise number for threshold configurations.
Step 4:
Close the "More information..." window, while leaving the Performance Monitoring Reports open in a separate tab. This will allow you to toggle between the Report and thresholds while setting appropriate boundaries.
Step 5:
Right click each Performance Monitoring Check and choose Edit Check. Set your thresholds to be just above the "Avg." information seen in the Report tab (Red line should be above the Green line). There may be times where your Green line crosses the Red line, but we will account for this in a following step. Also, please note that the Memory Utilization configurations can be a little more complex than the other Performance Monitors. When setting these up, start with your static variables (3 of your 5 memory monitors should be static, and more easily identified with the naming convention and threshold figures). The following two items under Memory monitoring require computation (you may need to pull out a calculator for a quick threshold setting).
Step 6:
Once you have configured your items from Step 5 (which I often refer to as the "Major thresholds"), you may configure your consecutive series. I refer to the consecutive series as the "Minor thresholds". When you right click your focus device and choose edit (or simply double click the target), you will want to make note of your "24x7 Frequency" under the Check frequency section. After noting this item, choose "Alert Policy". This is where you can set the Performance Monitoring Check consecutive failure settings. To give an example of appropriate use, let's say that you have a period of Green line above Red line for 30 minutes in a day for a Memory variable. Perhaps this is a long SQL query, and it is OK for this slight spike. IF your 24x7 frequency is set to 15 minutes, you would use "Only after 2 consecutive failures" for the 'Memory Usage:' section.
Step 7:
Determine if the System has any periods of abnormal consumption. A good example of this may be a long backup routine that occurs every Friday night for 3 hours. As a compliment to your Performance Monitoring Checks configuration, you may want to set a recurring Maintenance Window. To do this, Right Click the System and select Maintenance Mode > Schedule. Define the time that is appropriate for this abnormal utilization.
Step 8:
Setup any alerts that you intend to use for these Performance Monitoring Checks. This can help you more quickly identify behavior that is abnormal for this System.
After following these steps, the next time you receive an alarm for a failed Performance Monitoring Check you can do something with it. Consider carrying it to the Organization and explaining that the Hardware is doing more than it was originally specced for. Perhaps there are more users than originally intended and/or newer applications have found their way into the System. At this point, it could be worth adding more resources to the System (such as Memory). Maybe moving some of the applications off of this box to a standalone Application Server is an option. Or, it may be time to implement a newer System (Project time?!).
Hopefully this adds a little more value to your offering. Have a great day!