Availability monitoring
Introduced in 2022.2
Updated in 2023.1
The Availability Monitoring integration is a pre-built workflow package for basic availability monitoring. It continuously polls IT infrastructure components critical for your organization, such as web servers, routers, or printers. Whenever any device becomes unavailable, the system triggers an alert or submits an incident ticket. This article describes how to configure the integration and how to use it.
How Availability Monitoring works
You select an Organization via workflow parameters. This Organization, being the Organization associated with Networks you want to monitor, automatically includes active devices on those networks as being monitored. These devices are filtered based on computer and hardware categories chosen via workflow parameters.
The following is required:
- An Organization record
- Network records that are linked to this Organization record (via the Organization field in Networks)
- Devices linked to the Network records that are linked to the Organization record (via the Network field in the device)
- Devices in active status
- Devices that have Primary IP Addresses
There are two scheduled tasks involved in this integration
- Scheduled Task #1 - generates a list of unavailable devices
- Scheduled Task #2 - repeatedly pings unavailable devices and takes action
Scheduled Task #1 runs, generates a list of Primary IP Addresses for the eligible devices and then pings them to test for availability. When there is a failure at a Primary IP address, that Primary IP Address is entered into the Integration_Keys table noting that it failed one time. Going forward, Scheduled Task #1 ignores devices listed in the Integration_Keys table. This task is designed only to identify devices that have failed at least one time.
Scheduled Task #2 runs and focuses only on the devices listed in the Integration_Keys table. Each time this job runs it will ping the Primary IP Address for this device until it reaches the threshold set in Workflow Parameters. Once the amount of failures reaches the threshold set in the workflow parameters, action is taken. What action depends on what you have selected in the workflow paramters—an Incident is created or an email notification is generated.
Once action is taken, the entry in the Integration_Keys table is removed and the process continues. Next time the threshold is reached, the Incident will not be created if one is still in a status other than resolved or closed, however, the email notification will be sent again.
What does it monitor?
Availability monitoring will monitor both computers and hardware as long as they are assigned to a network which is linked to an Organization chosen in your Workflow Parameters for the functionality.
How often does it test for availability?
By default the Availability Monitoring checks it devices are available every 10 minutes every day. If a device is unavailable it will then check that device every minute until the threshold for failures is reached (set in Workflow Parameters) or a successful response is obtained.
These settings can be changed in the Scheduled Tasks area for each job.
- Availability Monitoring Device Checking (every day every 10 minutes) - checks to see if devices respond initially
- Availability Monitoring Failed Devices (every day every 1 minute) - checks to see if devices continue to fail
How do I enable Availability Monitoring?
To enable Availability Monitoring, you must enable the two jobs associated with the functionality. You can find them under the Scheduled Tasks area in the web-based Admin Center or the desktop Settings App.
Do ports need to be opened?
You won't need to open ports and enable any protocols if you are monitoring devices on your own network. However, if you are trying to audit your network from the cloud or another disconnected office from your network, you will have to make sure those devices can be pinged.
To ping a device the following will be required:
The device you are pinging must be routable. For example, hosts having IP addresses belonging to the private IP range of 192.168.0.0/16 cannot be reached from the outside unless the administrator adds specific network routes.
Pings do not use TCP or UDP ports because they use a different protocol named ICMP. To allow pings, you will need to allow incoming ICMP traffic or more specifically incoming ICMP echo requests.
Workflow Parameters
Here are a listing of the Workflow Parameters and how to configure them:
- Organization - Organizations used in networks you would like to monitor. Organization selected must be the Organization of Networks you want to monitor.
- Monitored Computer Categories - Computer categories to be monitored by network monitoring. These values are expected categories as per defaults. If your categories differ you must add or change the values of this parameter's values to match your existing categories.
- Monitored Hardware Categories - Hardware categories to be monitored by network monitoring. These values are expected categories as per defaults If your categories differ you must add or change the values of this parameter's values to match your existing categories.
- Failed Attempt Threshold - The number of times a device should fail to respond before action is taken.
- Failed Threshold Action - Action to take when network device reaches the failed attempt threshold.
- Incident Medium - Used to determine the medium field value for the created incident.
- Incident Service - Used to determine the service field value for the created incident.
- Incident Status - Used to determine the status field value for the created incident.
- Incident Type - Used to determine the type field value for the created incident.
- Incident Category - Used to determine the category field value for the created incident.
- Incident Impact - Used to determine the impact field value for the created incident.
- Incident Urgency - Used to determine the urgency field value for the created incident
- Assignee / E-mail Recipient - Role of the person the Incident will be assigned to or the Email notification will be sent to.
- Debug Logging Status - Used to enable debug logging.
Important to note
Computer and Hardware Workflow Parameters for categories are static. They are set based on default categories related to Servers. When removing or changing the names of the default category, the categories will no longer work unless they are also modified in the Workflow Parameter's static values. When adding new categories, they will have to be added to the Workflow Parameter values.
If you have duplicate Primary IP Addresses in your device records, it will consider each one individually. If that IP Address fails, you will receive two failures if they are associated with two separate devices. If each of the duplicated IP Address passes, you will not receive any indication you have duplicate Primary IP Addresses. If this is a problem, it's simple enough to create a view in Computers or Hardware to show duplicate IP Addresses, but ultimately this is expected to be an issue with network management and this functionality cannot account for that.
The workflow requires an Alloy Navigator Service for this process to set due dates and response dates. You can choose the name of that Service in Workflow Parameters (Default: Availability Monitoring). However, once created, renaming this workflow parameter will cause another to be created. This is best to be set before enabling the job.
As with other scheduled tasks, the job should be enabled via workflow parameters, not at the job level.