op5 Monitor 6.1 is due for public release May 14. It will contain some new features that has been requested by our customers and we would like to take the opportunity to elaborate a bit on these and explain a bit more in detail what to expect.
We are constantly improving our HTTP API (REST) that was introduced in op5 Monitor 5.7. Our first steps was to make it possible to fetch status data and to query and modify the running configuration. This enables you to fetch historical data used for reports for use in third party applications (like Qlikview, Crystal Reports, Jasper Reports etc) for further analysis.
Since we have historical data in different forms and for different purposes (like performance data, alert data, comments and report data) we decided to limit the scope for this release and focus on report data.
The data returned from the API call will be in a “raw” format and will usually have to be processed further. Our intention is to implement the first step in the ETL model (http://en.wikipedia.org/wiki/Extract,_transform,_load). This means that we will provide a way to Extract the data from your op5 Monitor system but it will hence not include the Transform or Load step since these will be different depending on the intended use of the fetched data.
We will continue to develop and enhance our API in the future and if you have or are planning any integration with your monitoring solution, the API is definitely the recommended, future proof way forward.
Dynamic and adaptive thresholds with Bischeck
Until now, op5 Monitor has only supported static thresholds for checks. This means we are limited to define one maximum or one minimum value to express the threshold that is valid in all situations for the service we monitor. To have one single value that is correct for all days in the week and for all hours of the day is not very likely. The risk is that we will get too many or too few alarms and there is even some service metrics that we won’t be able to set a threshold for due to their dynamic behaviour. This is especially true when trying to monitor application and business related services that follow the dynamic of the business load
With the integration of Bischeck with Monitor 6.1, we now have a solution to enable dynamic and adaptive thresholds as a complement to the normal static threshold solution.
So what does dynamic and adaptive thresholds mean and what benefits will we achieve?
With Bischeck we can define different threshold profiles depending on the time of the day, day in a week or in a month. This means that we can set thresholds for any service where we expect some increase and/or decrease in the metric during a day period.
We can define thresholds based on historical data. This enables us to express different kinds of threshold baselines. As an example we can specify that the expected threshold at 12:00 should not be 5% higher or lower than the calculated average of the measured metrics at the same time from the 5 previous days. Bischeck supports several mathematical functions to calculate thresholds in run-time.
We can have multiple threshold rules for the same service. As an example for a file system utilization service we can combine the classic 90% utilization of the file system and a threshold that checks how quickly the utilization change by using historical data to calculated a utilization delta over a time period.
We can use data collected for one or multiple services as input when calculating the threshold for a different service. This adaptiveness is excellent when you have some service metric that drive the business process, like the number of visits to your web shop would probably have some effect on the number of expected orders, cpu utilization, application threads, etc. This means we can set the thresholds in relation to data that matters and not just a single value.
For a more in-depth presentation of the capabilities of Bischeck and testimonials etc, we recommend a visit to www.bischeck.org.
Please note that Bischeck won’t be shipped with op5 Monitor but it will be possible for our customers to install it through yum once a valid license has been installed on the system.
Filters and List View
As mentioned in a previous blog post (http://blogs.op5.com/news-in-the-user-interface-available-in-op5-monitor-6/), op5 Monitor 6 is capable to monitor far bigger environments than ever before. This has made us realize that it’s time to abandon the previous concept of displaying everything and leave it up to the user to spot the problems by simply scanning through the page focusing on the colours. The search capabilities introduced a couple of years ago in op5 Monitor 5.0 was a great success and has always been a very appreciated feature by many of our users. Now that we are monitoring thousands of hosts and even more services, the search functionality has reached its limits.
This is why we are now introducing Filters together with a new List View.
Basically, the filters makes it possible to customize your views instead of requiring that you adapt your eyes.
Let’s say you want to only view hosts and services with a name that starts with either “win” or “linux” and that are members of the servicegroup “databases” or the hostgroup “webservers”, with a specific state, that has been checked and not in scheduled downtime and flap detection is turned off. With the new listview you can now filter out that information in just a few clicks. It is also possible to create complex filters using regular expressions together with AND/OR in a way that wasn’t possible before. You can save the filter – and include it in another filter! We have also thought of the possibility to make a filter “global” to be able to share your filters with other users on the same system. As you can see, the possibilities are practically endless.
As seen in the screenshot above, it is possible to create and edit the filters using a GUI or by using the rather intuitive query language (at least if you are familiar with other query languages). The result of your filter is always shown in the background and is updated as you type, making it very easy to find what you are looking for.
The only possible drawback to this is that you might miss some information in the “Status Totals” widget at the top of the listings. In previous versions, we also showed host information in the service listing and a total count. This had to be removed from this version due to technical reasons but we believe that it’s a small price to pay to get the new possibilities that the filters mean. Please let us know if this removed functionality is something that is extremely important to you.
By calculating historical graph data we can now predict when a service will hit warning or critical level. As this feature is added as a normal graph template, this information is automatically available in all standard reports, if you choose to include it.
I just wanted to mention some of the other changes that might be of interest:
- Sorting of comments in reversed chronological order
- Updated our Nagvis version to 1.7.3
- Made Ninja icon set default in Nagvis
- Create custom columns from custom variables in listview
- Make it possible to select what columns to show (and in what order) in the new listview (per user setting under My account)
- Make number of columns in Tactical Overview customizable (global setting)
- Dynamic action button in service details (extinfo) by the use of custom variables.
- Added a plugin to monitor domain expire date (does not work with .de .no .at .ch since these domains don’t publish expiry date in whois).