op5 Monitor – Ready for IBM PureSystems!

Yesterday was the global launch of IBM PureSystems, a new and innovating solution from IBM that focuses on reducing the complexity of deploying new applications in the medium to large enterprise IT.

“Congratulations to op5 for being among the early adopters of IBM PureSystems,” said Michael Riegel, vice president of global ISVs, IBM.  “This will enable them to offer an industry leading monitoring solution to clients in a way that cannot be matched by competitors.”

“op5 is all about making unified control of customers IT / IP services easy. This new solution from IBM adds to this goal as it takes away uncertainties, needs for compliance testing and other time consuming risks usually associated with deployment of a new application” says Jan Josephson, CEO, op5 AB.

op5 Monitor Enterprise is among the very first applications fully certified by IBM PureSystem, please see more on:

Read more op5 Monitor on IBM PureSystem>>

Read more about IBM PureSystems>>

Post to Twitter

Posted in All Posts | Tagged , , , , , , , , , | Leave a comment

Axplanimation of benifits with IT monitoring

We have created a very short animated video to explain the benefits of IT monitoring and usage of op5 Monitor.

Post to Twitter

Posted in All Posts, Latest from op5, The Network Monitoring Blog | Tagged , , , , , | Leave a comment

How to scale op5 Monitor

op5 Monitor is a highly scalable solution that enables distributed monitoring with automatic fail-over, load-balancing and redundancy. In this 30 minutes webinar we will give you insights on why and how to scale op5 Monitor to your needs.

  • op5 Monitor – A scalable Monitoring Solution
    • Redundancy and load-balancing
    • Distributed monitoring
  • How to set up distributed monitoring
  • Example setups

Post to Twitter

Posted in All Posts, Latest from op5, Network Monitoring Monologue, The Network Monitoring Blog | Tagged , , , , | Leave a comment

Development work on Nagios project to improve performance

We want to share some information and give a progress report on development being carried out by Andreas Ericsson and the op5 Development team on the nagios project, being a core component of op5 Monitor. The development aims to bring improvements to the project and enable new possibilities of importance for us and our solution op5 Monitor.

We hope that this post will shed light on why this prioritized work is of importance for op5 users and what it will bring to the product further down the road. We also want to inform and engage the Nagios community on why the work is being carried out and to give insight into what goodies that is to come available for the Nagios project. Any feedback and suggestions are as always much appreciated.

In this post we describe the work on:

  • Complexity reduction by moving from multi-threading to single threaded programing
  • Reducing latency
  • Disk I/O usage reduction
  • CPU usage reduction
  • Memory usage reduction

Decreasing complexity

In short, our aim as a software company is to provide our users with high quality, high performance and rock solid products. In order to achieve this goal, we want to provide a software with low complexity without major performance bottlenecks, as well as producing well-tested code that can be reused across a multitude of applications. That is why our development department is spending a significant amount of resources on on contributing to the development of the Nagios project.

Here are some of the actions taken to remove complexities from the Nagios core:

Removal of multithreading

We have continuously been working to remove the multithreading code in the Nagios core in favor of a general-purpose I/O broker and has done away with the cumbersome and disk I/O intensive check result spoolfiles in favor of worker processes. This provides several benefits over the previous way of executing checks.

Multithreaded programming is a lot more complex than single-threaded programming, since multiple threads sharing the same resources have to deal with resource contention. Threads have to either wait for each other when they wish to use the same resource (making multithreading a moot point in the first place), create their own instance of the shared object (negating the benefits of resource sharing), or risk crashing when both threads try to use the same resource at the same time. Currently, all eventbroker modules that wish to communicate with external programs and update the status of the running Nagios with data from the external program are forced to handle these complexities. With the next generation Nagios core, several thousand lines of code can be removed from such addons in place of a simple, well-defined and well-tested library call provided by Nagios core, reducing complexity by several orders of magnitude.

Complexity separation

Since workers run in their own process space, bugs in the workers do not affect the stability of the core scheduler. This is a good thing, since it means one can experiment more freely with the worker code, and even assign external programs to work as prototype workers. The I/O broker also makes it a lot easier to move several previously hard-to-do tasks outside the core and into a separate process, leading to even further complexity reduction in the core and even better complexity separation.

Latency reduction

The scheduling core need only write a job request to one of its workers in order to execute the external script it wishes to run. Since notifications, eventhandlers and a slew of other actions are now executed asynchronously through one of the worker processes, the time it takes to run them doesn’t add to the master process’ latency numbers.

Disk I/O usage reduction

Since worker processes communicate by copying pieces of memory from one process to another (through a socket, for those interested in details), we can do away with all the disk I/O generated by writing, scanning for and reading the check result spoolfiles.

CPU usage reduction

The lack of need for scanning for check result spoolfiles with frequent intervals means we save some CPU usage. We save even more by implementing a more clever way of executing external scripts, effectively cutting the number of fork() calls in half for every running Nagios installation. Since fork() can be a very expensive call, that provides quite a huge saving.

Memory usage reduction

Another benefit of fork()’ing less is that less memory is consumed. Since worker processes are extremely lightweight, the amount of memory used to launch each check is minimized, and we thereby provide a small saving in memory usage. However, since the worker processes and the communication between workers and master do incur some memory overhead, the net gain is small.

Code reuse

The worker process code is backed by several elegant, simple and well-tested libraries which can be reused to create other addons that want to communicate with Nagios core one way or another. This is a very good thing, since it means the core of such addons will be well-tested and that they can be written very, very quickly.

The changes will also bring several future benefits. Since workers now have their complexity separated from the main Nagios daemon, it will be possible to implement checks directly in the workers, bypassing external scripts altogether. This would mostly be of benefit for highly popular checks that are run frequently enough to warrant the added complexity of building them directly into the worker. check_nrpe (or a replacement for it) comes to mind, and especially since NSClient++ can handle NRPE requests. Another good candidate for in-building would be check_snmp and various other snmp-based checks. It will also be possible to write a small broker module that let external programs subscribe to various types of events and have those events streamed directly from Nagios, avoiding unnecessary disk I/O. PNP4Nagios would be one potential user for such a subscriber service, allowing it to avoid the disk I/O costly spoolfiles it currently uses, and as a nice bonus we would get rid of the delay between executed check and updated performance graph.

Conclusion

This work will continuously be included into Nagios core as well as op5 Monitor and result in performance improvements and complexity reduction. Both the Nagios community and op5 Monitor users will benefit greatly from these changes. Particularly in the long run, when old addon projects start catching on and new ones are created.

This document is intended to give the technical staff and op5 customers an understanding of some of the ongoing development projects we are currently working on to enhance and secure our op5 Monitor solution for the future. This document should be viewed as a complement to our development roadmap.

This development is in someways unappreciative as there will be no visible new features presented in a nice user interface. The work do however lay the foundation for future feature enhancements and enables realization of new cool ideas, so we thought it would be a good idea to share this information with you.

Work behind the scene, core work, we see the need for change in fundamental functionality in the foundation of op5 Monitor, Nagios project.

This article describes the work in progress on:

  • Complexity reduction by moving from multi-threading to single threaded programing
  • Reducing latency
  • Disk I/O usage reduction
  • CPU usage reduction
  • Memory usage reduction

We want to provide a “high quality, high performance and rock solid” solution.

In order to achieve this goal, we want to build a system with low complexity without major performance bottlenecks and to produce well-tested code that can be reused as building blocks across a multitude of applications. That is why our development department has been working hard on contributing to the development of the Nagios project. Here are some of the actions taken to remove complexities from the Nagios core:

Removal of multithreading: op5 development department has continuously been working to remove the multithreading code in the Nagios core in favor of a general-purpose I/O broker and has done away with the cumbersome and disk I/O intensive check result spoolfiles in favor of worker processes. This provides several benefits over the previous way of executing checks.

Complexity reduction: Multithreaded programming is a lot more complex than single-threaded programming, since multiple threads sharing the same resources have to deal with resource contention. Threads have to either wait for each other when they wish to use the same resource (making multithreading a moot point in the first place), create their own instance of the shared object (negating the benefits of resource sharing), or risk crashing when both threads try to use the same resource at the same time. Currently, all eventbroker modules that wish to communicate with external programs and update the status of the running Nagios with data from the external program are forced to handle these complexities. With the next generation Nagios core, several thousand lines of code can be removed from such addons in place of a simple, well-defined and well-tested library call provided by Nagios core, reducing complexity by several orders of magnitude.

Complexity separation: Since workers run in their own process space, bugs in the workers do not affect the stability of the core scheduler. This is a good thing, since it means one can experiment more freely with the worker code, and even assign external programs to work as prototype workers. The I/O broker also makes it a lot easier to move several previously hard-to-do tasks outside the core and into a separate process, leading to even further complexity reduction in the core and even better complexity separation.

Latency reduction: The scheduling core need only write a job request to one of its workers in order to execute the external script it wishes to run. Since notifications, eventhandlers and a slew of other actions are now executed asynchronously through one of the worker processes, the time it takes to run them doesn’t add to the master process’ latency numbers.

Disk I/O usage reduction: Since worker processes communicate by copying pieces of memory from one process to another (through a socket, for those interested in details), we can do away with all the disk I/O generated by writing, scanning for and reading the check result spoolfiles.

CPU usage reduction: The lack of need for scanning for check result spoolfiles with frequent intervals means we save some CPU usage. We save even more by implementing a more clever way of executing external scripts, effectively cutting the number of fork() calls in half for every running op5 Monitor installation. Since fork() can be a very expensive call, that provides quite a huge saving.

Memory usage reduction: Another benefit of fork()’ing less is that less memory is consumed. Since worker processes are extremely lightweight, the amount of memory used to launch each check is minimized, and we thereby provide a small saving in memory usage. However, since the worker processes and the communication between workers and master do incur some memory overhead, the net gain is very small indeed.

Code reuse: The worker process code is backed by several elegant, simple and well-tested libraries which can be reused to create other addons that want to communicate with Nagios core one way or another. This is a very good thing indeed, since it means the core of such addons will be well-tested and that they can be written very, very quickly.

The changes will also bring several future benefits. Since workers now have their complexity separated from the main Nagios daemon, it will be possible to implement checks directly in the workers, bypassing external scripts altogether. This would mostly be of benefit for highly popular checks that are run frequently enough to warrant the added complexity of building them directly into the worker. check_nrpe (or a replacement for it) comes to mind, and especially since NSClient++ can handle NRPE requests. Another good candidate for in-building would be check_snmp and various other snmp-based checks. It will also be possible to write a small broker module that let external programs subscribe to various types of events and have those events streamed directly from Nagios, avoiding unnecessary disk I/O. PNP4Nagios would be one potential user for such a subscriber service, allowing it to avoid the disk I/O-costly spoolfiles it currently uses, and as a nice bonus we would get rid of the delay between executed check and updated performance graph.

Conclusion
This work will continiously under 2012 be included in the nagios core project as well as in op5 Monitor.

Post to Twitter

Posted in Latest from op5, The Network Monitoring Blog | Tagged , , , | Leave a comment

Scalability – when is IT considered BIG?

Scalability is a commonly used “feature” in all fields of IT and there is no question that it is a real challenge for many IT managers in the near future. IP is a shared best effort technology by default. Like any road – if you double the traffic it will get jammed! Continue reading

Post to Twitter

Posted in All Posts, ITOM and more | Tagged , , , , | Leave a comment

SOPA means garbage in Swedish…

Sopa is the Swedish name for garbage or a very underperforming person. I think that summarizes my thought on the SOPA discussion.

Here is great article that is well worth reading:

http://mashable.com/2012/01/17/sopa-dangerous-opinion/

/ Jan

Post to Twitter

Posted in All Posts | Tagged , , , , | Leave a comment

The battle of Stability vs. Agility

Can Stability and Agility go hand in hand?

Business wants agile IT, fast and flexible. IT operations is all about maintaining stability. Can the two really meet?

An increasing number of IT organisations are facing the challenge to having to accept demands for more flexibility, and thus using SaaS services from the public cloud, internal or external outsourcing etc.  Continue reading

Post to Twitter

Posted in ITOM and more | Tagged , | Leave a comment

How to qualify a vendor by looking at their manual

At a time when all vendors compete in being “extremely simple and very cheap” it can sometimes be a challenge to make a fast and quick comparison. I have a “simple, cheap and quick” tip to help make the initial judgement on the overall quality of a product, the company behind it and all promises that are given…..

Check out the manual!

We have all done it – bought a cheap remote control or downloaded an app, only to find a piece of really thin paper in Chinese or as an web page that obviously has been auto translated, trying to explain how to set up the device. It´s extremely annoying, takes up our time and is just plain bad.

The same goes for software products, a bad manual tells you a few things:

  • The vendor really does not care about how or even if you use the product
  • The vendor has limited or no own experience in using the product for the function that it is sold.
  • The vendor might be a great code cruncher – but that is not what you bought – you bought a product that should in most cases save you time and/or money.

A good manual on the other hand should:

  • Save time in finding operational and functional answers to your product.
  • Make usage of the product easy for more in the company and by that reducing training costs.
  • Reduce risk, if the application can be used by more people in the company it reduces the risk of creating “a single super user that needs to be in on everything relating to the product” – what happens when he/she gets sick or leaves the company?

A good manual tells you that the vendor cares for his/hers product, how its being used trying to maximise the usage at the customer i.e. the vendor cares for you.

And of course google, forums, blogs etc. etc. are great to compliment a good manuel, but they can not be the starting point as it takes way to long and gives way to many option to get the basic knowledge.

Needless to say… we do spend a good amount time on our manuals as we do think the above is true, take a look at our manuals

Cheers / Jan

Post to Twitter

Posted in All Posts, ITOM and more, The Network Monitoring Blog | Tagged , , | 2 Comments

Where does a monitoring system potentially save you money?

The saving can be identified in many places depending on your organisation and your needs. Here are some of the savings users of op5 Monitor have reported:

  • The large savings can be found in the incidents that can be avoided, these savings can be hard to measure and is easier to just estimate.
  • Most direct saving are in different areas of time savings, such as:
    • Faster mean time to repair (MTTR) when an incident occurs
    • Easier to use system saves time for the system administrators, that can use their time and skills more efficiently.
    • Decreased need of maintenance of the monitoring solution
  • More efficient resources and investment planning when having statistics and facts available when taking decisions.
  • The possibility to monitor SLA makes follow up easier, ensuring that 3rd party services perform as expected, and that compensations are paid when they under perform.
  • Decreased need of external consultants. If the system is easy to use and intuitive, the need of using product specialists can save large amounts of money.
  • Technical flexibility of the monitoring solution can decrease the need of using different device managers and a flora of other specialised or limited software’s that each one cost money, time and resources.
  • Savings in lower costs for licenses.

Post to Twitter

Posted in All Posts, Network Monitoring Monologue, The Network Monitoring Blog | Tagged , , , , | Leave a comment

Maintenance, Improvements and Upgrades

We’d like to inform all our customers, partners and community-members that we’ve a planned maintenance window Thursday 22:nd of September between 14:00 – 16:00 UTC (16:00 – 18:00 CEST) where our website will be unavailable.

It’s been a while since IT made an appearance to the blog, trust me we’ve been keeping busy working on the internal infrastructure and backend to our new web and customer backend systems.

As always all releases hasn’t been perfect but we’re making improvements week by week to ensure that things are getting more stable and functional for both our internal and external customers.

As a part of this we initiated a project this spring in the spirit of open source to migrate from VMware to KVM as our virtualization solution. Throughout the last months we’ve tested the new systems extensively and are now ready to get our business and mission critical services migrated.

Since the web is our face towards the web and we’ve had issues coping with the load this service is our top priority to have migrated to further reduce load issues and availability-issues.

We’re hoping to release a whitepaper later on this year further describing the changes we’ve made so stay tuned.

Regards, op5 IT

Post to Twitter

Posted in All Posts, From the server-room | Leave a comment