Home Directory Tutorials Other Tutorials And HOWTOs Design, Development and Improvement of Nagios System Monitoring for Large Clusters

Design, Development and Improvement of Nagios System Monitoring for Large Clusters

Bookmark and Share

Rating
0 votes
Favoured:
0
This document describes the work of design, development and improvement of the Nagios monitoring system done in Cineca and used for the Tier-1 systems participating in the PRACE projects. Starting from the issues arisen by the complexity of the HPC systems and the related monitoring activities, the targeted solutions and their implementation are explained. The most important aspects of the implementation and the specific issues related to HPC will be described with a specific attention to the exascale clusters.