+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

articles

Authors: Kataev A., Vikulin M.     Published in № 2(92) 30 april 2021 year
Rubric: IT and education

Experience with control systems and monitoring of the used computing resources of corporate HPC system

This article discusses modern server monitoring systems. The subject area under review concerns the control and management of high-performance computing systems (HPC). These systems are used in various branches of science and industry for modeling systems and their behavior in various conditions. The speed of the simulation depends on the applied technical solutions as part of the computing complex. These include the type of internal network, the number and types of computing nodes. For computing nodes, consider such parameters as the architecture and model of the processor, the amount of RAM. The features associated with the implementation of specific mathematical models that affect the speed of calculations are not considered in this article. The paper analyzes the existing market solutions and the main concepts used for the management and monitoring systems of such complexes. The systems under consideration are evaluated from an economic and technical point of view. For the available systems, a full-scale study of the cluster management and status monitoring capabilities is conducted. The set of parameters recorded by the monitoring system is taken based on the general architecture of the HPC and the approach to the administration of server systems. The practical part describes the experience of designing and implementing a promising management system. In the system being created, the main focus is on creating a management system. The justification for the need for a separate software product is given in the text of the article. Implementation issues in a specific program code and system environment are omitted, as they depend on the specific execution of the system. The task of creating your own monitoring system is considered insignificant, provided that existing solutions are available.

Key words

monitoring, HPCS, cluster management, HPC, SLURM

The author:

Kataev A.

Degree:

Student, Department 316 "System modeling and computer-aided design", Institute 3 "Control systems, computer science and electric power engineering", Moscow Aviation Institute (National Research University)

Location:

Moscow, Russia

The author:

Vikulin M.

Degree:

Lecturer, Department 316 "System modeling and computer-aided design", Institute 3 "Control systems, computer science and electric power engineering", Moscow Aviation Institute (National Research University)

Location:

Moscow, Russia