The relevance of the topic considered in the article lies in solving problematic issues of identifying rare events in imbalance conditions in training sets. The purpose of the study is to analyze the capabilities of a classifier’s ensemble trained on different imbalanced data subsets. The features of the heterogeneous segments state analysis of the Internet of Things network infrastructure based on machine learning methods are considered. The prerequisites for the unbalanced data emergence during the training samples formation are indicated. A solution based on the use of a classifier’s ensemble trained on various training samples with classified events imbalance is proposed. The possibility analysis of using unbalanced training sets for a classifier’s ensemble averaging of errors occurs due to the collective voting procedure, is given. An experiment was carried out using weak classifying algorithms. The estimation of features values distributions in test and training subsets is carried out. The classification results are obtained for the ensemble and each classifier separately. An imbalance is investigated consists in the events number ratios violation a certain type within one class in the training data subsets. The data absence in the training sample leads to an increase in the scatter effect responses is averaged by an increase in the model complexity including various classifying algorithms in its composition. The proposed approach can be applied in information security monitoring systems. A proposed solution feature is the ability to scale and combine it by adding new classifying algorithms. In the future, it is possible to make changes during operation to the classification algorithms composition, it makes possible to increase the indicators of the identifying accuracy of a potential destructive effect.
Key words
classification, anomalies detection, parasitic traffic, information security