+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

articles

Authors: Yasnitsky L., Plotnikova E.     Published in № 5(113) 30 october 2024 year
Rubric: Algorithmic efficiency

A neural network algorithm for identifying and removing outliers in noisy data sets

Outliers in statistical data, which are the result of erroneously collected information, are often an obstacle to the successful application of machine learning methods in many subject areas. The presence of outliers in training data sets reduces the accuracy of machine learning models, and in some cases, makes the application of these methods impossible. Currently existing outlier detection methods are unreliable. They are fundamentally unable to detect some types of outliers, while observations that are not outliers are often classified as outliers by these methods. Recently emerging neural network methods for outlier detection are free from this drawback, but they are not universal, since the ability of neural networks to detect outliers depends both on the architecture of the neural network itself and on the problem being solved. The purpose of this study is to develop an algorithm for creating and using neural networks that can correctly detect outliers regardless of the problem being solved. This goal is achieved by using the property of some specially created neural networks to demonstrate the largest training errors on those observations that are outliers. The use of this property, as well as the implementation of a series of computational experiments and the generalization of their results using a mathematical formula, which is a modification of the consequence of the Arnold – Kolmogorov – Hecht-Nielsen theorem, made it possible to achieve the stated goal. The use of the developed algorithm turned out to be especially effective in solving the problems of forecasting and controlling interdependent thermophysical and chemical-energy-technological processes of processing ore raw materials, occurring at existing serial metallurgical enterprises, where the presence of outliers in statistical data is almost inevitable, and without their identification and exclusion, the construction of neural network systems that are acceptable in accuracy models are generally impossible.

Key words

outlier in data, Arnold – Kolmogorov – Hecht-Nielsen theorem, thermophysical and chemical-energy-technological processes, neural network, training error

The author:

Yasnitsky L.

Degree:

Dr. Sci. (Eng.), Professor, Professor of Applied Mathematics and Informatics Department, Perm State National Research University; Professor of Information Technology in Business Department, National Research University Higher School of Economics in Perm

Location:

Perm, Russia

The author:

Plotnikova E.

Degree:

Dr. Sci. (Ped.), Professor, Head of Information Technologies in Business Department, National Research University Higher School of Economics (HSE University)

Location:

Perm, Russia