+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

Authors

Rogov O.

Degree
Junior Research Fellow, MIPT Research Centre
E-mail
olegrgv@yandex.ru
Location
Moscow
Articles

Models of corporate bankruptcy prediction with the ensemble of classifiers algorithm

Classification problems represent the group of machine learning methods where each instance is associated with a certain category or label. An individual classifier like Neural Networks, or Decision Trees is conventionally trained on a pre-marked or processed data set. Depending on the parameters distributions the data sets may feature issues when all the indicators are not learned efficiently by such a classifier, and this results in an inconsistent performance on the test sets. Ensemble classifiers denote a set of individual classifiers algorithms that are simultaneously trained in a classification problem. The paper aim is twofold. We present an ensemble of classifiers approach with a high predictive power for the Russian trade-related companies bankruptcy prediction. At the first stage we split the data into a train set (70%) and a test set (30%). At the second stage the precision of standard algorithms is measured as applied to the empirical indicators of the data. The algorithms are trained and tested, and then compared via the performance metrics. The standard algorithms include: random forest, decision trees and the modifications: the chi-square automatic interaction detection (CHAID), classification and regression trees (CRT, C5), Quick, Unbiased, Efficient, Statistical Tree (QUEST), discriminant analysis LDA, support vector algorithms (LSVM, SVM), neural networks (multilayer and radial). Based on the ROC-curve metrics and the prediction ability of the algorithms we select the most efficient methods that form the ensemble of classifiers algorithm. The empirical data set included 713 trade companies (334 — known bankrupts). The results feature the efficiency of the ensemble of classifiers algorithms based on the simple voting (the precision metric outperforms the one of the other individual algorithms, e.g. random forest, SVM, Logit). We also show that including the macroeconomic factors improves the prediction power of almost all studied algorithms by at least 8%. Given that, more sophisticated variations of the classifiers such as multilayer neural networks and random forests demonstrate higher precision and recall with the external variables employed in the training process.
Read more...

Applications of the sentiment polarity dictionaries for the textual analysis

We employ a contemporary set of sentiment analytics methods for the dataset of Russia-related news texts and compare a variety of sentiment dictionaries as applied to the news texts. In this paper we evaluate the applicability of the AFINN, NRC, Loughran and McDonald dictionaries to determine the impact of the sentiment polarity on the stock and foreign exchange markets. The dictionaries are selected due to their applications in the area of textual analysis and the number of sentiments polarities classes they cover. The empirical basis of the research is the 2,5 million Russia-related news texts, acquired via the Thomson Reuters authorized sources for the period from January 2012 to June 2018. Based on the textual analysis method known as the «bag-of-words» we evaluate the polarity of each of the news texts with the use of all selected dictionaries. The correlation of oscillating polarities and the major stock market indicators is determined. We show that the Russia-related news sentiments demonstrate substantial impact on the stock markets. In addition, the negative news polarities predominantly affect the markets being a major media factor for the market players. The NRC Emotion Lexicon dictionary fits best the sentiment analysis of the Russia-related news.
Read more...