Degree
|
Dr of Economics, Professor, Department of Corporate Finance and Corporate Governance, Financial University, Department of Finance, Higher School of Economics |
---|---|
E-mail
|
ecolena@mail.ru |
Location
|
Moscow |
Articles
|
Models of corporate bankruptcy prediction with the ensemble of classifiers algorithm
Classification problems represent the group of machine learning methods where each instance is
associated with a certain category or label. An individual classifier like Neural Networks, or Decision
Trees is conventionally trained on a pre-marked or processed data set. Depending on the parameters
distributions the data sets may feature issues when all the indicators are not learned efficiently by such
a classifier, and this results in an inconsistent performance on the test sets. Ensemble classifiers denote
a set of individual classifiers algorithms that are simultaneously trained in a classification problem. The
paper aim is twofold. We present an ensemble of classifiers approach with a high predictive power for
the Russian trade-related companies bankruptcy prediction. At the first stage we split the data into a
train set (70%) and a test set (30%). At the second stage the precision of standard algorithms is measured
as applied to the empirical indicators of the data. The algorithms are trained and tested, and then
compared via the performance metrics. The standard algorithms include: random forest, decision trees
and the modifications: the chi-square automatic interaction detection (CHAID), classification and regression
trees (CRT, C5), Quick, Unbiased, Efficient, Statistical Tree (QUEST), discriminant analysis
LDA, support vector algorithms (LSVM, SVM), neural networks (multilayer and radial). Based on the
ROC-curve metrics and the prediction ability of the algorithms we select the most efficient methods
that form the ensemble of classifiers algorithm. The empirical data set included 713 trade companies
(334 — known bankrupts). The results feature the efficiency of the ensemble of classifiers algorithms
based on the simple voting (the precision metric outperforms the one of the other individual algorithms,
e.g. random forest, SVM, Logit). We also show that including the macroeconomic factors improves
the prediction power of almost all studied algorithms by at least 8%. Given that, more sophisticated
variations of the classifiers such as multilayer neural networks and random forests demonstrate higher
precision and recall with the external variables employed in the training process.
Read more...
Applications of the sentiment polarity dictionaries for the textual analysisWe employ a contemporary set of sentiment analytics methods for the dataset of Russia-related
news texts and compare a variety of sentiment dictionaries as applied to the news texts. In this paper
we evaluate the applicability of the AFINN, NRC, Loughran and McDonald dictionaries to determine
the impact of the sentiment polarity on the stock and foreign exchange markets. The dictionaries are
selected due to their applications in the area of textual analysis and the number of sentiments polarities
classes they cover. The empirical basis of the research is the 2,5 million Russia-related news texts,
acquired via the Thomson Reuters authorized sources for the period from January 2012 to June 2018.
Based on the textual analysis method known as the «bag-of-words» we evaluate the polarity of each
of the news texts with the use of all selected dictionaries. The correlation of oscillating polarities and
the major stock market indicators is determined. We show that the Russia-related news sentiments
demonstrate substantial impact on the stock markets. In addition, the negative news polarities predominantly
affect the markets being a major media factor for the market players. The NRC Emotion
Lexicon dictionary fits best the sentiment analysis of the Russia-related news.
Read more...
Development of a tonal-thematic dictionary EcSentiThemeLex for the analysis of economic texts in RussianThe main goal of the research is to develop a publicly available tonal-thematic dictionary in Russian, which allows identifying the semantic orientation of groups of economic texts, as well as determining their sentimental (tonal) characteristics. The article describes the main stages of compiling a dictionary using machine learning methods (clustering, word frequency allocation, correlogram construction) and expert evaluation of determining the tonality and expanding the dictionary by including terms from similar foreign dictionaries. The empirical base of the research included: annual reports of companies, news from ministries and the Central Bank of the Russian Federation, financial tweets of companies and RBC news articles in the area of "Economics, Finance, money and business". The compiled dictionary differs from the previous ones in the following ways: 1. it is one of the first dictionaries which can be used to rate the tone of economic and financial texts in Russian language by 5 degrees of tonality; 2. allows you to rate the tonality and content of the text by 12 economic topics (e. g., macroeconomics, monetary policy, stock and commodity markets, etc.) 3. the final version of EcSentiThemeLex dictionary is included in the software package (library) ‘rulexicon’ for the programming environment R and Python. Step-by-step examples of using the developed library in the R environment are given. It allows to evaluate the tone and thematic focus of an economic or financial text by means of a concise code. The structure of the library allows you to use the original texts for their assessment without prior lemmatization (the reduction to elementary forms).The resulting EcSentiThemeLex dictionary is included in the rulexicon software package for the R modeling environment .The tonal-thematic dictionary EcSentiThemeLex with all word forms compiled in this work will simplify the solution of applied problems of text analysis in the financial and economic sphere, and can also potentially serve as a basis for increasing the number of relevant studies in the Russian literature. Read more... |