Degree
|
student, Faculty of Mechanics and Mathematics, Lomonosov Moscow State University |
---|---|
E-mail
|
am_laz1@mail.ru |
Location
|
Moscow, Russia |
Articles
|
Development of a tonal-thematic dictionary EcSentiThemeLex for the analysis of economic texts in RussianThe main goal of the research is to develop a publicly available tonal-thematic dictionary in Russian, which allows identifying the semantic orientation of groups of economic texts, as well as determining their sentimental (tonal) characteristics. The article describes the main stages of compiling a dictionary using machine learning methods (clustering, word frequency allocation, correlogram construction) and expert evaluation of determining the tonality and expanding the dictionary by including terms from similar foreign dictionaries. The empirical base of the research included: annual reports of companies, news from ministries and the Central Bank of the Russian Federation, financial tweets of companies and RBC news articles in the area of "Economics, Finance, money and business". The compiled dictionary differs from the previous ones in the following ways: 1. it is one of the first dictionaries which can be used to rate the tone of economic and financial texts in Russian language by 5 degrees of tonality; 2. allows you to rate the tonality and content of the text by 12 economic topics (e. g., macroeconomics, monetary policy, stock and commodity markets, etc.) 3. the final version of EcSentiThemeLex dictionary is included in the software package (library) ‘rulexicon’ for the programming environment R and Python. Step-by-step examples of using the developed library in the R environment are given. It allows to evaluate the tone and thematic focus of an economic or financial text by means of a concise code. The structure of the library allows you to use the original texts for their assessment without prior lemmatization (the reduction to elementary forms).The resulting EcSentiThemeLex dictionary is included in the rulexicon software package for the R modeling environment .The tonal-thematic dictionary EcSentiThemeLex with all word forms compiled in this work will simplify the solution of applied problems of text analysis in the financial and economic sphere, and can also potentially serve as a basis for increasing the number of relevant studies in the Russian literature. Read more... |