+7 (495) 987 43 74 ext. 3304
Join us -              
Рус   |   Eng

Authors

Kozlov P.

Degree
PhD in Engineering, Assistant, The Branch of National Research University «MPEI» in Smolensk
E-mail
originaldod@gmail.com
Location
Smolensk
Articles

Formation of the structure of the intellectual system of analyzing and rubricating unstructured text information in different situations

The analysis of electronic text documents written in natural language is one of the most important tasks implementing in systems of automated analyzing linguistic information. Today the most complicated problem is analyzing unstructured text documents coming to various organizations and authorities through the electronic communications. The increasing volume of such documents leads to the need to rubricate incoming messages, i.e. to solve the classification task. The analysis of the scientific works in this field has showed the impossibility of constructing a unified model for rubricating unstructured electronic text documents in various situations. The main reasons are the lack of statistical data, the dynamism of the thesaurus and the small size of the incoming document. To solve this problem, we propose a multimodel approach to the rubrication that is characterized by the combined use of intellectual and probabilistic-statistical methods of the text document analysis. The choice of a specific model is carried out using fuzzy logic algorithms based on the proposed characteristics (the size of document, the degree of rubric thesaurus intersection, the frequency of meaningful keywords, etc.). The implementation of the proposed multimodel approach will improve the accuracy of attributing unstructured electronic text documents to concrete rubrics taking into account their specificity and various objectives of practical application in the organization.
Read more...

Developing the economic information system for automated analysis of unstructured text documents

The study of tasks and methods of automated text rubrication was conducted and their prospects for the analysis of unstructured electronic text documents were evaluated taking into account the peculiarities of appeals received from citizens to the authorities. The architecture of the information system of automated analysis of such documents is developed. It implements the proposed multi-model approach to the rubrication based on the integrated use of intelligent and probabilistic-statistical methods. The procedure of processing citizens’appeals received by the authorities using the document management system and the developed information system is given.
Read more...

Using fuzzy decision trees to rubricate unstructured small-sized text documents

Every day, a large number of appeals (statements, proposals or complaints) submitted in unstructured text form are received on Internet portals and e-mails of public authorities. The quality and speed of automatic processing of such electronic messages directly depend on the correctness of their classification (rubrication). It consists in assigning the received message to one or several thematic rubrics that determine the directions of the departments. The choice of a mathematical approach to analysis and rubrication directly depends on the characteristics of incoming appeals. The analysis of their specifics (small size, the presence of errors, a free-style of the problem statement, etc.) has revealed the impossibility of using classical approaches to the classification of text documents. The article suggests using the apparatus of fuzzy decision tree for rubricating small-sized unstructured text documents arriving at Internet portals and e-mails of public authorities. It allows classification under conditions of the rubric intersection and a lack of statistical information for applying probabilistic and neural network methods. The proposed model for the document rubrication is distinguished by the consideration of syntactic relationships and roles of words in the sentences based on the use of binary fuzzy decision tree. The tree is constructed on the basis of the results of analysis of the degree of rubric thesaurus intersection and the distances between rubrics in the n-dimensional feature space.
Read more...

Analysis of short unstructured documents using fuzzy significance scales and special procedures for economic information integration

The article proposes a new approach to the automatic analysis of short messages arriving at Internet portals and e-mails of public authorities. The developed model allows to classify short unstructured text documents in a lack of statistical information and a low degree of thematic rubric intersection. The input data for the algorithm for constructing the model is the set of rubrics and the training sample. Its result is fuzzy scales of significant words in thesaurus of the rubrics, which ensures the correct presentation of the document characteristics and the operation of the classification (rubrication) algorithm.
Read more...

Rubrication of text documents based on fuzzy difference relations

One of the key areas of informatization of public authorities is to develop and implement the systems of automated processing the electronic appeals (applications, complaints, suggestions) of individuals and legal entities that arrive on official websites and portals of government. The rubrication plays an important role in solving this problem. It consists in the appeals’ distribution according to thematic rubrics determining the directions of the activity of departments carrying out processing and preparation of the corresponding response. The results of the analysis of the specific features of such text messages (small size, markup lack, the errors’ presence, thesaurus unsteadiness, etc.) confirmed the impossibility of using traditional approaches to rubrication and justified the feasibility of using data mining methods. The article proposes a new approach to the analysis and rubrication of electronic unstructured text documents arrived on official websites and portals of public authorities. It involves the formation of a tree-like structure of the rubric field, based on fuzzy relationships of differences between the syntactic characteristics of documents. The analysis is based on determining the fuzzy correspondence of these documents by their syntactic characteristics with the values of the clusters’ centers. It is carried out sequentially from the root to the leaves of the constructed fuzzy decision tree. The proposed rubrication method is programmatically implemented and tested in the automated processing and analysis of appeals (applications, complaints and suggestions) of citizens entering the Administration of Smolensk Region. This made it possible to ensure prompt and high-quality updating of rubrics and document analysis under conditions of non-stationary composition of the thesaurus and the importance of rubric words. Read more...

Rubrication of text documents based on fuzzy difference relations

One of the key areas of informatization of public authorities is to develop and implement the systems of automated processing the electronic appeals (applications, complaints, suggestions) of individuals and legal entities that arrive on official websites and portals of government. The rubrication plays an important role in solving this problem. It consists in the appeals’ distribution according to thematic rubrics determining the directions of the activity of departments carrying out processing and preparation of the corresponding response. The results of the analysis of the specific features of such text messages (small size, markup lack, the errors’ presence, thesaurus unsteadiness, etc.) confirmed the impossibility of using traditional approaches to rubrication and justified the feasibility of using data mining methods. The article proposes a new approach to the analysis and rubrication of electronic unstructured text documents arrived on official websites and portals of public authorities. It involves the formation of a tree-like structure of the rubric field, based on fuzzy relationships of differences between the syntactic characteristics of documents. The analysis is based on determining the fuzzy correspondence of these documents by their syntactic characteristics with the values of the clusters’ centers. It is carried out sequentially from the root to the leaves of the constructed fuzzy decision tree. The proposed rubrication method is programmatically implemented and tested in the automated processing and analysis of appeals (applications, complaints and suggestions) of citizens entering the Administration of Smolensk Region. This made it possible to ensure prompt and high-quality updating of rubrics and document analysis under conditions of non-stationary composition of the thesaurus and the importance of rubric words. Read more...