Degree
|
PhD in Engineering, Assistant, The Branch of National Research University «MPEI» in Smolensk |
---|---|
E-mail
|
originaldod@gmail.com |
Location
|
Smolensk |
Articles
|
Formation of the structure of the intellectual system of analyzing and rubricating unstructured text information in different situations
The analysis of electronic text documents written in natural language is one of the most important
tasks implementing in systems of automated analyzing linguistic information. Today the most complicated
problem is analyzing unstructured text documents coming to various organizations and authorities
through the electronic communications. The increasing volume of such documents leads to
the need to rubricate incoming messages, i.e. to solve the classification task.
The analysis of the scientific works in this field has showed the impossibility of constructing a unified
model for rubricating unstructured electronic text documents in various situations. The main reasons
are the lack of statistical data, the dynamism of the thesaurus and the small size of the incoming
document.
To solve this problem, we propose a multimodel approach to the rubrication that is characterized
by the combined use of intellectual and probabilistic-statistical methods of the text document analysis.
The choice of a specific model is carried out using fuzzy logic algorithms based on the proposed
characteristics (the size of document, the degree of rubric thesaurus intersection, the frequency of
meaningful keywords, etc.).
The implementation of the proposed multimodel approach will improve the accuracy of attributing
unstructured electronic text documents to concrete rubrics taking into account their specificity and
various objectives of practical application in the organization.
Read more...
Developing the economic information system for automated analysis of unstructured text documentsThe study of tasks and methods of automated text rubrication was conducted and their prospects
for the analysis of unstructured electronic text documents were evaluated taking into account
the peculiarities of appeals received from citizens to the authorities. The architecture of the
information system of automated analysis of such documents is developed. It implements the
proposed multi-model approach to the rubrication based on the integrated use of intelligent and
probabilistic-statistical methods. The procedure of processing citizens’appeals received by the authorities
using the document management system and the developed information system is given.
Read more...
Using fuzzy decision trees to rubricate unstructured small-sized text documentsEvery day, a large number of appeals (statements, proposals or complaints) submitted in unstructured
text form are received on Internet portals and e-mails of public authorities. The quality and speed
of automatic processing of such electronic messages directly depend on the correctness of their classification
(rubrication). It consists in assigning the received message to one or several thematic rubrics
that determine the directions of the departments. The choice of a mathematical approach to analysis
and rubrication directly depends on the characteristics of incoming appeals. The analysis of their specifics
(small size, the presence of errors, a free-style of the problem statement, etc.) has revealed the
impossibility of using classical approaches to the classification of text documents. The article suggests
using the apparatus of fuzzy decision tree for rubricating small-sized unstructured text documents arriving
at Internet portals and e-mails of public authorities. It allows classification under conditions of
the rubric intersection and a lack of statistical information for applying probabilistic and neural network
methods. The proposed model for the document rubrication is distinguished by the consideration
of syntactic relationships and roles of words in the sentences based on the use of binary fuzzy decision
tree. The tree is constructed on the basis of the results of analysis of the degree of rubric thesaurus intersection
and the distances between rubrics in the n-dimensional feature space.
Read more...
Analysis of short unstructured documents using fuzzy significance scales and special procedures for economic information integrationThe article proposes a new approach to the automatic analysis of short messages arriving at
Internet portals and e-mails of public authorities. The developed model allows to classify short
unstructured text documents in a lack of statistical information and a low degree of thematic
rubric intersection. The input data for the algorithm for constructing the model is the set of
rubrics and the training sample. Its result is fuzzy scales of significant words in thesaurus of
the rubrics, which ensures the correct presentation of the document characteristics and the
operation of the classification (rubrication) algorithm.
Read more...
Rubrication of text documents based on fuzzy difference relationsOne of the key areas of informatization of public authorities is to develop and implement the systems of automated processing the electronic appeals (applications, complaints, suggestions) of individuals and legal entities that arrive on official websites and portals of government. The rubrication plays an important role in solving this problem. It consists in the appeals’ distribution according to thematic rubrics determining the directions of the activity of departments carrying out processing and preparation of the corresponding response. The results of the analysis of the specific features of such text messages (small size, markup lack, the errors’ presence, thesaurus unsteadiness, etc.) confirmed the impossibility of using traditional approaches to rubrication and justified the feasibility of using data mining methods. The article proposes a new approach to the analysis and rubrication of electronic unstructured text documents arrived on official websites and portals of public authorities. It involves the formation of a tree-like structure of the rubric field, based on fuzzy relationships of differences between the syntactic characteristics of documents. The analysis is based on determining the fuzzy correspondence of these documents by their syntactic characteristics with the values of the clusters’ centers. It is carried out sequentially from the root to the leaves of the constructed fuzzy decision tree. The proposed rubrication method is programmatically implemented and tested in the automated processing and analysis of appeals (applications, complaints and suggestions) of citizens entering the Administration of Smolensk Region. This made it possible to ensure prompt and high-quality updating of rubrics and document analysis under conditions of non-stationary composition of the thesaurus and the importance of rubric words. Read more... Rubrication of text documents based on fuzzy difference relationsOne of the key areas of informatization of public authorities is to develop and implement the systems of automated processing the electronic appeals (applications, complaints, suggestions) of individuals and legal entities that arrive on official websites and portals of government. The rubrication plays an important role in solving this problem. It consists in the appeals’ distribution according to thematic rubrics determining the directions of the activity of departments carrying out processing and preparation of the corresponding response. The results of the analysis of the specific features of such text messages (small size, markup lack, the errors’ presence, thesaurus unsteadiness, etc.) confirmed the impossibility of using traditional approaches to rubrication and justified the feasibility of using data mining methods. The article proposes a new approach to the analysis and rubrication of electronic unstructured text documents arrived on official websites and portals of public authorities. It involves the formation of a tree-like structure of the rubric field, based on fuzzy relationships of differences between the syntactic characteristics of documents. The analysis is based on determining the fuzzy correspondence of these documents by their syntactic characteristics with the values of the clusters’ centers. It is carried out sequentially from the root to the leaves of the constructed fuzzy decision tree. The proposed rubrication method is programmatically implemented and tested in the automated processing and analysis of appeals (applications, complaints and suggestions) of citizens entering the Administration of Smolensk Region. This made it possible to ensure prompt and high-quality updating of rubrics and document analysis under conditions of non-stationary composition of the thesaurus and the importance of rubric words. Read more... |