Degree
|
Cand. Sci. (Eng.), Associate Professor, Software and Systems Engineering Department, Institute of Mathematics and Computer Sciences |
---|---|
E-mail
|
m.s.cyganova@utmn.ru |
Location
|
Tyumen, Russia |
Articles
|
Classification system for documents with mine surveying dataAll enterprises engaged in exploration activities on the territory of the Russian Federation, are facing the need to formulate tasks for the mine surveyor service and control their execution. It affects enterprise’s workflow process. Due to it, a problem of organization of efficient document processing in electronic document management systems (timely identification of documents containing mine surveying data) takes place. The article presents possible solution of this problem – automated document classification system into EDMS in the form of optional add-on for 1C:Document Management. Within the classification system creation a preprocessing script for primary document texts, including cleaning, lemmatization, stop words removing, as well as preparation of input features for the classifier were developed and implemented. Applicability of different machine learning algorithms to solution of considering classification problem was studied, the values of hyperparameters providing the highest value of the ROC AUC metric were determined. The quality of all obtained models was assessed using metrics Precision, Recall and F-measures, the stability of the classification quality to changes in the input data was investigated. The identified problem of instability of classification results was solved by building and implementing a machine learning model in the form of ensemble of classifiers. Classification model (an ensemble of clusters) was tested on the set of real documents of Gazprom nedra Ltd; classiffication quality on the test sample by ROC AUC metric was 0,91. Except the classification module itself, developed system contains the storage database for learning outcomes, function library for organization of work with the database and API interfaces allowing to process classification requests, coming from external systems. These API interfaces, in particular, implement the ability to load saved trained models, validate data coming from external systems, preprocess input text documents, train new models and assess their quality, save both trained models and the results of their testing. Also the possibility of the additional training of the models on a new data was realized. Read more... Predicting the deterioration of the condition of patients with cardiovascular diseases based on machine learning methodsThis study was carried out as part of a project to develop a subsystem for predicting the deterioration of the condition of patients with cardiovascular diseases on the platform of the medical information system "1C: Medicine. Hospital". The relevance of this task is due to the particularly high danger of this group of diseases and the necessity to make timely decisions about hospitalization or treatment when there is a risk of deterioration of the patient’s condition. The goal of this work was to create a tool that allows the attending physician to quickly obtain a reasonable assessment of the risk of deterioration of the patient’s condition based on available medical indicators. As a part of this study, an analysis of more than 30 thousand records containing patient health indicators downloaded from the regional medical information system was performed. The data set was labeled in accordance with the available information about medical decisions made (by attending physicians at the clinic and hospital). The lack of a standardized input of health indicators into the medical system required a significant amount of work to pre-process the input data and prepare it for modeling purposes. The prepared data was used to build a predictive model applying machine learning methods. Based on the results of the computational experiments, gradient boosting was chosen as the learning algorithm; the optimal parameters of this algorithm were selected. The prediction quality of the trained models was tested on data from the labeled set that did not participate in the training process. The quality indicators of the best model on test data were precision = 0.87; recall = 0.96; AUC-ROC = 0.97. The integration of trained models with the attending physician’s automated workstation in the 1C: Medicine. Hospital system was implemented. Thus, an algorithm for processing patient health indicators from downloading primary data from the medical accounting system to obtaining a forecast was developed, taking into account the peculiarities of data storage in the system and allowing the doctor to quickly receive information about identified risk cases after each update of indicator values in the system. It was shown that standardizing the values of medical research results entered into the system will help to improve the quality of forecasting by increasing the model’s stability to changes in input data. Read more... |