Degree
|
PhD in Technique, Associate Professor, Department of Theoretical informatics and computer technologies, Bauman Moscow State Technical University |
---|---|
E-mail
|
iubutenko@bmstu.ru |
Location
|
Moscow, Russia |
Articles
|
Method for resolving the lexical polysemous search query based on ontologyOne of the factors influencing the relevance of search output is the multivalued search query, expressed by lexical means of a natural language. The multivalued lexical unit manifests itself at the stage of a search query. The method of removing the multivalence of lexical units in a search query based on ontologies is proposed. It is grounded that ontologies allow transferring semantic component of data related to a subject area accurately enough. The proposed method for lexical multivariance resolution can be described as follows. A user search query is received at the search engine input. The search engine contacts the ontology library to find the query. If the lexical unit from the search query is multivalued, the search engine will offer the user a list of subject areas in which the lexical unit from the search query was found. Oft the user searches in advance for the result from a particular subject area. When the subject area is defined, the search engine determines the nearest elements in the structure of ontology, and when ranking the search output will be guided by their presence or absence. The use of ontologies also allows adding synonyms and acronyms meaning the same to the search query. The proposed approach will allow solving lexical multiplicity and significantly relieving the search output, leaving only the subject area of interest to the user. Read more... Method for the extraction of Russian-language multicomponent terms from scientific and technical textsThe article presents a method for extracting Russian-language multicomponent terms from scientific and technical texts based on structural models of terminological collocations. The existing approaches to term extraction on the basis of the method of stable word combination extraction, statistical and hybrid methods are described, and the linguistic aspects of terminology, not covered by the listed methods, are noted. The lexical composition of scientific and technical texts is characterized, the classification of special vocabulary in scientific and technical texts is given. The structural features of terminological vocabulary have been studied. The most productive models of multi-component terminological word combinations in Russian are presented. A method for extracting Russian-language multicomponent terms from scientific and technical texts is offered, and its stages are described. It is shown that the first stage involves morphological and syntactic analysis of the text by attributing to each word its grammatical characteristics. Then there is the exclusion of parts of speech, which can not be part of the Russian multisyllabic terms, as well as stop-words, which together with the term form free word combinations. The resulting word chains are further correlated with the templates of terminological word combinations available in the database of structural models of terms, as well as the terminological dictionary for the presence of the studied candidate term. The necessity of involving a terminologist to resolve ambiguous cases is substantiated. Each step of the method for extracting Russian-language multicomponent terms in scientific and technical texts is illustrated by examples. Further research perspectives are listed, and the necessity of complicating the methods of text extraction, by further classification of terminological vocabulary according to formal and semantic structures, types of anthropomorphic terms, nomenclatural names, normativity/non-normativity of terminological units is substantiated. Read more... |