The paper proposes an algorithm for automated search and initial analysis of sociological information aimed at studying the territorial identity of city area residents using Internet sources. Communities of social networks, e.g. VKontakte, are considered as the main data source, and websites of topographic objects found in the territories under study are used as auxiliary information sources. It is demonstrated that, in terms of information support, public pages and groups with open or restricted access walls have the greatest potential. The developed algorithm implies selecting relevant groups, finding content concerning area issues, and determining the indices of community activity in discussing territorial problems. The required information is retrieved through the interaction with a social network server with the use of the official Application Programming Interface (API). To identify communities and posts, it is proposed to apply methods of morphological analysis of textual information. The software implementation of the algorithm is described in Python 3.8.5, including original functions for the acquisition of data on communities by their identification numbers, for the formation of a set of urbanonyms for a specified area, and some other ones. The developed program has been used to analyze territorial groups in three areas of Moscow; the results of the analysis enable us to estimate the degree of the territorial identity of their residents. The analysis of the error in the results of automated data collection and processing shows good agreement of these results with manually obtained ones, i.e. the error is 2.6% in the identification of relevant groups and about 3% in the identification of posts on area issues. Therewith, a much higher speed of response and lower labor effort required to perform routine operations allow the algorithm and the implementing computer program to be viewed as an effective tool for sociological research based on data from social networks.
Key words
automated data mining, text analysis, Python, social networks, territorial identity, sociological research