A Complex Neural Network Model for Predicting a Personal Success based on their Activity in Social Networks

The development and improvement of effective tools for predicting human behavior in real life through the features of its virtual activity opens up broad prospects for psychological support of the individual. The presence of such tools can be used by psychologists in educational, professional and other areas in the formation of trajectories of harmonious person’s development. Currently, active research is underway to determine psychological characteristics based on publicly available data. Such studies develop the direction of “Psychology of social networks”. As markers for determining the psychological characteristics of people, various parameters obtained from their personal pages in social networks are used (texts of posts and reposts, the number of different elements on the page, statistical information about audio and video recordings, information about groups, and others). There is a difficulty in obtaining and analyzing a data set this big, as there are non-linear and hidden relationships between individual data elements. As a result, the classic methods of information processing become inefficient. Therefore, in our work to develop a comprehensive model of success based on the analysis of qualitative and quantitative data, we use an approach based on artificial neural networks. The labels of the input records are used to divide the subjects of the study into five clusters using clustering methods (k-means). In the course of our work, we gradually expand the set of input parameters to include metrics of users’ personal pages, and compare the results to determine the impact of qualitative parameters on the accuracy of the artificial neural network. The results reflect the solution of one of the tasks of the research carried out within the framework of the project of the Russian Science Foundation and serve as material for an information and analytical system for automatic forecasting of human life activity based on the metrics of his personal profile in the social network VKontakte.


INTRODUCTION
In this work we present an interdisciplinary research carried out at the junction of two subject areas: psychology and information systems. The psychology of social networks is one of the attractive directions in the modern psychology, which is aimed to study the features of the I-real and I-virtual personality interaction.
The main vector of the research presented here is aimed to predict the real behaviour of a person through the features of his virtual activity in social networks (Hiranyachattada & Kusirirat, 2020;Khandelwal, & Gotlieb, 2021;Levina et al., 2019;Minakhmetova et al., 2017;Orekhovskaya et al., 2019;Owan et al., 2020;Piralova et al., 2020;Razumovskaya et al., 2018;Rubio et al., 2020;Yusupov, 2019). The markers are the metrics of 2 / 9 the personal profile of a social network's user. It is a variety of digital content that a person leaves in the form of posts, audio and video plots, photos, etc. The complexity of such studies lies in the need to collect and analyse a large amount of data from social networks, and because of the presence of nonlinear relationships between the behavioural characteristics of a person in real life and indicators of his virtual activity (Du et al., 2018;Galchenko et al., 2020;Tugun et al., 2020). This results on the low efficiency of traditional methods of data processing and analysis. One of the solutions is the possibility of using machine learning methods.
Mining data from social networks is one of the most rapidly developing areas of scientific research at the present time. By using machine learning and artificial intelligence methods (especially artificial neural networks) it is possible to effectively solve such problems as classification, forecasting, anomaly detection and clustering (DiFranco & Santurro, 2020;Young et al., 2018). In the intelligent analysis of social networks, various types of neural networks are used, the most popular of which are classical perceptions (Jabłońska & Zajdel, 2020) and various types recurrent neural networks (RNN, GRU) (Balakrishnan & Geetha, 2020;Yang et al., 2017). It is important to note that recently developed deep learning neural networks based on graphs (GCNs) (Tan et al., 2019) are beginning to use quite effectively. For the analysis of qualitative data (posts, comments), text analysis methods (for example, text embedding, word2vec) based on dynamic (DNN) and recurrent neural (LSTM) networks are also intensively used (DiFranco & Santurro, 2020;Khan & Chang, 2019).
In addition, there are works that take advantage of machine learning and neural networks to isolate other characteristics from data obtained from social networks. For example, in Mukhametshin et al. (2019), Ophir et al. (2020), Shatte et al. (2019), andZheng et al. (2020), deep neural networks and other machine learning techniques analyze the risk of suicide based on the texts of Facebook posts.

Purpose and Objectives of the Study
The aim of the study is to develop a complex neural network model of professional success based on the analysis of qualitative and quantitative metrics of the user's personal profile on the VKontakte social network. The term complex model in this work means the construction of a generalized neural network system for predicting the success of an individual through the features of his professional activity via qualitative and quantitative metrics.

METHODOLOGY
In this study we used machine learning methods and artificial neural network methods. Software modules have been developed in Python programming language by using the open API interface of the Vkontakte social network (VK API) for downloading data. Initially, based on clustering methods (K-Means algorithm), the subjects under analysis were divided into separate clusters depending on their professional success (Vakhitov et al., 2019).
Our computational experiments are divided into 5 parts. The first part relates to setting up and training a neural network that can distinguish between two professional success clusters (1 and 5) based only on quantitative data. In this part the following set of input quantitative parameters is used: 'friends '(the number of friends),' followers' (the number of subscribers), 'walls' (the number of posts on the wall), 'photos' (the number of photos published by the user), 'pages' (the number of interesting pages that current person is subscribed to).
Next, we extend the set of parameters by adding the information about the number of faces in the photo of person's VKontakte profile (we will call this parameter 'count_persons'). The information about the number of faces on a photo was obtained using the YOLO library (Redmon & Farhadi, 2018). This library is able to recognize objects in images very quickly. We used a pretrained model from the library's official website and used it to process images obtained from the personal

Contribution to the literature
• This study presents a comprehensive artificial neural networks-based model of success by using qualitative and quantitative data, obtained from social networks. • This study provides a generalized system for predicting the success of an individual through the features of his professional activity via social network activity metrics. • This study provides support for research based on big data analysis, when there are nonlinear and hidden connections between individual elements and classical information processing methods are ineffective. • The results contribute to the understanding of the processes of interaction between the "Self-Real" and the "Self-Virtual" and contribute to the research of the psychology of social networks.
pages of VKontakte users. Next, for each photo, we count the number of neural network outputs labeled "person". It is this number that forms the parameter that is included in the category of qualitative data, which should improve the accuracy of the neural network. Using this set of parameters, we train two neural networks. One of them distinguishes only 1 and 5 clusters (that is, only people from 1 and 5 success categories get into the training and test collections). The second neural network uses six input parameters to distinguish between pairs of clusters (1 and 2, 1 and 3, 1 and 4, 1 and 5).
In the third part of our work, we add to the input parameters information about the time spent by a person on the VKontakte network ('online_time'). This information was obtained by periodically checking the online status of each person in our study sample. On such a set of parameters, the neural network that distinguishes between clusters 1 and 5 is trained.
In the fourth part, we conducted an experiment in which the input data is an information about groups of a person. For each person, a sparse vector of numbers [0,1] is formed, the size of which reflects the size of the set of groups in which people from the collection under study consist. Groups that consist of less than 50 people from the sample are not included in the vector. An additional column with numbers from the range [1-5] is used as labels that determine the category of a person's success. A neural network trained on such data is used to distinguish between two arbitrary clusters (for example, first and fifth).
In the fifth experiment, we trained a neural network, in which 5 categories of success are reduced to two types of label, in which the first label corresponds to data from the 1 category of success, and the second -data from all other categories, that is, 2, 3, 4, 5 categories of success. Simply put, the purpose of such a neural network is to distinguish people from the first category of success from all other people. The reverse experiment was also conducted, in which the fifth category was distinguished from all the others (categories 1, 2, 3, 4). The input for this neural network is a set of 6 elements ('friends', 'followers', 'walls', 'photos', 'pages', 'count_persons').
The main purpose of these computational experiments is to test the positive effect of adding qualitative parameters into the set of input parameters. If this effect is confirmed, it can be approved that a complex neural network model that uses both quantitative and qualitative data works better than the basic one.
A three-layer perceptron was chosen as the main type of neural network for parts 1-3, 5 of our experiments. We used two sequential Dense layers with ReLU activation function containing 30 and 10 inputs respectively, and the output layer with 1 neuron with sigmoid activation function, and binary cross-entropy loss function. For experiment number 4, the same neural network was used, but with 50 inputs on the first layer.
It is also worth noting that all input datasets were balanced, that is, if one of the data categories has much less data than the others, then the amount of data in other categories is equal to the count of elements from the category with fewer records. Thus, the representativeness of the "Accuracy" metric is achieved, and it can be used as the main one when determining the success of training a neural network. In addition to this metric, we also used the following metrics of neural network training success: Accuracy, Sensitivity, Specificity, Precision, F1-score. These parameters are calculated on the basis of confusion matrix and described as follows (Fawcett, 2005): 1) Accuracy is the degree of correspondence of the categories predicted by the neural network with the real set of categories. This parameter is calculated by using the formula: "Accuracy = (TP + TN) / (TP + TN + FP + FN)", in which TP is true positive, TN is true negative, FP is false positive, FN is false negative outcome. Such symbols will be used in the descriptions of other metrics.
3) Specificity is a proportion of actual negative which are predicted negative: «Specificity = TN / (TN + FP)». 5) F1-score can be interpreted as a weighted average of the Precision and Recall, where an F1 score reaches its best value at 1 and worst score at 0. «F1 score = 2 / (Precision-1 + Sensitivity-1)»

4) Precision (or
In addition to the above metrics, receiver operating characteristic (ROC) curves were constructed for each version of the neural network. A ROC curve is a graphical plot of the true positive rate, TPR = Sensitivity against the false positive rate, FPR = 1 -Specificity. The ROC curve is the curve most often used to represent the results of binary classification in machine learning. We used ROC to evaluate the quality of trained neural network. An example of a ROC curve is shown in Figure  1.

RESULTS
To achieve statistical reliability of the results we performed 9 iterations for all binary classification tasks (that is, in those experiments in which it is necessary to divide the collection into 2 categories), and then the average value of the neural network training success metrics was calculated. These values are presented in Table 1.

/ 9
The results of training neural networks (the average values of performance parameters) that separates clusters 1 and 5 on a set of parameters consisting of 5 elements ('friends',' followers',' walls',' photos',' pages') are presented in the line "5 parameters, 1 and 5 cluster" in Table 1. The ROC curve for this neural network is shown in Figure 2a. The effect of adding the parameter 'count_persons' on the performance is presented in the line "6 parameters, 1 and 5 cluster", the ROC curve is in Figure 2b. Next, we added the 'online_time' parameter and performed the same experiment, the results given in the line "7 parameters, 1 and 5 cluster", the ROC curve is in Figure 2c.
These results clearly show that adding a qualitative parameter to the input of the neural network gives an increase in accuracy from 78% to 80%. Inclusion into the model the quantitative parameter 'online_time' also led to an improvement in the performance of the neural network (this is also indicated by the increase in the area under the ROC curves in Figure 2).
Next, we will describe the results of our experiments on classification one cluster from all the others. In the Table 1 we show also the classification quality metrics of 1 cluster from the rest. The parameter's values show good results even on the basic neural network setup (the line "6 parameters, 1 and 2,3,4,5 cluster"). However, the classification results of cluster 5 from the rest with sufficient accuracy is not possible now (the line "6 parameters, 5 and 1,2,3,4 cluster"). The ROC curves for these experiments are shown in Figure 3a and Figure 3b, respectively.
Also were conducted computational experiments of neural network training aimed to classify two clusters from the following pairs of clusters: first and second, first and third, first and fourth, first and fifth. In all of these experiments 6 input parameters were used -'friends', 'followers', 'walls', 'photos', 'pages', 'count_persons'. The values of performance metrics about these experiments are given in Table 2, as well as ROC curves in Figure 4. We observed a steady increase of accuracy and another parameter values with an increase in the difference between cluster numbers (clusters that are far from each other in terms of professional success have markedly different quantitative and qualitative characteristics).
A similar experiment was conducted with input data that includes only information about the groups of subjects. The results are shown in Table 3, as well as ROC curve in Figure 5. Interestingly, that such a neural network shows even better results than a neural network that works on 6 parameters. c) 7 parameters, 1 and 5 clusters

DISCUSSION
In the course of the computational experiments, a positive effect on neural network accuracy was found when adding more input parameters for training the neural network. In addition, it was confirmed that the distribution of the original collection of people into clusters of success is correct, since there is a monotonous increase in the differences with an increase in the range between the two distinguished clusters. Separating cluster 5 (the most professionally successful people) from all the others did not show good results, but we will try to fix this in the future. Other qualitative data will also be added later (a categorical network based on the most frequent words in the texts of posts and reposts in individual success clusters, as well as information about audio and video recordings).
The next stage in the development of the information and analytical system will be to combine the results of identifying markers of academic success and professional success into a single neural network model and use it to predict the personal activity of individuals.

CONCLUSION
In this work was built a complex neural network model, which includes quantitative (friends, subscribers, posts, interesting pages, photos, videos, music) and qualitative characteristics (photo content), groups of their profiles on the VKontakte social network. The results of the study reflect the solution to one of the tasks of the RSF project implementation and serve as empirical material for the development of a theoretical and applied model for predicting the vital activity of an individual in his professional activity based on the integration of psychological patterns of manifestation of behavioral and cognitive processes of users in real and virtual space with the main metrics of their activity in social networks. The applied significance of the results is determined by the possibility of predicting the professional success of employed or potential employees of various organizations.
Author contributions: All authors have sufficiently contributed to the study, and agreed with the results and conclusions.