Personalized recommendation service of social media based on collaborative filtering and gene map

Abstract

With the popularity of social media, personalized recommendation services have become increasingly important. However, traditional collaborative filtering recommendation algorithms face many challenges when dealing with social media data. To improve the accuracy and efficiency of recommendation, this paper presents a collaborative filtering recommendation service in view of social media gene map. This study creates a social media genetic map by analyzing social media data and extracting user interests and behavioral characteristics. On this basis, a collaborative filtering recommendation model is constructed that takes into account the social network, content and historical behavior of users. In the performance testing of recommendation models, the research methods were compared with collaborative filtering algorithms based on alternating least squares, collaborative filtering algorithms based on generative game neural networks, and singular value decomposition algorithms. In model training, the model constructed in this study was superior to the other three types of algorithms in convergence speed and maximum accuracy. In the recommendation testing of three resources, the constructed recommendation model showed the best performance. Through experimental verification, the method has shown excellent outcomes in terms of recommendation accuracy and timeliness, providing an effective solution for social media recommendation services.

Keywords

Social media personalized recommendation gene map collaborative filtering

1. Introduction

As the boost of the Internet, social media (SM) is essential for people's daily lives. The large amount of information generated and exchanged by users on SM provides rich data resources for personalized recommendation services.^1,2 However, due to the diversity and dynamism of user interests, as well as the vast and complex nature of SM data, providing accurate and timely recommendations to users has become increasingly challenging.^{3, 4} Therefore, studying how to use SM data to improve the effectiveness and efficiency of recommendation services has become a hot topic in both academia and industry.^{5, 6} In view of this background, a personalized information recommendation system is developed. Personalized information recommendation services have been widely studied and applied in recent years due to their targeting, proactivity, and flexibility. For example, JD and Taobao e-commerce platforms, short video software such as Tiktok and Kuaishou, and music platforms such as NetEase Cloud music and Kugou Music are widely used. On the one hand, the personalized recommendation system can actively provide users with targeted information and services that can meet the needs of users, thus saving a lot of time and energy for users to search for information. Conversely, the system enhances the user-service information interaction ability with regard to the service content. This can be achieved without any waste of resources and can improve the user experience, thus enhancing user stickiness and ultimately leading to profits. However, most of the existing recommendation methods carry out information recommendation based on the similarity of users, labels or items. When the data scale is large, the accuracy of recommendation will be greatly reduced due to the single calculation method. Traditional collaborative filtering (CF) algorithms do not take into account the time factor, and often cannot timely change recommended items when user interests change, resulting in low timeliness. Therefore, this paper proposes a CF algorithm based on SM gene map (GM). This algorithm integrates project similarity, rating similarity, and semantic similarity together, and introduces weighted Slope 1 to obtain the predicted score of users for project services, thereby achieving more accurate user content comprehensive information recommendation services.

The innovations of the study and its contributions lie in two aspects. First, compared with the traditional CF, the proposed method improves the accuracy and personalization of the recommendation system by integrating richer user behavior data and social network structure. The algorithm not only analyzes the user's historical interaction data, but also considers the user's location and connection patterns in the social network, which are often ignored by traditional methods. Second, although graph neural networks have been shown to perform well in processing graph-structured data, they tend to require a lot of computational resources in practical applications and are very sensitive to the adjustment of parameters. In contrast, the algorithm proposed in this study pays more attention to the computational efficiency and stability of the algorithm while ensuring the recommendation quality, making it more suitable for use in resource-constrained environments.

This study mainly has five parts. Part 1 is an overall overview of the research. Part 2 is a summary of relevant literature at home and abroad. Part 3 is separated into two sections. The first section introduces the construction of SM GM and user recommendation services. The second section constructs a CF model in view of SM GM. Part 4 verifies the presented model. Part 5 is a summary of this study and prospects for future research. This study aims to provide a new approach and method for SM recommendation services to more effectively meet the personalized needs of users, and also provide valuable references for research in related fields. The abbreviations and their full names in this study are shown in Table 1.

Table 1.
A collection of abbreviations used in this study.

Abbreviation Full name

SM Social media

CF Collaborative filtering

GM Gene map

KMC K-means clustering

SOA Slope one algorithm

DNA Deoxyribonucleic Acid

SMGs Social media genomes

BGs Biological genes

SGs Social genes

NGs Network genes

ALS-CF Collaborative filtering of Alternative Least Squares

GAN-CF Collaborative filtering of Generation Antagonism Neural Network

SVD Singular value decomposition

SMG-CF Collaborative filtering of social media gene maps

Abbreviation	Full name
SM	Social media
CF	Collaborative filtering
GM	Gene map
KMC	K-means clustering
SOA	Slope one algorithm
DNA	Deoxyribonucleic Acid
SMGs	Social media genomes
BGs	Biological genes
SGs	Social genes
NGs	Network genes
ALS-CF	Collaborative filtering of Alternative Least Squares
GAN-CF	Collaborative filtering of Generation Antagonism Neural Network
SVD	Singular value decomposition
SMG-CF	Collaborative filtering of social media gene maps

2. Literature review

As people's personalized needs increase, how to improve the recommendation effectiveness of recommendation services has become the research object of many scholars. Sun et al. found that in social networks, there are significant differences in user preference information, and various factors in each type of information also have different amounts of information about user annotation behavior. To address this issue, researchers have proposed a deep neural network for label recommendation. The effectiveness and superiority of this model have been verified through experiments.⁷ Peng et al. found that existing software crowdsourcing recommendation mechanisms do not take consideration into the contextual information. Therefore, researchers have proposed a new framework for worker ability correction and short-term attention network recommendation. The experiment indicated that this framework could markedly enhance the crowdsourcing recommendations.⁸ Zhang et al. found that it is hard for achieving excellent retrieval results with text labels. Therefore, researchers have proposed a new method in view of image processing for tourist attraction location recognition and personalized recommendation. The accuracy and effectiveness of the algorithm and model have been verified through experiments.⁹ Lv et al. proposed an interpretable recommendation model in view of extreme gradient lifting trees. This method could provide reference reasons, ultimately achieving the goal of ensuring recommendation quality. This recommendation method could help users comprehend well the logic of system recommendations and increase their trust.¹⁰

In the existing recommendation services, the CF algorithm can make personalized recommendations in view of user behavior and preferences, and does not rely on the content information of items. This makes it more suitable for sparse data and cold start problems, which has attracted a lot of attention from scholars. Liu et al. believed that the manufacturing service industry typically operated in the form of composite services, and existing personalized recommendation technologies based on web services were difficult to play a role. Therefore, they proposed a new hybrid algorithm, which is a CF algorithm in view of clustering to quantify customer preferences. A new hybrid algorithm was formed by combining other algorithms, and its effectiveness was verified through experiments.¹¹ Tripathi et al. proposed a new method and algorithm for k-means clustering (KMC) to address clustering quality issues. KMC algorithm is a commonly used partition clustering algorithm in CF recommendation system. Researchers used singular value decomposition to solve the initialization problem of k-means, and then refined it. The experiment indicated that this method helped to improve clustering quality.¹² Almu et al. believed that the CF recommendation technology currently applied to various similarity measures would take a long time to make predictions. Researchers conducted experimental evaluations on four similarity measures for the same dataset using programming languages. The final experimental results showed that the Taxicab geometry similarity measure had high efficiency in the application process.¹³ Badis et al. found that as the lack of a central server with a complete network view, social network functionality was difficult to fully utilize some of its advanced features. Therefore, researchers used CF for recommending content in social networks. This method supported privacy protection and addressed relevant cold start issues, indicating that this method was effective.¹⁴

In summary, CF recommendation services have undergone extensive research and optimization, with personalized recommendation being a key direction.¹⁵ However, it is found that SM recommendation services do not achieve ideal results due to their complex and multidimensional data. Therefore, this paper constructs a CF recommendation model for SM GM, starting from the essential characteristics of SM, exploring more accurate and effective recommendation methods to help meet the personalized needs of SM users.

3. CF recommendation service in view of SM GM

In the era of big data, personalized recommendation technology is crucial for connecting users with high-quality content, so this study explores a new model of CF recommendation system, i.e., a recommendation model based on SM GM. The design of this model is mainly divided into two parts, which are “3.1 Construction of SM GM and User Recommendation” and “3.2 CF Recommendation Model in view of SM GM”. The design and implementation of the whole recommendation model can be divided into five parts: data preprocessing, user feature extraction, similarity calculation and nearest neighbor selection, recommendation list generation, and privacy protection.

In the data preprocessing stage, basic user information and social interaction data, such as liking, commenting, and retweeting behaviors, are collected through SM APIs and web crawling techniques. Then, text cleaning methods are used to remove noisy data, including removing useless characters and error messages, to ensure data quality. In the user feature extraction stage, multi-dimensional information such as biological features (gender, age), social features (friend network, participation groups) and online behavioral features (posting frequency, content preference) are first extracted based on the user's behavioral data to construct a SM GM. Secondly, the extracted features are encoded and processed for subsequent similarity calculation and recommendation algorithm application. In the similarity calculation and nearest neighbor selection stage, an improved weighted slope one algorithm (SOA) is designed, and then the similarity between items is calculated based on user ratings and SM behavior data using content analysis and CF techniques. The final designed recommendation algorithm is implemented in the recommendation list generation phase and the performance of the algorithm is tested. Finally, in terms of privacy protection measures, data security in storage and transmission is mainly protected by data anonymization process and data encrypted storage and transmission. Through the above steps, this study aims to implement a CF recommendation system that can effectively utilize SM data while taking into account the privacy and security of the users. This approach based on SMG provides a novel personalized recommendation service strategy by deeply analyzing the social networks and behavioral patterns of the users.

3.1 Construction of SM GM and user recommendation

The “SM genetic map” is a novel concept that establishes a connection between human behavior on SM and their intrinsic traits, similar to the biological Deoxyribonucleic Acid (DNA) that determines their physiological characteristics. Although the concept of the “SM genetic map” originated from genetic analogies in the existing literature, this study further innovates it by applying it to recommender systems through a more detailed analysis of user behavioral data.¹⁶ Unlike traditional recommendation systems that use simple user characteristics such as age and geographic location, this study delves deeper into user behavior patterns and social interactions on social networks, resulting in more complex and dynamic user profiles. For example, user content preferences, social circle characteristics, etc.

People's activities on SM, such as posting, music selection, and online duration, can be seen as clues to reveal their personalities and interests. These behavioral data can be collected and analyzed like studying biological DNA to identify key factors representing individual intrinsic characteristics. These factors are considered to be the “genes” of network individuals.^{17, 18} By collecting, processing, and analyzing data to extract user features, the decisive genes that determine these features are identified, ultimately forming a gene set as shown in equation (1).

C = {c_{1}, c_{2}, \dots, c_{n}} \to G = {g_{1}, g_{2}, \dots, g_{n}}

(1)

In equation (1), $c_{i}$ represents user characteristics, $i = 1, 2, \dots, n$ . C serves as the set of user features. $g_{i}$ represents a certain gene of the user, and G represents the set of genes of the user. According to the characteristics of users, SM genomes (SMGs) can be divided into three categories, including biological genes (BGs), social genes (SGs), and network genes (NGs). Therefore, the SM genome can be represented as $S M G = {B G s, S G s, N G s}$ , as shown in Figure 1.

Figure 1.

Sm genome.

In Figure 1, in the SM GM, each gene represents a certain feature of the network entity. Gene fragments represent the characteristics of network entities in a certain aspect, and these gene fragments are unrelated to each other. By connecting these gene fragments in a specific way, a SM GM is formed. In view of the different characteristics of SM, GM can be divided into three parts, including biological gene fragments, social gene fragments, and network gene fragments. In SM, users generate a lot of behavioral data. By analyzing and mining these data to extract attributes that can stably represent the basic characteristics of users, a SM GM can be obtained. This process mainly includes two steps: first, modeling the behavioral data of SM users, and then classifying the stable attributes of users. The framework for constructing a SM GM is shown in Figure 2.

Figure 2.

The construction framework of SM GM.

This architecture comprehensively considers multiple factors for studying the characteristics of SM users, and further separates them into three sub-categories: BGs, SGs, and NGs. The biogenetic component covers inherent characteristics of SM users such as gender, geographic location, and age. The SGs include information about the user's interests, which are collected and quantified by analyzing the user's interactions on SM, such as the frequency of liking and commenting on specific topics. For example, a user's interests can be identified by how often he or she participates in discussions and activities on a particular topic, and this data can be converted into a quantitative score. NGs, on the other hand, describe how active and influential a user is in a social network, which is quantified by analyzing the frequency of a user's posts and the average number of likes and comments on a post.

These three gene sub-classes, through interconnection and interaction, together form a SM GM, providing each user with a unique identifier. In the SM GM, the selection of gene fragments also affects the matching degree of users. The stability of SGs is relatively high, and differences are not easy to occur. If only this is used as a matching criterion, it is often recommended for people from the same region and age group. Therefore, according to the different characteristics of SM, it is necessary to select appropriate core genes and assign appropriate weights. To eliminate human bias, this study assigns gene fragments using entropy weight method.^{19, 20} Entropy weight method is an objective weight determination method, which is widely used in multi-attribute decision analysis. It determines the importance of each gene segment by analyzing its information entropy. Information entropy is an index to measure the degree of information disorder, reflecting the degree of data dispersion. In the study, the information entropy of each gene segment can reveal how dispersed and effective that segment is in describing user characteristics. A gene segment with high information entropy indicates that users have a large feature difference in this dimension, so this segment is less important in the analysis of user characteristics. On the contrary, gene fragments with low information entropy indicate that users have less difference in this dimension, and such fragments are more critical for the characterization of user characteristics. The weights of SM gene segments are determined as shown in Table 2.

Table 2.

Sm gene fragment weight confirmation system.

Item	Gene segment	Evaluating indicator
SM genetic map	Biological genes	Gender
	Biological genes	Age
	Social genes	Content
	Social genes	Hobby
	Network genes	Activity
	Network genes	Impact degree

In the assigning values using the entropy weight method, it is first essential for standardizing the evaluation indicators, as shown in equation (2).

r_{i j} = \frac{x_{i j} - x_{i m i n}}{x_{i m a x} - x_{i m i n}}

(2)

In equation (2), $x_{i j}$ and $x_{i j}$ respectively represent the values of the $j$ -th core gene evaluation index for the $i$ -th fragment before and after standardization. $x_{i m a x}$ and $x_{i m i n}$ represent the maximum and minimum values of the $j$ -th core gene evaluation index for the $i$ -th fragment.²¹ The decision matrix is shown in equation (3).

R = (r_{i j})_{m \times n}

(3)

In equation (3), m serves as the quantity of gene fragments. n serves as the quantity of evaluation indicators on the gene segment.²² The value of the entropy of the $i$ -th evaluation indicator is shown in equation (4).

H_{i} = \frac{- 1}{\ln n} \sum_{j = 1}^{n} f_{i j} \ln f_{i j}

(4)

In equation (4), $i = 1, 2, \dots$ , $f_{j i} = \frac{r_{i j}}{\sum_{j = 1}^{n} r_{i j}}$ . Then the $j$ -th evaluation indicator is calculated, as shown in equation (5).

W_{j} = \frac{1 - H_{j}}{n - \sum_{j = 1}^{n} H_{j}}

(5)

In equation (5), $W_{j}, j = 1, 2, 3, 4, 5, 6$ respectively represent the sizes of the six core weights in the gene fragment, with $W_{1} + W_{2} + W_{3} + W_{4} + W_{5} + W_{6} = 1$ . By calculating the matching degree of three gene fragments of a user, corresponding matching degree data can be obtained.²³ Then, the weights obtained by the entropy weight method are fused with the matching data to obtain the final SM user matching data. Finally, the user recommendation list is generated by ranking it from largest to smallest, as shown in equation (6).

m a t c h (u, v) = \sum_{i = 1}^{4} (m a t c h_{i} * w_{i}) + (X_{u} + X_{v}) * w_{5} + (Y_{u} + Y_{v}) * w_{6}

(6)

In equation (6), $m a t c h (u, v)$ represents the genetic map matching data of user u and user v, and $m a t c h_{i}$ is the matching data of the $i$ -th core gene. $X_{u}$ and $X_{v}$ represent the activity levels of user u and user v. $Y_{u}$ and $Y_{v}$ represent the degree of influence of user u and user v. $w_{i}$ serves as the weight value of each gene. The user recommendation model in view of SM GM is shown in Figure 3.

Figure 3.

A user recommendation model in view of SM gene graph.

Figure 3 shows that the user recommendation model first collects basic information and behavioral data of users through API calls and web crawler technology. Then, the initial data is organized through text preprocessing and Chinese word segmentation to extract effective information. It then calculates the matching degree of the six core genes and calculates the overall matching degree of the SM GM. Finally, it arranges the matching data in descending order and recommends the top N users to the target users.

3.2 CF recommendation model in view of SM GM

Among the current recommendation models, CF recommendation model has unique advantages in privacy protection. CF algorithms rely primarily on user preferences or behavioral data, rather than personally identifiable information. For example, in a movie recommendation system, the algorithm focuses on a user's rating of a movie or viewing history, rather than the user's personally identifiable information such as name or address. This reliance on anonymous user preferences helps protect the privacy of users’ personal identities. CF, on the other hand, generates recommendations by aggregating data from a large number of users, rather than relying on detailed data from individual users. This clustering method means that the data of a single user is integrated into a large data set, thus reducing the risk of personal data being identified. In addition, CF is based on similarities between groups of users, rather than specific information about individual users. This means that recommendations are based on pattern recognition of user behavior, rather than in-depth analysis of individual user data. Therefore, this study improves on the SOA and obtains an improved weighted SOA. On this basis, the SM GM is introduced to obtain a CF recommendation model in view of the SM GM. This study proposes a CF recommendation model based on SM GM, which is applied to both person and item recommendations. By deeply analyzing SM data, the model not only extracts behavioral and content features that are directly related to users’ interests, but also takes into account the structure of users’ social networks. This integrated analytical approach enables the system to consider the user's social interactions when making recommendations regarding potential social connections. It also leverages the user's content preferences and historical behavioral data in item recommendations, thereby creating a unified and more efficient recommender system.

In traditional CF recommendation, user-based CF determines the set of adjacent users, and item-based CF determines the set of adjacent items.²⁴ This study uses SM GMs to determine the set of nearby users, and then uses an improved weighted SOA and item similarity to determine the set of nearby items. The SOA is similar to the project-based CF algorithm, and the calculation process is simple and efficient. The first step of the SOA is for calculating the average deviation, as shown in equation (7).²⁵

d e v (i, j) = \frac{\sum_{u \in U (i) \cap u (j)} (S_{u i} - S_{u j})}{| U (i) \cap u (j) |}

(7)

In equation (7), $S_{u i}$ denotes the rating of user u for item i. The rating data are collected from the user interaction records, including direct ratings of items on the platform. The rating data are derived from user interaction records, including direct user ratings of items on the platform. To ensure data completeness and accuracy, data cleansing and preprocessing steps are employed. These steps are employed to exclude invalid or anomalous rating data. Similarly, $S_{u j}$ represents the rating of user u for item j. $| U (i) \cap u (j) |$ represents the set of users who have rated both Project i and Project j. $d e v (i, j)$ serves as the average deviation between two items. The second step of the SOA is to generate recommendations for users, as shown in equation (8).

p r e (u, j) = \frac{\sum_{j \in S (u \to i)} (S_{u i} - d e v (i, j))}{S (u) - {i}}

(8)

In equation (8), $S (u)$ serves as the set of items rated by user u. $S (u \to i)$ serves as the set of projects where project i and at least one other project are rated by user u. $p r e (u, j)$ represents the predicted rating of user u on project j.

Relative to other algorithms, the SOA has an intuitive and simple principle, does not require complex mathematical knowledge, and can easily process new scoring data.²⁶ When new scoring information appears, there is no need to recalculate the entire model, just update the relevant statistical data.²⁷ However, due to the relatively simple nature of the SOA, it may not be able to fully capture the complex relationships between users and projects, and it does not take into account the user audience, which greatly affects the average deviation. Therefore, this study enhances the SOA by introducing a weighted value, as illustrated in equation (9).

p r e (u, j) = \frac{\sum_{j \in S (u \to i)} (S_{u i} - d e v (i, j)) * | S_{i j} |}{\sum_{j \in S (u \to i)} | S_{i j} |}

(9)

In equation (9), $| S_{i j} |$ serves as the set of users who rate Project i and Project j. $p r e (u, j)$ serves as the predicted rating of user u on project j. However, as the number of projects increases, the weighted SOA still cannot ensure that all users will evaluate each project. For enhancing the recommendation, this study combines user scoring and semantic similarity analysis for calculating the similarity in items. When calculating user rating similarity, this study uses Pearson correlation to calculate, as shown in equation (10).²⁸

R a t i n g S i m_{i, j} = \frac{\sum_{u \in U_{i, j}} (R_{u i} - {\bar{R}}_{i}) * (R_{u j} - {\bar{R}}_{j})}{\sqrt{{\sum_{u \in U_{i, j}} (R_{u i} - {\bar{R}}_{i})}^{2} * \sum_{u \in U_{i, j}} (R_{u j} - {\bar{R}}_{j})}}

(10)

In equation (10), ${\bar{R}}_{i}$ and ${\bar{R}}_{j}$ represent the average scores of user u on project i and project j. When $R a t i n g S i m_{i, j} > 0$ , the larger the value, the greater the similarity in the two items. When $R a t i n g S i m_{i, j} < 0$ , a smaller value indicates a smaller similarity of annual projects. When calculating semantic similarity, the research is similar to the Jaccard correlation, where the semantic similarity of two different items depends on the ratio of their shared semantic descriptors to their total descriptors, as shown in equation (11).²⁹

S e m S i m (i, j) = \frac{| N (i) \cap N (j) |}{| N (i) \cup N (j) |}

(11)

In equation (11), $N (i)$ represents the semantic description set of item i. After determining the similarity of project scores and semantic similarity, the study uses a linear weighting method to calculate project similarity, as illustrated in equation (12).

S i m (i, j) = γ * R a t i n g S i m (i, j) + (1 - γ) S e m S i m (i, j)

(12)

In equation (12), $γ$ is the weight value, which is determined through an optimization process. The determination process takes into account the similarity between items and the history of users’ ratings of these items, which ultimately results in a weight that reflects the effect of similarity between different items on the predicted ratings. Combinations of items with lower variance will be given higher weights, this is because low variance indicates that users are more consistent in their ratings of these items, thus increasing the reliability of the predictions.

In the final constructed model, semantic similarity is integrated by analyzing the text content in the product description. First, the descriptions of each project are disambiguated and vectorized using natural language processing techniques. This process involves extracting keywords and converting these keywords into vector form, which is usually done using the word embedding method Word2Vec. Next, the cosine similarity value between each pair of project descriptions and their respective vectors is calculated and used as the semantic similarity. Finally, the computed semantic similarity is linearly combined with the traditional user-rating-based similarity to form a comprehensive similarity measure. By adopting this method, the rating prediction model considers both the user's direct rating of the product and the semantic information of the product content, which can more accurately reflect the user's preferences and improve the performance of the recommendation system. Through this approach, the similarity between two projects can be more reasonably expressed, and the obtained project similarity has higher credibility. The relevant expression is shown in Figure 4.

Figure 4.

Similarity calculation process in view of project association.

This paper integrates the SM GM into the improved SOA to build a CF recommendation service model in view of the SM GM. When calculating project scoring bias, this study introduces SM user matching to calculate, as shown in equation (13).

d e v (i, j) = \frac{\sum_{u \in U (i) \cap U (j)} (S_{u i} - S_{u j}) * m a t c h (u, v)}{| U (i) \cap U (j) |}

(13)

In equation (13), $m a t c h (u, v)$ represents the matching degree of the GM between user u and user v. $S_{u i}$ and $S_{u j}$ serve as user $u$ 's ratings for project i and project j, respectively. $U (i)$ and $U (j)$ serve as the set of users who rate Project i and Project j. $| U (i) \cap U (j) |$ serves as the set of users who jointly rate Project i and Project j. In view of equation (13), the study introduces project similarity to predict user ratings, as shown in equation (14).

p r e (u, j) = \frac{\sum_{j \in S (u \to i)} (S_{u i} - d e v (i, j)) * s i m (i, j)}{S (u) - {i}}

(14)

In equation (14), $p r e (u, j)$ represents the predicted score of user u on project j. $S (u)$ represents the combination of user u 's rating items. $S (u \to i)$ represents the project set where project i and at least one other project are rated by user u. $s i m (i, j)$ serves as the similarity in project i and project j.

This study analyzes physical behavior data in SM, generates user and project datasets, and organizes the calculation results to form user matching data. In view of the most matched N users, they provide recommendation services to the original users.³⁰ Next, it forms a set of close neighbors of the target user and obtains the project dataset of the users in the Top N list. It calculates the semantic and scoring similarity between these project datasets and the original project dataset, and obtains the project similarity data through weighted calculation. It predicts users’ ratings of projects through user matching data and project similarity data. Finally, it recommends the highest rated K projects to users and completes the project recommendation service. The CF recommendation service model in view of SM GM built by the research is shown in Figure 5.

Figure 5.

Cf recommendation service model in view of SM GM.

4. Performance verification of improved CF recommendation model

The data required for the experiment comes from Sina Weibo. Through API calls and web crawling technology, relevant information is collected, including personal information, follower lists, fan lists, and tweets posted or forwarded. To ensure the accuracy of the experiment, users with the number of fans and followers lower than 15 and users with no recent interaction behavior will be excluded. The final dataset consists of 4650 users, 89,214 tweets and 53,246 interactions, and is divided into a training set and a test set according to the ratio of 8:2. To objectively demonstrate the performance of recommendation systems, this study is conducted on the same device. The computer device information used is shown in Table 3.

Table 3.
Computer equipment information.

Name Configuration

Video card GTX 1080ti

CPU Inter Xeon E5

Gpu-accelerated library CUDA 10.0

Memory 64 GB

Operating system Windows 10

Name	Configuration
Video card	GTX 1080ti
CPU	Inter Xeon E5
Gpu-accelerated library	CUDA 10.0
Memory	64 GB
Operating system	Windows 10

Recommended algorithms for performance comparison of algorithms, including Alternative Least Squares CF (ALS-CF), Generation Antagonism Neural Network CF (GAN-CF), Singular value decomposition (SVD), and the proposed SM GM CF (SMG-CF). The accuracy of each algorithm's training iteration in the dataset is shown in Figure 6.

Figure 6.

Comparison of iterative accuracy curves.

Figure 6 shows the variation curve of the training accuracy of each algorithm as the iteration's quantity grows. Figure 6 shows that as the iteration's quantity grows, the accuracy of each algorithm significantly increases. The SMG-CF algorithm is superior to others in convergence speed and maximum accuracy. Among them, the SMG-CF algorithm approaches convergence at the 130th iteration, with a convergence accuracy of 93.15%. The ALS-CF algorithm approaches convergence at the 270th iteration, with a convergence accuracy of 81.23%. The GAN-CF algorithm approaches convergence at the 290th iteration, with a convergence accuracy of 82.04%. The SVD algorithm approaches convergence at the 390th iteration, with a convergence accuracy of 80.18%. The statistical data show that there is little disparity in the convergence accuracy of the three algorithms: ALS-CF, GAN-CF and SVD, but the rate of convergence of ALS-CF and GAN-CF is better than that of SVD. Compared with the above three algorithms, the rate of convergence of the SMG-CF algorithm has grown by 51.85%, 55.17% and 66.67% respectively. The convergence accuracy of the SMG-CF algorithm has been enhanced by 14.67%, 13.54%, and 16.18%, respectively. The experiment validates the effectiveness of this study.

The collected 89214 tweets are divided into three categories: film and television resources, music resources, and entertainment news resources. After data cleaning and a series of preprocessing operations, the final amount of data related to film and television resources, music resources, and entertainment news resources is 16593, 14365, and 18215, respectively. The film and television resources dataset contains comprehensive information about movies and TV dramas, covering title, director, starring actor, year of production, movie and TV genre, duration, rating and user comments. In addition, the dataset also includes the viewing history and preferences of the audience, as well as the number of times each production has been viewed and the percentage of users who have liked it, which helps to analyze the viewing habits and preference patterns of users. The music resources dataset, on the other hand, includes various genres of music productions, with data for each song including the song title, artist, release date, music style, duration, ratings, and user comments. Similarly, the dataset records users’ listening history and ratings, making it possible to analyze which factors influence users’ music choices and preferences. The Entertainment News dataset contains a variety of types of entertainment information, such as movie release information, celebrity updates, music festival events, etc. Each news item includes news title, release time, news source, news category, content summary and full text. In addition, the dataset also collects users’ clicks, shares, and comments on the news, which helps to assess the hotness of the news and users’ attention. The three datasets are utilized to construct a recommender system. By analyzing the user's interactive behavior with the contents, the recommender system is capable of learning the user's preferences and predicting new contents that may be of interest.

In the experiments, all baseline algorithms are trained and tested on the same dataset as the proposed method to ensure fairness. Algorithms such as SVD do not directly support heterogeneous information processing. Through feature engineering, SM data is transformed into structured feature vectors, such as user activity levels and social connections, to help algorithms better understand and utilize complex information on SM. To compare the recommendation performance of different models, the study uses prediction accuracy as an evaluation metric for performance testing. The prediction accuracy is measured by calculating the difference between the user ratings predicted by the system and the actual user ratings. After several experiments, the prediction accuracies of the four models under three types of datasets of movie and television resources, music resources, and entertainment news resources are obtained as shown in Figure 7.

Figure 7.

Prediction accuracy of different models.

Figure 7(a) to (c) show the prediction accuracies of the four models SMG-CF, ALS-CF, CAN-CF, and SVD for movie and TV resources, music resources, and entertainment news resources, respectively. In Figure 7(a), the prediction accuracy of the SMG-CF model is significantly higher than the other three recommendation models. The prediction accuracy of SMG-CF is 92.11%, which is 23.44%, 36.72%, and 40.89% higher compared to ASL-CF, SVD, and CAN-CF, respectively. In Figure 7(b), the prediction accuracy of SMG-CF in music resources is 94.67%, which is improved by 3.78%, 3.42% and 3.09% compared to ASL-CF, SVD and CAN-CF, respectively. In Figure 7(c), the prediction accuracy of SMG-CF in entertainment news sources is 63.12%, which is 47.32%, 17.89% and 56.81% higher compared to ASL-CF, SVD and CAN-CF, respectively. In summary, SMG-CF has the highest prediction accuracy in all three datasets, which shows that the model has the best recommendation effect.

To further validate the performance of recommendation models, this study calculates the recall rates of different recommendation models in view of the obtained accuracy, and plotted the Precision Recall (PR) curve, as shown in Figure 8.

Figure 8.

Comparison of classifiers PR curves.

The PR curve is used to validate the performance of the model. In the PR curve, the larger the area under a model curve, the better the performance of the model. Figure 8(a) shows the PR curves of various algorithms in film and television resource recommendation, with the performance of each recommendation model ranging from best to worst being SMG-CF, ALS-CF, GAN-CF, and SVD. Figure 8(b) shows the PR curves of various algorithms in music resource recommendation. The SMG-CF recommendation model has the best performance, while the ALS-CF and GAN-CF recommendation models have similar performance, and the SVD recommendation model has the relatively worst performance. Figure 8(c) shows the PR curves of various algorithms in entertainment news recommendation, with the performance of each recommendation model being SMG-CF, ALS-CF, GAN-CF, and SVD from best to worst. Overall, in the recommendation tests of the three resources, the SMG-CF recommendation model shows strong performance, followed by the ALS-CF recommendation model, the GAN-CF recommendation model, and finally the SVD recommendation model.

In this study, user satisfaction is assessed through an online questionnaire and real-time feedback mechanism. The study is conducted with 50 volunteers as experimental subjects, who are surveyed about their satisfaction with the recommendation service. The questionnaire includes several questions about the relevance of the recommended content and the user experience, and is rated on a 5-point Likert scale. Meanwhile, a real-time feedback mechanism within the system allows users to directly rate whether they liked or disliked the recommended content to collect instant satisfaction data. The satisfaction scores for different recommendation models are shown in Figure 9.

Figure 9.

User satisfaction evaluation.

Figure 9 shows the area of the user satisfaction survey, with larger areas indicating higher user satisfaction. Figure 9 shows that the SMG-CF recommendation service receives the highest satisfaction, with an average satisfaction rate of 91.89%, followed by GAN-CF with an average of 89.02%, and ALS-CF with an average of 87.88%. SVD's recommendation service satisfaction is relatively the lowest, with an average of 86.34%. This indicates that the SMG-CF recommendation service has high recommendation accuracy and achieves high user satisfaction, verifying the effectiveness of the study.

In previous optimization of recommendation models, the integration of different algorithms resulted in an increase in CPU execution time (ET) and a decrease in efficiency. Therefore, this study also conducts CPU ET comparison experiments with different recommendation models, and the relevant outcomes are shown in Figure 10.

Figure 10.

Comparison of CPU ET.

Figure 10 shows that as the resources’ quantity grows, the CPU ET of each recommendation model shows an accelerating trend. Among them, the CPU ET of the SMG-CF recommendation model is 1.88 s when the quantity of resources reaches 100. The CPU ET of the SVD recommendation model is 2.84 s when the quantity of resources reaches 100. The CPU ET of the ALS-CF recommendation model is 3.91 s when the quantity of resources reaches 100. The CPU ET of the GAN-CF recommended model is 5.81 s when the number of resources is 100. Compared to SVD, ALS-CF, and GAN-CF, the CPU ET of the SMG-CF recommendation model decreases by 33.80%, 51.92%, and 67.64%, respectively. It verifies that the SMG-CF recommendation model also performs better in CPU ET.

Figure 11 shows the accuracies of the four algorithms in the test set and the training set respectively when the number of recommenders finally presented to the user is set to 10, 20, and 30. In Figure 11(a), SMG-CF has the highest accuracy in different number of recommenders compared to other algorithms, with an average recommendation accuracy of 88%. In Figure 11(b), when the number of recommenders is 10, 20, and 30, the recommendation accuracy of SMG-CF in the test set is 81%, 88%, and 89%, respectively, which is significantly better than the other algorithms.

Figure 11.

Accuracy of each model under different recommended numbers.

The performance comparison of SMG-CF with other baseline algorithms is given in Table 4. Table 4 shows that SMG-CF has the highest average recommendation accuracy and the shortest average recommendation time in completing the recommendation task, which are 98.3% and 0.6 s. Literature,³¹ Literature,³² Literature,³³ and graph neural networks also have an average recommendation accuracy of more than 90%, but these recommendation algorithms take longer, especially for graph neural networks. This is because graph neural networks tend to require a lot of computational resources in practical applications and are very sensitive to parameter tuning, so their time consumption will be far more than other baseline algorithms.

Table 4.

Performance comparison of other baseline algorithms with SMG-CF.

Algorithm Name	Average recommendation accuracy/%	Average recommendation time/s
SMG-CF	98.3%	0.6s
Literature³¹	91.2%	1.8s
Literature³⁴	89.5%	2.2s
Literature³²	95.0%	3.8s
Literature³⁵	88.6%	3.1s
Literature³⁶	88.2%	2.4s
Literature ³³	92.3%	2.6s
Graph Neural Networks	90.4%	3.5s

5. Conclusion

The personalized recommendation service of SM is mainly aimed at providing users with a better user experience. With the significant increase in SM resources and user numbers, the accuracy of existing recommendation methods has begun to decline. This paper aimed to solve the challenges in SM recommendation services. By constructing an SM GM and combining CF algorithm, a new recommendation service method was proposed. The results demonstrated that the SMG-CF recommendation model exhibited strong performance in recommendation tests for film and television resources, music resources, and entertainment news, followed by the recommendation model of ALS-CF, GAN-CF, and SVD. In addition, the SMG-CF recommendation service received the highest satisfaction, with an average satisfaction rate of 91.89%, followed by GAN-CF (89.02%) and ALS-CF (87.88%). SVD's recommendation service satisfaction was relatively the lowest, with an average of 86.34%. The experiment indicated that utilizing the rich information resources of SM, combined with appropriate algorithms and models, could better meet users’ personalized recommendation needs. This study provides a new perspective and method to address the issue of SM recommendation services, providing valuable references for further research and practical applications. However, while the SMG-CF model performs well in current experiments, its ability to handle large-scale data sets has not been fully validated. The massive growth of SM data can take a toll on the computational efficiency of models and the quality of recommendations. With the rapid growth of social media data, storing and efficiently retrieving all the data becomes the primary challenge for practical applications. Traditional databases are difficult to meet the real-time processing requirements of large-scale data, and the demand for large-scale data processing on computational resources such as CPU, memory and GPU will increase significantly; and the construction and updating of SMG-CF involves complex matrix operations and iterative optimisation, which will also increase dramatically with the increase of data size. Therefore, future research should focus on improving the scalability of the model and its ability to handle large datasets, and comprehensively consider optimisation algorithms to reduce computational overhead. In addition, there are still many challenges for the scalability of models in practical applications. In order to achieve large-scale dynamic data processing, the model should be able to scale linearly with the increase of data volume and improve the processing capability by adding resources on a single node. Meanwhile, in the dynamically changing social media environment, the model should be able to adaptively learn new user behaviours and preferences while maintaining the memory of old data. Therefore, future research work should consider introducing online learning or incremental learning mechanisms to update the model parameters in real time and increase the scalability of the model. On the other hand, the dynamic and complex nature of SM data is also a challenge, with users’ behaviors and preferences changing over time, and existing models may not be able to reflect these changes in a timely manner. Future research could explore real-time data processing and dynamic learning mechanisms to ensure that recommendation systems can adapt to these rapidly changing environments. In addition, despite the use of anonymization in this study, how to effectively use personal data without violating user privacy is still a question that needs further discussion. Future research should focus on how to optimize data utilization efficiency while protecting user privacy. Therefore, although this study has achieved certain results in the field of SM recommendation services, it still needs further exploration and improvement in the aspects of algorithm scalability, real-time performance and user privacy protection. Future research can further explore these areas to more fully address the challenges of SM recommendation services.

Footnotes

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Shaw

Mayo

. Music education and distance learning during COVID-19: a survey. Arts Educ Policy Rev 2022; 123: 143–152.

Aichner

Grünfelder

Maurer

, et al. Twenty-five years of social media: a review of social media applications and definitions from 1994 to 2019. Cyberpsychol Behav Social Networking 2021; 24: 215–222.

Meier

Reinecke

. Computer-mediated communication, social media, and mental health: a conceptual and empirical meta-review. Communic Res 2021; 48: 1182–1209.

Liu

Zeng

. Knowledge graph-based multi-context-aware recommendation algorithm. Inf Sci (Ny) 2022; 595: 179–194.

Bath

Daubney

Mackrill

, et al. The declining place of music education in schools in England. Child Soc 2020; 34: 443–457.

Yang

Gao

. Personalized content recommendation in online health communities. Ind Manag Data Syst 2022; 122: 345–364.

Sun

Zhu

Jiang

, et al. Hierarchical attention model for personalized tag recommendation. J Assoc Inf Sci Technol 2021; 72: 173–189.

Peng

Wan

Wang

, et al. Deep learning-based recommendation method for top-K tasks in software crowdsourcing systems. J Ind Manage Optim 2023; 19: 6478–6499.

Zhang

Liu

, et al. Location identification and personalized recommendation of tourist attractions based on image processing. Trait Signal 2021; 38: 197–205.

10.

Yang

Zeng

. An interpretable mechanism for personalized recommendation based on cross feature. J Intell Fuzzy Syst 2021; 40: 9787–9798.

11.

Liu

Wang

, et al. A multi-attribute personalized recommendation method for manufacturing service composition with combining collaborative filtering and genetic algorithm. J Manuf Syst 2021; 58: 348–364.

12.

Tripathy

Champati

Patnaik

. SVD-initialised K-means clustering for collaborative filtering recommender systems. Int J Mange Dec Mak 2022; 21: 71–91.

13.

Almu

Bello

. An experimental study on the accuracy and efficiency of some similarity measures for collaborative filtering recommender systems. Int J Comput Eng Res Trends 2021; 8: 33–39.

14.

Badis

Amad

Assani

, et al. P2PCF: a collaborative filtering based recommender system for peer to peer social networks. J High Speed Netw 2021; 27: 13–31.

15.

Salloum

Rajamanthri

. Implementation and evaluation of movie recommender systems using collaborative filtering. J Adv Inf Technol 2021; 12: 189–196.

16.

Thompson

Johnson

. Enhancing social Media analytics with behavioral genomics for advanced user profiling. J Comput Soc Sci 2022; 5: 204–219.

17.

Choudhuri

Adeniye

Sen

. Distribution alignment using complement entropy objective and adaptive consensus-based label refinement for partial domain adaptation. Artif Intell Appl 2023; 1: 43–51.

18.

DTK

EHL

Chu

SKW

. Engaging students in creative music making with musical instrument application in an online flipped classroom. Educ Inf Technol 2022; 27: 45–64.

19.

Moldovan

. Digital technologies and their impact in music education/aportul tehnologiilor digitale în educația muzicală. Tehnologii Informatice şi de Comunicaţii în Domeniul Muzical 2021; 12: 13–19.

20.

Ouyang

Zheng

Jiao

. Artificial intelligence in online higher education: a systematic review of empirical research from 2011 to 2020. Educ Inf Technol 2022; 27: 7893–7925.

21.

Joe

MCV

Raj

DJS

. Location-based orientation context dependent recommender system for users. J Trends Comput Sci Smart Technol 2021; 3: 14–23.

22.

Gao

Liu

. An introduction to key technology in artificial intelligence and big data driven e-learning and e-education. Mobile Netw Appl 2021; 26: 2123–2126.

23.

Javed

Shaukat

Hameed

, et al. A review of content-based and context-based recommendation systems. Int J Emerg Technol Learn (iJET) 2021; 16: 274–306.

24.

. Business model innovation and experimentation in transforming economies: byteDance and TikTok. Manage Organ Rev 2021; 17: 382–388.

25.

Larson

Harvey

Rubin

, et al. Regulatory frameworks for development and evaluation of artificial intelligence–based diagnostic imaging algorithms: summary and recommendations. J Am Coll Radiol 2021; 18: 413–424.

26.

Zhang

Cai

. A potential friend recommendation algorithm for obtaining spatial information. J Softw 2021; 16: 46–54.

27.

Nitu

Coelho

Madiraju

. Improvising personalized travel recommendation system with recency effects. Big Data Min Anal 2021; 4: 139–154.

28.

Prathiba

Raja

Anbalagan

, et al. Federated learning empowered computation offloading and resource management in 6G-V2X. IEEE Trans Netw Sci Eng 2021; 9: 3234–3243.

29.

Wei

Zhou

. A hybrid probabilistic multiobjective evolutionary algorithm for commercial recommendation systems. IEEE Trans Comput Soc Syst 2021; 8: 589–598.

30.

Zhan

Liu

, et al. A privacy-preserving cross-domain healthcare wearables recommendation algorithm based on domain-dependent and domain-independent feature fusion. IEEE J Biomed Health Inform 2021; 26: 1928–1936.

31.

Dong

Chawla

Swami

. metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp.135-144.

32.

Zhao

Jin

Liu

, et al. Heterogeneous information network embedding for user behavior analysis on social media. Neural Comput Appl 2022; 34: 5683–5699.

33.

Zheleva

Guiver

Mendes Rodrigues

, et al. Statistical models of music-listening sessions in social media. In: Proceedings of the 19th international conference on the World wide web, 2010, pp.1019-1028.

34.

, et al. Graph neural network for customer engagement prediction on social media platforms, 2021.

35.

Burke

. Hybrid recommender systems: survey and experiments. User Model User Adapt Interact 2002; 12: 331–370.

36.

Guy

. Social recommender systems. In: Recommender systems handbook, 2023, pp.835-870. New York, NY: Springer US.