A hybrid approach for article recommendation in research social networks

Abstract

With the prevalence of research social networks, determining effective methods for recommending scientific articles to online scholars has become a challenging and complex task. Current studies on article recommendation works are focused on digital libraries and reference sharing websites while studies on research social networking websites have seldom been conducted. Existing content-based approaches or collaborative filtering approaches suffer from the problem of data sparsity. The quality information of articles has been largely ignored in previous studies, thus raising the need for a unified recommendation framework. We propose a hybrid approach to combine relevance, connectivity and quality to recommend scientific articles. The effectiveness of the proposed framework and methods is verified using a user study on a real research social network website. The results demonstrate that our proposed methods outperform baseline methods.

Keywords

Article recommendation connectivity analysis quality analysis relevance analysis research social networks

1. Introduction

The rapid development of information technology, especially Web 2.0, has led to a tremendous growth in online content. The presence of massive amounts of information has posed significant challenges to the discovery of new information, particularly in academia. The number of scientific articles has experienced explosive growth. Over 66 million digital object identifiers (DOIs) have been registered, with each DOI linked to one distinct scientific item; these DOIs include more than 65 million journal articles.¹ Researchers complain of information overload and are frustrated by the search for relevant scientific information. Such an environment urgently requires an efficient and effective information acquisition technology that can help researchers to quickly find relevant articles of interest.

The methods that researchers use for information acquisition have dramatically transformed with the development of information technologies. Researchers in the print age traditionally searched for relevant articles by using library catalogues. This method was ineffective because of the limited number of journal articles and books available in libraries. Online access services, such as Google Scholar² and Web of Science database,³ provide a powerful search tool for finding relevant articles. Digital libraries make academic resources available online and provide a variety of convenient functions for effective information retrieval. For instance, keyword-based search is a convenient and efficient approach to retrieve information. However, forming queries for finding new research articles can be difficult when researchers are uncertain about what they are looking for.

With the spread of social sharing websites (such as CiteULike⁴) and social networking services (such as Researchgate⁵ and Scholarmate⁶), researchers can now easily share references of interest and promote their own publications. Social networking applications facilitate information search and provide online users with freedom while creating new challenges related to information overload. In this article, we attempt to provide an automatic process of recommending the most relevant articles to scholars in research social networks.

Article recommendation is a hot research topic that has been intensively studied in different contexts [1]. Earlier studies proposed content-based (CB) methods to leverage textual information related to the target user’s interest in recommendation and employed keyword weighting techniques to retrieve relevant articles in digital libraries [2 –5]. However, keywords generated through a researcher’s query often have semantic ambiguities. Such semantic relationships may also exist in the documents. Therefore, traditional CB approaches suffer from a mismatch problem, that is, relevant documents that do not exactly match the researcher’s query are discarded. In several studies, collaborative filtering (CF) approaches were employed to leverage the preferences of like-minded users in recommending interesting articles in social computing contexts [6 –9]. However, these approaches cannot achieve the expected performance as in taste-related domains (movies, videos and online purchases) because of the data sparsity of the preference matrix, in which the number of scholars (users) is too small and the number of articles (items) is large. Hence, current studies have focused on hybrid recommendation approaches to leverage the advantages of CB and CF approaches and to alleviate their disadvantages [10 –14]. Current hybrid methods combine relevance and connectivity features to build recommendation models while ignoring information quality.

In this article, a hybrid approach is proposed to tackle the key challenges highlighted above. The proposed approach leverages relevance, connectivity and quality features in research-oriented social networks to profile online users. Three analysis modules are employed to model the recommendation process and to ensure that a satisfactory recommendation list is provided. An experiment is conducted in a Chinese research social network website to verify the effectiveness of proposed method. The results show that the proposed approach outperforms the baseline methods in terms of recommendation accuracy metrics.

The rest of the article is organised as follows. In section 2, we survey the related work on article recommendation. The details of the social-network-empowered article recommendation method are introduced in section 3, while section 4 presents the design and methodology used in the experiments. The results are analysed and discussed in section 5. We conclude the work and present the future work in section 6.

2. Related work

Personalisation techniques enable the tailoring of content and services to individuals based on their preferences and tastes. As a primary personalisation tool, a recommender system matches potentially interesting content with user expectations [15]. Recommender systems have gained increased popularity, resulting in huge profits for the industry. They are designed to recommend microblogs [16], news [17], stories [18], movies [19] and so forth. These systems show great prospect and potential to serve scientific communities. In academic contexts, such systems are often used to recommend relevant scientific information to researchers and to consequently reduce search efforts. Several researchers have focused on article recommendation from various perspectives. The related work is reviewed in terms of application domains and recommendation techniques.

2.1. Application domains

Application domains on article recommendation have three main categories: digital libraries, reference sharing websites and academic social networking websites.

2.1.1. Digital libraries (D1)

With the prevalence of the Internet, digital libraries have been frequently used by diverse communities of users and have thus become a source of scientific information for scholars [20]. Digital libraries can be informally defined as collections of information with associated services that are delivered to user communities using a variety of technologies [21]. Digital libraries have evolved rapidly over the past several decades but are still limited to providing only basic search functions. As the volume of information managed by digital libraries increases, the needs of users have also become increasingly complex. Furthermore, users have become frustrated with the limitations of basic facilities. Currently, several digital libraries have begun to offer personalisation functionalities. These functionalities include personalised alert services that notify users with a list of new and relevant documents. Recommender systems are also incorporated into these libraries to meet information needs. Bollacker et al. [22] developed several recommendation strategies to aid researchers in quickly discovering relevant scientific literature.

2.1.2. Reference sharing websites (D2)

Resource sharing systems have become increasingly widespread with the development of Web 2.0 technologies. These systems allow users to upload various types of resources and select objects of interest with personalised tags. Mainstream reference sharing websites include CiteULike, Bibsonomy,⁷ Zotero⁸ and Mendeley. A myriad of studies have proposed different kinds of paper recommendation methods based on these websites. CiteULike is a social tagging website that offers free service to aid researchers in storing, organising and sharing scholarly papers they are reading. Articles are often stored with their metadata (e.g. title, author and year) and include links to the pages of publishers. Bibsonomy is another social bookmarking and publication sharing system that is focused on bookmarking references and team-oriented publication management. The Bibsonomy system has gained much attention in the academia because it has provided its open application programming interface from the beginning [23]. Zotero and Mendeley focused on the reference management function that assists researchers in organising bibliographic documents.

2.1.3. Academic social network websites (D3)

Reference sharing websites help readers share and find relevant papers, whereas academic social networking websites (e.g. Researchgate, Academia.edu⁹ and Scholarmate) focus on the producers of these papers. Researchgate is a typical social networking site for researchers and is often described as a mixture of ‘Facebook, Twitter, and LinkedIn’ because of similar social features.¹⁰ Madisch [24] noted that Researchgate as a social network is the first step towards Science 2.0. This social network has also been distinguished for its academic discussion functions. Academia.edu is an academic social networking site founded by an Oxford University philosopher. The primary rationale of the site is to connect readers to authors to facilitate the raising of queries on recently read articles [25]. Previous studies investigated the two academic social networking websites in terms of their facilities and the implications of their use. However, research that explores recommender systems in these websites is scarce.

2.2. Recommendation techniques

Three types of recommendation techniques mentioned have been employed in the recommendation of scientific articles to researchers. The choice of recommendation techniques often depend on the type of feature used. CB techniques commonly use relevance features to search for relevant articles, whereas CF techniques focus on connectivity features. Hybrid techniques often combine two or more features to build recommendation models.

2.2.1. CB methods

CB filtering methods have been proposed to discover interesting articles. Some studies [5,26 –29] have suggested that CB approaches are effective in locating textual documents that are relevant to a topic. These methods often require the extraction of personal preferences to build user profiles from document content. User profiles can be constructed through either explicit declaration by users or observation of users’ actions. A successful digital library system, CiteSeer [22], was designed to perform information-filtering and knowledge-discovery functions for online users. CiteSeer uses the personal profiles of researchers to track and recommend relevant research articles. In the study, the different information sources on papers and reviewers were combined for recommendation with the help of a word-based information representation language system. The CB approach used in the work of Soo Kim [30] calculated keyword scores based on locations and frequencies within the text. The semantic expansion method [26] and concept-based method [5,31] have been proposed to address the problem of inadequate information as well as to enhance user profiling and achieve highly relevant recommendations. Most of these methods face the problem of generalisability because they use existing dictionaries and taxonomies, such as WordNet and ACM Taxonomy. In this study, we construct a keyword similarity matrix for semantic expansion to address the mismatch problem.

2.2.2. CF methods

CF methods have attracted increasing attention in recent years because of the prevalence of social bookmarking and social networking websites. Bogers and Van den Bosch [32] applied the traditional CF approach in recommending scientific publications in the CiteULike database and found that a user-based approach is better than an item-based approach because of data distribution factors. Parra and Brusilovsky [33] proposed and evaluated three variants of user-based CF article recommendation algorithms. In their study, BM25-boosted CF achieved better performance compared with the other two CF methods (classic CF and neighbour-weighted CF) because of tag contribution. To overcome the data sparsity problem, Vellino [34] proposed usage-based and citation-based methods for recommending research articles. In addition to collective relations involved in user–item pairs, other social relations have also been analysed and incorporated into CF methods to improve performance. The literature shows that social relations can be used to improve recommendation quality. Guan et al. [35] mined tagging data (user–tag–item assignments) and proposed a graph-based learning algorithm for document recommendation.

2.2.3. Hybrid methods

CF and CB approaches have unique advantages and disadvantages. Several researchers have attempted to combine both techniques and generate hybrid ones to improve performance [11,12,14,36]. The main assumption for hybrid methods is that fusing the algorithms could provide more accurate recommendations than a single algorithm could and that the disadvantages of each algorithm could be overcome by other algorithms. Bogers and van den Bosch [37] considered tagging information and content metadata information, and proposed different fusion strategies for different algorithms. They found that fusing methods significantly improves recommendation accuracy. To incorporate relevance and connectivity features into a unified model, Wang and Blei [8] proposed a hybrid method that combines the merits of traditional CF and probabilistic topic modelling and is capable of providing recommendations on both existing and newly published articles. By contrast, Li et al. [12] proposed a topic regression MF (tr-MF) model based on the assumption that users share similar preferences when they bookmark similar articles. In Lee et al. [14], a hybrid method, combining the CB and the graph-based approaches, was proposed suitable for paper recommendation in DBpia.

Table 1 shows the selected major works on article recommendation, from which general conclusions can be drawn. For application domains, traditional article recommendation studies have focused on digital libraries while recent works have paid attention to reference sharing websites (or social tagging sites). However, few studies have investigated article recommendation in research-oriented social networking websites. Such studies are low in number because of the absence of a public dataset for article recommendation in research-oriented social networks and the high cost of conducting user studies on research-oriented social networks. For recommendation techniques, three types of recommendation approaches in E-commerce contexts have been borrowed for use in academic contexts. CB approaches often provide recommendations based on relevance features, and direct keyword weighting methods have been widely employed in previous studies. CF approaches usually build recommendation models based on connectivity features, especially behaviour connections. Hybrid approaches combine two types of features and often outperform basic CB and CF approaches.

Table 1.

Taxonomy of article recommendation studies.

Previous studies	Domains			Techniques			Dataset
Previous studies	D1	D2	D3	CB	CF	Hybrid	Dataset
Bollacker et al. [22]	√			TFIDF and CCIDF			CiteSeer
Hwang et al. [3]	√			Association rule			NSYSU-ETD system
Hwang and Chuang [6]	√			FP, ARHP and ACHP		Weighted	NSYSU-ETD system
Gori and Pucci [38]	√				PageRank		ACM portal
Lin and Wilbur [4]	√			Probabilistic model			PubMed portal
Liang et al. [26]	√			SAM			NCLT
Bogers and Van den Bosch [7]		√		TFIDF	UCF and ICF		CiteULike
Lee and Brusilovsky [39]		√				Mixed	CiteULike
Parra and Brusilovsky [33]		√				Combined	CiteULike
Bogers and Van Den Bosch [37]		√				Weighted	CiteULike
Wang and Blei [8]		√				Combined	CiteULike
Doerfel et al. [11]		√			FolkRank		CiteULike
Tian and Jing [40]		√			RWR		CiteULike
Vellino [34]	√					Switch	Synthese
Huang et al. [28]	√	√		Global and local			CiteSeer and CiteULike
Chakraborty et al. [41]		√			DiSCern Ranking		Microsoft Academic Search
Xia et al. [9]		√				Combined	CiteULike
Zhao et al. [5]	√			Spreading activation model			ACM portal
Our Approach			√	SeCon	RWRF	Weighted	Scholarmate

TFIDF: term frequency-inverse document frequency; CCIDF: common citation inverse document frequency; NSYSU-ETD: national Sun Yat-sen University - electronic theses and dissertations; FP: feature partitioning; ARHP: association rule hypergraph partitioning; ACHP: article clustered hypergraph partitioning; SAM: spreading activation model; NCLT: national central library in Taiwan; UCF: user-based collaborative filtering; ICF: item-based collaborative filtering; ACM: Association for Computing Machinery; SeCon: semantic content filtering; RWR: random walk with restart; RWRF: RWR-based filtering.

From Table 1, several important research gaps are identified. First, discovering scientific information from research-oriented social networks has become increasingly common with the prevalence of social networking sites. However, the issues of article recommendation in research-oriented social networking websites have seldom been addressed. To our knowledge, only one work has conducted document recommendation in social trust networks, with the proposed method evaluated in simulation studies. Second, direct keyword weighting methods dominate previous CB approaches, although keyword mismatch problems often emerge. Thus, a semantic content filtering method is needed to address these issues. Third, behaviour connections have been widely used, but additional information has not been mined. Fourth, the quality of information in articles has largely been ignored in previous studies, thereby raising the need for a unified recommendation framework that combines relevance, connectivity and quality features.

3. Hybrid article recommendation

3.1. Overview of recommendation framework

In this study, an integrated approach is proposed in building the article recommender system. Figure 1 shows the main components and procedures of the proposed article recommendation approach.

Figure 1.

Overview of unified framework for article recommendation.

For the target user, the proposed recommender system outputs a list of relevant articles. A two-stage recommendation strategy is employed to provide article recommendation effectively and efficiently. In the first stage, initial results are output by matching the user profile with profiles of candidate articles such that irrelevant articles are filtered out. In the second stage, the relevance, connectivity score and quality score derived from the former analysis modules are further aggregated with the appropriate weighting distribution. The final article ranking list becomes appropriate and accurate after aggregation.

3.2. Profiling

In general, profiling is the process of identifying and determining relevant information and attributes that can be used to characterise a given object. In the article recommendation context, we focus on the means of collecting the necessary data to construct a comprehensive researcher profile. According to Vivacqua et al. [42], researcher profiles can be constructed in two ways: declaration and inference based on observation of research activities. Declared profiles are reflected by subjective information that often contains self-claimed interests, expertise and skills. The subjective information is often represented by structured keywords. However, obtaining this information imposes extra work on the researcher. Thus, this information is often incomplete and difficult to update. The observation and interpretation of research activities has the potential to build accurate profiles that can be constructed automatically and objectively. In research-oriented social networks, users promote their own publications and bookmark articles of interest for reading. These user activities allow service providers to design recommender systems that can assist users in discovering relevant scientific information. Online users often create their homepages and claim areas of expertise and research interest to easily communicate with other scholars. Researchers connect with others or participate in social groups to promote outputs, share experience and ultimately expand their academic influence. Scientific articles are usually collected from standard academic databases by service providers or from those imported by users. The content (such as title and abstract) and metadata (such as published journal and year) of these articles are often collected to provide users with an overview. The graph representation of the data on research-oriented social networks is presented in Figure 2. These data are utilised in building recommendation models, as described in the next section. In our context, researchers represent users. The terms researchers and users are used interchangeably in this study.

Figure 2.

Graph representation of data on research social networks.

3.3. Modelling

3.3.1. Relevance analysis module

This module proposes a semantic keyword weighting method to determine the content relevance of candidate articles. Natural language processing (NLP) procedures are initially used to preprocess articles and construct a keyword–article (KA) matrix, in which matrix elements represent weighted term frequencies. The similarities of keywords are then calculated to build a keyword correlation matrix. In this study, we consider the frequency of keywords in the title, abstract and keyword list as well as social tags to address the keyword sparsity problem. The elements in the KA matrix denote weighted frequency scores (FSs). FS can be calculated as

F S_{ka} = λ * (α f_{tit} + β f_{abs} + f_{key}) + (1 - λ) * f_{tag}

(1)

where $f_{tit}$ is the frequency of keywords in the article title, $f_{abs}$ is the frequency of keywords in the article abstract, $f_{key}$ is the frequency of keywords in author-assigned keyword list, and $f_{tag}$ denotes the number of users who have tagged the article. $λ \in [0, 1]$ is the weight parameter used to control the importance between self-description and social tagging keywords. Here, it is set $λ = 0.5$ for simplicity. This setting gives equal weight to objective and social information. $α, β and γ$ subjected to the equation $α + β + γ = 1$ are weights of $f_{tit}, f_{abs} and f_{key}$ , respectively. The parameter values of $α, β and γ$ are determined through an initial test. First, 10 queries are submitted, and relevant scientific publications are retrieved. Second, three types of keyword weighting methods (namely, $f_{tit}, f_{abs} and f_{key}$ ) are employed to return relevant documents. Third, the number of relevant documents based on the three methods is recorded. Finally, values of the 10 queries are aggregated, and the relevance ratios of the three methods are calculated. Therefore, $α = 0.4, β = 0.4 and γ = 0.2$ . We employ the novel keyword similarity method proposed by Quattrone et al. [43], which relies on the mutual reinforcement principle. The method uses an iterative approach to compute similarities whereby the similarity between any two objects (tags or resources) is computed based on similarities already computed in the previous iteration. The similarity computation is performed in detail as follows.

In the initial step

s k^{0} (k_{m}, k_{n}) = θ_{mn}, s a^{0} (a_{m}, a_{n}) = θ_{mn}

(2)

In the pth step

s k^{p} (k_{m}, k_{n}) = \frac{S K^{p} (k_{m}, k_{n})}{\sqrt{S K^{p} (k_{m}, k_{m})} \cdot \sqrt{S K^{p} (k_{n}, k_{n})}}

(3)

s a^{p} (a_{m}, a_{n}) = \frac{S A^{p} (a_{m}, a_{n})}{\sqrt{S A^{p} (a_{m}, a_{m})} \cdot \sqrt{S A^{p} (a_{n}, a_{n})}}

(4)

where

S K^{p} (k_{m}, k_{n}) = \sum_{i, j = 1}^{n_{a}} F S_{mi} \cdot φ_{ij} \cdot s a^{p - 1} (a_{i}, a_{j}) \cdot F S_{nj}

(5)

S A^{p} (a_{m}, a_{n}) = \sum_{i, j = 1}^{n_{k}} F S_{im} \cdot φ_{ij} \cdot s k^{p - 1} (k_{i}, k_{j}) \cdot F S_{jn}

(6)

Keyword similarity $s k^{0} (k_{m}, k_{n})$ and article similarity $s a^{0} (a_{m}, a_{n})$ are defined in the initial step. Each keyword (article) is similar only to itself and is dissimilar to all other keywords (article). At the pth step, let $s k^{p} (k_{m}, k_{n})$ ( $s a^{p} (a_{m}, a_{n})$ ) be the keyword (article) similarity between $k_{m} and k_{n}$ ( $a_{m}, a_{n}$ ). In equations (5) and (6), $φ_{ij}$ is equal to 1 if i = j; otherwise, it is equal to $φ$ , where $φ$ is the mutual reinforcement factor, and $φ \in [0, 1]$ . The mutual reinforcement factor $φ$ is guided to provide high relevance to keywords that represent the very same articles (articles represented by the same keywords). As operated in Quattrone et al. [43], parameter $φ$ can be learned from experiments. In this study, the best performance is achieved when $φ$ is set to 0.4. In this way, the keyword correlation matrix can be constructed and used to compute the matching degree between two profiles, as presented in the next section.

The relevance score between the researcher profile and the article profile is calculated as follows

RS (u, a) = \sum_{i = 1}^{N_{u}} F S_{u} (i) \cdot Min [\sum_{j = 1}^{N_{a}} sim (i, j), 1] \cdot F S_{a} (i)

(7)

where $N_{u}$ ( $N_{a}$ ) is the number of distinct keywords in the target user profile (article profile), $F S_{u} (i)$ ( $F S_{a} (i)$ ) represents the FS of keyword i in the target user profile (article profile), $sim (i, j)$ denotes the similarity of keywords i and j in the keyword correlation matrix, and the $Min$ function imposes a constraint on the incorporation of the similarities of keywords representing the target user and article profiles. This function limits the sum of the similarity measures to 1.0 and ensures that one keyword alone does not represent all keywords.

3.3.2. Connectivity analysis module

In this study, three types of connectivity features (behavioural, social and semantic features) are utilised to improve article recommendation performance. We represent behavioural, social and semantic connectivity as user–article ( $UA$ ), user–user ( $UU$ ) and user–keyword ( $UK$ ) matrices, respectively. The entry in the $UA$ matrix is a binary value (0 or 1) that reflects the authoring or bookmarking behaviour of a user towards an article. The entry in the $UU$ matrix is also a binary value (0 or 1) that reflects the social connections (friends or groups) between two users. The entry in the $UK$ matrix is an FS that reflects the semantic relevance of a user to a keyword. On the basis of these matrices, random walk with restart (RWR) CF methods are employed to recommend scientific articles.

We can derive an undirected tripartite graph $G_{UAK} = (U \cup A \cup K, E)$ based on the $UU$ , $UA$ and $UK$ matrices. This graph is defined as follows

G_{UAK} = | \begin{matrix} UU & UA & UK \\ AU & 0 & 0 \\ KU & 0 & 0 \end{matrix} |

(8)

RWR method is employed in the graph $G_{UAK}$ to find relevant articles for the target user. The primary process of the RWR method can be interpreted as follows: often starting from node r, an RWR is performed and randomly moves to another node through available edges at each step. The walker then jumps back to node u with a certain probability and restarts from that node at each step. Let $s$ be a column vector that denotes the steady-state probability of node u. Let $c \in (0, 1)$ be the probability of restarting a random walk from a certain node and $h$ be a restart vector. The steady-state probability (column) vector $s$ can be defined as the solution of the following equation

s = (1 - c) G_{UAK}^{'} s + c h

(9)

where $G_{UAK}^{'}$ is a row-normalised version of $G_{UAK}$ , and all elements of each row in $G_{UAK}^{'}$ equals one. Entries of $s$ consist of nodes of three classes: entries related to users U, entries related to articles A and entries related to keywords K. Therefore, a value of an entry u in $s$ , $s (a)$ , refers to the long-term stationary probability of the random walk that starts from node u and ends at node a. This entry value can be considered as a measure of relatedness between user u and article a. The algorithm of RWR is shown in Table 2.

Table 2.

The algorithm of Neighbour Selection with RWR

Algorithm. Neighbour Selection with RWR

Input: target user u; adjacency matrix

G_{UAK}

; restarting probability c; converge threshold

ε

Process: 1. Let N be

| U | + | A | + | K |

2. Let

h = h (0 : N)

3. Identify a set of article,

A_{u}

preferred by user u, 4. For i:1 to N do If

u_{i} = u \lor u_{i} ϵ A_{u}

then

h (u_{i}) = 1 / (| A_{u} | + 1)

Else

h (u_{i}) = 0

End 5. Initialise

s = h

6. Compute the row-normalised adjacency matrix

G_{UAK}^{'}

G_{UAK}

. 7. Do

s = (1 - c) G_{UAK}^{'} s + c h

While

Δ s_{r} < ε

8. Set vector

s_{a} = s_{a} (a : | A |)

. 9. Return

s_{a}

.Output: ranking article vector

s_{a}

3.3.3. Quality analysis module

Quality-based retrieval has been paid more attention in recent information retrieval research [44]. The quality of documents has great influence on satisfying a user’s information needs. In this study, three measures are proposed for evaluating the quality of a scientific article: recency, citation and journal impact factor (JIF).

Users often want to read recent papers that are related to their own research interests. The recency measure reflects the freshness of an article and is an important factor in evaluating the quality of articles. It is defined as follows

Q_{r} (a) = Y_{u} - Y_{a}

(10)

where $Y_{u}$ denotes the year that the target user u requested recommendations, and $Y_{a}$ denotes the year in which an article a was published. As $Q_{r} (a)$ is a cost metric, we provide a log-scale transformation for $Q_{r} (a)$ . The new recency measure $Q_{r}^{'} (a)$ is defined as follows

Q_{r}^{'} (a) = \frac{1}{1 + \log (1 + Q_{r} (a))}

(11)

Most search engines consider links of webpages and employ PageRank to present the authority of a webpage. Thus, the network of article citations can be inspected to evaluate the quality of an article. In this study, citation count is employed to represent the authority of an article. The citation measure is defined as

Q_{c} (a) = C_{a}

(12)

where $C_{a}$ denotes the number of times article a was cited. The difference between one and five citations is more important than that between 101 and 105 citations. Thus, we also transform the citation measure in the log-space. The new measure $Q_{c}^{'} (a)$ is defined as

Q_{c}^{'} (a) = \log (1 + Q_{c} (a))

(13)

The quality of an article is often evaluated by its published venue, such as a journal. The JIF is a tool for ranking journals and is marketed as a tool for evaluating single articles. Although the impact factor is not a perfect tool for measuring the quality of articles, it is the only one deemed effective and has the advantage of already being in existence; therefore, it is a good technique for scientific evaluation [45]. The use of impact factor as a measure of quality is widespread because it fits well with the opinion we have in each field of the best journals in our specialty. The JIF measure $Q_{j} (a)$ is defined as follows

Q_{j} (a) = JI F_{a}

(14)

where $JI F_{a}$ denotes the impact factor of the journal in which article a is published. We also transform this measure in the log-space. The new measure $Q_{j}^{'} (a)$ is defined as

Q_{j}^{'} (a) = \log (1 + Q_{j} (a))

(15)

Finally, all three measures are aggregated into a scoring function that ranks the retrieved articles. The score function is essentially a weighted sum of quality measures

QS (a) = w_{1}^{q} Q_{r}^{'} (a) + w_{2}^{q} Q_{c}^{'} (a) + w_{3}^{q} Q_{j}^{'} (a)

(16)

where $w_{1}^{q}, w_{2}^{q} and w_{3}^{q}$ ( $α + β + γ = 1$ ) are the weights of $Q_{r}^{'} (a)$ , $Q_{c}^{'} (a)$ and $Q_{j}^{'} (a)$ , respectively. Before aggregating the three quality measures, we use zero-one scaling to normalise these log-space quality measures based on the following equation

Scale (Q_{i}^{'} (a)) = \frac{Q_{i}^{'} (a) - Min (Q_{i}^{'} (a))}{Max (Q_{i}^{'} (a)) - Min (Q_{i}^{'} (a))}

(17)

where the subscript i represents different types of quality measures, and $Min (\cdot)$ and $Max (\cdot)$ are the minimum and maximum functions, respectively.

The weights of measures ( $α, β and γ$ ) are determined by employing the standard-deviation-based weight updating approach used in image retrieval and video recommendation domains [46,47]. This weight can be obtained based on a user’s references in the publication history. The references of one publication can be considered as the user’s relevance feedback on his or her information needs. Thus, the importance of the three quality measures can be defined by

w_{i}^{q} = \frac{1}{σ_{i}^{q}}

(18)

where $i \in {1, 2, 3}$ , $σ_{i}^{q}$ is the standard deviation of the ith quality measure values in the reference lists. This updating approach shows that a small variance equates to a large weight, and vice versa. Intuitively, if all preferred articles have similar values for quality measure $Q_{i}$ , then $Q_{i}$ is a good indicator to represent the user’s information needs. $w_{i}$ is then normalised in the interval $[0, 1]$

w_{i}^{q} = \frac{w_{i}^{q}}{\sum_{i} w_{i}^{q}}

(19)

3.4. Ranking

As the number of articles in research-oriented social networks can be very large, comparing each article with the profile of a target user would be inefficient, and the computation could significantly prolong the recommendation process. To reduce the processing time and to increase the efficiency of the computation, we apply a pre-filtering strategy in the articles within the collections to generate a subset of candidate articles for recommendation in the next stage. If an article contains at least one keyword that exactly matches or is highly similar to one of the keywords in the target user profile, it can be considered as a candidate. We employ a reduced version of the keyword correlation matrix that contains 30% of the most frequently occurring keywords to identify highly similar keywords. After filtering out irrelevant articles, we compute recommendations by aggregating the relevance, connectivity and quality scores of the candidate articles with appropriate weights. The aggregated score is defined as follows

S (u, a) = w_{1} RS (u, a) + w_{2} CS (u, a) + w_{3} QS (a)

(20)

where $RS (u, a)$ , $CS (u, a)$ and $QS (a)$ are the relevance, connectivity and quality scores of target user u on candidate article a; $w_{1}$ , $w_{2}$ and $w_{3}$ are the weights of $RS (u, a)$ , $CS (u, a)$ and $QS (a)$ , respectively. The weights reflect the different focuses of the target user’s information needs and enable the user to specify his information need precisely. In this study, the weights are un-updated based on a user’s relevance feedback, as this strategy has been widely employed in the areas of image retrieval and video recommendation. Let $L^{S}$ be the list of relevant articles returned by the aggregated score $S$

L^{S} = [L_{1}^{S}, \dots, L_{p}^{S}, \dots L_{N}^{S}]

(21)

Let $F b_{p}$ be the set containing the relevance feedback provided by the target user for $L_{p}^{S}$ ; it is defined as

F b_{p} = {\begin{matrix} 1, if the p th article is a' positive' example \\ - 1, if the p th article is a' negative' example \end{matrix}

(22)

Then, let $I \in {RS, CS, QS}$ ; $L^{I}$ is the article list returned by the Ith score

L^{I} = [L_{1}^{I}, \dots, L_{p}^{I}, \dots, L_{N}^{I}]

(23)

In calculating weight $w_{i}$ , we first initialise and then use the following procedure

w_{i} = {\begin{matrix} w_{i} + F b_{p}, if L_{p}^{I} is in L^{S} \\ w_{i} + 0, if L_{p}^{I} is not in L^{S} \end{matrix}

(24)

where $p = 1, \dots, N$ . Here, we consider that all articles outside $L^{S}$ are marked as no opinion and have a value of 0. After this procedure, if $w_{i} < 0$ , then we set it to 0. The weights obtained through the above procedure are then normalised by the total weight to make the sum of the normalised weights equal to 1. It is defined as

w_{i} = \frac{w_{i}}{\sum_{i} w_{i}}

(25)

Obviously, a great overlap of relevant objects between $L^{I}$ and $L^{S}$ indicates that the weight of $w_{i}$ is large. That is, the representation score receives great emphasis if it reflects the user’s information need.

4. Experimental design

4.1. Implemented system in Scholarmate

Scholarmate is an online professional social networking community platform in China. The platform aims to foster a knowledge-sharing cyberspace for researchers to allow them to collect and share different resources [48,49]. The proposed approach is implemented as one of the application services in Scholarmate. The system provides main interfaces to extract article-related data, including titles, keywords and abstracts. Once the system has gathered the required information, matching degrees between article and researcher profiles are calculated. The system also extracts different relationships among online researchers to support social-network-empowered recommendation. Figure 3 presents the interface of the article recommendation application with descriptive features (content and social features). As shown in the right panel of Figure 3, additional details about article quality are also provided by the system. The decision button (accept or reject) is displayed in the last column.

Figure 3.

Homepage and article recommendation interface in Scholarmate.

4.2. Data and methodology

A user study is run in Scholarmate to verify the effectiveness of proposed method. For the performance comparison of recommendation methods, we implemented our methods and baselines. They are listed as follows:

Semantic content (SeCon) filtering method: This method has been presented in section 3.3.1.

RWR-based filtering (RWRF) method: This type of model-based CF approach employs RWR to combine connectivity features. This method has been presented in section 3.3.2.

RCQ_SUM recommendation method: This integrated method employs CombSUM strategy [50] to combine the results from the relevance analysis, connectivity analysis and quality analysis modules. It is a type of weighted hybrid recommendation method that sums the results of the three separate analysis modules with equal weights.

RCQ_MNZ recommendation method: This integrated method employs CombMNZ strategy [50] to combine the results from the relevance analysis, connectivity analysis and quality analysis modules. This method regards not only the scores in each of the ranked list but also the number of supported evidence (non-zero scores in each list).

RCQ_RF recommendation method: The RCQ_RF method is our proposed hybrid recommendation method that determines weights by leveraging the user’s relevance feedback. This method has been presented in section 3.4.

All the recommendation methods mentioned were evaluated in the user study. The user study consisted of two main stages (Figure 4). In the first stage, we collected the publications and social activity data of users to compute the scores in the three analysis modules. Using the calculated scores, we provided three recommendation lists (SeCon1, RWRF1 and RCQ_SUM1) to users and investigated their levels of satisfaction with the recommendation lists. We then conducted a comparison analysis on the results of the three methods. In the second stage, we collected relevance feedback on the recommendations in the first stage and computed new recommendations based on adaptive weights. We conducted the second-round survey by providing recommendation lists based on the three hybrid methods (RCQ_SUM2, RCQ_MNZ2 and RCQ_RF2) and conducted another comparison analysis on the results of the three hybrid methods.

Figure 4.

The user study process in Scholarmate.

Our aim is to validate the proposed recommendation method by having online users assess the accuracy of the issued recommendations. Hence, users taking part in the evaluation are referred to as subjects. At each stage, recommendation lists from the three methods were computed, and the top 10 recommendations from each method were presented. Then, each subject assessed the randomised combined recommendation list (not more than 30 articles) to ensure that the rank of a recommendation did not influence the subject’s perception. The subject rated each recommended article on a five-point Likert scale ranging from 1 to 5. Notice that one mark means the subject is not at all interested in the recommended opportunity: this recommendation is not relevant. On the contrary, a high mark by a subject indicates the great relevance of a recommendation. We could thus use the obtained feedback data to evaluate the effectiveness and accuracy of our proposed approach.

We considered active registered users in Scholarmate as subjects from 12 distinct disciplines, such as information systems and management science (refer to the National Natural Science Foundation of China discipline tree¹¹). A total of 100 active users were randomly selected and subjected to the condition that each should have at least three publications in Scholarmate. In the first-round survey, we conducted an online survey of their perceptions on the quality of the recommendation results obtained by the integrated recommendation method and by the other two alternatives. We collected 76 valid responses. The response rate was 76%, which is an acceptable rate. Among the subjects, 14% held the title of professor, 38% held the title of associate professor and 48% held the title of lecturer. In the second-round survey, personal weights were adjusted separately for each user using the relevance feedback adaption mechanism. Then, recommendation lists were recomputed and again presented to users 1 month later. The articles recommended during the first stage were excluded from this stage. In total, 45 users completed our second-round survey.

4.3. Evaluation metrics

Similar to that in traditional recommendation and search systems, we recommended a list of relevant articles to users and asked them to rate the recommendations. The average rating (AR) score and the normalised discounted cumulative gain (NDCG) were selected as performance metrics [51]. These metrics were computed over the top 5 and 10 recommended articles. AR was computed based on the ratings from all the users and indicated the AR of all the recommendations. The NDCG is a commonly adopted metric for evaluating a search engine’s performance and is used for gradual judgments (i.e. documents are non-relevant or more or less relevant to the query). The metrics used are defined as follows

AR = \frac{1}{| U |} \sum_{i = 1}^{| U |} \frac{1}{N} \sum_{j = 1}^{N} r_{ij}

(26)

NDCG = \frac{1}{| U |} \sum_{i = 1}^{| U |} Z \sum_{j = 1}^{N} \frac{2^{r_{ij}} - 1}{\log (1 + j)}

(27)

where $| U |$ denotes the number of researchers in the survey, N is the number of recommended articles and is set to N = 5 or 10, $r_{ij}$ represents the rating of researcher i on article j and Z is a normalisation constant and is chosen so that the NDCG value of the perfect ranking is 1.

5. Result analysis and discussion

In this section, we present the detailed comparison of the results from the online user study. According to Buckley and Voorhees [52], evaluating a search engine should guarantee at least n = 25 queries to ensure the robustness of the retrieval practice. We have 76 valid responses in the first stage and 45 valid responses in the second stage. These numbers are robust for statistical analyses. The performance results of the three methods are shown in Tables 3 and 4.

Table 3.

Performance of three methods in terms of AR.

	SeCon1	RWRF1	RCQ_SUM1	Improvements over the best baseline
AR at 5	3.68	3.52	3.99	8.4%**
AR at 10	3.28	3.12	3.46	5.5%*

AR: average rating.

p-value significant at alpha = 0.05; **p-value significant at alpha = 0.01.

Table 4.

Performance of three methods in terms of NDCG.

	SeCon1	RWRF1	RCQ_SUM1	Improvements over the best baseline
NDCG at 5	0.78	0.74	0.88	12.8%*
NDCG at 10	0.69	0.62	0.75	8.7%**

NDCG: normalised discounted cumulative gain.

p-value significant at alpha = 0.05; **p-value significant at alpha = 0.01.

5.1. Evaluation of the integrated recommendation framework

The proposed integrated method, RCQ_SUM1, achieves the best performance in terms of both the AR metric and the NDCG metric. The AR scores obtained by using the SeCon method and the RWRF1 method are 3.68 and 3.52, respectively, when recommending the top five articles to users. Although these results are acceptable, the integrated method achieves an improvement of more than 8.4% over the best baselines (SeCon1) because it also considers the additional article quality factors. The improvements of AR for the top 10 recommendations also indicate that the integrated method can recommend more relevant scientific articles compared with the two other methods. We further evaluate the rank performance of the three methods. The NDCG scores reflect the browsing efforts of the users before locating the relevant scientific articles. In terms of the NDCG values, the RCQ_SUM1 method achieves over 12.8% improvement for the top 5 recommendations and over 8.7% improvement for the top 5 recommendations. The improvements in the NDCG value clearly show that the RCQ_SUM1 method is more effective than the SeCon and RWRF methods, as it provides a higher ranking for the relevant articles in the recommendation list.

We also test the significance of the improvement of the results of the integrated method over baseline methods by means of paired t-tests. Tables 3 and 4 show that improvements of the integrated method over the best baseline method in terms of AR and NDCG are all statistically significant.

5.2. Evaluation of relevance feedback fusion

The analysis above indicates that the RCQ_SUM1 method outperforms the other two non-hybrid recommendation methods (SeCon1 and RWRF1). We further investigate the performance of the hybrid method that employs the fusion strategy of relevance feedback. The three hybrid methods (RCQ_SUM2, RCQ_MNZ2 and RCQ_RF2) are compared, and the results are listed in Tables 5 and 6. The RCQ_RF2 method clearly outperforms the other two hybrid methods. The performance improves significantly in terms of AR and NDCG for the top 5 and 10 recommendation lists when relevance feedback weight adaption is used. Therefore, we can conclude that employing users’ feedback can help improve recommendation accuracy.

Table 5.

Performance of three hybrid methods in terms of AR.

	RCQ_SUM2	RCQ_MNZ2	RCQ_RF2	Improvements over the best baseline
AR at 5	3.72	3.69	3.96	6.4%**
AR at 10	3.23	3.14	3.42	5.9%*

AR: average rating.

p-value significant at alpha = 0.05; **p-value significant at alpha = 0.01.

Table 6.

Performance of three hybrid methods in terms of NDCG.

	RCQ_SUM2	RCQ_MNZ2	RCQ_RF2	Improvements over the best baseline
NDCG at 5	0.76	0.72	0.84	10.5%*
NDCG at 10	0.66	0.63	0.72	9.1%*

NDCG: normalised discounted cumulative gain.

p-value significant at alpha = 0.05.

6. Conclusion

Personalisation has become a major trend in the industry and the academia, and the recommender systems have become the mainstream in academic communities. In this research, we propose an integrated recommendation framework to help scholars discover relevant articles. The designed recommender system was implemented in a real social networking website. To overcome the shortcomings of the traditional CB- and CF-based methods, we propose a three-dimensional recommendation framework and relevance feedback techniques to fuse the results. The proposed framework and methods were evaluated in a user study. The results show their promising performance in terms of accuracy metrics.

This research has several limitations. The first limitation is related to the semantic content filtering method in the relevance analysis module. In this study, we employed keyword similarity to expand the user profile. The use of domain ontology will help to resolve the semantic ambiguity in keyword matching. In the future, research domain ontology can thus be constructed to support extended profile matching. The second limitation is related to the evaluation part. The number of active subjects in the user study was limited. With the prevalence of Scholarmate, an increasing number of researchers will be actively involved in the website to connect with other scholars for intelligent research. Thus, we will expand the size of the experiment to obtain truthful results.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

This work is supported by the National Natural Science Foundation of China (71501057, 71490725, 91546114 and 71371062), the National Key Technology Support Program (2015BAH26F00), Innovative Research Groups of the National Natural Science Foundation of China (71521001), the Humanity and Social Science Foundation of Ministry of Education (15YJC630111), Anhui Provincial Natural Science Foundation (1608085QG166) and Hefei University of Technology (JZ2014HGBZ0368 and JZ2017HGTB0185).

Notes

References

Beel

Gipp

Langer

et al . Research-paper recommender systems: a literature survey. Int J Digit Libr 2016; 17: 305–338.

Basu

Hirsh

Cohen

et al . Technical paper recommendation: a study in combining multiple information sources. J Artif Intell Res 2001; 14: 231–252.

Hwang

S-Y

Hsiung

W-C

Yang

W-S

. A prototype WWW literature recommendation system for digital libraries. Online Inform Rev 2003; 27: 169–182.

Lin

Wilbur

. PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 2007; 8: 423.

Zhao

Liu

. Paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target. Inform Process Manag 2016; 52: 976–988.

Hwang

S-Y

Chuang

S-M

. Combining article content and Web usage for literature recommendation in digital libraries. Online Inform Rev 2004; 28: 260–272.

Bogers

Van den Bosch

. Recommending scientific articles using CiteULike. In: Proceedings of the 2008 ACM conference on recommender systems, Lausanne, 23–25 October 2008, pp. 287–290. New York: ACM.

Wang

Blei

. Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, 21–24 August 2011, pp. 448–456. New York: ACM.

Xia

Liu

Lee

et al . Scientific article recommendation: exploiting common author relations and historical preferences. IEEE Trans Big Data 2016; 2: 101–112.

10.

Cabanac

. Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics 2011; 87: 597–620.

11.

Doerfel

Jäschke

Hotho

et al . Leveraging publication metadata and social data into FolkRank for scientific publication recommendation. In: Proceedings of the 4th ACM RecSys workshop on recommender systems and the social web, Dublin, 9–13 September 2012, pp. 9–16. New York: ACM.

12.

Yang

Zhang

. Scientific articles recommendation. In: Proceedings of the 22nd ACM international conference on conference on information & knowledge management, San Francisco, CA, 27 October–November2013, pp. 1147–1156. New York: ACM.

13.

Tang

Wan

Zhang

. Cross-language context-aware citation recommendation in scientific articles. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, Gold Coast, QLD, Australia, 6–11 July 2014, pp. 817–826. New York: ACM.

14.

Lee

Y-C

Yeom

Song

et al . Recommendation of research papers in DBpia: a Hybrid approach exploiting content and collaborative data. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), Budapest, 9–12 October 2016, pp. 2966–2971. New York: IEEE.

15.

Kim

Chen

. A scientometric review of emerging trends and new developments in recommendation systems. Scientometrics 2015; 104: 239–263.

16.

Jia

Zhang

et al . Combining tag correlation and user social relation for microblog recommendation. Inform Sciences 2017; 385: 325–337.

17.

Bai

Cambazoglu

Gullo

et al . Exploiting search history of users for news personalization. Inform Sciences 2017; 385: 125–137.

18.

Bach

Do Hai

Phuong

. Personalized recommendation of stories for commenting in forum-based social media. Inform Sciences 2016; 352: 48–60.

19.

Lee

Y-C

et al . Improving the accuracy of top-N recommendation using a preference model. Inform Sciences 2016; 348: 290–304.

20.

Thong

Hong

Tam

. What leads to user acceptance of digital libraries?Commun ACM 2004; 47: 78–83.

21.

Renda

Straccia

. A personalized collaborative Digital Library Environment: a model and an application. Inform Process Manag 2005; 41: 5–21.

22.

Bollacker

Lawrence

Giles

. Discovering relevant scientific literature on the web. IEEE Intell Syst App 2000; 15: 42–47.

23.

Benz

Hotho

Jäschke

et al . The social bookmark and publication management system bibsonomy. VLDB J 2010; 19: 849–875.

24.

Madisch

. ResearchGATE scientific network: a first step towards science 2.0. Clin Exp Immunol 2008; 154: 214.

25.

Mangan

. Social networks for academics proliferate, despite some doubts. Chron High Educ 2012; 58: 1–7.

26.

Liang

T-P

Yang

Y-F

Chen

D-N

et al . A semantic-expansion approach to personalized knowledge recommendation. Decis Support Syst 2008; 45: 401–412.

27.

Kim

H-N

Lee

K-S

et al . Collaborative user modeling for enhanced content filtering in recommender systems. Decis Support Syst 2011; 51: 772–781.

28.

Huang

Mitra

et al . RefSeer: a citation recommendation system. In: Proceedings of the 14th ACM/IEEE-CS joint conference on digital libraries, London, 8–12 September 2014, pp. 371–374. New York: IEEE.

29.

Martín

Schockaert

Cornelis

et al . Using semi-structured data for assessing research paper similarity. Inform Sci 2013; 221: 245–261.

30.

Soo Kim

. Text recommender system using user’s usage patterns. Ind Manag Data Syst 2011; 111: 282–297.

31.

Chandrasekaran

Gauch

Lakkaraju

et al . Concept-based document recommendations for CiteSeer authors. In: International conference on adaptive hypermedia and adaptive web-based systems, Hannover, 29 July–1 August 2008, pp. 83–92. Berlin: Springer.

32.

Bogers

Van den Bosch

. Collaborative and content-based filtering for item recommendation on social bookmarking websites. In: Proceedings of the ACM RecSys ’09 workshop on Recommender Systems and the Social Web, New York, 2009. New York: ACM.

33.

Parra

Brusilovsky

. Evaluation of collaborative filtering algorithms for recommending articles on CiteULike. In: Proceedings of Workshop on Web 3.0: Merging Semantic Web and Social Web 2009 Turin, Italy, June 29, 2009, CEUR Workshop Proceedings. New York: ACM.

34.

Vellino

. Usage-based vs. citation-based methods for recommending scholarly research articles. In: ACM Recommender Systems Workshop 2012, Dublin, 2012. New York: ACM.

35.

Guan

Wang

et al . Document recommendation in social tagging services. In: Proceedings of the 19th international conference on world wide web, Raleigh, NC, 26–30 April 2010, pp. 391–400. New York: ACM.

36.

Özbal

Karaman

Alpaslan

. A content-boosted collaborative filtering approach for movie recommendation based on local and global similarity and missing data prediction. Comput J 2011; 54: 1535–1546.

37.

Bogers

Van Den Bosch

. Fusing recommendations for social bookmarking web sites. Int J Electron Comm 2011; 15: 31–72.

38.

Gori

Pucci

. Research paper recommender systems: a random-walk based approach. In: WI 2006. IEEE/WIC/ACM international conference on web intelligence, Hong Kong, China, 18–22 December 2006, pp. 778–781. New York: IEEE.

39.

Lee

Brusilovsky

. Using self-defined group activities for improving recommendations in collaborative tagging systems. In: Proceedings of the fourth ACM conference on recommender systems, Barcelona, 26–30 September 2010, pp. 221–224. New York: ACM.

40.

Tian

Jing

. Recommending scientific articles using bi-relational graph-based iterative RWR. In: Proceedings of the 7th ACM conference on recommender systems, Hong Kong, China, 12–16 October 2013, pp. 399–402. New York: ACM.

41.

Chakraborty

Modani

Narayanam

et al . DiSCern: a diversified citation recommendation system for scientific queries. In: 2015 IEEE 31st international conference on data engineering, Seoul, South Korea, 13–17 April 2015, pp. 555–566. New York: IEEE.

42.

Vivacqua

Oliveira

De Souza

. i-ProSE: inferring user profiles in a scientific context. Comput J 2009; 52: 789–798.

43.

Quattrone

Capra

De Meo

et al . Effective retrieval of resources in folksonomies using a new tag similarity measure. In: Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, 24–28 October 2011, pp. 545–550. New York: ACM.

44.

H-J

Yoon

Y-C

Kim

. Finding more trustworthy answers: various trustworthiness factors in question answering. J Inf Sci 2013; 39: 509–522.

45.

Hjørland

. Methods for evaluating information sources: an annotated catalogue. J Inf Sci 2012; 38: 258–268.

46.

Rui

Huang

Ortega

et al . Relevance feedback: a power tool for interactive content-based image retrieval. IEEE T Circ Syst Vid 1998; 8: 644–655.

47.

Mei

Yang

Hua

X-S

et al . Contextual video recommendation by multimodal relevance and user feedback. ACM T Inform Syst 2011; 29: 10.

48.

Silva

Yang

et al . A profile-boosted research analytics framework to recommend journals for manuscripts. J Assoc Inform Sci Technol 2015; 66: 180–200.

49.

Jiang

Yang

et al . A social voting approach for scientific domain vocabularies construction. Scientometrics 2016; 108: 803–820.

50.

Fox

Shaw

. Combination of multiple searches (Special publication). Gaithersburg, MD: NIST, 1994, pp. 243–252.

51.

Adomavicius

Tuzhilin

. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 2005; 17: 734–749.

52.

Buckley

Voorhees

. Evaluating evaluation measure stability. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, 24–28 July 2000, pp. 33–40. New York: ACM.