Abstract
Social Networks is an essential phenomenon in all aspects through various perspectives. These networks contain a large number of users better termed as nodes and the connections between the users termed as edges. For efficient information processing and retrieving, accessing the influential node is essential for improving the diffusion process. To identify the influential node inside a heterogeneous community, incorporating probability metrics with regression classifier is put forth stated by proposed method Support Vector Bayesian Machine (SVBM). Node metrics such as degree centrality, closeness centrality is measured for eliminating the nodes primarily. A standardized index based on the centrality values computed for enhancing into SVBM. After the standardized index, similarity dissimilarity index values evaluated by combining the Euclidean, Hamming, Pearson coefficient for valued relations and Jaccard for binary relations which results in a single index value considered as the power degree value(p). The value p determines the node’s boundedness, which indicates the range of influence within the community. The outlier nodes in the bounded region get eliminated, and the nodes remaining taken for the final phase of SVBM, probability regression line predicts the node inhibiting the most influential nature. Experimental evaluation of the proposed system with the existing Support Vector Machine (SVM) technique resulted in 0.95 and 0.41 respectively for Area Under Curve (AUC) denoting that the true positive influential node classification process from the other existing nodes was higher than SVM. In comparison with the existing SVM, the proposed methodology SVBM attained a node detection, which influenced a higher diffusion rate within the networks.
Keywords
Introduction
Online platforms which play an essential role in day to day life of each is contributed using networking platforms termed as Social Networks. Majority of the unique properties of such massive networks can be explored only by dividing a larger network into sub-communities. Communities play an inevitable role in information diffusion inside the network. The relationship of an individual user to other users in the world represented through Social Networks [8, 9, 10]. The properties exhibited by sub-communities termed as community structure properties [12]. They also comprise of a large number of nodes and edges connected through topologies. The structural hole theory was put forth by Ronald Burt for analyzing the problems that occur in social networks [1]. The theory states that there will be a formation of clusters within a social network between the individuals who have strong ties. If the nodes in a network are individual, then a common node which acts as the message passing node is called as the Structural hole node or Influential node which acts the bridge between two networks.
Evaluation of a node’s influence in a complex network is an important task to be performed in network analysis. The evaluation process mainly depends upon major properties of the nodes such as local property, global property, location, and random walk of the network. Among these four, with low time complexity, the local property of the network mainly depends on the information of the current node and also the neighborhood nodes. It also considers both adjacent and nonadjacent nodes per their dependence strength. Spreading capability of users in a complex network of respect with their local structure [13]. It also considers both connections and topologies of the neighborhood nodes. The significance of structural holes in terms of k-brokers in the process of connecting sub-networks resulted in a less time complexity in information search [2]. Numerous parameters contribute to detect the influential node involves betweenness centrality, closeness centrality, PageRank centrality, and so. Detection methodologies involve account variability factor through data normalization [3], data mining methodologies [4], cluster optimization methodologies [2, 4]. Business-related, IT-related, and healthcare applications also deal with the detection of the influential node across the network.
Motivation
The main objective is to find an influential node through an index attaining higher accuracy. The network is usually represented with a set of nodes
Global attributes of a node derive the value for a node to be influential. Each attribute of the node denotes some important factors occurred between two nodes. Subjectively betweenness indicates the availability of the node [5], the capability of making other nodes influential indicated by closeness [6], Eigenvector corresponds to the prestige of a single node in comparison to all other nodes inside the community. In a community, each node has either a weak link or a strong link. A weak link denotes the least likelihood of direct interaction between two nodes or two communities, whereas the stronger denotes the maximum likelihood of communication between them.
Contribution
The proposed system Support Vector Bayesian Machine (SVBM) designed by combining both Support Vector Machine (SVM) and Bayesian characteristics. For the classification of irrelevant nodes from the relevant nodes identified from the probability value. Based on the values of the SVBM, the performance of the system is derived using a Receiver Operating Characteristic (ROC), and the classification accuracy of the model is given using the Area Under Curve (AUC). The Bayesian statistics in the network provides the probabilistic value which determines the nodes which have the maximum likelihood in spreading the information. The SVM on combining with the likelihood values gives the output of the nodes with higher values of the information values for the spread.
The indexing mechanism involves the initial stage of traversing through the network for the estimation of similarity evaluation between two nodes inside the community. Based on the degree and the closeness centrality, the nodes initially eliminated with the above two metrics. The similarity measurement of the nodes considered through four extractions. Euclidean distance, Cosine similarity, Jaccard index, and Pearson’s correlation coefficient used for similarity evaluation. Various datasets that are detailing about the relationships between the nodes in the networks also evaluated in perspective of the structure of the network [26, 27], managerial employee relations [28, 29], centrality values [30, 31] and Social messages [31, 32], the Nodes placed on the concentric layers of the influential ranges provided by the boundedness values. Based on the probability values set on the regression curve, the outlier nodes get decided.
The organization of the paper is as follows. The first section frames the ideology of a structural hole and its detection process, a literature review by various authors for influential node detection portrayed in the second section. The proposed methodology SVBM explained in the third section, followed by an experimental evaluation, which includes the comparison and outcomes, the conclusion section outlining the work done and the future work to be enhanced.
Related works
Structural holes play an essential key role in the role of rapid information diffusion. This section explains the various structural holes in detection methods. The techniques for detection of the influential node in a Social Community structure can be done based on the structure of the network, node centrality metrics such as betweenness, closeness, algorithmic approaches for node detection. Various methods for classification of influential nodes from other nodes and the datasets that explain the relationships and their tie strength between the nodes inside various types of communities given.
Based on network structure
Co-citation is done using the similarity among the documents done using Cluster analysis for the retrieving the various documents from history. Zhu et al. enhanced an influential maximization method by reducing the number of seed nodes through Structure Hole based Influence Maximization algorithm (SHIM). A Laplacian matrix formed by the difference in the adjacent and degree matrix. The minimum value in the matrix is found to be the influential maximization node which overcame the complexity of Monte Carlo and traditional algorithms. Conditions framed were considered discordant [8]. Mishalro Kimura et al. [18] designed a system for classifying the influential nodes from other nodes in the network by combining bond percolation and graph theory. This method overcame the time complexity prevailed in the greedy approach. Malm et al. [22] surveyed marijuana growers for finding the relationship between them and detecting the influential node in their co-worker network. Two measures observed in the approach, the amount of information, and the quality of information. The trust in their network and self-awareness growers between them. It resulted in the detection of the nodes whose ties were weak inside the network. The research suggested that the brokers had the criteria to access more information than others in their network. The study does not reveal perceptual deterrence and risks.
Xu et al. [20] designed a model based on structural diversity for finding the influential node inside the networks. The approach solved the problem of the increasing polynomial time for finding the k seed nodes as the activated nodes get maximized. A two-step process for detecting the influential node involving the selection of seed set through multiple iterations and the later calculates the influence of each node individually through Monte Carlo simulations. The selection of the node with the highest probability is selected and added to set, comparison of the seeds through the greedy approach for choosing the highest probability node. The drawback of this approach is that it did not address the time complexity measure. Colladon and Remondi [23] proposed an analytical method for detecting suspicious activities that take place in financial transactions and criminals who are potentially in the network. A dataset with a large dataset involving frequent transactions, behaviors that are done by the notorious users. The factors for the detection included the degree measures, betweenness, risks taken, and the constraints for geographical areas, transactions, and economic sectors. It gave a solution for the problem prevailed in studying single attributes of social units which tend to have a lesser informative power through relational data between the users. No machine learning methods were implemented nor addressed, which is the main disadvantage which would result for quick detection of fraudulent actions. Figueiredo et al. [9] explained how the dimensions of a network such as a node centrality, structural hole, and, tie determine the influence of a node and also helps in content selection. This approach also helped in identifying the novelty and represent the behavior of users through content selection. Significant factors such as the measure of complex social networks, community structure, strong and weak ties were done employing analysis of clusters and co-citation Shiau et al. [10].
Based on node centrality metrics
Lu et al. designed an evaluation system which structured a Fuzzy Social Network Centrality Analysis, the system combined the various metrics of the system such as node distribution, node connection strength and node condensation and cluster based on the fuzzy graph theory. Three conditions are studied, to evaluate the equilibrium of the system friend group nodes is analyzed with fuzzy comentropy index. For evaluation of the centrality status and types of nodes by fuzzy node degree index, which gives the connection information between BBS group regional nodes. The difference in the centrality measured by use of the fuzzy condensation degree, fuzzy cluster coefficient, and fuzzy geographic concentration indexes [7]. An efficient method for finding the structural holes given by Zhu et al. combining the influence matrix and closeness centrality. The structural holes detected both local and global information of the nodes [8]. Aili Yang and Xie [19] proposed a multi-objective method for finding the crucial nodes in a social network. Consideration of the closeness centrality of each node and ranking the nodes based on the combination of five indicators which are selected based on the various global and local attributes of the network, information dissemination, and location. This method also functioned as an emergency management system for decision support systems based on the node importance provided by the above method. The above method did not deal with undirected relational networks.
Yu et al. [21] addressed the problem persisted when global metrics such as betweenness and closeness centrality are considered in an unknown topology and also in large-scale networks. The method outlined to overcome is by using only the degree of nodes and the information by the nearest neighbor named as Improved Structural Holes (ISH) method. Susceptible Infectible Recovery (SIR) model considered for evaluating the spread of the influential nodes through local information. The drawback of the above method was considering very fewest metrics for determining the influential node. The precise node identification with only the degree and information as keys attributes results in less accuracy. Zhu et al. [24] sketched a method for evaluating the influential node through the integration of structural hole theory, closeness centrality, and influence matrix. Ranking the nodes based on their influential degrees determines the relative influence between the nodes. The time complexity of the algorithm also determined the need for finding the shortest pair of nodes. The evaluation summarized that the algorithm has the upper hand in mining the key nodes through implementation in three large-scale networks measured by effectiveness. Exclusion of local and global information of key nodes is the major drawback factor in the evaluation process.
Node detection algorithms
Comparison of various algorithms for detection of the structural holes present inside a network with a proposed greedy algorithm approach was done by Xu et al. The algorithm found its structural hole using the shortest path distance between the nodes which are given by articulation points. Comparison of benchmark structured algorithms AP_BICC, Central, PathCount, 2-step, PageRank, Constraint, and HAM with the proposed algorithm made. AP_Greedy gave an efficient time constraint finding of structural holes [12]. Bekett S.J. LPAwb
Node classification methods
Hu, P, and Mei, T framed a method of ranking the influential nodes using E-Burt method. The products of the degree values of the two endpoints of the edge as the edge weight is taken, which changes the unweighted network graph into a weighted network graph. The nodes evaluated through Susceptible-Infected-Recovery (SIR) model. The values implemented in three different types of network and E-Burt method was found to be efficient in Zachary network [11]. Dutta et al. developed an attribute-based selection feature for classification of a typical user from spam users in Online Social Networks (OSNs) [5]. The finding of smaller subsets leads to better classification. Three different attributes, namely text, URL, and User profile based characteristics like active user participation. This method had an issue which not leads to cover the generalization as the datatypes were of different types. Veni et al. proposed a method for classifying the various URL attacks that are present on the web through link structures, web page contents, DNS information, and network traffic [14]. The main classifiers used here are SVM, RAkel, and Ml-KNN, which uses linear regression, Labeled sets, and nearest neighbor for classification by observing the DNS flexibility.
Li et al. proposed a prediction model determines the host who has dense connections inside the social groups can result in efficient information forecasting [15] combining the significant fields in determining the criteria for an influential node such as egocentric analysis in a social network, structural holes theory, and survival analysis. A three-step analytical approach for determining the focal node, position of the structural hole and the deciding the node whether to accept or reject the influential node based on the status of the node which is determined by factors such as recovery, failure, maturity also, it determined the weight of the predictors applied to the recommender systems. The outcome of the approach is only evaluated in a user-user network and not other organizational network structures such as co-authorship networks and co-membership networks.
Relational datasets with various features of social networks
Relational datasets with various features of social networks
The networks which define the relationship between two nodes and the interpersonal relations inside the networks concerning the communication between the two nodes taken for evaluation of the proposed system. Seven datasets about the relationships of the nodes. Table 1 portrays the various relational datasets which are suitable for the network formation and testing the index to identify the datasets containing the most relational and influential information in it in the form of tie strength and data communication.
Krackhardt, high-tech managers network, comprises of 21 managers of a company. The network consists of over 100 employees and 21 managers. It defines three relationships of advice, friendship, and reports to in the perspective of between two nodes. The influential node determines the document attained by the other nodes from each node [26]. Padgett’s Florentine Families dataset was created by Breiger and Pattison, which contained the knowledge about the 16 families who were fighting each other. This dataset included business ties and marital ties that accounted for the relationship factors [27]. A famous dataset collected by Wayne W. Zachary containing 34 members and 78 ties explained the communication by the nodes outside the network and the reason for the split up. It contained only two factors a presence or absence of ties and also the interactions that have occurred between the nodes [28]. Representation of corporate interlocks is done through the asymmetric binary matrix by Stokman et al. also known as the Dutch network [29]. A communicational dataset developed through interaction between fifty academics in an Electronic Information Exchange System (EIES) as a medium. Based on the values of time, it determined the level of friendship by Freeman [30]. Data collection from workers from 95 different companies who had different types of relationships in perspective of both information and money exchange. The dataset is in the form of directed and not symmetric [31].
Challenges identified
For a homogenous graph
Flowchart of SVBM index mechanism of detecting influential node.
The proposed methodology induced with the initial process of network creation with essential nodes and edges and followed by the attribute extraction of the individual nodes. The individual nodal values then are taken for calculating the boundedness of the nodes determining the influential range. Figure 1 represents the flowchart of the SVBM Index mechanism of an influential node detecting framework.
After the determination of boundedness, the resultant value will give the factors of the influential value of the nodes. The probability values for each node is calculated and then implemented into the linear regression equation, which gives the most influential node inside a community.
The architecture of influential node detection by SVBM index mechanism.
The important process in the classifier is setting the classifier equation value. The SVBM classifier functions on both the regression line value and the probability value of the user node. The Network created with its attributes such as indegree, outdegree, and ties. The network includes essential users, and their relationship is denoting who reports to whom. It also provides information about the relationship between each node in the form of values. The diagrammatic representation of the SVBM mechanism given in Fig. 1. The network created with ‘n’ number of nodes and the nodes that are connected by their corresponding edges ‘e’ is represented as
The nodal degree of each node ‘n’ gets computed. The degree of each node represented through
The eigenvalues for the given nodes calculate criteria for enhancing the topological organization. The eigenvalues reveal the connected components formed from the values of Eq. (2). It produces a normalized Laplacian matrix which gives the topology.
The attributes for each node ranges from
For evaluation, similarity dissimilarity between nodes taken as input. Similarity measures denote the nodes that are equivalent in their characteristics concerning their attributes, and the dissimilar measures denote the unlikely attributes, similarity measures considered for the measurement of similarity between the nodes listed in Table 2 with their functionality.
Similarity measures
The distance between the two nodes given by Euclidean distance. The distance measure considered directly proportional to the likelihood between two nodes. When the distance value increases the likelihood of the relationship or a stronger tie between the nodes is not so desirable than with a lesser one. The Euclidean distance denotes that if the value greater than or equal to 1, denoting that the nodes have similar ties at all the time (A). When the value is 0, it denotes that there is no similarity at all between the nodes, if it lies between 0 and 1 denoting some similarity between the users. The similarity measure calculation is employing the Jaccard index taken as the second metric denoted through (B). Comparison of two members out revealing the shared members. The higher the values of the index portrays the better similarity. Through the value of Jaccard index, gives the Jaccard distance. The Jaccard distance gives the value of dissimilarity between the nodes. The values of the Jaccard index has the minimum value of 0. occurs due to no similarity between two users and the maximum value of 1 representing the stronger tie between the nodes. The intermediate values represent the similarity by some measures in the nodes (D). Here, the hamming distance method is used to find the difference between the codes of two communities (C), the connectivity of the nodes inside the community represented through a matrix. The matrix denotes the binary code of the node communication inside the community. The similarity of two communities evaluated through the difference between two matrices derives a resultant matrix gives the difference.
Similarity calculation between two nodes is for finding the nearest neighbors in a community. It also represents the stronger and the weaker tie. It results in values ranging from 0 to 1, which is resultant from (6). When the resultant value is zero, it represents that there is no similarity between the users at all. If the value lies in between 0 to 1, out reveals that some matches in their distances and their ties. When the similarity value inside the matrix for a node is 1, it represents that the two users have their ties the same irrespective of the time.
Pearson correlation coefficient represents boundedness value between the negativity to positivity (4). The negative value of the node denotes that the actors are not correlating with each other at any point in time, whereas the positive indicates that the user is with the same features. The zero value represents that there is no possibility of a relation between the two nodes.
Eccentricity layers plotted in concern with centrality and distance values. The values of the eccentric layers denote the various ranges of the values in their relationship. Nodes placed on the layers of the eccentric circles concerning their values. The nodes placed in the range given with their corresponding edges used for classification of the weaker ties.
The node similarity measurement inside the community explained through Algorithm 1.
Nodes generated as subgraph based on structural equivalence, automorphic equivalence, and regular equivalence. Automorphic equivalence determines various subgraphs that have equal weight. Regular equivalence plays a role in revealing equal important nodes. Generate the value of the
Community with boundedness levels.
Community with stronger ties after classification.
Influential node detection mechanism by SVBM, a. Formation of the network using a random topology, b. Removal of the node from the network due to low degree and centrality values, c. Nodes lie on their respective influential range through their boundedness values, d. Influential nodes classified based on the bounded regions (SVBM Classification).
After node elimination concerning similarity metrics evaluation, nodes fitted onto layers of respective influential range values. The distant nodes eliminated and the nodes are classified based on their influential probability values based on the regression values. Identification or distribution of degrees of each node is calculated to identify which has the highest degree dissipation has the most value of influenzation. The nodal power is the ratio of the number of nodes to the power degree of each node to its value of the node. The power degree is given using Eq. (5)
Where
Equation (5) derives the vertices that have the mean distribution for becoming the influential node. The probability mass function (P.M.F) of the Poisson distribution gives the essential edges distributed inside the network.
Boundedness of each node denotes the level of tie to which each of the individual nodes gets connected. The boundedness denotes the range of the node inside a network.
The concentric circle layers or areas denote the various ranges for the influential closeness centrality values. From Fig. 3, it is observed that all the nodes in the networks are placed on their respective range values following their centrality values. The boundedness denotes the stronger ties connected to each of the individual nodes. The value of the nodes gives the community the likelihood it belongs. Eccentric circles specify each of the community value, the node that lies in the eccentric layer of each circle denotes the community. The boundedness classification based on their stronger ties represented in Fig. 4 denotes the nodes after classification that consists of the stronger ties between the nodes inside the network. The dataset contains the information of who reports to whom. The stronger ties took as the criteria for eliminating the nodes are placed in the range, nodes which contain weaker ties are considered as false nodes for developing the influential node detection process. The nodes with the least relationships are eliminated.
a. The standardized degree and centrality values of the nodes in the network, b. Average influential value of the node (similarity values based on Euclidean, Jaccard, Hamming, and Pearson) and c. SVBM index values for classification of the influential nodes from other nodes inside the community.
The influential classified nodes from the community by SVBM index process. Based on the influential values, two nodes having the same classified as the most influential nodes from the community of nodes.
The Influential node detection in Fig. 5 represents the mechanism of the SVBM process involving the formation of the network using the random topology in the nodes having tie strength similarly. The initial phase of node classification involves closeness, centrality values, and the similarity evaluation of the nodes in respect to their boundedness values, nodes placed on the influential range. Based on the probability values, the influential nodes are labeled from the other nodes inside the bounded region.
Values derived from the various phases of the influential node detection mechanism by SVBM index process given in Fig. 6. Initially, the degree and closeness values are standardized represented in Fig. 6a, and then the values of similarity values are shown in Fig. 6b and the influential values derived from SVBM index values taken for classification Fig. 6c.
Influential nodes extracted from the SVBM index process shown in Fig. 7. The two nodes from the community of nodes detected as the influential node. The value of the probability is the same for two nodes inside the community, which reveals that two nodes have similar properties to become influential nodes.
Algorithm 2 depicts the mechanism of structural hole detection when there are nodes and edges present inside the graph after the measurement of the structural equivalence when there are the nodes, vertices, and community of the nodes. The traversal of the graph with the community is between the three-node metrics with Euclidean distance, Cosine similarity, and Pearson’s correlation coefficient. The traversal is based on the highest value of the node.
The classification of each nodal values derived through equating the support Vector Machine value. Generally, the SVM equation consists of two variables a and b, which denotes the characteristics of the line. Here, the centrality values of the nodes passed onto the Eq. (7).
The two communities taken as
The values of
In order to develop an SVM with prediction support of higher accuracy, a model is combining with Bayesian mechanism with SVM induced. The Bayesian approach gives the probability of the node for succeeding, the prediction made through the probability of the node through Eq. (10). The combination of the Bayesian and SVM gives the classification of the nodes through prediction, which gives an increase in the rate of classification and accuracy of the nodes present in a community.
The probability values thus derived from the two characteristic equations are then passed onto the SVM values, which combines to give the SVBM value. The value denotes the prediction value of each node from the community to become the influential node. The node classification enhanced with the predictor line through SVBM. The nodes predicted whether it has the characteristics for becoming the influential node. The combination of the values gives the highest accuracy for the linear classification of influential nodes from other nodes inside the community.
The prediction value of the nodes is derived using Eq. (11), which represents the linear regression of the SVM with the probability value of the influential nodal occurrence which signifies the nodal attribute. The value of the regression line enhances the probability value, which defines the predictability of the node defined as the influential node.
The removal of the false node plays a major role in the classification process as the nodes containing the similar value have different nature, the node detection process. Removal of a false node implements a lesser complexity in the detection process. The reduction in the edges is processed, the edges which are stronger ties done. The weaker ties removed as it does not provide much information for the influential nodes. The weaker also gives information about acquaintances. The acquaintance inhibits the details of healthier relationship values. The probability may arise that an acquaintance has a chance to become closer but not as an influential node distant relationship between the nodes which necessarily not. Based on the eccentric values, the nodes classified with the values from the SVBM classifier, which classifies the influential nodes from the other nodes inside the community, which is having more probability. The nodal attributes, when imposed into the regression equation line in combination with the probability values. The classifier line makes a prediction value for classification of nodes which predict the outcome of a node with the nature of influential or not. The model defines the prediction of nodes, which is a powerful phenomenon.
Experimental evaluation
The centrality and the degree values of the nodes first considered for determining the vertices considered as the outlier. The standardized index values of both the centrality and degree evaluated in order to increase the precision in determining the input data to the SVBM model.
Experimental setup
The network implemented in SocnetV involving kracharkdt dataset, which contains 24 actors. The dataset represents the network containing each actor attributes such as communication, relationship, tie information, and report of one node to another node. This network also conveys the information about the node which all the nodes will inform. The centrality information all the nodes measured and also the distance measures such as the Hamming distance, Jaccard distance, and Euclidean distance also measured for both similarity and dissimilarity. Both the similarity and the dissimilarity measures give the nodal values for the influenzation of the node. The standard index of each centrality measure is taken, which gives a range of influence of the node with greater accuracy. SVBM index classification implemented on the relational datasets described in Table 2 to determine the dataset containing the more influential and relational information.
Table 3 depicts the values of the standardized index of the centrality and degree measures. These measures involve the increase in accuracy of the properties of the node measures inside the network.
Standardized index values for centrality and degree
Standardized index values for centrality and degree
Table 4 states the values of the boundedness of each node, which describes the influence range of the nodes. Boundedness classifies the nodes after the integration of the node attribute values.
Influential values for SVBM index –
This section represents the result outcomes of the SVBM approach in comparison with SVM in terms of a Classification model, accuracy measurement by confusion matrix, and performance measurement by AUC curve. A comparison of the relational datasets for the finding the precise relational information in it through influential and relational values into SVBM model for the influential node detection.
The values in Table 4 denote the influential node value derived in respect with the similarity metrics euclidean, Jaccard, Hamming, and Pearson for the node relations in the network mentioned in Fig. 7a. The power values for the node determine the influential range in which the node will settle down in the bounded area.
a. SVM classification for the given network nodes. b. SVBM classifier for classifying the influential node from the network.
Figure 8a represents the classification of SVM and Fig. 8b shows the classification of the SVBM model, which represent the nodes classified with attributes such as the influential node and the non-influential node. In which, the nodes gaining the influential attributes denoted by circles and the nodes eliminated denoted by cross symbols denoting that those nodes eliminated for not having the influential capability. SVM classifies the nodes only in two limits either 0 or 1 as the classification of the SVM process does not consider the various relational properties of the node. SVBM attains the different classes for classification of the nodes as induction of the probability value into the linear regression line achieves the higher efficiency in the classification process. Nodes inside the influential classification region denote the nature having more tie strength with the other nodes in the community. The nodes with the higher values of regression not classified as the influential node indicate that the node has lesser tie strength, and it is egocentric.
a. Confusion matrix for positive predictive value and the false discovery rate. b. Confusion matrix predicting the positive predictive value in the ratio of true positive rate and false-negative rate.
a. AUC for the SVM classifier (AUC 
The confusion matrix contains rows and columns which reports the false positives, false negatives, and true negatives respectively. The matrix gives a more detailed analysis of the accuracy. The confusion matrix represents the efficiency of the classification. The confusion matrix denotes the ratio of the positive predicted value to the false discovery rate. The rate of positive data prediction is very higher than that of the false predicted values. The accuracy of the model is very higher than that of the normal SVM classifier. The confusion matrix based on SVBM classifier values represented in Fig. 9a and b. The ratio of the true class with the predicted class achieved a higher percentile of 100% and revealed that the in respective with True discovery value and the false predictive rate in Fig. 9a, true positive rate attained most values at 0.17389, which is co-existing with the ratio true positive value in ratio with the false negative value shown in Fig. 9b.
The accuracy of the developed model given through AUC curve based on true positive and true negative values. The accuracy of the SVBM classifier was with 95% accuracy, whereas the accuracy of the normal SVM represented in Fig. 10b based only on the centrality values was 41% given in Fig. 10a This shows that the node network classification is higher when implementing SVBM than SVM. The node classification with the influential attributes denoting the nodes have higher chances of becoming the influential node.
Average clustering coefficient and eccentricity values for relational datasets
Figure 11 denote average values derived from the clustering coefficients of relational datasets. Clustering coefficient is the tendency of a neighbor node to become a cluster. The values of the clustering coefficients rely on the principle of the influential nature of the dataset. From Fig. 11, it can be seen that freeman dataset has higher possibilities of influential spread to the other nodes due to the ratio of nodes to the edges are very higher. In Freeman dataset, only 32 users with 650 connections or traversals among them lie as the backbone of its influential nature. Nature of the dataset determines the transitivity of the data inside its network.
Influential Range Values (IRV) determines the influential range of nodes derived through Eigen Vector centrality. The Eigenvector centrality values determine that each of the nodes is having the least, moderate, and high values on each other nodes. Table 6 depicts influential range values for determining the influential nature of datasets.
Influential Range Values (IRV) of datasets for determining the influential range of nodes in network datasets
Average clustering coefficients of datasets.
Eigencentrality values of datasets.
Eigencentrality values of a node is a way of measuring its influence by the other nodes inside the network. Through assigning relative scores to all nodes inside the network gives its value. A high eigenvector value denotes that a node connected to all other high scoring nodes. Figure 12 represents the eigenvector values of nodes on different levels of influence of seven datasets.
Eccentricity of nodes.
The influential values of each dataset calculated through their probability range values given through Fig. 11. The datasets which have highly influential nature data was evaluated and found that the dataset contained the relational information about the managerial employee relationship was found to be containing more influential data. Wasserman_Faust dataset [29] containing the most relational data in explaining the node behavior in the community revealed that the range provided fully probable in detecting the influential value of the nodes.
Relational evaluation of datasets.
The values plotted in Fig. 14 is justified in Table 5, consisting of each number of ties connected with the vertices which have the highest, average and least values, the influential value derived from a combination of relational and influential metrics fed into SVBM. In Table 7, Wasserman_Faust dataset having the highest influential values throughout nearly all the vertices determining all the nodes are connected equally with stronger ties and the least value of 0.53 with only one vertex. Inferred from the above plots that nodes in the vertices having the higher relational values have a higher probability of becoming the influential node.
The plot matrix in Fig. 12 shows the different values for the various relational network datasets for the SVBM index. The graph shows that the values in the matrix denote the dataset, which is containing the
Plot matrix for SVBM index values.
best relational values inside the network. It is plotted based on the ratio of influential value to the probability value. From the matrix graph, proven that Zachary Karate club dataset is the best one denoting the relational values between the actors inside the network. The Zachary Karate club datasets attain the highest influential values for the relationships attained in-between the actors. The dataset has the depth data about the tie strength of the individual nodes. For the other datasets, the evaluation gave only from 0.7 to 0.9 in order with the relational data about the given nodes.
Relational datasets with Highest, Average, and Least influential values and the number of ties connected to them
From the results mentioned above, inferred that the detection of influential nodes from the community from a social network determined by the following criteria:
The influential value probability of the nodes inside the network ranges from 0.6 to 1. The datasets containing the more relational values tend to be influential. The probability of the node becoming influential is directly proportional to the relational tie strength between the nodes inside the community. User attributes such as behavior, trust, information sharing can influence the range of that user’s probability. SVBM index classifies the nodes with their similarity measures inclusion of the major properties of relational network nodes such as distance, node similarity results in an increase in the performance of classification shown through AUC. The major factor for a dataset more contiguous in nature lies in its eccentricity values. Clustering coefficient and Eigenvector centrality determine the power of neighborhood adaption of a node. Wasserman Faust dataset with 120 edges connecting 16 nodes contains the most influential information diffusion rate. Also suitable for obtaining results for multi-relational constraints.
An important criterion for maximizing information diffusion lies in the part of detecting the structural holes. The essentiality and the significance of finding the structural hole explained, a novel method for detecting the structural hole framed. The method outperforms the Support Vector Machine (SVM) in the performance of classification SVBM method with the observed nodal values which have the higher accuracy was obtained. Classification with a higher positive rate observed in comparison with SVM. The accuracy of the SVBM achieved 95% in the detection of the influential node. The values from the Bayesian calculation incorporated into regression line values. Both relational and influential values determine the influential node in a community, given this criterion, relational datasets evaluated for finding the influential gain of the datasets. It is found that Wasserman Faust dataset had detailed description for relational strengths between the nodes and Zachary resulted in the most suitable for influential values. Three major factors of measuring the influential nature of nodes done. Eccentricity, Clustering coefficient, and Eigen centrality were taken to verify the obtained results. Results were matching with the SVBM classification results revealing Wasserman Faust dataset containing more influence data in it. An increase in predicting the probability of the node done, which outperformed the avoidance in the outlier classification. Influential node prediction inside random networks enhanced in the future.
