Modeling dynamic social networks using concept of neighborhood theory

Abstract

Dynamic social network analysis basically deals with the study of how the nodes and edges and associations among them within the network alter with time, thereby forming a special category of social network. Geometrical analysis has been done on various occasions, but there is a difference in the approximate distances of nodes. Snapshots for social networks are taken at each time slot and then are bound for these studies. The paper will discuss an efficient way of modeling dynamic social networks with the concept of neighborhood theory of cellular automata. So far, no model that uses the concept of neighborhood has been proposed to the best of our knowledge and the literature survey. Besides cellular automata that has been important tool in various applications has remained unexplored in the area of modelling. To this extent the paper, is the 1st attempt in modelling the social network that is evolving in nature. A link prediction algorithm based on some basic graph theory concepts has also been additionally proposed for the emergence of new nodes within the network. Theoretical and programming simulations have been explained in support to the model. Finally, the paper will discuss the model with a real-life scenario.

Keywords

Dynamic social network social networks link prediction neighborhood theory cellular automata

1. Introduction

A collection of associations among individuals is defined as a social network. The nature of how these individuals, commonly referred to as social entities, interact with one another is explained by social networking. Most of the social networks are dynamic. Individual associations in such networks tend to develop, persist for a period of time, and then degrade [1]. SNA (social network analysis) is a process for investigating social systems using networks and graph theory. Graph theory depicts a network structure as nodes, which might be individual actors, people, or things inside the network, and ties, edges, or links, which define linkages or interactions that connect them [2].

Despite the fact that several aspects of “social” and “group” dynamics have been learned in the sociological literature [3, 4], various mathematical techniques used in SNA have concentrated upon graph-theoretic features of social networks without giving attention to the dynamic behavior that exists among them. The majority of social network models have focused on special procedures such as random walks, contagion, and percolation [5, 6]. During the past few years, some important work has been done in bridging the gap between SNA and dynamical systems, thereby conceptualizing novel approaches to Dynamical Social Network Analysis (DSNA) [7] and temporal or evolutionary networks [8].

While modeling the method that the network follows for its alteration with time, a great impact is shown on the entire network, defining the innovative extent of the productivity in understandability of the procedures that motivate the development of these social networks. The computation of all the paths in a graph representing a social network is basically expensive, making it less efficient. One of the possible solutions is the usage of spectral methods for embedding a graph within a geometric space and carrying out further applicable calculations surrounded by geometry [9].

Researchers have been using Automata theory for the development of methods for providing the description and analysis of dynamic behavior for discrete systems. A special category of automata referred to as Cellular automata has been widely applied in various social science applications beyond their geographic feature basically since they can be successfully applied in simulation. Within the field of ecology, biology and the domain of medical science also Cellular Automata models have proven to useful and succeeding. The conception of space-time representation of Cellular automata has been explored in anthropology for modelling the development of societies, political science and sociology for exploration of civil violence, economics for representation of the procedures of urban agglomeration. Within each of the cases of discussed applications, humans are involved and the social interaction among humans has been somewhat that is considered to be explored. This motivation has initiated authors in the development of the model based on the cellular automata. Within the field of ecology, biology and the domain of medical science also Cellular Automata models have proven to useful and succeeding.

Cellular automata are characterized as a simplified model of a spatially extended decentralized system comprised of many unique parts known as cells, form a type of dynamic model with discrete time, space, and state properties. They are also defined as a type of finite-state machine where every cell within the model constitutes a finite quantity of states. A cellular automaton represents mathematical models that have simple guidelines that govern the duplications besides the destruction, which makes them applicable in modelling complex systems that constitute simple units. The existence of simple rules in cellular automata makes them applicable in the modelling of self-organizing networks that evolve with time. The feature of parallel computation, being local and conformism, adds to its effectiveness in modelling dynamic networks with understandability of global behavior with complex phenomena of several systems (networks) [10].

The objective of this paper is two-fold. This paper is an attempt to develop a dynamic social network model using the concept of neighborhood of cellular automata. While defining the evolution of new nodes within the network with this notion, the authors have proposed a novel link prediction approach, that will choose new nodes within the network that are only within the Von-Neumann neighborhood.

The paper is organized as follows. In Section 2, the authors have made a discussion of the various literature available in the field of study. Further considerations in modeling dynamic social networks and some required concepts are illustrated throughout Sections 3 and 4. Section 5 includes a full description of the dynamic social network model as well as a discussion of the suggested link prediction algorithm. Section 6 discusses the theoretical and simulation results generated in implementation of the proposed model. Finally, the paper presents with a real-time example related to the proposed model and ends with a general conclusion and future study in the area.

2. Literature review

Quite a few studies have been done in the area of modeling the structure and the dynamic behavior of social networks in the past decade. A few such works have been briefly illustrated in the present section. Kumar et al. [11] did research on Node-based approaches, concentrating on the expansion of the structure of online social networks (Flickr and Yahoo 360!) and explaining them with the dynamics of invitations. On the basis of the outcomes generated, a model was deduced that discusses its process as a composition of two steps. The addition of nodes, edges, and non-fading prevailing associations is considerable within this model. Furthermore, Leskovec et al. [12] have conducted an analysis at a local level with different platforms. Among various factors effecting the dynamics of networks, it was evaluated that edge locality has a significant impact on the development of networks. The technique of maximum likelihood was utilised for the comparison of models that render the probability of engendering the experiential data. A model illustrating the development of a network has been developed where the dissemination of new nodes within a graph has been stated through an exponential or polynomial function. Toivonen [13] in his thesis targets the increment of understandability concerning the way huge social networks are arranged besides the ways in which these networks are clarified, which includes the following: 1) A model based on simple local machinery that will illustrate genuine community structure has been devised to explain the requirements of social network models. 2) A well-defined comparative analysis for social network models that would judge their workability with real-world data would be conducted. 3) Various conflicting options were studied that would focus on the development of communities that have still been unexplored.

Considering frequent pattern-based methods, Bringmann et al. [14] has offered a substitute method defining certain rules for modeling structural alterations within a network. The proposed method shall be to represent the data-set through the collapse of every graph to a single undirected graph, accompanied by the addition of time-slots with every association upon their first appearance within the network. In order to illustrate how a network evolves, an attempt was also made to find association rules among the frequent patterns of interaction (sub-graphs). The researchers developed GERM (Graph Evolution Rule Miner) to extract evolution rules from graphs, and it was tested on four real-world networks (Flickr, Y360, DBLP, and arXiv) with varying time periods. The authors of [15] thought about how node centrality affected how structure changed over time. Experimentation has been done on large networks of email, through the sampling of sub-graphs for each of the data intervals for a day. Porter and Smith [16] proposed a technique for representing the structure of large social networks utilizing the concept of ego- centered network neighborhoods in their study. Such a study provided a localized picture of the network with a focus on the vertices and their kth order neighborhoods, enabling the finding of interesting patterns with network properties that are typically missed during global network analysis. The approach was verified with several real-life scenarios and was found to be successful in determining the network behavior on a local level. Some additional approaches were presented for the usage of these concepts in the identification of unexpected abnormalities within dynamic networks.

Within [17], authors have characterized the performance of a network through the scrutiny of associations among the promising prevailing triads. With the intention of discussing the development patterns within a network, the derivation of a probabilistic model was done for the calculation of probabilities for transitions between triads of nodes. The mean values of the transition triad matrix TTM, which presents a probability of a given pattern ( $i$ ) at time $t$ transferring to another ( $j$ ) at some time, has been exploited for the prediction of links.

Some work on bio-inspired methods has been done by authors in [18]. They modeled the dynamics of the coauthor network using the forgetting curve and swarm intelligence. Professional relations were encouraged through social ant behavior. Investigations were conducted on a set of DBLP network authors’ assistances. In order to describe how email-based social networks develop, Budka et al. [19] employed a molecular-inspired socio-dynamical model as inspiration. Considerations were made regarding the formation and desertion of associations. Besides, alterations in relationship strength with time have also been studied. The research approach has been to study the social dynamics that prevail in various circumstances with some organizational tools that evolved from complexity theory [20]. Additionally, contemplating the fact of online social media serving as a platform for distinct expression in addition to public dialogue, it was estimated that the study, as multifaceted adaptive systems, will meaningfully subsidies understandability, prediction, and observation of social phenomena that occur in various online and offline social networks.

Structural changes within social networks have been discussed by Aouay et al. [21] with the classification of methodologies into three broad categories: methods that use node attributes information; ways for extracting various patterns inside a graph and modeling structural changes; and methods for simulating network growth over time using bio-inspired sources. Skillicorn et al. [9] investigated spectral techniques for modeling time-varying directed graphs. Snapshots for the network for every time period are destined to be joined together to form a single graph in such a way as to keep the structures associated with time. This global graph is thereafter said to be spectrally embedded. An observation is made on the similarity among certain set of nodes for tracking them over time, in such a manner that altering associations and clusters might be seen; with the conception that a meaningful trajectory is obtained for a node across time. There has been a demonstration of how these approaches are applied to comprehend how a social network changes as a result of internal dynamics and in response to cooperative law enforcement operations.

In the tutorial, the authors [8] intend to highlight concepts of control theory and the way it is applicable to social systems. It also engrossed classical prototypes of social dynamics and the way they are interrelated with the present achievements within multi-agent systems. The same authors have added a discussion of the most current comprehensive studies on MAS and complex networks. In addition to a study of the potential for control in social and techno-social systems, this paper places a strong emphasis on social processes that have arisen concurrently with MAS theory.

While modelling any dynamic social network, time has been an important constraint that has been explained in various ways by researchers around us. This time has been considered basically for the case of link prediction among the existing nodes within the network. Some focus is also made on whether the new nodes can come into picture. Considering the case of link production various temporal methods have been proposed. These temporal models are concerned with the fact that being provided with the link data from time 1 to $T$ whether the model can predict the links during further times i.e $T+1$ , $T+2$ , and so on. Considering the networks where links are evolving with a period of time, the model proposed within [25] concerns with the development of the network on the basis of network contents alongside the local or proximity information as a time function. Work on [26] considers time-awareness factor within the evolving social networks for modelling. The authors within [27], has made an integration of the three important characteristics like temporal information, community structure, and node centrality for modelling. Work as described in [28] is based on the combination of multi-domain topological features and as well as temporal dimension. While trying to simulate dynamic social networks, it was discovered that many diverse networks have been modeled using cellular automata, an extension of classical automaton theory. Some studies concerning the same have been discussed herewith.

The authors [22] have combined a discussion on the diverse categories of cellular automata that have been used in modeling with a discussion on the various analytical methods for prediction of global behaviour from local configurations. A detailed sketch for the configuration of local settings under a global situation was also illustrated.

By using a parameter that represents the dimension of the hypercube that unites neighboring cells, the authors [23] established a general neighborhood for a d-dimensional cellular automaton that ranges from Von Neumann’s to Moore’s neighborhood. After the study of finite hypercube and hypertoruses, calculations are made on the quantity of neighbors within the boundary and connections within cells. The method makes use of ad-hoc software, which creates a Petri net sketch for the hypercube and hypertorus models.

3. Essential considerations in modeling dynamic social network

In the present section, the authors shall make a brief overview on the set of concepts that are required in the development of the proposed model. Initially, some useful concepts that will be useful in defining network dynamics will be defined:

Local density. The segment of first neighbors within state A (B, AB) for an agent is referred to as local density of A (B, AB) is represented by $\sigma_{a}=(\sigma_{b},\sigma_{ab})$ .

Interface density. The degree of arrangement within systems is illustrated as the portion of links which join agents within diverse states. This segment is referred to as interface density. There is a decrement in this feature with the growth of homogenized domains in terms of size, which dissolves ultimately when either of the options triumphs. Within some consistent network topology, this factor provides an indication of average domain size, whereas within complex networks it illustrates rough domain growth like low interface density suggests a higher degree of organization. This factor is used in the study of the development of domains within individual understandings of stochastic dynamics, whereas average behavior for a system is illustrated as average interface density, and this average is considered for a collection of realizations commencing at dissimilar random preliminary circumstances [13].

Absorbing state. This state is reachable by interrelating agents whenever there is no further alteration of states by them, in addition to the dynamics halt. In some cases, agents that exist in a similar state are referred to as absorbing due to the nonexistence of neighbors within different states, such that the probability of node for alteration of states changes to zero.

Coarsening. This refers to the formation and development of reliable domains and is demonstrated with the lessening of interface density since it bears a resemblance to the growth of the extents of an average domain.

Meta-stable states. This state is referred to as a state within a dynamic system that exists for longer duration’s, although such a state has not been stretched by the system as an absorbing state. There are various types of meta-stable states, such as dynamical and trapped [13].

While explaining the link prediction algorithm, some of the basic features of the graph have been used. These features are illustrated as under:

Degree. For an undirected graph, the degree of any vertex ‘ $v$ ’ is defined as the number of vertices that are adjacent to the node ‘ $v$ ’. In simple words, the degree is computed by number of edges that are incident on that vertex. It is denoted by deg( $v$ ) and can have the value within the range of 0 to $n-1$ , where n is the total number of vertices in the given graph. 1 being subtracted because of the fact that a node can form an edge with a self-loop. But if the graph is directed there can be two types of degrees namely, indegree and outdegree. Indegree for any vertices ‘ $v$ ’ of a given graph denoted by deg $-$ ( $v$ ) denotes the number of edges that are coming to the vertex ‘ $v$ ’. Outdegree for v denoted by deg $+$ ( $v$ ) defines number of outgoing edges from the vertex ‘ $v$ ’.

Neighbors. A node $v$ is said to be a neighbor of $u$ if they are directly connected with a single edge. If $G=(V,E)$ defines a directed Graph with $V$ set of vertices and $E$ set of edges, then vertex ‘ $u$ ’ is an in-neighbor of vertex ‘ $v$ ’ if there exists an edge while if then ‘ $u$ ’ is an out neighbor of ‘ $v$ ’.

Common neighbor. Let ‘ $u$ ’ and ‘ $v$ ’ are two non-adjacent nodes in a given graph. Another vertex ‘ $w$ ’ within the same graph is called a common neighbor of ‘ $u$ ’ and ‘ $v$ ’ if they are both connected with ‘ $w$ ’ through a single edge i.e. they are adjacent.

Average degree of the network. If the Graph is considered to be an undirected one, then the average degree for the network is defined as summation of all the individual degrees of each node divided by the total number of nodes in the network. This criterion is often used in measuring the comparison between the number of edges and nodes a graph has.

Maximum path length: If the graph is a weighted connected graph and there exists ‘ $n$ ’ number of paths (through any number of nodes) between two nodes ‘ $s$ ’ and ‘ $t$ ’, then the path which has maximum weight will be considered as having maximum path length between two nodes ‘ $s$ ’ and ‘ $t$ ’.

Path with maximum traversed node: If there exist ‘ $n$ ’ number of ways to reach a node ‘ $v$ ’ from node ‘ $u$ ’, then among $n$ paths the path which passes through maximum number of nodes of the graph is referred to as the path with maximum traversed node.

4. Preliminaries in DSN modeling

4.1 Defining dynamic social networks

Considering $G=(V,E)$ as undirected, unweighted graph demonstrating a social network consisting of $N$ nodes and $M$ edges. Also considering $C=(C_{1},C_{2},C_{3},\ldots C_{k})$ as a as disjoint segregation of $V$ , where $C_{i}\in C$ denotes a community of $G$ . Every vertex, $u\in V$ denotes its degree, community contained within u and group of its neighboring communities as $d_{u},C(u)$ and $\textit{NC}(u)$ respectively. Every $S\subset V$ , $m_{s}$ , $d_{s}$ and $e_{s}^{u}$ shall indicate amount of links within $S$ , entire degree of vertices within $S$ , and the sum of influences from $u$ to $S$ , respectively.

With the considerations taken, $G^{S}=(V^{S},E^{S})$ shall indicate time conditional network snapshot captured at a discrete time $s$ . $\Delta V^{S}$ and $\Delta E^{S}$ shall illustrate set of vertices and links that are presented (or detached) at the specific time $s$ . Therefore, $\Delta G^{S}=(\Delta V^{S},\Delta E^{S})$ shall designate the alterations within the entire network with time. Subsequent network snapshot $G^{S+1}$ is defined as $G^{S+1}=G^{S}\cup\Delta G^{S}$ . Therefore, dynamic social network $g$ is referred as an arrangement of network snapshots that develops with time: $g=(G^{0},G^{1},G^{2},\ldots,G^{S},\ldots)$ .

4.2 Objective function

In order to enumerate an eminence of any detected network communal arrangement, an extensively recognized portion referred as modularity $Q$ [11] is used, which is illustrated as

$\displaystyle Q=\sum\limits_{c\in C}\left(\frac{m_{c}}{M}-\frac{d_{c}^{2}}{4M^% {2}}\right)$ (1)

Usually, $Q$ denotes the portion of the entire links within communities in which values are less than expected value for the same quantity within graphs for the nodes that consist of similar degrees, although there is a random distribution of links with higher modularity of $Q$ . This higher modularity value provides a measurement of an enhanced network community structure.

4.3 Problem definition

Provided with a dynamic social network $g=(G^{0},G^{1},G^{2},\ldots,G^{S},\ldots)$ where $G=(V,E)$ denotes the original network whereas $G^{0},G^{1},G^{2},\ldots,G^{S}$ provides an indication of network snapshots obtained through $\Delta G^{0},\Delta G^{1},\ldots\Delta G^{S}$ , the aim of this work is to develop a model based on the evolution or detachment of new nodes or links. With this, authors also have focused on the development of a link prediction algorithm that would be required for development of links between the existing nodes for the network.

4.4 Methods

A Dynamic Social Network replicates the alterations within a social network where the underlying network structure is frequently updated with the insertion and removal of nodes or edges. The introduction or removal of a certain group of nodes (or edges) over time can really be divided into a number of simultaneous insertions and deletions of nodes (or edges). This claim makes it easier to think of network changes as a collection of little operations, where each operation can be one of newNode, removeNode, newEdge, or removeEdge. These events are explained as under:

newNode: Familiarization of a new node u in-cooperation with its accompanying edges. u can come into existence with only a single edge connected to the existing network nodes. The choice of u should be within the von Neumann neighborhood of the existing nodes for the network.

removeNode: The removal of a node u and its subsequent associated edges from the network.

newEdge: Insertion of a new edge that connects two prevailing nodes comes into existence. The Link Prediction Algorithm, which is explained in the following section, will be used to select the node for link formation.

removeEdge: Removal of an existing edge e from the existing network [24].

4.5 The concept of neighborhood

With the consideration of the time dependent un-directed binary graphs $G_{t}=(V_{t},E_{t})$ , where $V_{t}$ denotes the set of $N_{t}$ vertices and $E_{t}$ denotes $M_{t}$ edges at time $t$ . Representation of the total graphs can be made through an adjacency matrix $A_{t}={A_{t}(i,j)}_{ij\in V_{t}}$ where $A_{t}(i,j)=1$ if $(i,j)=E_{t}$ i.e. there is an existence of an edge between vertices $i$ and $j$ and 0 otherwise.

Considering $d_{t}(u,v)\in{0,1,\ldots}$ as the shortest path distance among the vertices u and v at t. For a vertex, the $k_{\text{th}}$ -order neighborhood is demarcated as a collection of vertices that are at a distance of $k$ . In a formal manner, $N_{t}[v,k]={u\in V_{t}:d(u,v)\leqslant k}$ defines the $k_{\text{th}}$ order adjacent neighborhood comprising a collection of vertices in the surrounding also consisting of vertex $v$ and time $t$ .

The graph $g(N_{t}[v,k])$ demonstrates the neighborhood sub-graph, that was persuaded by neighborhood and comprising of vertices defined as $N_{t}[v,k]$ and edges defined as $e_{t}(i,j)\in E_{t}:i,j\in N_{t}[v,k]$ . Contrarily, $n_{t}[v,k]\in|N_{t}[v,k]$ defines the quantity of edges within the neighborhood. Thereby the neighborhood definition generates a collection of sub-graph denoted as $G_{t}={g(N_{t}[v,k]):v\in V_{t},k=0,1,\ldots}$ for every time period. Some vertex dependent neighborhoods are illustrated in Fig. 1.

Figure 1.

The darkened edges and vertices correspond to the elements in the network neighborhood anchored at vertex 6. The 3rd order neighborhood (i.e. $k=$ 3) corresponds to the full network.

Neighborhood statistics $S_{t}(v,k)\equiv S(g(N_{t}[v,k]))$ is calculated for every graph that will describe kth order neighborhood surrounded by vertex $v$ [16].

4.6 Neighborhoods in cellular automata

Considering an infinite integer lattice of dimension $d$ , comprising of nodes with coordinates $\vec{i}=(i_{1},i_{2},\ldots,i_{d}),i_{j}\in 1\leqslant j\leqslant d$ . Cellular automata [10], defines these nodes as cells where every cell has a unit-sized $d$ -hypercube having its center located at the coordinate $\vec{i}$ . For the study of neighborhoods for a cell methodically some definitions of distances within multidimensional space shall be considered:

Minkowsky distance: $L^{p}(\vec{i}^{\prime},\vec{i})=(\sum_{j}(|\vec{i}^{\prime},\vec{i}|)^{p})^{1/p}$

Manhattan distance: $L^{1}(\vec{i}^{\prime},\vec{i})=(\sum_{j}(|\vec{i}^{\prime},\vec{i}|))$ .

Minkowsky distance is generally represented as $L^{p}$ regarding parameter $p$ whereas remaining distances are considerable its specific cases: Manhattan or ‘taxicab’ distance represented as $L^{1}$ , is attained when $p=1$ whereas Chebyshev distance represented as $L^{\infty}$ is generally attained at $p\to\infty$ .

Moore’s neighborhood for a cell $\vec{i}$ is defined as a collection of cells that are located at a Chebyshev distance of 1 whereas von-Neumann’s neighborhood defines a collection of cells that are located at Manhattan distance of 1, from $\vec{i}$ . Figure 2 demonstrates the associations of cells through the sides of squares within von Neumann’s neighborhood, whereas through the sides and vertices of squares in the case of Moore’s neighborhood.

A square signifies specific cases of a 2-dimensional hypercube, concisely defined as a 2-cube. Though previously known for the 2-dimensional case, Von Neumann’s neighborhood is generalized into a d-dimensional instance that is clear-cut and employs a principle where the value changes for a single coordinate. In two dimensions, “two coordinates” also denotes “all coordinates”, which leads to uncertainty in Moore’s neighborhood’s generalization. In a $d$ -dimensional instance, the meaning of “any set of coordinates” is sometimes sacrificed for simplicity. Altering less coordinates, on the other hand, denotes a neighborhood where the quantity of coordinate changes is a parameter.

Surfaces of finite $d$ -dimensional hypercube comprises of $2d$ facets that has ( $d-1)$ -dimensional hypercubes and all of them embraces 2( $d-1$ ) facets that are ( $d-2$ )-dimensional hypercubes and so on. In conclusion, $2^{d}$ 0-dimensional hypercube exist. Association of cells only occurs through facets within von Neumann’s neighborhood in ( $d-1$ )-dimensional hypercube, whereas the association among cells occurs through bounds within Moore’s neighborhood within ( $d-j$ )-dimensional hypercube, where $1\leqslant j\leqslant d$ [23].

Figure 2.

Classical neighborhoods (2-dimensional case): a) von Neumann’s neighborhood; b) Moore’s neighborhood.

5. Proposed model

The proposed model of Dynamic Social Network has been illustrated in this section. Toward the maximum of our understanding as well as the review of literature, no framework that makes use of the notion of neighborhood has yet been provided. Other than cellular automata, which has been a crucial tool in many applications, modelling has not been fully investigated. This study represents the first attempt to simulate a social network that is naturally developing.

Figure 3 as illustrated below shall discuss the proposed model.

Figure 3.

Model of the proposed system.

The above model can be distinctly separated into various phases:

Initialization phase

Development phase

Termination phase

During the initialization phase, the nodes and edges are defined. A set of nodes that reside within a particular dimension of a hypercube are taken into consideration for the setting of links within a network. At the same time, the number of nodes and edges are counted and the counter values for nodes and edges are set accordingly. The initial values for the counter are considered at time $t=0$ . With this counter values the network proceeds on to the next phase.

During the development phase, there can be three possibilities. New nodes can come into existence within the hypercube, allowing development of links with the existing nodes and the development of new nodes and links at the same time. The three options can all show up simultaneously or separately. In each of these three cases, there is a change in the counter value. Time $n$ is noted at this moment. The existence of the new links takes place with the application of link prediction algorithms as illustrated below. In the same fashion, time is noted for each of the cases whenever there is a change in the counter value of node edges or both. The proposed link prediction algorithm DN-LPA that shall discussed below will integrate some of the basic features of graph generation which has not been considered by various researchers till date.

[ht] Proposed Degree and von-Neumann neighborhood based Link prediction algorithm (DVNN-LPA)predictLinkFormationDeclare and Initialize $G_{i}=(V_{i},E_{i})$ $n=$ total number of nodes within graph and Rectangular Grid of Cells of order $=$ $\sqrt{n}$ Calculate the Manhattan Distance of every $V_{i}$ within $G_{i}$ . selected node, $V_{k}$ having highest von-Neumann Neighbors, with $\textit{Deg}(V_{k})\leqslant\frac{\sum\textit{Deg}(V_{i})}{n}$ , where $n$ is total number of nodes in the network find neighbor nodes ${V_{l},V_{m},V_{n},V_{o}}$ within von-Neumann neighborhood (having maximum Manhattan Distance from $V_{k}$ ) and where Edge ( $E_{k}$ ) does not exist select a node among ${V_{l},V_{m},V_{n},V_{o}}$ to form a link whose $E_{k}$ weight is least from $V_{k}$ . select a node among ${V_{l},V_{m},V_{n},V_{o}}$ to form a link on the basis of their degrees $\textit{Deg}(V_{i})$ Require: calculate the degree $\textit{Deg}(V_{i})$ of each node Require: choose the node $V_{k}$ for link with $\textit{Deg}(V_{k})\leqslant\frac{\sum Deg(G_{i})}{n}$ two or more node with same $\textit{Deg}(V_{k})$ exists Require: compute the mean ( $\textit{Deg}(V_{k})$ ) as $\textit{Mean}(\textit{Deg}(V_{k}))=\frac{\sum V_{m}}{m}$ for set of ‘ $m$ ’ nodes whose $\textit{Deg}(V_{i})\leqslant\frac{\sum\textit{Deg}(V_{i})}{n}$ , i.e degree for node is less than average degree of nodes for the entire network select single node $(V_{k})$ for link formation continue to step 7 two unconnected nodes, $V_{u}$ and $V_{v}$ Require: Find the ways in which $V_{v}$ is reachable from $V_{u}$ there exist maximum ways to reach $V_{v}$ from $V_{u}$ u and v are selected for link formation there exist two $V_{v}$ ’s reachable from $u$ with same number of ways select $u$ to $v$ path having maximum path length, $E_{\textit{max}}=E_{w}+E_{y}+E_{z}+E_{v}$ , if u has traversed through $w$ , $y$ , $z$ to reach $v$ . select $u$ to $v$ having maximum traversed node as $V_{\textit{travs}}=\textit{count}(\textit{max}(V_{i}))$ from the nodes, which satisfy the criteria 5, 6 and 7, select the node $V_{l}$ having highest manhattan distance from the $V_{k}$ i.e node from where link is to be formed no node $V_{l}$ of highest manhattan distance exists select the node $V_{l}$ of next higher manhattan distance in the next iteration two nodes $V_{l}$ having same manhattan distance exists select node for link formation having least $\textit{Deg}(V_{l})$ calculate the edge length $(E_{l})$ Calculate the path length to reach the node $V_{l}$ from $V_{k}$ through its selected neighbors. This entire process is carried out at time $t=1$ . repeat the process for selection of nodes for Link Establishment (nodes with next highest number of von-Neumann neighbors) at successive time slots, goto step 3 repeat the same process for every node $V_{i}$ having next higher Manhattan distance upto $\sqrt{n-1}$ . repeat entire process to recheck if any nodes is left for link formation starting with node having highest Manhattan distance and proceed to the next lower one.

The next phase, i.e., the termination phase, appears after the continuation of the previous phase for a prolonged period. In this phase, there are no new nodes or edges developing for some time, and the counter value becomes constant for some time. This can be either because all the nodes have perished within the hypercube and there are no further chances of the new node appearing or because all the edges have been set up between the existing nodes and there are no further chances of the setting of a new edge with the application of the above algorithm. The proposed algorithm is better the the earlier algorithm in terms of consideration of the factors for link generation as this algorithm considers all the possible factors to be considered for link generation.

6. Results and discussions

Within the present section, the authors shall be describing the proposed model and the Degree and Neighbor based algorithm. In the 1st part of the section a verification will be made taking into consideration of a random graph of 20 nodes. In the second part of the section, real life data sets will be considered and will be validated on the algorithm. Comparison results with earlier approaches have been drawn in the last part of this section.

6.1 Theoretical results

The random graph has been considered and checked on whether link formation is done on the set of nodes with the application of the above algorithm. A set of 20 nodes within a two-dimensional hypercube is considerably explained with graph paper ( $t=1$ ). Selection of the nodes for link formation is only made among the Von-Neumann neighbors for each of the nodes. Selection of nodes for link formation is done on the basis of the nodes with the highest number of neighbors and preceded further in decreasing order. In this section, the results are discussed for two different scenarios.

Scenario 1: The 2nd criteria, as illustrated in the algorithm, i.e., the degree of the selected neighbor for link formation $\leqslant$ degree of the entire graph, has been considered as an essential but not mandatory criteria for link formation. Phase wise illustration of the formation of Dynamic Social Networks has been illustrated in Fig. 4.

Figure 4.

Phase wise illustration of the formation of new Edges within Dynamic Social Networks according to Scenario 1.

Phase 1 ( $t=2$ ) i.e nodes with 4 neighbors For node 6, there exist 4 neighbors namely 2, 5, 7, 10 and edge exist with 2 and 7, each with path length 2. The node 5 and 10 is reachable from 2 and 7 respectively. Degree of 10 (2) $\leqslant$ avg Degree of the network i.e ((1 $+$ 2 $+$ 1 $+$ 2 $+$ 3 $+$ 4 $+$ 4 $+$ 3 $+$ 2 $+$ 2 $+$ 2 $+$ 1 $+$ 1 $+$ 3 $+$ 3 $+$ 2 $+$ 2 $+$ 3 $+$ 2 $+$ 1)/20 $=$ 2.2) Path length to 10 through 7 minimum (2 $+$ 3 $=$ 5 $\leqslant$ 2 $+$ 4 $=$ 6). Therefore, 10 is chosen form link formation. Edge length is 2 $+$ 3 $=$ 5.

For node 7, there exist 4 neighbors namely 3, 6, 8, 11 and edge exist with 6 and 8, one with path length 2, other with path length 3. Only 11 is reachable from 8. Degree of 11 (2) $\leqslant$ avg degree of the network i.e (1 $+$ 2 $+$ 1 $+$ 2 $+$ 3 $+$ 5 $+$ 4 $+$ 3 $+$ 2 $+$ 3 $+$ 2 $+$ 1 $+$ 1 $+$ 3 $+$ 3 $+$ 2 $+$ 2 $+$ 3 $+$ 2 $+$ 1)/20 $=$ 2.3. Path length to 11 through 8 only exists (3 $+$ 2 $=$ 5). Therefore, 11 is chosen form link formation. Edge length is 3 $+$ 2 $=$ 5.

For node 10, there exist 4 neighbors namely 9, 6, 11, 14 and edge exist with 6 and 14, one with path length 5, other with path length 6. 9 and 11 is reachable from 6 and 14 respectively. Degree of 11 (3) $\leqslant$ avg degree of the network (2.4). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. Path length to 11 through 14 minimum (6 $+$ 2 $=$ 8 $\leqslant$ 5 $+$ 4 $=$ 9). Therefore, 11 is chosen form link formation. Edge length is 6 $+$ 2 $=$ 8.

For node 11, there exist 4 neighbors namely 7, 10, 12, 15 and edge exist with 7 and 10, one with path length 5, other with path length 8. Node 15 is reachable only through two paths one through 10 and 14, other through 14. Degree of 15 (3) $\leqslant$ avg degree of the network (2.5). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. Path length to 15 through 14 minimum (2 $+$ 7 $=$ 9 $\leqslant$ 8 $+$ 6 $+$ 7 $=$ 21). Therefore, 15 is chosen form link formation. Edge length is chosen as 2 $+$ 7 $=$ 9 which is minimum.

For node 14, there exist 4 neighbors namely 10, 13, 15, 18, and edge exist with 10 and 15, one with path length 6, other with path length 7. 18 is reachable from 15. Degree of 18 (3) $\leqslant$ avg degree of the network (2.6). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. Therefore, Path length to 18 through 15 is taken. Edge length is 7 $+$ 4 $=$ 11.

For node 15, there exist 4 neighbors namely 11, 14, 19, 16, and edge exist with 11, 14 and 16, with path length 9, 7 and 4 respectively. Its neighbor 19 is not reachable 11, 14 or 16. Therefore, No node is selected for edge formation.

Phase 2 ( $t=3$ ): nodes with 3 neighbors For node 2, there exist 3 neighbors namely 1, 6, 3, and edge exist with 6, with path length of 2. Degree of 3 (1) $\leqslant$ avg degree of the network (2.6). Path length to 3 through 6 is taken. Edge length is 2 $+$ 4 $=$ 6.

For node 3, there exist 3 neighbors namely 2, 7, 4, and edge exist with 2, with path length of 6. Degree of 7 (5) $\leqslant$ avg degree of the network (2.8). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. To reach 7 from 3 we have 2 paths, (one through 2 and 6, other through 6). Path through 6 (4 $+$ 2 $=$ 6 $\leqslant$ 6 $+$ 2 $+$ 2 $=$ 10) is minimum. Therefore, 7 is chosen for link formation with path length 6 Path length to 7 through 6 is taken. Edge length is 2 $+$ 4 $=$ 6.

For node 5, there exist 3 neighbors namely 1, 6, 9, and edge exist with 9 and 1, with path length of 2 and 4 respectively. Degree of 6 (5) $\leqslant$ avg degree of the network (3). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. Therefore, 6 is chosen for link formation. Edge length is 2 $+$ 4 $=$ 6.

For node 8, there exist 3 neighbors namely 4, 7, 12 and edge exist with 4 and 7, with path length of 5 and 3 respectively. 12 is not reachable from 7 in any means. Therefore, No node is chosen for link formation.

For node 9, there exist 3 neighbors namely 5, 10, 13 and edge exist with 5, with path length of 2. Edge is possible with 10. Degree of 10 (4) $\leqslant$ avg degree of the network (3.2). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. To reach 10 from 9 we have 2 paths, (one through 5 and 6, other through 6). Path through 6 (4 $+$ 5 $=$ 9 $\leqslant$ 2 $+$ 6 $+$ 5 $=$ 13) is minimum. Therefore, 10 is chosen for link formation. Edge length is 4 $+$ 5 $=$ 9.

For node 12, there exist 3 neighbors namely 8, 11, 16 and edge exist with 16 with path length of 6. Edge possible with both 11 and 8. Degree of 11 (5) $\leqslant$ avg degree of the network (3.4). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. Degree of 8 (3) $\leqslant$ avg degree of the network (3.4). To reach 11 from 12 we have 1 path, (through 16 and 15). To reach 8 from 12 we have 1 path, (through 16, 15, 11). Therefore, Path through 16, 15, 11 to reach 8 is chosen for link formation as it traversed maximum nodes, in addition it has satisfied the 2nd condition also. Edge length is 6 $+$ 4 $+$ 9 $+$ 2 $=$ 21.

For node 13, there exist 3 neighbors namely 9, 14, 17 and edge exist with 17 with path length of 6. Edge is possible with both 14 and 9. Degree of 14 (4) $\leqslant$ avg degree of the network (3.6). Degree of 9 (2) $\leqslant$ avg degree of the network (3.6). To reach 14 from 13 we have 1 path, (through 17 and 18). To reach 9 from 13 we have 1 path, (through 17, 18, 14, 10). Therefore, Path through 17, 18, 14, 10 to reach 9 is chosen for link formation as it traversed maximum nodes, in addition it has satisfied the 2nd condition also. Path length is 6 $+$ 2 $+$ 11 $+$ 6 $+$ 9 $=$ 34.

For node 16, there exist 3 neighbors namely 12, 15, 20 and edge exist with 12 and 15 with path length of 6 and 4 respectively. Edge is possible with 20. Degree of 20 (1) $\leqslant$ avg degree of the network (3.8). To reach 20 from 16 we have 1 path, (through 15, 18 and 19). Therefore, Path through 15, 18 and 19 to reach 20 is chosen for link formation. Path length $=$ 4 $+$ 4 $+$ 7 $+$ 5 $=$ 20.

For node 18, there exist 3 neighbors namely 17, 14, 19 and edge exist with all with path length – 2, 11 and 7 respectively. Therefore, No neighbor node is left for link formation.

For node 19, there exist 3 neighbors namely 15, 18, 20 and edge exist with 18 and 20 with path length – 7 and 5 respectively. Edge is possible with 15. Degree of 15 (4) $\leqslant$ avg degree of the network (4). The 2nd condition is not satisfied, since this is not an essential condition for link formation we proceed on to the next step for link formation. To reach 15 from 19 we have 2 path, (through 18 and 14) and (through 20 and 16). Therefore, Path through 18, 14 to reach 20 is chosen for link formation and path length is minimum (7 $+$ 11 $+$ 7 $=$ 25 $\leqslant$ 5 $+$ 20 $+$ 4 $=$ 29). Edge length is 7 $+$ 11 $+$ 7 $=$ 25.

Phase 3 ( $t=4$ ): nodes with 2 neighbors For node 1, there exist 2 neighbors namely 2, 5 and edge exist with 5 with path length of 4. Edge possible with 2. Degree of 2 (3) $\leqslant$ avg degree of the network (4.2). To reach 2 from 1 we have 2 path, (through 5) and (through 5 and 6). Therefore, Path through 5 to reach 2 is chosen for link formation and path length is minimum (4 $+$ 4 $=$ 8 $\leqslant$ 4 $+$ 6 $+$ 2 $=$ 12). Edge length is 4 $+$ 4 $=$ 8.

For node 4, there exist 2 neighbors namely 3, 8 and edge exist with 8 path length 5. Edge is possible with 3. Degree of 3(3) $\leqslant$ avg degree of the network (4.4). To reach 3 from 4 we have 2 path, (through 7) and (through 8 and 7). Therefore, Path through 7 to reach 3 is chosen for link formation and path length is minimum (6 $+$ 6 $=$ 12 $\leqslant$ 5 $+$ 3 $+$ 6 $=$ 14). Path length is 6 $+$ 6 $=$ 12.

For node 17, there exist 2 neighbors namely 13, 18 and edge exist with both of them path length 6 and 2 respectively. Therefore, No neighbor bode is left for link formation.

For node 20, there exist 2 neighbors namely 16, 19 and edge exist with both of them path length 20 and 5 respectively. Therefore, No neighbor bode is left for link formation.

Phase 4 ( $t=5$ ): nodes with 4 neighbors (rechecking if any node is left for link formation) For node 6, there exist 4 neighbors namely 2, 5, 7, 10 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 7, there exist 4 neighbors namely 6, 3, 8, 11 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 10, there exist 4 neighbors namely 6, 9, 11, 14 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 11, there exist 4 neighbors namely 7, 10, 12, 15 and edge exist with 7, 10 and 15 with path length 5, 8 and 9 respectively. Edge possible with 12. Degree of 12 (2) $\leqslant$ avg degree of the network (4.6). To reach 12 from 11 we have 3 path, (through 8), (through 7 and 8), (through 15 and 16). Therefore, Path through 15 and 16 to reach 12 is chosen for link formation and path length is minimum (9 $+$ 4 $+$ 6 $=$ 19 $\leqslant$ 2 $+$ 27 $=$ 29 $\leqslant$ 5 $+$ 3 $+$ 27 $=$ 35). Path length is 9 $+$ 4 $+$ 6 $=$ 19.

For node 14, there exist 4 neighbors namely 10, 13, 15, 18 and edge exist with 10, 15 and 18 with path length 6, 7 and 11 respectively. Edge possible with 13. Degree of 13 (2) $\leqslant$ avg degree of the network (4.8). To reach 13 from 14 we have 2 path, (through 10 and 9), (through 18 and 17). Therefore, Path through 18 and 17 to reach 13 is chosen for link formation and path length is minimum (11 $+$ 5 $+$ 6 $=$ 22 $\leqslant$ 6 $+$ 9 $+$ 37 $=$ 52). Path length is 11 $+$ 5 $+$ 6 $=$ 22.

For node 15, there exist 4 neighbors namely 11, 14, 16, 19 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

Phase 5 ( $t=6$ ): nodes with 3 neighbors (rechecking if any node is left for link formation) For node 2, there exist 3 neighbors namely 1, 6, 3 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 3, there exist 3 neighbors namely 2, 7, 4 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 5, there exist 3 neighbors namely 1, 6, 9 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 8, there exist 3 neighbors namely 4, 7, 12 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 9, there exist 3 neighbors namely 5, 10, 13 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 12, there exist 3 neighbors namely 8, 11, 16 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 13, there exist 3 neighbors namely 9, 14, 17 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 16, there exist 3 neighbors namely 12, 15, 20 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 18, there exist 3 neighbors namely 17, 14, 19 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 19, there exist 3 neighbors namely 15, 18, 20 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

Phase 6 ( $t=7$ ): nodes with 2 neighbors (rechecking if any node is left for link formation) For node 1, there exist 2 neighbors namely 2, 5 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 4, there exist 2 neighbors namely 3, 8 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 17, there exist 2 neighbors 13, 18 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 20, there exist 2 neighbors namely 16, 19 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

Since there is no more neighbor node left for link formation within the graph, the graph is complete.

Since there are no further development of the new edges within the graph after $t=5$ , we have not drawn the graph for those time slots.

Scenario 2: 2nd criteria as illustrated in the algorithm i.e indegree of the selected neighbor for link formation $\leqslant$ indegree of the entire graph, has been considered as an essential as well as mandatory criteria for link formation. Phase wise illustration of the formation of Dynamic Social Networks has been illustrated in Fig. 5.

Figure 5.

Phase wise illustration of the formation of new Edges within Dynamic Social Networks according to Scenario 2.

Phase 1 ( $t=2$ ): nodes with 4 neighbors For node 6, there exists 4 neighbors namely 2, 5, 7, 10 and edge exist with 2 and 7, each with path length 2. 5 and 10 is reachable from 2 and 7 respectively. Degree of 10 (2) $\leqslant$ avg degree of the network ((1 $+$ 2 $+$ 1 $+$ 2 $+$ 3 $+$ 4 $+$ 4 $+$ 3 $+$ 2 $+$ 2 $+$ 2 $+$ 1 $+$ 1 $+$ 3 $+$ 3 $+$ 2 $+$ 2 $+$ 3 $+$ 2 $+$ 1)/20 $=$ 2.2). Path length to 10 through 7 minimum (2 $+$ 3 $=$ 5 $\leqslant$ 2 $+$ 4 $=$ 6). Therefore, 10 is chosen form link formation. Edge length is 2 $+$ 3 $=$ 5.

For node 7, there exists 4 neighbors namely 3, 6, 8, 11 and edge exist with 6 and 8, one with path length 2, other with path length 3. Only 11 is reachable from 8. Degree of 11 (2) $\leqslant$ avg degree of the network (1 $+$ 2 $+$ 1 $+$ 2 $+$ 3 $+$ 5 $+$ 4 $+$ 3 $+$ 2 $+$ 3 $+$ 2 $+$ 1 $+$ 1 $+$ 3 $+$ 3 $+$ 2 $+$ 2 $+$ 3 $+$ 2 $+$ 1)/20 $=$ 2.3. Path length to 11 only exists (3 $+$ 2 $=$ 5). Therefore, 11 is chosen form link formation. Edge length is 3 $+$ 2 $=$ 5.

For node 10, there exists 4 neighbors namely 9, 6, 11, 14 and edge exist with 6 and 14, one with path length 5 and other with path length 6. 9 and 11 is reachable from 6 and 14 respectively. Degree of 11 (3) $\leqslant$ avg degree of the network (2.4). Edge length to 11 through 14 is therefore cannot be formed as 2nd condition is not satisfied.

For node 11, there exists 4 neighbors namely 7, 10, 12, 15 and edge exist with 7 and 10, one with path length 5, other with path length 8. 15 is reachable only through two paths one through 10 and 14, other through 14. Degree of 15 (3) $\leqslant$ avg degree of the network (2.4). Path length to 15 through 14 is therefore cannot be formed as 2nd condition is not satisfied.

For node 14, there exists 4 neighbors namely 10, 13, 15, 18 and edge exist with 10 and 15, one with path length 6 and other with path length 7. 18 is reachable from 15. Degree of 18 (3) $\leqslant$ avg degree of the network (2.4). Path length to 18 through 15 is therefore cannot be formed as 2nd condition is not satisfied.

For node 15, there exists 4 neighbors namely 11, 14, 19, 16 and edge exist with 11, 14 and 16, with path length 9, 7 and 4 respectively. Its neighbor 19 is not reachable 11, 14 or 16. Therefore, No node is selected for edge formation.

Phase 2 ( $t=3$ ): nodes with 3 neighbors For node 2, there exists 3 neighbors namely 1, 6, 3 and edge exist with 6 with path length of 2. Degree of 3 (1) $\leqslant$ avg degree of the network (2.4). Path length to 3 through 6 is taken. Edge length is 2 $+$ 4 $=$ 6.

For node 3, there exists 3 neighbors namely 2, 7, 4 and edge exist with 2 with path length of 6. Degree of 7 (5) $\leqslant$ avg degree of the network (2.6). To reach 7 from 3 we have 2 paths, (one through 2 and 6, other through 6). Path through 6 (4 $+$ 2 $=$ 6 $\leqslant$ 6 $+$ 2 $+$ 2 $=$ 10) is minimum. 7 is chosen for link formation with edge length 6. Path to 7 through 6 is therefore cannot be formed as 2nd condition is not satisfied.

For node 5, there exists 3 neighbors namely 1, 6, 9 and edge exist with 9 and 1 with path length of 2 and 4 resp. Degree of 6 (5) $\leqslant$ avg degree of the network (2.6). The 2nd condition is not satisfied, although this is not an essential condition for link formation. Path to 6 is therefore cannot be formed as 2nd condition is not satisfied.

For node 8, there exists 3 neighbors namely 4, 7, 12 and edge exist with 4 and 7 with path length of 5 and 3 respectively. 12 is not reachable from 7 in any means. Therefore, No node is chosen for link formation.

For node 9, there exists 3 neighbors namely 5, 10, 13 and edge exist with 5 with path length of 2. Edge possible with 10. Degree of 10 (3) $\leqslant$ avg degree of the network (2.6). To reach 10 from 9 we have 2 paths, (one through 5 and 6, other through 6). Path through 6 (4 $+$ 5 $=$ 9 $\leqslant$ 2 $+$ 6 $+$ 5 $=$ 13) is minimum. Path to 10 is therefore cannot be formed as 2nd condition is not satisfied.

For node 12, there exists 3 neighbors namely 8, 11, 16 and edge exist with 16 with path length of 6. Edge possible with both 11 and 8. Degree of 11 (4) $\leqslant$ avg degree of the network (2.6). The 2nd condition is not satisfied, although this is not an essential condition for link formation. Degree of 8 (3) $\leqslant$ avg degree of the network (2.6). To reach 11 from 12 we have 1 path, (through 16 and 15). To reach 8 from 12 we have 1 path, (through 16, 15, 11). Path through 16, 15, 11 to reach 8 therefore cannot be formed as 2nd condition is not satisfied.

For node 13, there exists 3 neighbors namely 9, 14, 17 and edge exist with 17 with path length of 6. Edge possible with both 14 and 9. Degree of 14 (4) $\leqslant$ avg degree of the network (2.6). Degree of 9 (2) $\leqslant$ avg degree of the network (2.6). To reach 14 from 13 we have 1 path, (through 17 and 18). To reach 9 from 13 we have 1 path, (through 17, 18, 14, 10). Path through 17, 18, 14, 10 to reach 9 is chosen for link formation as it traversed maximum nodes, in addition it has satisfied the 2nd condition also. Edge length is 6 $+$ 2 $+$ 11 $+$ 6 $+$ 9 $=$ 34.

For node 16, there exists 3 neighbors namely 12, 15, 20 and edge exist with 12 and 15 with path length of 6 and 4 respectively Edge possible with 20. Degree of 20 (1) $\leqslant$ avg degree of the network (2.8). To reach 20 from 16 we have 1 path, (through 15, 18 and 19). Therefore, Path through 15, 18 and 19 to reach 20 is chosen for link formation. Edge length is 4 $+$ 4 $+$ 7 $+$ 5 $=$ 20.

For node 18, there exists 3 neighbors namely 17, 14, 19 and edge exist with all with path length of 2, 11 and 7 respectively. Therefore, No neighbor bode is left for link formation.

For node 19, there exists 3 neighbors namely 15, 18, 20 and edge exist with 18 and 20 with path length of 7 and 5 respectively Edge possible with 15. Degree of 15 (4) $\leqslant$ avg degree of the network (3). The 2nd condition is not satisfied, although this is not an essential condition for link formation. To reach 15 from 19 we have 2 path, (through 18 and 14) and (through 20 and 16). Therefore, Path through 18, 14 to reach 20 therefore cannot be formed as 2nd condition is not satisfied.

Phase 3 ( $t=4$ ): nodes with 2 neighbors For node 1, there exists 2 neighbors namely 2, 5 and edge exist with 5 with path length of 4. Edge possible with 2. Degree of 2 (3) $\leqslant$ avg degree of the network (3). To reach 2 from 1 we have 2 path, (through 5) and (through 5 and 6). Therefore, Path through 5 to reach 2 is chosen for link formation and path length is minimum (4 $+$ 4 $=$ 8 $\leqslant$ 4 $+$ 6 $+$ 2 $=$ 12). Edge length is 4 $+$ 4 $=$ 8.

For node 4, there exists 2 neighbors namely 3, 8 and edge exist with 8 path length 5. Edge possible with both 3. Degree of 3 (3) $\leqslant$ avg degree of the network (3.2). To reach 3 from 4 we have 2 path, (through 7) and (through 8 and 7). Therefore, Path through 7 to reach 2 is chosen for link formation and path length is minimum (6 $+$ 6 $=$ 12 $\leqslant$ 5 $+$ 3 $+$ 6 $=$ 14). Path length is 6 $+$ 6 $=$ 12.

For node 17, there exists 2 neighbors – 13, 18 – edge exist with both of them path length 6 and 2 respectively. Therefore, No neighbor bode is left for link formation.

For node 20, there exists 2 neighbors – 16, 19 – edge exist with both of them path length 20 and 5 respectively. Therefore, No neighbor bode is left for link formation.

Phase 4 ( $t=5$ ): nodes with 4 neighbors (rechecking if any node is left for link formation). For node 6, there exists 4 neighbors namely 2, 5, 7, 10 and edge exist with 2, 7,10. Edge possible with both 5. Degree of 5 (3) $\leqslant$ avg degree of the network (3.4). To reach 5 from 6 we have 2 path, (through 2) and (through 2 and 1). Therefore, Path through 2 to reach 5 is chosen for link formation and path length is minimum (2 $+$ 4 $=$ 6 $\leqslant$ 2 $+$ 8 $+$ 4 $=$ 14). Path length is 2 $+$ 4 $=$ 6.

For node 7, there exists 4 neighbors namely 6, 3, 8, 11 and edge exist with 6, 8,11. Edge possible with both 3. Degree of 3 (3) $\leqslant$ avg degree of the network (3.6). To reach 3 from 7 we have 4 path, (through 6), (through 4), (through 6, 2) and (through 8, 4). Therefore, Path through 6 to reach 3 is chosen for link formation and path length is minimum (2 $+$ 4 $=$ 6 $\leqslant$ 2 $+$ 6 $=$ 8 $\leqslant$ 6 $+$ 12 $=$ 18 $\leqslant$ 3 $+$ 5 $+$ 12 $=$ 20). Edge length is 2 $+$ 4 $=$ 6.

For node 10, there exists 4 neighbors namely 6, 9, 11, 14 and edge exist with 6, 14. Edge possible with 9 and 11. Degree of 9 (3) $\leqslant$ avg degree of the network (3.8). Degree of 11 (4) $\leqslant$ avg degree of the network (3.8) 2nd condition is not satisfied again. To reach 9 from 10 we have 3 path, (through 6), (through 6, 5) and (through 14, 13, 9). Therefore, Path through 6 to reach 9 is chosen for link formation and path length is minimum (5 $+$ 4 $=$ 9 $\leqslant$ 5 $+$ 6 $+$ 2 $=$ 13 $\leqslant$ 6 $+$ 22 $+$ 34 $=$ 62). Edge length is 5 $+$ 4 $=$ 9.

For node 11, there exists 4 neighbors namely 7, 10, 12, 15 and edge exist with 7. Edge possible with 10 and 15. Degree of 10 (3) $\leqslant$ avg degree of the network (4). Degree of 15 (3) $\leqslant$ avg degree of the network (4). To reach 10 from 11 we have 3 path, (through 7), (through 7, 6), (through 14). To reach 15 from 11 we have 1 path, through 14. Path through 7 to reach 10 is chosen for link formation and path length is minimum (5 $+$ 3 $=$ 8 $\leqslant$ 2 $+$ 7 $=$ 9). Edge length is 5 $+$ 3 $=$ 8.

For node 14, there exists 4 neighbors namely 10, 13, 15, 18 and edge exist with 10, 15. Edge possible with 13 and 18. Degree of 13 (2) $\leqslant$ avg degree of the network (4.2). Degree of 18 (3) $\leqslant$ avg degree of the network (4.2). To reach 13 from 14 we have 1 path, (through 10 and 9). To reach 18 from 14 we have 1 path, (through 15). Therefore, Path through 18 through 15 is chosen for link formation as path length is minimum (7 $+$ 4 $=$ 11 $\leqslant$ 6 $+$ 9 $+$ 34 $=$ 49). Edge length is 7 $+$ 4 $=$ 11.

For node 15, there exists 4 neighbors namely 11, 14, 16, 19 and edge exist with 14 and 16. Edge possible with 11 and 19. Degree of 11 (4) $\leqslant$ avg degree of the network (4.4). Degree of 19 (2) $\leqslant$ avg degree of the network (4.4). To reach 11 we have 1 path, through 14. To reach 19 we have 2 paths, through 18, and through 14 and 18. Therefore, Path through 14 to reach 11 is chosen for link formation as its path length is minimum (7 $+$ 2 $=$ 9 $\leqslant$ 4 $+$ 7 $=$ 11 $\leqslant$ 7 $+$ 11 $+$ 7 $=$ 25). Edge length is 7 $+$ 2 $=$ 9.

Phase 5 ( $t=6$ ): nodes with 3 neighbors (rechecking if any node is left for link formation). For node 2, there exists 3 neighbors namely 1, 6, 3 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 3, there exists 3 neighbors namely 2, 7, 4 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 5, there exists 3 neighbors namely 1, 6, 9 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 8, there exists 3 neighbors namely 4, 7, 12 and edge exist with 4 and 7. Edge possible with 12. Degree of 12 (1) $\leqslant$ avg degree of the network (4.6). To reach 12 we have 2 path, (through 11, 15, 16) and (through 11, 14, 15, 16). Therefore, Path length is same in both cases. Either of the paths is chosen for link formation. Edge length is 2 $+$ 9 $+$ 4 $+$ 6 $=$ 21.

For node 9, there exists 3 neighbors namely 5, 10, 13 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 12, there exists 3 neighbors namely 8, 11, 16 and edge exist with 8 and 16. Edge possible with 11. Degree of 11 (5) $\leqslant$ avg degree of the network (4.8). 2nd criteria not satisfied, therefore, path can’t be chosen for link formation.

For node 13, there exists 3 neighbors namely 9, 14, 17 and edge exist with 9 and 17. Edge possible with 14. Degree of 14 (4) $\leqslant$ avg degree of the network (4.8). To reach 14 we have 2 path, (through 9, 10) and (through 17, 18). Therefore, Path through 17, 18 to reach 14 are chosen for link formation as its path length is minimum (6 $+$ 2 $+$ 11 $=$ 19 $\leqslant$ 34 $+$ 9 $+$ 6 $=$ 49). Edge length is 6 $+$ 2 $+$ 11 $=$ 19.

For node 16, there exists 3 neighbors namely 12, 15, 20 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 18, there exists 3 neighbors namely 17, 14, 19 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 19, there exists 3 neighbors namely 15, 18, 20 and edge exist with 18 and 20. Edge possible with 15. Degree of 15(4) $\leqslant$ avg degree of the network (5). To reach 15 we have 2 path, (through 18) and (through 20, 16). Path through 18 to reach 15 is chosen for link formation as its path length is minimum (4 $+$ 7 $=$ 11 $\leqslant$ 5 $+$ 20 $+$ 4 $=$ 29). Edge length – 4 $+$ 7 $=$ 11.

Phase 6 ( $t=7$ ): nodes with 2 neighbors(rechecking if any node is left for link formation) For node 1, there exists 2 neighbors namely 2, 5 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 4, there exists 2 neighbors namely 3, 8 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 17, there exists 2 neighbors namely 13, 18 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

For node 20, there exists 2 neighbors namely 16, 19 and edge exist with all of them. Therefore, No neighbor bode is left for link formation.

Phase 7 ( $t=8$ ): nodes with 4 neighbors(rechecking if any node is left for link formation) For node 6, 7, 10, 14, 15 there exists edge exist with all its immediate neighbors. Therefore, No neighbor bode is left for link formation.

For node 11, there exists 4 neighbors namely 7, 10, 15, 12 and edge exist with 7, 10 and 15. Edge possible with 12. Degree of 12 (2) $\leqslant$ avg degree of the network (5.2). To reach 12 we have 3 path, (through 8), (through 7, 8) and (through 15, 16). Therefore, Path through 15, 16 to reach 12 is chosen for link formation as its path length is minimum (9 $+$ 4 $+$ 6 $=$ 19 $\leqslant$ 2 $+$ 21 $=$ 23 $\leqslant$ 5 $+$ 3 $+$ 21 $=$ 29). Edge length is 9 $+$ 4 $+$ 6 $=$ 19.

Phase 8 ( $t=9$ ): nodes with 3 neighbors(rechecking if any node is left for link formation) For node 2, 3, 5, 8, 9, 12, 13, 16 there exists edge exist with all its neighbors. Therefore, No neighbor bode is left for link formation.

Phase 9 ( $t=10$ ): nodes with 3 neighbors(rechecking if any node is left for link formation) For node 1, 4, 17, 20, there exists edge exist with all its neighbors. Since there is no more neighbor node left for link formation within the graph, the graph is complete.

Since there are no further development of the new edges within the graph after $t=8$ , we have not drawn the graph for those time slots.

Comparison of the time wise edge development for the two scenarios has been illustrated in Fig. 6a. Comparison of the time wise total number of edges within the graph for the two scenarios has been illustrated in Fig. 6b.

Figure 6.

(a) Comparison of the total number of edges developed within the graph with time for two scenarios. (b) Comparison of the total number of edges developed within the graph with time for two scenarios.

Figure 7.

Variation of clustering coefficient for scenario 1 and 2.

The authors have found out the clustering co-efficient for the 20 nodes from the graph that was considered for the theoretical explanation of the above link generation algorithm in Fig. 7. It was found that although the value of $c_{i}$ for some nodes was initially low for initial time slots but as time progressed the value of $c_{i}$ has altered variably and thus for the final time slot indicated by the blue line, the $c_{i}$ value for almost all nodes has been found to be increasing than earlier time periods. The authors have also calculated the clustering co-efficient for the 20 nodes from the graph for scenario 2 where a special case has been considered. The same observation has been made where the value of $c_{i}$ has changed from being low for initial time slots to some higher value for the final time slot as indicated by the violet line. Observation was also made that the $c_{i}$ have has decremented in some cases than earlier value. This has resulted basically due to the evolution of new edges among the neighboring nodes with the application of the proposed algorithm.

Figure 8.

Comparison of change in clustering co-efficient value for two different scenarios.

In order to have a deeper insight into the value of clustering coefficient, the average $c_{i}$ for all 20 nodes for each time period was calculated in Fig. 8. It was observed that there is a stiff rise in $c_{i}$ value for scenario 1 as compared to scenario 2. Although the final $c_{i}$ value obtained was greater for scenario 1 than scenario 2. Since $c_{i}$ which is an important criterion in any graph has grown with time and as the algorithm has been applied time and again, therefore conclusion can be drawn to the fact that the algorithm is efficient in generation of new links.

Observation of degree centrality value for all 20 nodes for Fig. 9 in scenario 1 shows that the $dc_{i}$ value has incremented with time and has been found to be maximum for each node in t5 i.e final time slot for scenario 1. On the other hand, if we observe the degree centrality value for 20 nodes in scenario 2, the $dc_{i}$ value has been incremental with time and maximum value for every node is seen in t8.

Figure 9.

Comparison of change in Degree Centrality value for two different scenarios.

Figure 10.

Comparison of change in average centrality distribution for two different scenarios.

The same trend of incremental nature was observed for average centrality distribution for the successive time clots in both scenarios in Figure 10. The rise of the average degree centrality score was more for scenario 1 than for scenario 2 during initial time slots. This nature was observed till t5 i.e the time till scenario 1 was considered. After t5 the growth of the average centrality value for scenario 2 was not that much and was found to be almost constant during the final three time slots.

6.2 Simulation results

6.2.1 Description of the dataset

In order to implement the proposed algorithm and to check its effectiveness with the earlier approaches, the authors have used Stanford University dataset named as Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv that covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper $i$ cites paper $j$ , the graph contains a directed edge from $i$ to $j$ . If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months) [29, 30].

Since the data set is too large the authors have considered three different discrete portions for the implementation of the proposed DVNN-LPA. The first portion consist of the 1st 100 edges, the 2nd portion consist of the edges numbered from 17500 to 17620 while the last portion consist of the edges numbered from 34890 to 35000. Besides the dataset that has been considered consist of an un-directed graph without any edge length or weight thereby some portions of the algorithm that deals with the estimation of new edges on the basis of weights or edge length has not been considered. All the other notions as described within the algorithm has been taken into consideration for the implementation. Besides a square grid consisting of 10 $\times$ 10 matrices has been considered.

The Authors have also considered a real and toy network datasets i.e 26KeroNetwork [31] that maps the favor exchange network in a rural village in India. The network consist of 135 nodes and 251 weighted edges. This dataset is considered for the implementation of all the criterias that was not possible in implementing in the previous dataset. For the sake of simplicity, the Manhattan distance of 1 is only taken into consideration.

6.2.2 Comparison parameters used

Figure 11.

Demonstration of clustering coefficient.

For the purpose of comparison with the earlier approaches, the following parameters were taken into consideration:

Clustering coefficient, concerns to the fact that for any node how many neighbors are connected. Therefore, it can be defined as a degree to which neighbors of a given node are linked to each other. It is defined by the formula: $C_{i}=\frac{2xL_{i}}{k_{i}(k_{i}-1)}$ where $L_{i}$ is number of links between the neighbors of node $i$ .

Initially, in Fig. 11 Clustering Co-efficient for node $C$ is given as, CC $=$ 2 $\times$ 0 $/$ (4 $\times$ 3) $=$ 0. Further, when the red nodes are newly introduced, CC becomes $=$ 2 $\times$ 2 $/$ (4 $\times$ 3) $=$ 4 $/$ 12 $=$ 0.333. We can verify the fact that when all the nodes are fully connected i.e all nodes are connected to each other, then the value of the clustering co-efficient becomes 1. The value of Clustering co-efficient can only vary between 0 and 1.

Degree centrality distribution. The degree of a node is the number of neighbors that it has. The degree centrality is the number of neighbors divided by all possible neighbors that it could have. Depending on whether self-loops are allowed, the set of possible neighbors a node could have could also include the node itself.

Jaccard coefficient value which is calculated as the common neighbors divided by the total combined neighbor nodes.

The Adamic-Adar index that has been an important parameter which predicts links in a social network based on the number of shared links between two nodes. It is characterized as the total of the invertible logarithmic degree centrality of the two nodes’ common neighbors.

The term “preferential attachment” refers to the fact that the more tightly attached a node is, the more probable it’s going to obtain additional links. Higher degree nodes have a greater capacity for attracting links to the network.

The diameter of a graph is the greatest distance between two vertices. Average diameter measures the maximum distance between each pair of vertices in the graph.

A graph’s effective diameter deff (G) is the minimum number of hops required for 90% of all attached pairs of nodes to access each other.

The average amount of edges along the shortest route for all possible combinations of nodes in the network is described as the graph’s Average Path Length (APL). Determining $l$ for any general graph necessitates traversing the entire graph numerous times.

6.2.3 Output obtained

Figure 12.

Sample output for Successive time intervals obtained on implementing the proposed algorithm for the 1st 100 edges of dataset.

Arxiv HEP-TH dataset that has been considered is a huge dataset consisting of more than 35,000 edges, thereby taking the entire dataset and implementing the proposed DVNN-LPA is a huge task. Thereby to simplify the process, we have considered a collection of edges at three different instances. Initial 100 edges were considered primarily for implementation, thereafter two more instances i.e 120 edges from the middle of the dataset has been considerable. Finally, the 110 edges from the final part of the dataset has been considerable for study and analysis. The following screenshots in Fig. 12 shall demonstrate the sample output obtained on implementing the proposed algorithm for the 1st 100 edges on the Arxiv HEP-TH dataset. It will be seen that there is continuous development of edges obtained for each successive time slots. To verify the applicability of the algorithm various parameters have been extracted out for the successive time slots which will be demonstrated in due time.

The value of the different parameters for the 1st 100 edges obtained from the Arxiv HEP-TH dataset after algorithm implementation for six successive time slots, have been summarized in Table 1.

Table 1

Parameter values for 100 edges obtained from the 1st portion of the Arxiv HEP-TH dataset

Time slots	Jaccard coefficient	Adamic adar index	Pref. attachment	Degree centrality	Avg path length	Diameter	Avg. effective diameter	Clust. coefficient
$t=$ 0	0.7174	0.1848	1.8225	0.0224	2.1922	4	2.1053	0
$t=$ 1	0.7934	0.3895	5.8290	0.0428	1.9572	2	2.8838	0.9577
$t=$ 2	0.7647	0.5527	12.9231	0.0620	1.9380	2	3.6301	0.9552
$t=$ 3	0.7506	0.7134	21.5983	0.0802	1.9199	2	4.3420	0.9477
$t=$ 4	0.7400	0.8709	31.5957	0.0977	1.9024	2	5.0043	0.9401
$t=$ 5	0.7312	1.0258	42.8515	0.1147	1.8853	2	5.6282	0.9336

Figure 13.

Comparison of the average parameter values. (a) Jaccard coefficient, Adamic Adar index, degree centrality and clustring coefficient, (b) preferential attachment and (c) average path length, diameter, average effective diameter obtained while implementing the proposed algorithm for the 1st 100 edges of Arxiv HEP-TH dataset.

Based on the data of the parameters obtained, an analysis has been carried out. Figure 13 will demonstrate the fact.

Figure 14.

Comparison of Graph obtained while implementing DVNN-LPA for the middle 120 edges of Arxiv HEP-TH dataset.

Figure 13a demonstrates the fact that value of the Jaccard coefficient value is found to be incrementing for the 1st time slot and this has successively reduced for the subsequent time intervals. But the reduction in value was very minimum and is almost ignorable considering the fact that a large number of edges is being added in subsequent time intervals. The value for Adamic-Adar index is also found to be increasing in successive time intervals because as more and more links are added in the network, and graph becomes dense, thereby increasing the value. Figure 13b shows that Preferential attachment value with the attachment of additional links with each time interval, the value has increased. Figure 13c shows the increase in the average degree centrality value of all nodes within the network verifies the fact that each of the node has become dense in subsequent time intervals. A constant diameter of 2 is observed within the graph for each time slots. Since the graph has become dense with addition of edges in each time slot, this value is incremental in nature. The APL value for the graph has almost remained the same for each time slots as the graph has no deletion of edges. The clustering coefficient value has almost remained constant throughout the time intervals and has no effect on the overall configuration of the graph.

Table 2

Parameter values for 120 edges obtained from the middle portion of the Arxiv HEP-TH dataset

Time slots	Jaccard coefficient	Adamic adar index	Pref. attachment	Degree centrality	Avg path length	Diameter	Avg. effective diameter	Clust. coefficient
$t=$ 0	0.1200	0.0795	5.1872	0.0258	3.5728	6	2.3647	0.0539
$t=$ 1	0.1507	0.1495	13.4244	0.0425	2.7775	5	2.9386	0.8280
$t=$ 2	0.1279	0.1794	25.1271	0.0569	2.7253	5	3.2745	0.8164
$t=$ 3	0.1043	0.1930	38.3900	0.0694	2.7128	5	3.4586	0.8102
$t=$ 4	0.0879	0.2004	51.6830	0.0799	2.7023	5	3.5506	0.8083
$t=$ 5	0.0750	0.2033	64.5342	0.0890	2.6933	5	3.5875	0.8043

Figure 15.

In the second instance, middle 120 edges i.e within the edge 17500 to 17620 of Arxiv HEP-TH dataset has been considered for parametric analysis. Figure 14 has been obtained at initial $t=$ 0 and successive 5 time slots.

The parametric values from the dataset after algorithm implementation for six successive time slots, have been summarized in Table 2.

Based on the parameter data obtained, an analysis has been carried out. Following Fig. 14 will demonstrate the analytical comparisons being made.

Figure 15a demonstrates the fact that Jaccard coefficient value has initially increased but has slightly decremented in further time slots. Adamic Adar index has incremented in each of the time slots as new graphs are attached. The values of preferential attachment have stiffly increased subsequently in Fig. 15b. All other parameters that includes degree centrality, average effective diameter and clustering co-efficient has also found an increment in each of the time intervals due to the addition of new edges in the graph. Within Fig. 15c decrement was only found in the value of average path length value, although this decrement was not that much and is almost countable to be constant. Diameter has maintained a constant value of 5 in each of the time intervals.

In the final instance, 110 edges i.e within the edge 34890 to 35000 of Arxiv HEP-TH dataset has been considered for parametric analysis. The value of the different parameters for the last 110 edges obtained from the Arxiv HEP-TH dataset after algorithm implementation for six successive time slots, have been summarized in Table 3.

Table 3

Parameter values for 110 edges obtained from the final portion of the Arxiv HEP-TH dataset

Time slots	Jaccard coefficient	Adamic adar index	Pref. attachment	Degree centrality	Clust. coefficient
$t=$ 0	0.1528	0.0882	4.6393	0.0275	0.1470
$t=$ 1	0.2304	0.2052	12.2626	0.0474	0.9158
$t=$ 2	0.2304	0.2934	23.8140	0.0659	0.8942
$t=$ 3	0.1959	0.3337	40.2696	0.0831	0.8765
$t=$ 4	0.1724	0.3620	59.5694	0.0989	0.8671
$t=$ 5	0.1420	0.3706	80.9842	0.1131	0.8620

Based on the values of different parameters, analysis will be carried out. Following Fig. 15 will briefly summarize the comparisons being made.

Figure 16.

Comparison of the Average parameter values. (a) Jaccard coefficient, Adamic Adar Index, degree centrality and clustring coefficient. (b) Preferential attachment obtained while implementing the proposed algorithm for the 110 edges from last portion of Arxiv HEP-TH dataset.

Figure 17.

Comparison of the Total number of Edges developed while implementing the proposed algorithm for different instances of Arxiv HEP-TH dataset.

The graph above shows that in Fig. 16, the Jaccard coefficient value gradually increased but then declined slightly in subsequent time slots. As fresh edges are attached, the Adamic Adar index has increased across all time slots. Consequently, preferential attachment values have increased sharply. Most other parameters, such as degree centrality and clustering co-efficient, have increased across all time frames as a consequence of the addition of new edges to the graph.

The researchers have also made a comparison of the number of edges developed per time slot with the application of DVNN-LPA. Figure 17 shows that with each time slot, the number of edges being developed has decreased. This trend of decreasing nature of the total number of edges developed is viewed throughout the three different ranges. It will further decrease if more time intervals will be taken into consideration.

Table 4

Parameter values generated for subsequent time slots while implementing DVNN-LPA on Real and Toy Network dataset

Time slots	Jaccard coefficient	Adamic adar index	Pref. attachment	Degree centrality	Avg path length	Diameter	Avg. effective diameter	Clust. coefficient
$t=$ 0	0.0169	0.0689	16.7684	0.0346	4.5066	12	3.3616	0.2792
$t=$ 1	0.0176	0.0754	19.3749	0.0372	4.3120	11	3.5099	0.2974
$t=$ 2	0.0176	0.0756	19.6624	0.0375	4.3060	11	3.5071	0.3127
$t=$ 3	0.0176	0.0756	19.6624	0.0375	4.3060	11	3.5071	0.3127
$t=$ 4	0.0176	0.0756	19.6624	0.0375	4.3060	11	3.5071	0.3127

Figure 18.

Comparison of Graph obtained while implementing DVNN-LPA on Real and Toy Network dataset.

The proposed DVNN-LPA was also implemented on the Real and Toy Network Dataset and the outputs obtained for some subsequent time slots has been illustrated in Fig. 18.

On the basis of the Graph obtained in Fig. 18, an in-depth analysis is carried on the different parametric values and the following results as described in Table 4 was obtained.

Based on the values of different parameters, analysis will be carried out. Figure 19 will briefly summarize the comparisons being made.

Figure 19.

Comparison of the average parameter values. (a) Graphical parametr values. (b) Edge based parameters. (c) Preferential attachment obtained while implementing the proposed algorithm for the Real and Toy Network dataset.

Figure 19a shows that very small incremental nature was found within the values of the Jaccard Coefficient and Degree Centrality, while the increment was slightly higher for subsequent time slots in the values of Adamic Adar index and Clustering Coeffiecient. On comparison of the edge based parameter values in Fig. 19b, it was found that very very less decrement was found within the values of average path length and diameter while the average effective diameter values has shown a little incremental nature. The values of the preferential attachment has shown an incremental nature and finally comes to a constant value in the final time slots as shown in Fig. 19c.

6.3 Comparison of the results

In order to verify the appropriateness of the Proposed DVNN-LPA with earlier approaches, several comparisons were drawn. Within [7] comparison was done on the basis of Average diameter of the undirected network over time. The diameter of a graph is defined as the largest shortest path distance in the graph. In other words, it is the maximum value of $d(u,v)$ over all $u$ , $v$ pairs, where $d(u,v)$ denotes the shortest path distance from vertex $u$ to vertex $v$ .

Within [11], Average and effective diameter of the giant component of Flickr and Yahoo! 360 timegraphs, by week. A graph method that returns the (approximation of the) Effective Diameter (90-th percentile of the distribution of shortest path lengths) of a graph (by performing BFS from NTestNodes random starting nodes). This has been demonstrated in Fig. 20a, b and c respectively.

The increase in average effective diameter was observed for subsequent time slots throughout the time slots in both the ranges with an application of DVNN-LPA within Fig. 21a and this behavior is similar to the one that was observed in [7, 11]. The diameter also shows a uniform trend in subsequent time slots both for flickr and yahoo 360 and such a similarity was observed in DVNN-LPA where a uniform trend in diameter was observed for subsequent time slots in Fig. 21b. Therefore, a conclusion can be drawn that DVNN-LPA is very much efficient as compared to the earlier approaches.

Figure 20.

(a) Diameter obtained with the application of algorithm in [7]. (b) Average and Effective Diameter obtained for Flickr dataset. (c) Average and Effective Diameter obtained for 360 timegraph dataset[11].

Figure 21.

(a) Diameter and effective diameter obtained with the application of DVNN-LPA for the 1st 100 edges. (b) Diameter and effective diameter obtained with the application of DVNN-LPA for the 120 edges from the middle range.

Figure 22.

(a) Clustering coefficient and (b) degree centrality value for nodes as described for flickr dataset in [12]. Distribution of (c) clustering coefficient and (d) degree centrality value for initial 100 nodes for respective time slots with DVNN-LPA.

As illustrated in [12] ie Fig. 22a and b during first process (PA), selection of node is done preferentially, with probabilities proportional to their degrees for edge addition. Then during the next process (RR), random-random triangle-closing model has been used, where a node is selected at first preferentially and further a selection of a node that is located two hops away with the usage of random-random model is being done. While during the application of DVNN-LPA, selection of node is done within the Von-Neumann neighborhood having a Manhattan distance of 1. This process is repeated consequently for the selection of new nodes within the same neighborhood. Figure 22c and d has demonstrated the fact that how the clustering coefficient and degree centrality value has incremented for subsequent time slots for each of the nodes.

Figure 23.

(a) Comparison of new edges developed for different Dataset with GERM algorithm as described in [14]. (b) Comparison of new edges developed for three different Intervals of the ArXiv dataset with DVNN-LPA.

As discussed in [14] the Table within Fig. 23a provides data analysis on the target prediction time frame that has been used in the experiments based on three different data sets. The table specifically indicates the quantity of edges connecting two old nodes (that is, nodes that was present during the training period), one old and one new node (old-new), and two new nodes (new-new). The old-old categories involve cases which can be managed by both traditional link prediction as well as our framework; the new-old category involves cases that our method is capable of handling but traditional link prediction cannot. Neither the given GERM Algorithm framework nor the traditional link prediction framework is capable of handling the new category. As shown in the above Table within Fig. 23a, the old-new category accounts for a large proportion of freshly generated edges across all data sets. While comparing the results in Fig. 23b, it was found that the average growth rate of 36.5%, 27% and 30.4% respectively was found for each of the three different intervals for the Proposed DVNN-LPA while the results for dblp 92-02, flickr-month and flickr-week shows the growth rate was 33.3% for the specified period. The results were quite comparable and would have been further increased if long intervals were considered for the study in implementing DVNN-LPA.

7. Real time scenario

Let us consider the case of 20 students who are admitted to the first year of a Bachelor of Engineering course at a university. Their first day of university is considered time t=0. During the period, only two students were associated as they were from the same school. No other kind of association exists between the students as they are fully unknown to each other. Very few associations are found to exist among them at the end of their first semester ( $t=$ 1). Furthermore, associations tend to show a growing nature at the end of their second semester ( $t=$ 2). The nature of growth or development of association can be regarded as a function of mutual understanding, likes, dislikes, sharing common hobbies, and common interests, as these factors are primarily essential for the development of friendship among students. An additional 5 students have been admitted to the same course after $t=$ 2. So they appeared as a new set of nodes within the network. For $t=$ 3, i.e., by the end of the third semester, it was seen that two students opted out of the course for personal reasons. So there is a deletion of nodes during the time. This process of association and breaking of past associations continues to exist by the end of each semester, i.e. $t=$ 4, 5, 6, and 7. Some clusters can be found to exist within these students where some students from two different clusters can be associated by any means. Finally, by the end of the final semester, i.e., after $t=$ 8, it was observed that very rare associations exist among the students as they are all engaged either in full-time jobs or have devoted themselves to pursuing higher studies and have very little time to associate among themselves. This development of associations among nodes considered as students in this case depends upon whether the nodes with which associations will be made are the neighbors of the node. Besides, the association function must have a value of 1 for the association to come into existence.

8. Conclusion

Over the last two decades, social networks have gained enough popularity and have succeeded enough in order to create efficient communication among people. This usefulness has attracted researchers in their investigation of the possible ways of modeling social networks. Usage of static methods in modeling the dynamism of social networks is a challenge. The authors discuss the applicability of neighborhood theory for modeling dynamic social networks throughout the paper. As far as the knowledge of the authors in addition to the literature survey being presented within the paper, this model is the first attempt in modelling network with conception of neighborhood concept of cellular automata. This type of automata has been a significant tool in numerous applications and has remained to be ignored in the domain of modelling a network. This study shall present the initial attempt in simulating a developing network. Although various link prediction approaches have been discussed previously by various researchers considers time bound functionalities or various higher ordered features of graphs, but an integration of some of the basic graphical features with the neighborhood conception of cellular automata has not been done. To this an extent, the modelling and the link prediction approach that has been presented within this paper is novel in nature. It has also been shown that the proposed link prediction algorithm works better than the previously proposed approaches in terms of the applicability through theoretic and programming simulations. Future work regarding the same will involve the applicability of the method and the algorithm on the real-world dataset for observing the growth of the network for each year considering it as a larger snapshot that will include several smaller snapshots of whenever a new link appears. It has also been shown that the proposed link prediction algorithm works better than the previously proposed approaches in terms of the applicability of a real-life problem.

Footnotes

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Mitra

Paul

Panda

Padhi

. A Study on the representation of the various models Dynamic Social Networks. In: International Conference on Communication, Computing and Virtualization 2016 (ICCCV 2016). Mumbai; 2016. pp. 624-631.

Otte

Rousseau

. Social network analysis: a powerful strategy, also for the information sciences. Journal of Information Science. 2002; 28(6): 441-453.

Lewin

. Frontiers in group dynamics: Concept, method and reality in social science; social equilibria and social change. Human Relations. 1947; 1(5): 5-41.

Sorokin

. Society, culture, and personality: Their structure and dynamics, a system of general sociology. New York and London: Harper & Brothers Publ; 1947.

Newman

. The structure and function of complex networks. SIAM Review. 2003; 45(2): 167-256.

Newman

Barabasi

Watts

. The structure and dynamics of networks. Princeton University Press; 2006.

Breiger

Carley

Pattison

, editors. Dynamic social network modeling and analysis: workshop summary and papers. Washington, DC: National Acad. Press; 2003.

Proskurnikov

Tempo

. A tutorial on modeling and analysis of dynamic social networks. Part I, Annual Reviews in Control. 2017; 43: 65-79. ISSN 1367-5788.

Skillicorn

Zheng

Morselli

. Modeling dynamic social networks using spectral embedding. Social Network Analysis and Mining. 2014; 4: 182.

10.

Zhang

. Chapter 12: Cellular Automata. In: Fundamentals of Network Biology. 2018. pp. 229-236.

11.

Kumar

Novak

Tomkins

. Structure and evolution of online social networks. In: KDD’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA: ACM; 2006. pp. 611-617.

12.

Leskovec

Backstrom

Kumar

Tomkins

. Microscopic evolution of social networks. In: Li

Liu

Sarawagi

, editors. KDD’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2008. pp. 462-470.

13.

Toivonen

. Social Networks: Modeling Structure and Dynamics. Helsinki University of Technology, Dissertation for public examination and debate. 2009.

14.

Bringmann

Berlingerio

Bonchi

Gionis

. Learning and predicting the evolution of social networks. IEEE Intelligent Systems. 2010; 25(4): 26-35.

15.

Hill

Braha

. Dynamic model of time-dependent complex networks. Physical Review E. 2010; 82(4): 46-105.

16.

Porter

Smith

. Network neighborhood analysis. In: 2010 IEEE International Conference on Intelligence and Security Informatics. 2010. pp. 31-36.

17.

Juszczyszyn

Budka

Musial

. The dynamic structural patterns of social networks based on triad transitions. In: 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE Computer Society; 2011. pp. 581-586.

18.

Kudelka

Horak

Snsel

Krmer

Platos

Abraham

. Social and swarm aspects of co-authorship network. Logic Journal of the IGPL. 2012; 20(3): 634-643.

19.

Budka

Juszczyszyn

Musial

. Molecular model of dynamic social network based on e-mail communication. Social Network Analysis and Mining. 2013; 3(3): 543-563.

20.

Lymperopoulos

Lekakos

. Analysis of Social Network Dynamics with Models from the Theory of Complex Adaptive Systems. In: 12th Conference on e-Business, e-Services, and e-Society (I3E). IFIP Advances in Information and Communication Technology. AICT-399. Athens, Greece: Springer; 2013. pp. 124-140.

21.

Aouay

Jamoussi

Gargouri

Abraham

. Modeling dynamics of social networks: A survey. In: 2014 6th International Conference on Computational Aspects of Social Networks. 2014. pp. 49-54.

22.

Ganguly

Sikdar

Deutsch

Canright

Chaudhuri

. A Survey on Cellular Automata. In: Conference Proceedings. 2003.

23.

Zaitsev

. A generalized neighborhood for cellular automata. Theoretical Computer Science. 2016; 666: 21-35.

24.

Nguyen

Dinh

Shen

Thai

. Dynamic Social Community Detection and Its Applications. PLoS One. 2014; 9(4): e91431.

25.

Gao

Denoyer

Gallinari

. Temporal link prediction by integrating content and structure information. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM; 2011 October. pp. 1169-1174.

26.

Tylenda

Angelova

Bedathur

. Towards timeaware link prediction in evolving social networks. In: Proceedings of the 3rd workshop on social network mining and analysis. ACM; 2009 June. p. 9.

27.

Ibrahim

NMA

Chen

. Link prediction in dynamic social networks by integrating different types of information. Applied Intelligence. 2015; 42(4): 738-750.

28.

Niladri

Saptarshi

Sukumar

Sanasam

. Temporal link prediction in multi-relational network. World Wide Web Journal. 2018; 21(2): 395-419.

29.

Leskovec

Kleinberg

Faloutsos

. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2005.

30.

Gehrke

Ginsparg

Kleinberg

. Overview of the 2003 KDD Cup. SIGKDD Explorations. 2003; 5(2): 149-151.

31.

Jackson

Barraquer

Tan

. Social Capital and Social Quilts: Network Patterns of Favor Exchange. American Economic Review. 2011; forthcoming.