Abstract
The social transmission of beliefs, behaviors, and technologies is a central function of dark networks, just as it is in legitimate networks. One motivation for disrupting dark networks is to break the flow of information and learning. It is often unclear, however, which network should be targeted for disruption because individuals inhabit multiple and correlated networks, and the most relevant network for a given cultural process must be inferred from limited empirical data. Three analytic methods potentially are able to distinguish among alternative network diffusion processes: autoregression, dyadic regression with permutations, and dyadic regression with or random effects. All three rely on having measureable cultural outcomes and network or tree-like connections among the data points. We tested the ability of each method to infer cultural diffusion correctly within 4000 simulated datasets generated on two historical networks that linked violent and pacifist Anabaptist religious groups. Under both frequentist and Bayesian inference procedures, regression of dyadic matrices with random effects exhibited the best statistical performance. We found similar results in a more comprehensive search of the network parameter space that simulated both network structures and the diffusion of traits.
1. Introduction
A perennial issue within comparative anthropology is determining which of several transmission pathways most influence the spread (or diffusion) of cultural traits. Specifically, we can ask which of several alternative networks govern the diffusion of some sets of traits, and whether the traits are inherited along a multigenerational tree.1–7
This topic is not merely of theoretical interest to a small group of statistically minded social scientists. Blocking the diffusion or inheritance of information on dark networks plays a prominent role in disrupting them.8–11 For example, in-depth statistical network analyses have explored the role of networks in political violence among 16th century radical European Anabaptists, who sought to overthrow secular governments to establish their version of Christian theocratic rule.12,13 This historical case study has multiple parallels with radical religious extremists today and, by providing known final outcomes, can serve as a complementary research resource to studies of contemporary movements.
In addition to being integral to the diffusion of violent ideologies, networks are crucial to the diffusion of home-grown weaponized technologies such as improvised explosive devices (IEDs), which themselves are the necessary instruments for modern terrorists or other non-state actors to carry out violence. 8
Structural features of dark networks may differ from the networks of mainstream governments, businesses, religions, or other forms of legitimate and non-covert social activity if dark network participants need to conceal their activities to prevent infiltration of the network by moles. Empirical analyses, however, point to many similarities in the network structures of dark and legitimate social networks,14–16 likely because to be useful for its members’ goals, both must perform much the same functions. The diffusion of information and beliefs can be viewed as one such core function that dark networks share with most other human social networks.
Participants in dark networks, as any other people, can have multiple types of relationships stemming from family, work, religious association, online and offline communication, etc. To disrupt diffusion processes on dark networks, it would be useful to have analytic tools that distinguish which types of relationships are the most important to any particular observed diffusion or inheritance process. Although a number of different statistical methods are published for this purpose, they have never been evaluated against each other with a common set of simulations where the actual diffusion or inheritance processes are known. In empirical datasets, real processes are not known with certainty because these statistical methods draw inferences from cross-sectional data patterns. Using simulated data solves this problem, since the researchers then know the nature of the simulated process and therefore can assess straightforwardly the adequacy of each statistical method. Although one may argue longitudinal data would allow for direct empirical assessment of methods to infer diffusion, cross-sectional data and data sampled sparsely through time are much more common. This is true particularly for datasets on clandestine organizations. Thus, applied researchers have a need for valid statistical methods to infer diffusion processes on networks from cross-sectional data.
In this article, we attempt the first such systematic evaluation of published methods potentially useful for inferring which networks or trees govern the diffusion or inheritance of cultural traits. We define cultural traits as socially learned information, behaviors, or beliefs.
1.1 Empirically grounded simulations
We first evaluated the performance of each method with simulated data on an empirical dark network from the Anabaptist case. Anabaptist dark networks have been the subject of substantial empirical research using some of the same methods we aim to verify.
Matthews et al. 12 examined how theological and liturgical traits were distributed among 11 groups of Anabaptists that operated during the Protestant Reformation. The Anabaptists were fairly extreme sects that rejected secular governments and supported the establishment of Christian theocracies. About half of them advocated using violence to establish such theocracies, even taking over the city of Münster by force at one point. 17 The other half were radical pacifists committed against any use of warfare. The violent sects were brutally suppressed to the point of extinction by the kings, Catholics, and Lutherans, while the pacifist sects, although suppressed as well, managed to leave biological and cultural descendants in the USA as today’s Amish and Mennonites.
A central question is whether traits like violence were diffusing through congregation leaders (ministers) via written letters of correspondence or being inherited as characteristics of the congregations as they diverged. Variants of this question could be applied to many extremist religious groups active today. Figures 1 and 2 illustrate some of the complexity of the Anabaptist networks and especially the differences between the network of group leaders and the schism tree of congregations. 12

The network of personal correspondence among Anabaptist leaders. Numbers in nodes give the year in which each leader’s congregation originated.

The schism history of Anabaptist congregations expressed as a phylogenetic tree. Numbers at the tips show the number of social connections for each congregation’s leader in the correspondence network (Figure 1). Numbers at nodes show the density of correspondence ties amongst the leaders above that node on the tree.
We relied on these structures known from historical records in our empirically grounded simulations. We simulated the diffusion or inheritance of cultural characteristics on the network and tree using standard published approaches.6,18,19 To replicate the empirical dataset, we simulated data sets of 45 binary characters. We also simulated continuous characters (aka traits or variables) because they are commonly encountered situation. In both cases we evaluated the performance of frequentist and information theoretic (Bayesian) inference procedures.
1.2 Simulations of broader network parameter space
We used frequentist statistics to assess the performance of the methods across a broader range of network space by simulating both the network structures and the tree structures, and the diffusion of characteristics on those structures.
2. Potentially applicable statistical methods
Three methods are applicable to this research problem and currently are implemented for empirical social network analysis: autoregression, dyadic regression with permutations, and dyadic regression with random effects. They are derived from various disciplines including evolutionary biology, anthropology, and social network science, and each makes an attempt to account for the nonindependence of network data.
2.1 Autoregression methods
These methods use pairwise matrices of expected similarity from the tree or network to transform the unexplained variation in an otherwise standard linear regression model.6,20,21 The unexplained variation (the residual term) is the variation in the dependent (outcome) variable that is not predicted by the independent variables in the model. In the case of a model with only the intercept term, the residual encapsulates any deviation from the mean value for the dependent variable.
One appealing feature of autoregression is that it makes it straightforward to include independent attribute variables along with network and tree type variables. In such cases, the network or tree effect is estimated on the reduced residual of variation not explained by the independent variables. In this paper, we examine solely the intercept-only model to focus on the issue of determining the importance of a transmission tree or network.
The implementation of autoregression has a significant history in anthropology, network science, and biological phylogenetics. In all these fields, autoregression techniques were developed principally to deal with the problem of statistical non-independence among data points. Statistical non-independence occurs when the values of data points in a model are correlated based on some kind of a relationship among them. For example, data points close together in space may have similar values if the individuals are thereby exposed to similar environments. 21
By the same token, groups or individuals who are close in a network or who share close historical ties in a cultural tree might exhibit similar attributes. Such attributes can arise if the people influence one another to be similar (social influence or contagion); or alternatively, if they choose to connect because of the similarity of their attributes (homophily). Either of these processes will lead to what is termed “data point non-independence.” Within comparative anthropology, the case of historical ties among societal or culture-group data points has a particular intellectual history known as “Galton’s Problem.”22–24
In short, standard regression models assume the data are independent, but network and phylogenetic tree data usually violate this assumption. Thus, dealing with data point non-independence has been a major motivating force for the development of autoregression methods. When the independence assumption is violated, significance tests of standard regression models become unreliable.20,21
2.1.1 Network autoregression
Network autoregression models were developed based on insights from spatial autoregressions. 21 The technique, called the simultaneously autoregressive (SAR) model, was developed to assess how networks influence the diffusion of a continuous trait and to correct for the non-independence of residuals in regression models with network data. 6 The most commonly used implementation of this model is the lnam function in the “sna” package.7,25
Several implementations of autoregression have been produced by the evolutionary biology community, but none of them allows for statistical testing of multiple networks or trees. Instead, they typically allow the user to conduct tests on a single tree. One method incorporates both spatial and phylogenetic autocorrelations, and so potentially could be flexibly adapted, but its current implementation is fairly rigid and does not allow for statistical testing of multiple putatively important networks. 26
2.2 Dyadic regression methods
Another set of techniques, known as dyadic or matrix regression, construct pairwise distances out of the dependent cultural measure and then regress those distances on one or more matrices that reflect how closely connected the data points are on the network and tree variables. The dyadic regression thus creates pairwise distances from the dependent variable. There will be N×(N– 1)/2 undirected distances from this procedure, where N is the number of individuals studied. Besides the change in sample size as compared to the original cultural measure, the distances also are nonindependent, because each individual’s cultural measures are used over and over again to calculate distances to the various other individuals. This repetition of individuals throughout the data is similar to repeated measures study designs common in physiological and biomedical studies, in which individuals are measured on the same variable over multiple occasions through time.
2.2.1 Dyadic regression with permutation
The first solution to these problems for dyadic data was proposed by Mantel, 27 who created a null distribution by permuting the distances from the dependent variable. This procedure remains in semi-common use in evolutionary biology, anthropology, and social network science, despite known statistical performance problems.28,29 Many researchers may be unaware that other permutation techniques perform much better than the classical Mantel method, which is susceptible to inflated false positive rates when the independent network or tree variables themselves somewhat are correlated. 28 Although this condition, called multicollinearity, is technically a violation of standard regression models, standard regression models are highly robust up to even moderate levels of multicollinearity. Dekker et al. 28 showed that the Mantel permutation lacks such robustness. Instead, they found that permuting the residuals of each independent predictor was robust to a collinear and spurious network being included in the model.
2.2.2 Dyadic regression with random effects
Recall that the repetition of the original cultural measure in the pairwise distances is conceptually similar to a repeated measures design where individuals have been measured over time. This observation has led to a wholly different approach to dyadic regression that uses random effects mixed models to correct for the repetition of individuals’ identities. The random effects allow all the dyadic distances linked to a given individual to have a somewhat higher or lower intercept than the overall intercept in the regression model. 30 Both O’Malley and Christakis 31 and de Nooy 32 independently developed this approach in the context of regressing the formation of network ties on multiple relationship matrices and distances derived from attribute data. They offered some mathematical proofs for the validity of the approach and empirical results, but no simulation work has been conducted on their method.
3. Methods
3.1 Simulated datasets on an empirical dark network
For the case of a single continuous trait, we simulated inheritance of 1000 characters under random motion on a phylogenetic tree using the sim.char function from the “geiger” R package. 18 To simulate diffusion on a network, we first generated a random normal variable for each node and then added 50% of the observed difference from their value to the average of their immediate social ties. We replicated this process 1000 times to create 1000 simulated continuous characters affected by diffusion. This reflects a pure horizontal diffusion process in which all nodes simultaneously adopt a portion of the trait value of their social connections. 6
To simulate multiple discrete traits, we again used the sim.char function to evolve traits on a tree. 18 We simulated 1000 datasets each containing 45 characters. To simulate the network diffusion of discrete traits, we used the code from Franz and Nunn’s network-based diffusion model, 19 setting the stopping point in their model to a randomly drawn proportion of the total number of nodes.
All the statistical and simulation work was performed in R version 3.2.0 or R version 3.3.1 and the code necessary to perform the simulations is provided in the supplementary materials. 33
3.2 Simulated datasets on simulated networks and trees
We conducted the same simulations of character diffusion and inheritance but on a broader set of the network and tree parameter space. To simulate network structures, we used the barabasi.game function of the “igraph” R package. The simulated networks differed based on the number of nodes and undirected links (edges) between the nodes that they contained. The number of nodes was set to 20, 50, or 100, while the average number of edges per node was 2, 5, or 10. Note that the average edge counts did not always exactly equal 2, 5, and 10, but when setting the simulation parameters, we specified the number of edges per node that would produce networks with averages closest to these targets. Each of the resulting nine network types (i.e., 20 nodes with 2 edges per node, 20 modes with 5 edges per node etc.) was replicated 1000 times, yielding a total of 9000 simulated networks.
For simulations of the tree structures we used the rcoal function from the R “ape” package with default parameters. 34
3.3 Statistical implementations
3.3.1 Frequentist approach
We implemented network autoregression with the lnam function from the commonly used “sna” R package. We implemented dyadic regression with the semi-partial permutation recommended by Dekker et al. 28 with the netlm function in the “sna” R package. 25 Lastly, we coded the random effects dyadic regression with the lmer function from the “lme4” R package. 35
Autoregression techniques require a single continuous variable as the outcome to be modeled. The simulations of continuous characters produced this directly. To accommodate the simulations of 45 binary characters, we reduced each of those sets of binary characters to a single continuous variable by applying principal components analysis to each 45-character matrix and extracting the first principal component, as is sometimes performed in Cultural Consensus Analysis. 36
In the case of dyadic regression methods, it is necessary to convert the simulated characters into distances. For the continuous characters, we used the Euclidean distance, while for the binary characters, we used Jaccard distances, both of which are very commonly applied transformations to obtain distances from these respective data types. 37
In the cases of network autoregression with lnam or the dyadic regression with random effects, we calculated p-values using the Wald test, which is the standard for these implementations and asymptotically equivalent to the likelihood ratio test. 38 The distribution from permutations in the netlm function provided a direct estimate of p-values for the permutation-based dyadic regression.
3.3.2 Information theoretic approach with the Bayesian Information Criterion
The autoregression method implemented through lnam and the dyadic regression through random effects both provide means to use an information theoretic approach to evaluating hypotheses. With this approach, the likelihood (probability) of the data was calculated under each of four alternative models: (1) a model with an intercept term only and no network or tree variables; (2) a network plus intercept model; (3) a tree plus intercept model; and (4) a network, tree, and intercept model. The likelihoods were converted to the Bayesian Information Criterion (BIC), which allows for easy calculation of Bayes Factors. 39 The latter express the predictive utility of the preferred model over the next most preferred alternative. The result of this search of alternative models is a count of how many times the correct model was selected through the BIC.
4. Results
4.1 Results from simulations on the empirical dark network of Anabaptists
These results reflect simulated continuous or binary data that had either diffused on the ties of the Anabaptist leader network or had been inherited along the ties of the Anabaptist congregational tree. We then applied the statistical methods as described above. Correct inferences for significance of the network or tree were used to calculate statistical power, while incorrect inferences were considered false positives.
4.1.1 Frequentist approach
4.1.1.1 Autoregression
The results indicated that network autoregression did not perform well for this task (Table 1). The lnam method demonstrated unacceptably high false positive rates.
Performance of alternative methods on simulated Anabaptist data.
It is worth noting that the performance seemed improved with the binary characters as compared to the single continuous character, even though the autoregression methods were developed for the case of single continuous outcomes. This is likely because the multitude of binary characters that have all evolved on the same network or tree contain more signal relative to random noise than a single continuous character, since over the 45 runs of the binary characters random noise will cancel itself out and the signal will come through.
4.1.1.2 Dyadic regression
The dyadic regression techniques both performed reasonably well for this task. The permutation method showed a somewhat high false positive rate for inferring a significant tree variable when discrete traits were simulated on a network (Table 1). The random effects method had elevated false positives for a significant network variable in conditions where traits were simulated on a tree, but the elevation of the false positive rate in these cases was modest (<0.1). Dyadic regression demonstrated substantial statistical power, especially considering that all simulations had only 11 data points and involved a component of random noise. Although power was low for the single continuous character case, it was still demonstrable. Low power likely resulted from there being less signal:noise as compared to the case of multiple binary characters.
4.1.2 Information theoretic approach with the BIC
4.1.2.1 Autoregression
The autoregression method implemented through lnam showed a similarly poor performance when conducted in an information theoretic framework. For discrete characters simulated to diffuse on a network, the autoregression method selected the tree-only model about twice as often as it selected the correct network-only model (Table 2). When the correct network-only model was preferred, the median BIC difference to the second best model was 1.58, which corresponds to a Bayes Factor of 3.14. This is considered “positive support” (the lowest category of support) under Kass and Raftery’s recommendations for the interpretation of Bayes Factors. 39 Lnam performed better for simulations of discrete characters being inherited on a tree, since the tree-only model was selected 10 times more frequently than the network-only model (Table 2).
Counts of models selected as best in the Bayesian inference approach.
When tested through simulations with continuous characters, lnam exhibited little ability to detect the correct model (Table 2).
4.1.2.2 Dyadic regression
As in frequentist tests, dyadic regression with random effects performed much better than did lnam when inferences were made through the information theoretic approach. In the case of network diffusion simulations with discrete characters, dyadic regression nearly always selected the correct network-only model (Table 2). When the correct network-only model was preferred, the median difference in the BIC to second best model was 14.5, corresponding to a Bayes Factor of 29.0, which is considered the highest qualitative category of support, “very strong,” under Kass and Raftery’s schema. 39
The results for discrete characters inherited on a tree were more conservative, since slightly more than half of the simulations had the null model as preferred, and nearly all the remaining half selected the correct tree-only model. When the correct tree-only model was preferred, the BIC difference to the next best model was 7.54, corresponding to a Bayes Factor of 15.0 and “very strong” support.
Note that in Bayesian statistics the null (intercept only) model can be preferred over research hypotheses. This is because in the information theoretic framework, the probability of each hypothesis is evaluated, unlike under frequentist approaches that evaluate the probability of the data only under the null hypothesis and never directly calculate the data probability under research hypotheses. Thus, Bayesian approaches tend to be more conservative when the data are improbable not only under the null, but also under the research hypotheses. In such situations, when the probability of the outcome under the null is sufficiently small (e.g., p < 0.05), frequentist approaches would infer support for the research hypothesis, whereas Bayesian approaches would not make this inference if the probability of the data were not sufficiently greater under the research hypotheses.
The only systematic errors for dyadic regression appeared to be for the condition of continuous characters simulated through inheritance on a tree (Table 3). In that situation, the null model was mostly favored, but the network-only model was incorrectly selected more frequently than was the tree-only model.
Model performance on simulated networks and trees. We include simple linear regression throughout to show, as is well known, that the non-independence of network and tree data produce substantial error rates. Network size (m.n) indicates the number of nodes (m) and the average number of edges per node (n) used to generate networks under the barabasi.game function of the “igraph” R package.
4.2 Results from simulations on simulated networks and trees
We next conducted comparable simulations and statistical assessments on a much larger set of parameter space than was possible on the Anabaptist networks. Our goal was to assess the generalizability of our findings. Because the frequentist versus Bayesian analyses and continuous versus discrete conditions were largely consistent in the Anabaptist case, and to keep the analysis for the broader simulations more tractable, we conducted only frequentist tests on continuous characters for these analyses.
The results of our tests on simulated networks and trees were consistent with our findings from the specific case of the Anabaptists.
We conducted one set of simulations where we had two network structures in the statistical models, one being the network on which character diffusion was simulated and the other being a random network from the set of simulated networks. In this case, lnam exhibited elevated type I errors (false positive), but both random effects and permutation approaches behaved appropriately (Table 3).
The same pattern was found for cases where we simulated diffusion on a network and included a random tree in the model. False positives in this situation arise from a method incorrectly inferring significance for the tree. Here again, lnam exhibited elevated false positives while both methods for dyadic regression performed appropriately. It is notable, though, that across all conditions in which data were simulated on networks, the random effects approach exhibits statistical power 0.135 greater on average as compared to the comparable application of the permutation method (p < 0.001, two-tailed paired t-test).
In the situation where we simulated the inheritance of traits on a tree and included a random network, we found that only the random effects approach exhibited suitable levels of false positives, while the Both lnam and permutation methods showed elevated false positive rates.
5. Discussion
Under traditional frequentist (p-value) inference procedures, none of the techniques tested fully reduced type I error to expected levels while maintaining statistical power for all conditions we simulated. The dyadic regression with the random effects technique, however, demonstrated high power and reduced false positives to acceptable or near-acceptable levels across nearly all conditions.
The random effects method also performed generally well in a Bayesian inference paradigm that evaluated alternative models against each other more directly than possible with frequentist approaches. The only systematic errors were in the condition of continuous characters simulated to be inherited through trees. False positive for dyadic regression with permutations was a close second in performance to the use of random effects, but autoregression with lnam vastly underperformed the other two methods. In some conditions lnam failed even to improve false positive rates beyond the rate exhibited by simple linear regression that assumed full data point independence (Table 3). Although we have simulated a broad range of conditions, we note that autoregression may perform better for selection among alternative networks in some specific contexts. For example, Matthews et al. 40 showed that lnam performed largely as it should using a Bayesian inference procedure on an international Indo-European linguistic network. However, lnam demonstrated systematic biases in some simulations in that study as well. 40
The general failure of autoregression, contrasted with the success of dyadic regression with random effects, recommends that empirical researchers should proceed by first assessing which networks or trees are important using dyadic regression with random effects, and then apply autoregression to correct for nonindependence in a broader model with attribute predictor variables. This procedure was followed by Matthews et al. 12 in their study of violent and pacifist Anabaptist sects. These authors first used dyadic regression with random effects to find that violence was generally more correlated on the lineages of the Anabaptist tree rather than the network connections of sect leaders. Having determined the tree was a better autocorrelation structure, they proceeded to apply a standard autoregression model to test hypothesized associations with violence while simultaneously correcting for data point nonindependence. Their results failed to support hypotheses that more socially peripheral groups, or groups with a history of more schisms or rapid ideological change, would be more violent.
It is disappointing that autoregression methods performed so poorly for inferring the best network or tree because, by allowing for the simultaneous inclusion within the model of network data with traditional independent variable data, 20 autoregression provides a more elegant statistical solution to research questions such as those just described in Matthews et al. 12 This presents an opportunity, however, for statistical programmers to advance network science by revising lnam’s implementation or by providing a new implementation that is similarly flexible but that performs correctly in simulation tests. Although some other implementations of autoregression are available from the spatial statistics and phylogenetics communities, none are suitably flexible for network analysis, which requires an ability to test multiple networks and/or trees against one another. Our study also provides a potential methodological lesson that simulations can perform a critical testing function of statistical implementations used by researchers working on dark networks or other applied network analyses. At the end of the day, practitioners want statistical methods that perform as advertised for their clients and stakeholders. Peer reviewers agreeing that a study’s methods are acceptable should be regarded solely as a means to the end of ensuring correct inferences. Although mathematical proofs for a statistical method may be correct, implementations still may not perform as expected. For this reason, we think the use of simulations to test statistical performance of actual implementations for methods is crucial to their evaluation. Simulation testing can provide a relatively straightforward way to check the realized statistical performance of implementations under a range of conditions that, although likely simplified compared to the real world, are theoretically plausible.
Footnotes
Acknowledgements
We thank Quentin Atkinson for his feedback on an earlier version of the manuscript. We thank Mathias Franz for providing advice and code for network based diffusion analysis.
Funding
The writing of this work was funded by the RAND Corporation’s Investment in People and Ideas program. Support for this program is provided, in part, by the generosity of RAND’s donors and by the fees earned on client-funded research.
