Abstract
The methodological approach proposed here combines textual statistical analysis and agent-based simulations. The controversy concerning the unusual disappearance of pollinating bees (Apis mellifera) covered in the French press over a period of 13 years was chosen to describe the different stages of this heuristic framework. First, the articles are categorized according to the origin they attribute to the phenomenon as a univariate cause, a multivariate cause or a lack of understanding. Then, the different proportions of agents supporting either of these causes are obtained and reproduced from a dynamic opinion model, which shows the existence of agents with pre-determined choices. The change in the proportions over time leads to questionning the consistency of the data concerning the independence of all agents. In particular, it becomes possible to assume the existence of networks of influence, a hypothesis that could be verified by other techniques such as qualitative, ethnographic or interview methods.
Empirical Data Dealing with the Precautionary Principle Application
Some published reports show the interest of text-mining analysis methods to address controversies (Chateauraynaud and Torny, 1999; Chateauraynaud, 2011; Chateauraynaud et al., 2003; Bastin and Bouchet-Valat, 2014). Furthermore, agent-based simulation has already been used to analyze complex phenomena (Casilli and Tubaro, 2012). Moreover, a good deal of research has been devoted to the theoretical study of collective threshold phenomena (Watts and Dodds, 2009) and to analyses within the context of sociophysics (Castellano et al., 2009; Galam, 2008; Galam, 2012).
But not many real cases have been investigated by agent-based simulation, using authentic empirical textual data. The challenge of comparing descriptive text-mining analysis with theoretical modeling still stands. Indeed, our approach aims to answer the demanding question of the interpretation of big data (Boyd and Crawford, 2012). The current paper presents a case study for a subject that deals with controversial environmental risks. Public problem treatment in the media renders an understanding of public opinion complex. In our particular context, an application of the precautionary principle 1 is a sensitive issue, which mobilized public opinion and let us gather enough data to be modeled (Delanoë and Galam, 2014).
The empirical data used in this paper concerns abnormal bees’ death, also called colony collapse disorder (CCD) in some countries, including the US. The controversy is symbolic of the question of risks connected to the burden of making possible mandatory arbitration to ban the use of specific chemical products. Moreover, real innovation implementation has led to public debates, which are inevitably driven by incomplete scientific data. The question arose as to how possible risks were translated (Callon, 1986) into solid facts (Tönnies, 1922) during the ongoing associated public debate (Dewey, 1927) in France from 1998 to 2010. Then, we studied the social aspect of the bees’ disappearance, as discussed by French-speaking journalists, using a corpus of 1467 articles published in newspapers.
From a systematic textual analysis, we tagged each article to one of three stances to explain the phenomenon; a uni-factor cause, namely the use of pesticides, a multi-factor cause, or the absence of clear understanding. On this basis, we obtained the evolution of the proportion of each category over 13 consecutive years. Then, we confronted the data with a model (Galam and Jacobs, 2007; Galam, 2005) to re-question the social meaning of the dynamics of the phenomenon. Our hypothesis asserts that the evolution over time for each view in journalists’ reports results from social interactions, which we have to elucidate.
Assuming journalists may change their way of reporting the facts, according to their dispositional or positional influence (Bourdieu, 1973), we extracted the associated proportions from the data, applying an agent-based model of dynamic opinion. The variations exhibited by the data suggest that the number of dispositional agents (who hardly changed their mind) alters from year to year. The varying numbers of agents were inferred by applying the model. From these distributions, we could analyze the possible social interactions among journalists and other externalities. Applying the model to the empirical data built from a corpus of articles published in newspapers provided a frame to return to the data (with an interview or ethnography) and question its social meaning, with the possible mechanisms behind the data.
The rest of this paper is structured as follows: the problem is set out in the first section, where the dynamics opinion is quantitatively evaluated from empirical data; the second section highlights the newspaper level to show the dispositional or the positional agent profiles; the model is adapted to the problem in section three; and the proportion of dispositional agents as a function of time extracted in section four; in section five, the social meaning behind the data is addressed, focusing on the determinant role of social interactions and other externalities; the results are discussed in the last section.
Many Hypotheses behind the Dynamics
A Boolean equation dealing with the abnormal death of bees in France from 1998 to 2010 led to the extraction of a corpus of almost 1,500 French articles from the Lexis-Nexis and Factiva complementary databases. The collection of reports is taken from daily, weekly and monthly French-speaking press. The annual distribution of the number of articles is shown in Figure 1.

Number of articles published each year from 1998 until 2010 by French daily, weekly and monthly press that dealt with bee deaths
Over the thirteen years, the total amounts to 1,467 reports.
A systematic textual analysis of all the collected papers showed that of the articles dealing with the question of abnormal bee death, some indicated particular scientific results suggesting a single cause, namely the use of pesticides (Chateauraynaud, 2004; Delanoë, 2004), while others emphasized alternative specific results, suggesting the association of several different causes (Chiron and Hattenberger, 2008; Maxim and van der Sluijs, 2010; Debaz, 2012). Accordingly, a combination of words has been established to categorize each view and to assign each article to one of the different classes.
This required several examinations of the corpora (Delanoë, 2004; Delanoë, 2007; Delanoë, 2010), with a multi variant co-word analysis such as that previously published in BMS (Bastin and Bouchet-Valat, 2014). However, the approach described in this paper is closer to another way of building a representation of the corpus, with a classification according to a “bag of words” inside it. Such a systematic method is sometimes called the Latent Dirichlet Allocation. According to the joint presence or absence of certain words in an article, the paper can be classified into one of the following groups: Articles containing the words “pesticides”, “insecticides” or “chemicals” that do not reference other factors are categorized in the uni-factor class; Articles containing at least one word from the following list are placed in the class of the multi-factor cause: “Foulbrood” (it is a bacterium); “Nosema” or “Nosemose” (it is a fungus); “Varroa” (it is a parasite); “Virus” (it represents mainly the Israel acute paralysis virus); “Predators” or “galleria mellonella” or “aethina tumida” or “Asian predatory wasp”; “Monoculture” or “natural sunflower toxin” (which refer to agricultural practices); “Pollution”, “climate change” or “meteorology” (which represent external or environmental causes); “Multi-factors” or “many factors”; Articles containing sentences such as those below are assigned to the class for which there is no clear understanding yet: “While it would be impossible to formally accuse the pesticide as being exclusively responsible for the fall in the hive population”; “It is not an element of new evidence of CCD”; “The data analyzed does not incriminate formally and exclusively the treatment of sunflower seeds”; “The pesticide was evaluated on two occasions over the last three years, and we believe that there is no causal relationship between the product and the problems of bee orientation”.
Semi-automatic textual analysis tools combined with 3 human readings for validation have enabled 84% of the corpus articles to be tagged. We have restricted the corpus to the tagged articles shown in Figure 2.

The proportions of articles published each year in the daily press dealing with uni-factor, multi-factor or no-proof categories
We can make the observation that each category corresponds to a public debate (Gusfield, 1981) period concerning a single dynamics of the controversy. The second period focuses on the causes of the bees’ disappearance from 2005 to 2010, whereas the first period focuses on the fact that bees were disappearing from 1998 to 2004 (and therefore a precautionary principle is needed). The two periods are separated by an event in 2004: the application of Precautionary Principle.
The years are numbered from T = 0 for 1998 to T = 12 for 2010. The corresponding proportions of articles for the uni-factor class denoted by PT are respectively 0.500, 0.60, 0.677, 0.513, 0.627, 0.831, 0.769, 0.517, 0.540, 0.422, 0.544, 0.40, 0.255 with T = 0,1,…,12. A indicates the opinion of journalists who belong to the uni-factor class. Simultaneously, the opinion of journalists belonging to either one of the two other classes, the multi-factor and the no-proof ones, is noted B.
Newspaper Publications over Time Highlight the Main Profile Types
In his seminal criticism of public opinion (Bourdieu, 1973), Bourdieu highlighted theoretical warnings that can be mobilized to deconstruct public opinion. Indeed, he analytically discussed 2 social facts that could theoretically produce an opinion: a dispositional fact and a positional one. The dispositional fact highlights the case that agents do not strictly follow the accepted opinion, because of their inherited disposition in their own social field (such as habitus), whereas other agents follow a social contextual influence, because of their (actual or expected) position in their respective social field. The coexistence of dispositions and positions would prevent analysts from literally interpreting public opinion through reading results of the pools. We used this distinction to question the data, supposing that agents may have valid dispositional (Bourdieu, 1973) or positional reasons 2 to change, or not, their opinion. We tested this hypothesis at the newspaper level to determine whether it is robust enough for the following modeling step.
The corpus of selected articles was sourced from almost 60 different newspapers. To model the dynamics of opinion, we needed to check whether contributions exhibit different profiles depending on the newspaper. The contributions from “Le Monde” (Figure 3) show large variations. Indeed, some years none of the journalists wrote either type A or B articles, as in 2001, 2006, 2007, 2008 or 2009. This means that during those years, journalists were either all dispositional agents, or all positional agents with some dispositional agents present only on the opinion side which has been advocated at 100 percent. In other words, a 0 percent support for one opinion implies the absence of dispositional agents on this side. In the years where the dynamics did not reach 100 percent for one opinion, dispositional agents may have been present on both sides.

The proportions of articles published each year in Le Monde (a total of 67 tagged articles), Sud Ouest (a total of 209 tagged articles) and Le Figaro (a total of 62 tagged articles) newspapers that were uni-factor or not-unifactor
While 4 years (2001, 2006, 2007 and 2008) were characterized by zero dispositional agents for the uni-factor cause, only 1 year, 2009, featured zero dispositional agents for the multi-factor cause. When the year's contribution did not reach 100 percent, and was split over the support of A and B, we inferred a possible existence of dispositional agents on one or two sides, as occurred in 1998, 1999, 2000, 2002, 2003 and 2004.
The contributions from the Sud Ouest newspaper (Figure 3) revealed the possible presence of dispositional agents on each side every year. Contributions from Le Figaro newspaper (Figure 3) and Le Monde both exhibited 100 percent polarization in several years, namely 1998, 1999, 2001, 2005, 2007, 2008, 2009 and 2010. These years are characterized by zero dispositional agents for the uni-factor cause (while for Le Monde it occurs for both sides). This position was modified in 2000 and reversed in 2002-2004, with a slight surge in 2006.
The results shown in Figure 3 suggest a key role played by dispositional agents in the generation of data in Figure 2. This would call into question their social meaning. Thus, it is important to extract the values of the proportions of dispositional agents present each year. Specifically, the successive brutal changes of trends exhibited in Figure 2 indicate a change in the proportion of the dispositional agents.
Since the goal is to re-build the actor networks from the dynamics, implementing the GUF model (Galam Unifying Frame) seems to be appropriate, as this does not infer a priori structures of networks. Indeed, this model only incorporates the effects of dispositional agents on the dynamics of opinion among positional agents (Galam, 2010; Galam and Jacobs, 2007; Galam, 2005). Moreover, the size of the local update groups does not affect the major results, since increasing the group size reduces the number of updates required to reach the attractors. To keep the equations analytically solvable, group updates of size 3 were used.
Using a Model to Re-interpret a Problem
The Framework of Basic GUF Depends on the Group Size Distribution
The GUF model investigates the competition between two opposite opinions within a population of “inflexible” and “flexible” agents, which are respectively our dispositional and positional agents. In that heuristic social space, each agent has a single opinion, i.e. one way to report the facts in the case of journalists. Diffusion rules assert that positional agents can shift opinion. Indeed, within a group of agents, a positional agent has the majority opinion since the positional effect led it to adapt his position, whatever his “good” reasons for following it. Dynamics were implemented via repeated chance meetings of agents in small groups of various sizes in a randomized network. At each distribution, agents’ opinions were locally updated according to the respective local majorities in their own group. In the case of equality in even-sized groups, agents preserve their current position towards the way to describe facts.
In real-life, people meet and discuss in groups of different sizes, but these are generally small. In the case of journalists considered here, the meetings occur within the social network in which they can interact. Although in principle an infinite number of size distributions are possible, the dynamics are qualitatively unchanged, with the two attractors and the tipping point always being invariant. The main difference is the number of iterations required to reach either attractor. Larger groups contribute to the acceleration of the polarization effect. On this basis, to keep calculations simple and tractable we restrict the group size to 3 in the rest of this paper.
Series of Continuity in Discontinuity
The uni-factor distributions shown in Figure 2 reveal a series of abrupt variations at years 2000, 2001, 2003, 2005, 2006, 2007 and 2008. In parallel, according to the model the opinion that reaches the majority would be even more dominating. However, empirical data evolution is incoherent with regard to such model interpretation. In fact, this observation reveals a discontinuity point after a series of continuity in the dynamics. To counter this increasing trend, external parameters must be integrated to carry out a topological modification in dynamics. In the model, we suppose that dispositional agents do not have more powerful arguments than positional agents (at first sight). This point highlights our complementary approach to the argumentative sociology methods (Chateauraynaud, 2011), which assert different impacts in the public sphere, according to the arguments. With our approach, it is only possible at the end of the process to question the network topologies and their respective arguments according to empirical data. Indeed, arguments do not have the same weight, but we assume this hypothesis first and finally refute it at the end of the process (Popper, 1991). However, arguments do obey one person and one position in a group discussion (ie, a pressure to describe the facts in a single way). Once every agent in a local group has written, they do not follow the local majority rule in case they are in the minority. In the present study, we considered a population that is a mixture of positional and dispositional agents. The proportions of dispositional agents are external parameters, while the respective proportions of positional agents in favor of A or B are internal parameters driven by the dynamics of local discussions. The possibility of making dispositional agents an internal parameter has been studied in (Martins and Galam, 2013), but this question is not introduced here. Accordingly the equation becomes,
where a and b denote the respective proportions of A and B dispositional agents. The associated dynamics was extensively studied in (Galam and Jacobs, 2007; Galam, 2005).
Fitting the Model to the Empirical Data
The proportions of dispositional agents can be modified every year as a result of the activation of external pressures in favor of either opinion. Then, each year a positional agent may change to a dispositional status, and vice versa. Each year, dispositional agents’ proportions are kept fixed for each successive update. Opinion dynamics of are implemented in three steps. Firstly, fixed proportions of dispositional agents are given. Secondly, n consecutive updates of positional agents are carried out, keeping unchanged the dispositional agents’ proportions. Finally, the proportions of dispositional agents are modified before n new updates are performed. This multi-step dynamic is implemented by modifying Equation 1 into
where pT,t and (1−pT,t) denote the proportions of journalists in favor of A and B respectively, during year T and intra-time t. The associated proportions of dispositional agents aT and bT were independent of the intra-time. They only depended on the year T. Given pT,t, the model determined pT,t+1 obtained after one update of opinions for fixed values of aT and bT. A detailed study of the properties of Equation 2 was performed in (Galam, 2010; Galam and Jacobs, 2007).
To account for the interplay between the two timescales, we remarked that, since T = 0,…,12 for the years and t = 1,2..,n for the intermediate intra-time within a year, we had a congruence of (T,n) = (T+1,0).
In addition, we noted that only the fraction pT,t−aT had a positional influence, i.e. was able to shift opinion under convincing local arguments. The same argument holds true for opinion B. We thus have pT,t≥aT and 1−pT,t≥bT⟺pT,t≤1−bT, which combine to give
with the constraints 0≤aT≤1,0≤bT≤1 and 0≤aT+bT≤1.)
Implementing the Model to Rediscover the Data
We do not aim to reproduce the data shown in Figure 2. The objective of the method is to evaluate the minimum values of both proportions of dispositional agents aT and bT, and the intra-time n, which are compatible with the data for every pair of successive years. Given a pair of values PT and PT+1 we determined the minimum values of aT, bT and n, which starting from pT,0=PT reached pT,n=PT+1 with a precision of 10−3 after n successive iterations of Equation 2. In a second step, writing pT, n=pT+1,0 we evaluated the minimum values of aT+1 and bT+1 that allowed us to obtain pT+1, n=PT+2 starting from pT+1,0.
More precisely, we started from p0,0=P0 to evaluate a0 and b0 such that p0, n=p1, 0=P1. Then we evaluated a1 and b1 such that p1, n=p2, 0=P2. And so on up to the evaluation a11 and b11 such that p11, n=p12, 0=P12.
To determine which value of n to use, we took advantage of the information that the number of articles for each year is distributed within 3 different groups, of respectively less than 100 (10), between 100 and 300 (2), and more than 300 (1) as seen in Table 1. For each group we determined the minimum value of n that allowed the implementation pT, n=PT+1 starting from pT,0 for all cases of each group. We found respectively, n=3,5,8 as shown in Table 1.
Dispositional Agent Proportions at Each Year to Reach the Following One, Covering 12 Annual Intervals
From Table 1 it can be seen that for each year, n, only one fitting parameter is used since either aT or bT always equals zero. The variation of pT,n as a function of successive iterations is shown in Figure 4. The error bars are also indicated in the figure, although GUF values of the pT,n series recover perfectly the PT data values for all 13 years. Figure 5 exhibits the simultaneous variations of aT and bT as a function of T.

Evolution of the proportion of journalists advocating the uni-factor cause of bee death as a function of updates, using Equation 1 with 9×3+2×5+1×8=45 updates

Variation of dispositional (i.e. inflexibles) agent proportions aT and bT as a function of the year
Circles show the overlap with the data calculated per year and vertical lines indicate error approximation using the variance of a binomial generator.
Behind the Data, Another View of the Networks
The picture drawn from this heuristic framework leads to the opposite conclusion to that expected a priori. Proportions of categorized articles follow different evolutions to proportions of dispositional agents. As a consequence, we cannot interpret the evolution of text occurrences to infer actors’ behavior without questioning their network topologies.
Starting from a balance in 1998, until 2010, the public debate around the controversy can be divided into two phases. The first phase focuses on the disappearance of the bees, whereas the second one focuses on multi-factor causes. These two phases are separated by the precautionary principle, which was applied in 2004. The first phase is a conclusion period focused on the hazard to the bees (from environment to human risks). The second phase is a premise phase focused on the causes (from pesticides alone to other possible factors). In the first year of the public controversy, the proportions of words in papers were equally distributed (P0=0.50). However, the uni-factor agents appear to have dispositional supporters on their side (beekeepers and their local networks), while none were present on the other side. Multiple interviews with industrialists 3 revealed this: at this point in time, industrialists did not consider beekeepers’ alert as a threat to their business. This highlights the determinant advantage made by the first whistle-blowers concerning a new controversy and the importance of taking into account possible negative externalities of commercial products.

Variation of dispositional agents’ proportions aT and bT as a function of the year, with sociological interpretation
Once the controversy was launched by an “anti-gaucho” collective, industrialists tried to communicate on the bias of scientific studies. They accused pesticides, whereas the beekeepers side turned down the pressure. But with the threshold nature of dynamics (Galam and Jacobs, 2007; Delanoë, 2010), the following years brought the uni-factor side back to rather high values in 2002 and 2003. Since then, the beekeepers’ view has been translated into the public media through experts or political spokesmen during the 2004 regional and European elections. Indeed, pesticides were banned from being sold that year.
After 2005, another kind of informant focused on other causes. New scientific studies and lobbying of industrialists were questioned: focus was directed towards many factors other than pesticides. With new scientific studies in following years, the multi-factor side re-launched the debate in the media. Finally, we questioned the possible existence of interactions between journalists and the people who intervene in the public debate.
Conclusion
In this paper we have shown how empirical text-mining, descriptive analysis and theoretical modeling together can produce a heuristic framework. Our focus was on the disappearance of bees.
First, we performed a text-mining analysis of published articles to categorize the articles. Facts reported to deal with the causes of the critical phenomenon were used to nest the papers. In one category, papers advocate that the cause is uni-factor, namely the use of pesticides. In the other category, the causes are multi-factor or there is no identified cause to date. In the approach used, we do not only consider the risks of chemicals, but also focus on the harm to honey bees and how people are translating the context of the issue in public debate. However, quantitative data analysis revealed two phases during the 13 years of the controversy: one public debate focused on the conclusion (ie, honey bees die because of a single cause, pesticides) and another public debate focused on the premises (ie, many factors are implicated in the phenomenon). Despite this result, a discontinuity of the data does not enable sociological induction. Hence, data have been confronted with a complementary approach.
Second, the evolution of each proportion of categorized articles is assumed to be rebuilt with the dynamics of interactions among journalists. Two types of agents are considered. Some never change their mind with a dispositional behavior depending on their social anchorage (habitus, “good reasons” or other explanations). Some have a positional behavior since they may shift their opinion according to the related networks. The respective proportions of agents follow a function of time and only vary on a yearly basis. Between each pair of consecutive years, the fraction of journalists in each class is inferred from the distribution of opinions, using a model of opinion diffusion. The evolution of respective proportions of dispositional agents is thus obtained for each year. With its randomized network modeling and analytical solution, the model does not assume any a priori network to question quantitative text-mining evolution with the results of the simulation. In that context, actor networks can be questioned with scenarios that include external pressures, social structure and the frame of the debate around the non-human actants.
Third, the proportions of dispositional agents extracted from the model are turned back towards the empirical data, to question the interactions between agents and network topology. Finally, the results are confronted with qualitative analysis and/or interviews. Moreover, we could question possible pressure on the journalists from various involved parties to keep on exploring the controversy. In conclusion, starting from non-human actants mentioned in published articles (ie, bees, pesticides, fungi and others factors), this framework offers a methodology to re-question actor networks (journalists, whistle-blowers, spokesmen) implied in a debate dealing with controversial innovations.
Footnotes
Acknowledgements
We thank the native English speakers of the INIST-CNRS for proofreading this text.
