Abstract

Keywords
The question of what lurks in the shadows is a perennial one. In the case of data, although bringing data “into the open” is often accepted uncritically as an obvious good, it is important to establish why doing so is important not only for those who produce and share data but also for policymakers (on the “open science” movement more generally, see, for instance, The Royal Society 2012). What count as “quality” data also warrants deeper analysis, because not just any random data are considered important by creators and (re)users. It is notable in the papers in this special issue that definitions both of data and of what count as quality data appear to differ across the relevant communities of practice, within communities as they evolve over time, and between situational contexts (see also Borgman 2012; Leonelli 2016). What makes data count as “data” (and perhaps even what make data visible) is the outcome of negotiations within a discipline, field, or community of practice about underlying concepts and the goals of research, as, for instance, Alison Wylie’s paper stresses in the context of archaeological evidence and Linsey McGoey’s article explores with regard to data on wealth distribution. Conflicts about these matters can take place between those even in closely related fields or communities of practice, as are detailed in all of the papers in this issue. Hence, it is essential to consider not only how data (or more precisely, a community of practice’s views on data) serve to define and delimit the community of practice. It is necessary to go further to examine how data are tied in with the establishment of identity (Hackett 2005) and claims to expertise (a complex and politically plagued issue in science; see, for instance, Hilgartner 2000; Brown 2009), including in negative cases where some entities are not in fact considered to be high-quality data (or even data at all). Communities often distinguish themselves from one another, and mark off their territories, by focusing on weaknesses in the quality of the data created or used by others: consider the examples in Nadine Levin and Sabina Leonelli’s paper about valuing biological data. Hence, what sometimes begins as a conflict between practitioners over techniques, methods, and standards in fact can become a marker of differing epistemologies and in turn diverse identity claims (Hackett 2005). In short, the transformation of some data into “shadow data” is a complex process and clearly is not just (or primarily) about the data themselves.
Oftentimes proponents of open data link collaboration directly to openness without detailing the other necessary mechanisms that must be in place for such practices to occur. Thus, a critical second point that emerges from this special issue is that the simple transparency or availability of data does not necessarily result in what many would consider to be collaboration or coproduction, let alone shared collectivity in any deep sense (an interesting example of conflicts that arise can be found in Peled 2011, which examines US open government initiatives). For some situations in which data collection is a central activity, it is not the data themselves that are critical, but the interchange and accumulation activities associated with their management; consider, for instance, many contemporary citizen science movements or historic efforts such as those described in Elena Aronova’s paper. So even when data became open, other aspects of the associated processes can remain opaque, namely, what abilities or capacities are required to produce and use the data: as analyzed in several of the contributions, notably Levin and Leonelli’s article (see also Leonelli 2016), the skills, labor, and care required to allow data to be reusable often becomes obscured and disappears in the shadows. These issues in turn encourage us to consider not just the ontology of data but the underlying epistemologies and political commitments of producers, curators, and users, including potential conflicts in their epistemologies and worldviews.
Perhaps most importantly, as all of the papers underscore in different ways, the value of openness is something that needs to be gauged in a particular context and within a complex network of relationships that constitute communities of practice. In some instances, these communities have preferences for “not knowing,” in other words for not seeking out more data and bringing them into the open, particularly where additional data are likely to create disadvantage, including by undermining existing findings or theories: a key example can be found in McGoey’s analysis of neglected or “absent” data on wealth disparities. There are additional forces that present impediments to openness, including commercial advantages (Evans 2010), security concerns (Roberts 2006; Balmer 2013; Rappert and Balmer 2015), and desires to maintain scientific priority. In other cases, there are apprehensions about certain types of data: for instance, should we conceptualize negative results as a form of “data,” and if so, how might this affect what counts as a publishable unit, which is already an extremely complex issue with regard to data?
Data from the so-called gray literature are an interesting case of shadow data (for instance, in Aronova’s case, reports, policies, and other nonscholarly literature serve as data points). This type of literature is produced outside of commercial publishing venues common in academia, so includes research produced for internal purposes within companies, governmental bodies, and other organizations (Farace and Schöpfel 2010), but which usually lacks systematic means for distribution, collection, or archiving as well as differential methods of quality control (Lawrence et al. 2014); open access preprints also are an increasingly important form of gray literature. One might ask if even the concept of what counts as “gray” literature is in effect an attribution of power, particularly about what are considered to count as “real” data. In certain fields, in terms of amount of use and reuse, gray literature arguably is on par with more traditional sources of evidence and data: consider engineering, where it accounts for approximately 40 percent of citations; in economics, it represents 9-17 percent and biology 5-13 percent (Schöpfel and Farace 2010). It may well be that this distinction between gray and “real” literature is partially eroding particularly with Internet technologies which allow us more ready access to data produced in a variety of ways. Thus we must ask how these changes in what is accessible in terms of data impact on our understandings of discipline formation and scientific practice, as well as on conversations about critical issues that previously tended to be hidden from view due to industrial, commercial, governmental, and other interests.
Finally, this special issue encourages us to consider in much more detail the performative and embodied nature of data—how are they made and used—and hence takes debates about data well beyond the usual focus on their ontologies. These questions are particularly useful in cases where data have both internal-facing relevance (within the relevant scientific field itself) and external implications and applications, for instance, in public and social policy. Thus, it is essential to continue to question how data and the processes associated with them are shaping not only scientific practices, but a range of other social, technological, and political processes. By focusing on data and the practices that surround them, we can continue to analyze what points of view are being privileged, which are being silenced, and in turn what data that might be produced (but have not been) have thus been rendered nonexistent and returned to the shadows.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
