Abstract

In the last six years, “big data” has attracted a lot of public attention from politicians and scholars. The growing interest is in part a response to a series of corporate announcements and public summits on the topic, advocacy efforts by privacy researchers and watchdogs, the Obama Administration's proposed Blueprint for a Privacy Bill of Rights to protect consumers online, the Snowden revelations about National Security Agency (NSA) Surveillance, the release of various national government reports on the subject, and the announcement of multi–million dollar national initiatives funded by the Obama Administration like the “Big Data Research and Development Initiative” ($200m), among other Executive orders regarding big data, security, and privacy. Despite the attention devoted to this emerging and evolving concept, the discourse on the potential scientific breakthroughs and epistemological paradigm shifts expected to solve societal problems and the proposed economic opportunities “big data” can deliver is limited and lacks theoretical grounding.
There are various definitions of “big data.” But most can agree that “big data” is the accumulation of incalculable bits of structured and unstructured data points that are tracked, recorded, and stored by an interconnected and interactive network of digital computer devices in real time; making up a dynamic and massive collection of databases that capture the operation of systems and everyday personal activity that can be linked and mined through advanced computational data analytics, disaggregation and aggregation processing techniques, and re–identification processes.
Most analysts (especially professional “data scientists,” corporate and government representatives for big data) emphasize the volume of big data, where quantity is the value. A popular belief is that “big data” is objective, representative, and generalizable because it captures all (n = all). The presumption is clear: The more data there is about everyone and everything the more quality information and knowledge can be produced. Moreover, the easier it will be to create new products and services that will lead to economic growth and development. Likewise, the faster it will be to predict social phenomena and better approach a wide range of societal issues (health care, economic growth, security, etc.) and easier to achieve “data–driven” decision making, compensating for the imperfection of human intuition and biased special interests.
However, addressing long–standing social, economic, political, and institutional problems takes more than information, that is, political will, among other factors, is a vital component of positive change. Furthermore, big data is not neutral, and does not adequately reflect or represent all of society. In fact, it is alarmingly incomplete when it comes to social and ethnic populations, that is, there are millions of primarily low–income, low–educated, older, and ethnic populations (African Americans, Latinos, and Native Americans) that are still offline. Thus, the claim that data–driven decisions are superior is misleading. Big data is not objective and does not speak for itself. Nor is it an independent entity, but a reflection of subjective decisions by its producers, designers, and consumers. In fact, it is inherently political and contextual, built on specific ideologies and assumptions by its creators and analysts. Indeed, it is tied to certain economic and political purposes and frequently intended for use in circumstances not completely foreseen by its creators.
Consequently, more scholarly attention needs to be given to the social–institutional structures that govern big data about institutions, organizations, people and their behaviors, and the effects of the same. The agency of data creation and methods of data production must be unveiled. It matters who owns, designs, manages, analyzes, interprets, and regulates big data. Similarly, it is vital to figure out how big data is sustained, when, with whom, and the conditions under which it is shared (or not). What questions are being raised, how big data is framed or sold to the public is relevant to this study. The implications and impacts of “big data” on society can only be understood by grounding this unregulated political economy in the social and institutional structure under which advanced capitalist information societies operate. And also by taking account the interest of those who own and regulate the production, consumption, and exchange of data via the Internet. In other words, those interested in advancing the field of big data studies should stay away from overly simplified conceptions of big data. The context and setting in which big data is created and (re)used, how it is categorized and classified, the questions being addressed (or not asked), and how benefits and costs are framed provide strikingly different impacts. Ultimately, “big data” must be examined within this framework.
Scholars can advance knowledge by creating a comprehensive cross–disciplinary theoretical framework of big data that analyzes and describes the mechanism that facilitates the production, storage, and commodification of personal data and how institutions utilize information for capital and control. Such a grounded conceptualization of big data must account for, among other things, the interconnectivity of social, institutional, economic, political, and technology systems; the dynamic effects of social context, governance, and policy by place; the function of networked information and telecommunications technology as recording, transferring, and connecting past, present, and future bits of information; the nature of information as a commodity; the privatization of the technical infrastructure; the limitations of the legal system with regard to corporations’ treatment of personal information, including undisclosed tracking and unregulated third–party data use; the trade–offs between security and privacy in government surveillance; sophisticated advancements in predictive analytics; the obscurity and proprietary safeguards of algorithms; continuous improvement in search and retrieval functions; the latent value and monetization of big data and its reuses; misinformation and disinformation online; the underrepresentation of producers and overrepresentation of consumers by race, ethnicity, income, and country; the ideology of difference in society by place, class, race, and ethnicity; differing ways people process information; and the range of individuals’ views and online experience. We can focus on those processes that are new or changing.
Ultimately, the pressing task for a viable sociological philosophy is to outline how big data's social–institutional infrastructure and management system functions, for whose interest, and to whose detriment. Unveiling the players, their practices, strategies, techniques, (re)uses and interpretations of personal data, and the potential effects of predictable violations of privacy, can also expand our conceptual understanding of the industry and nature of the information age. We know very little about how influential players make decisions concerning data construction, how and what data can be collected and distorted to achieve certain outcomes. The assumptions made about the value of storing and selling personal information without limits and without consumer protection safeguards needs to be questioned. Also, we must take account of the biases embedded in databases and how these are (re)produced as they are distributed and go viral. In short, all of this analysis will help us make visible the impact and implications of the uses of big data on the human and social welfare of society.
To hold influential decision makers accountable, we can also build conceptual tools and simulations that help the public grasp the mechanisms of transmission, collection, storage, and the potential negative effects of leakage of bits of information that make up someone's personal profile. To accomplish this, we can illustrate (via metaphors as well as literal descriptions) how technology works within the frames of networked structures and of social–institutional systems. For instance, technology is more than a tool. It is a process of production, consumption, and exchange; always has been. What is new is the integration of a new set of network and information telecommunication technologies that facilitate the transmission and extraction of information and content online, and the technology mechanism and human skill set required to do so. Also new is the public policy and governance structure by which social engagement, economic production, political participation, and institutional communications and interactions with the public are regulated online. Moreover, we need to note that the severity of negative impacts in society can depend on the social and economic location of individuals. Certain populations are more vulnerable to monitoring and predatory behavior, that is, new entrants who are inexperienced online and lack embodied knowledge of how the Internet functions, and already disadvantaged racial and ethnic groups and the poor.
The question remains, how does owning and controlling big data recreate preexisting socio–economic divisions and uneven development patterns by people and place? Currently, the U.S. government, information, telecommunications, and cable industry in advanced information societies are profiting because they have the capital and human capacity resources to manipulate the productive function of big data (see Castells 2011 for an empirical analysis of these agents). In fact, they do so to meet their needs and interests. It follows that high resource institutions, corporations, organizations, and individuals with direct access to big data, who own the tools of the trade, and embody the expertise to manipulate information, have a competitive advantage over those who do not, especially during a time when big data systems are hardly understood or critically examined by a majority. These few agents are advantaged because they are able to extract and make sense of data in ways that others cannot. This is vital, as data are used to reify information and used to legitimate ideas and claims that can in turn validate policies that determine the allocation and distribution of resources; which can all affect social welfare and development trajectories. Consequently, low resource institutions, corporations, nonprofit organizations, and passive consumers of technology without the tools and intellectual capacity to manipulate big data are disadvantaged and disempowered in productive and development processes.
Much can be learned about the conditions under which society benefits or not from big data. The challenge and opportunity is to recognize the critical importance of how big data (its benefits and costs) is constructed and framed, for this determines outcomes. It is imperative to be critical of the conditions in which big data is collected, stored, interpreted, and used by institutions. The results for some individuals and places can be harmful and irreversible.
