Abstract
This paper analyzes how enthusiasm for data science methods in organization science has masked divergences between data science and organization science epistemologies and the likely consequences of adopting some or all of the precepts of data science epistemology on the future direction of organization science theory and research. Possible outcomes are framed in terms of three scenarios: marginalization of data science within organization science, integration of data science and organization science epistemologies and methodologies, and bifurcation with some organization science scholars affirming the current epistemology and others adopting data science epistemology and methods. The paper concludes with suggestions to shape the conversations needed to set a path for the future of OS theory and research.
Big data and associated analytical methods have received considerable attention among academics in a wide variety of disciplines. Interest in big data began with scholars in engineering and computer science working on algorithm development and database optimization. It has evolved into a formal academic discipline that integrates model development and knowledge discovery. The term data science (DS) is used here to represent this emerging field. It includes the aggregation and management of large, often heterogenous datasets; the methods used to analyze those data; and the interpretation of results generated by those analytical methods.
This shift in focus from model optimization to knowledge discovery highlighted the salience of epistemology. The initial foray into epistemology was to define a new science that signaled the end of theory, as all knowledge was thought to be embedded in data (Anderson, 2008). The death of theory was short lived in that data scientists have acknowledged the value of theory in interpreting the results of their empirical models. However, theory is applied post hoc to assist in making sense of the complex relationships that DS algorithms typically identify (Elragal & Klischewski, 2017).
Organization science (OS) has more established epistemological underpinnings. Nonetheless, persistent problems with OS theory and research have been identified. OS theory and research have been characterized as lacking relevance, and as moribund and fragmented (Aguinis & Vandenberg, 2014; Kim et al., 2018; Leavitt et al., 2021). Recent criticisms have highlighted an epistemological defensiveness in OS that protects objectivity, causality and rationality at the expense of imaginative thinking and humanism (Cunnliffe, 2022).
Disquiet about the current state of OS theory and research has resulted in receptivity to new models, methods and mindsets, with considerable attention directed toward DS. In considering how DS can enhance OS theory and research, DS models and methods have been characterized as welcome additions to our “epistemological toolbox” (cf., Leavitt et al., 2021). Further, although they are not seen as a panacea, the value of DS methodologies in addressing long-standing problems in OS research has been highlighted with respect to theory development and testing (Tonidandel et al., 2018) and to building empirical models that improve management practice (McAbee et al., 2017).
Incorporating DS methods into OS, however, goes beyond methodology because DS epistemology cannot be easily separated from DS methodology. As such, increasing interest in DS methods in OS increases the likelihood of a collision between DS and OS epistemologies. The prevailing view in OS is either that, with a change in mindset, DS (presumably both its epistemology and methodologies) can enhance OS theory and research (Tonidandel et al., 2018) or that DS and OS epistemologies and methodologies can peacefully coexist (McAbee et al., 2017).
These views seem overly sanguine. To begin with, epistemologies have implications for values, assumptions and accepted practices (Longino, 2002) so that integration of DS into OS involves more than a changed mindset. In addition, the disruptive nature of DS on the process of knowledge discovery has been recognized by data scientists (Kitchin, 2014), so that a collision of DS and OS epistemologies might result in a redefined OS with unintended, undesirable consequences.
The purpose of this paper is to gain a better understanding of the potential influence of DS epistemology on OS theory and research. The notion of a collision of epistemologies was used to connote the process of divergent forces coming together with both positive and negative outcomes. Those outcomes raise important questions for OS thereby adding to the conservation about the future direction of the discipline. Further, because unlike methodology, epistemology reflects our values and expectations, it is relevant not only to how we define our science, but also to our identity as scientists.
Epistemologies in organizational research
Organizational research is guided by multiple epistemological frameworks that differ in terms of how knowledge is defined and discovered. The term “organization science” is used here to refer to the prevailing epistemological paradigm, which is defined as model-centered and grounded in scientific realism. Realism is based on the premise that phenomena are real and exist objectively so that the task of the organization scientist is to attempt to explain those phenomena. As such, OS epistemology codifies processes and procedures for developing and evaluating possible explanations for phenomena, with the objective of discovering objective, generalizable explanations for outcomes of interest (McKelvey, 2002).
Scientific inquiry is guided by established epistemological paradigms which prevail until they are superseded (Kuhn, 1962). Although the model-centered OS epistemology is prominent in organizational research (Cunnliffe, 2022; Hambrick, 2007), it is not the only epistemological framework used to study organizational phenomena, nor is it universally accepted. As such, the term “organization studies” is used here to capture the epistemological pluralism that is evident in the discipline.
Organization studies encompasses a variety of frameworks that include action research, ethnographic research, critical management studies and postmodern thought, feminist theory and humanism (Clegg et al., 2006). Although there are nuances in these epistemological frameworks, there is consensus that knowledge is context bound, transitory and relative (Adler et al., 2007). Thus, they challenge a fundamental premise of scientific realism, embedded in OS epistemology, that explanation results from the “abstraction of essential commonalities that can be generalized across contexts” (Cunnliffe, 2022, p. 3).
The distinction between organization science and organization studies is significant because it frames the conceptual domain of this paper. Specifically, data science epistemology is as an alternative to the scientific realism (or normal science) that underpins OS epistemology (Elragal & Klischewski, 2017). Consequently, our definition of OS epistemology establishes boundary conditions that ensure that OS epistemology is not conflated with the epistemologies that comprise organization studies.
Theory and theoretical models
Realism results in an epistemology and associated values that include a formalized research process, validation of measures and methods, and the accumulation of empirically generated knowledge by studies guided by the parsimonious application theoretical concepts and assumptions, The objective of scientific inquiry is to develop simplified representations of reality that are isomorphic with real world phenomena of interest so that those phenomena can be better understood (Campbell, 1988).
In OS epistemology, theories and theoretical models are the vehicles needed to generate knowledge about the empiric world. Model-centeredness refers to hypothesized linkages among concepts and their hypothesized isomorphism with reality (McKelvey, 2002) that explain phenomena (Shapira, 2011). Bacharach (1989) defines theory as “a statement of relations among concepts within a set of boundary conditions and constraints” (p. 496). Boundaries define the phenomena under study and the conditions under which proposed relationships among constructs hold. Concepts are abstractions that cannot be observed directly (Kaplan, 1964) so that they must be transformed into variables that are closer to the empiric world to enable the testing that is essential to theory development (Bacharach, 1989; Bunge, 1967a).
An important discussion in OS about what theory is not has proven useful in defining what theory is, and in turn, clarifying OS epistemology. In defining what theory is not, Sutton and Staw (1975) point out that theory is not data, a listing of constructs or variables, diagrams or hypotheses. Rather, Sutton and Staw (1975) view theory as a linguistic connection among constructs based on logical operations that define processes that explain phenomena. In essence, theories are logical systems that are hypothesized to mirror empiric referent systems.
Theories have been viewed in terms of families of models so that the abstract concepts that comprise a theory can be used to develop multiple theoretical models through logic and reasoning (McKelvey, 2002), Theoretical models are clear, comprehensive statements that provide explanations of phenomena of interest (Collins, 2006) that are comprised of less abstract constructs (i.e., variables) that are operationalized with an auxiliary measurement theory. Empirical testing is essential to theory development and theoretical models are evaluated by assessing whether predictions about phenomena are true or false (Bunge, 1967b; Shapira, 2011).
This process is summarized in Figure 1. As indicated in the Figure, a conceptual system (i.e., theoretical model) is developed and tested in relation to a referent system in the empiric world. A theoretical model is assessed with respect to whether the hypothesized isomorphism between the theoretical model and referent system generates accurate predictions about specific outcomes in the referent system (Bunge, 1967b; Shapira, 2011; Sutton & Staw, 1975).

Structure of a theoretical model.
A recent historical review of the employee turnover literature provides a good example of how OS epistemology guides OS research. In particular, Hom et al. (2017) trace the theoretical underpinnings of turnover research from its inception to the present time. As studies evolved, they were guided by foundational turnover models (i.e., theoretical models) in which job satisfaction and job alternatives were identified as key variables in an explanatory process that defined the decision to leave. This process began with dissatisfaction with one's job which triggered job search and the intention to leave. This theoretical model of turnover was augmented with theory tied to rational decision-making based on subjective expected utilities. The (theoretical) turnover model was tested and refined over a multi-year cycle that Hom and colleagues (2017) refer to as normal science, with refinements to turnover processes including the addition of other antecedent variables such as organizational commitment and job performance. Later refinements included a revised job search process based on a better understanding of labor market conditions and the addition of job embeddedness as an influence of turnover.
This historical account of turnover research captured how OS epistemology guided turnover studies. Theoretical concepts were distilled into more specific explanatory processes within the framework of a theoretical model that began with job satisfaction and job alternatives. It was later refined through empirical testing to include additional concepts that resulted in richer explanatory processes.
Data science epistemology and the new science
Data science is a rapidly evolving academic discipline that emerged from the integration of big data and advances in artificial intelligence and machine learning. Data science methods are focused on the efficient recognition of patterns in (usually) very large datasets. As the connection between data and knowledge came into focus, data scientists were faced with the prospect of codifying the process of knowledge discovery. In addressing this challenge, a new epistemology emerged in which knowledge generation was defined as liberating and creating meaning from raw data (Donoho, 2017). Termed the fourth paradigm (Hey et al., 2009; Kitchin, 2014), data science epistemology provided the foundation for a new science in which theoretical models were replaced with analytical models built with large scale datasets and pattern recognition algorithms (Vallverdu, 2009).
Data science epistemology was influenced by philosophies and methods that meld empiricism and inference with knowledge discovery (Desai et al., 2022). The influence of constructive empiricism on data science epistemology is evident in the shared focus on empiricism and pragmatism. Specifically, constructive empiricism, an alternative to scientific realism, is based on the premise that theories must be interpreted literally so that all factual content must be reduced to observational levels to be understood. As is the case with data science epistemology, knowledge is found in observables (i.e., data) so that the abstract concepts that comprise theoretical models are seen as having no explanatory or predictive power (van Fraassen, 1980). Thus, constructive empiricism focuses attention on empirical patterns in data that are relevant to predicting outcomes of interest (Gutting, 1983). Data science epistemology also draws upon statistical theory and the logic of probability to make inferences from data and to justify conclusions reached from those inferences (Desai et al., 2022). Statistical theory and probabilistic inference provide a framework to assess whether one result is more or less plausible than are others that can be used to evaluate the reliability of knowledge claims (Mayo, 2018).
Values and assumptions in the new science
This new science has been characterized as a post truth epistemology. Post truth refers to moving beyond truth as it is defined in scientific realism so that truths in data science are contextual, transitory and embedded in data (Maruyama, 2021). These different truths revolve around a value system that places much greater emphasis on correlation than it does on causality (Kitchin, 2014; Lowrie, (2017). Rather than the ultimate form of explanation, causality is seen abstract inference that cannot match the greater predictive accuracy generated by algorithmic analyses (Mazzocchi, 2015; Schonberger & Cukier, 2013). The empiricism that underpins correlational analyses is also justified with the view that truths change as data accumulate and as algorithms become more proficient at finding patterns in those data (Maruyama, 2021).
Analytical models as alternatives to theoretical models
Analytical models are the engine of knowledge discovery in data science. In contrast to theory and theoretical models, analytical models are computational systems in which algorithms are developed and trained to identify patterns in data. As such, in an analytical model, the referent system is defined in terms of input data, and those data are analyzed at the empiric level so that an analytical model fuses empiric referent systems and modeling systems (Schonberger & Cukier, 2013).
An analytical model is comprised of input data, pattern recognition algorithms, and an outcome or outcomes of interest. As indicated in Figure 2, the level of abstraction does not go beyond observables in the empiric world, where all knowledge (i.e., truth) is thought to reside (Lowrie, 2017). Analytical models are refined to increase predictive accuracy by adjusting the parameters of pattern recognition algorithms (Wheeler, 2016). That is, new data are added to an analytical model and/or algorithms are refined, and the increased predictive accuracy is interpreted as knowledge discovery (Donoho, 2017; Schonberger & Cukier, 2013).

Data science epistemology.
Analytical models have been characterized as incomplete by OS scholars because they do not provide an explanation for the predictions they generate (Shapira, 2011). Data scientists reject this position, arguing that knowledge resides in empirically derived patterns in data. In cases where analytical models generate findings that are inconclusive, Elragal and Klischewski (2017) offer the notion of lightweight theory as a tool to improve their interpretability. Lightweight theory refers to the post hoc application of theoretical concepts and models that are applied to “sort out” patterns in data generated by analytical models. These concepts and models provide an interpretative context that goes beyond observed empirical relationships. Lightweight theory is grounded in pragmatism so that the choice of theories, models and constructs require no justification other than their value in interpreting results from analytical models. Thus, because theories or hypotheses are not being tested empirically, multiple theories and/or constructs from different theories can be used to interpret results generated by one analytical model.
Analytical models are not common in OS research. To gain a better understanding of how analytical models are used to study phenomena of interest and how they differ from theoretical models, a study of employee turnover using an analytical model is briefly summarized. Raza et al. (2022), compared multiple analytical models built with machine learning algorithms to predict employee turnover. The objective of this study was to compare and calibrate several neural computing paradigms to determine those with the highest levels of predictive accuracy, and then to identify key variables within the most efficacious analytical model.
Study methodology was presented as a workflow diagram explaining how data were extracted and analyzed. Analytical models were built and refined using forty opportunistic input variables, taken from a company database, that were primarily descriptive (e.g., age, monthly income, job and company tenure). Although job satisfaction was included in the dataset no mention was made of its connection to turnover theory. Several analytical models, using different pattern recognition algorithms, were compared and discussed. The paper concluded by identifying the key variables in the analytical model that best predicted turnover, and those variables were interpreted as representing new knowledge to be applied to managing employee turnover.
This research design mirrors the depiction of data science epistemology presented in Figure 2. It begins with data agglomeration and processing and then proceeds to building and refining analytical models using pattern recognition algorithms. As depicted in the Figure, analytical models were refined (i.e., trained) by comparing predictions with actual values, and algorithms were adjusted to improve predictive accuracy. The process concluded when it was no longer possible to achieve meaningful gains in predictive efficacy and the finalized analytical model was then deemed fit to be applied in organizational settings. As such, the legitimacy of the analytical model rests on the epistemological assumptions that knowledge resides in data and not concepts, algorithms unlock that knowledge by identifying patterns in data, and those patterns are sufficient to explain the phenomena under study so that theories and theoretical models are tangential to the process of knowledge creation.
Data science epistemology as a disruptive force
Given differences in values, accepted practices and research traditions, a collision of DS and OS epistemologies seems highly likely. Moreover, as DS is inherently disruptive (Desai et al., 2022), the outcome of that collision will, almost certainly, impact the future of OS theory and research. The desired result, championed by proponents of DS, is an hybridized epistemology in which the scientific realism that guides OS research is enhanced with new methods of knowledge discovery (Leavitt et al., 2021). Anticipated benefits include more complete and more relevant theory, and research that is more closely tied to practice (McAbee et al., 2017; Oswald et al., 2020).
The disruptive and unpredictable nature of DS, however, suggests that integration is only one of several possible outcomes. The two most likely alternatives are marginalization and bifurcation. Marginalization results from strict adherence to the current OS epistemology such that resistance to change limits the influence of DS on OS theory and research. Bifurcation occurs if integration is not successful so that OS epistemology is affirmed by some OS scholars, but rejected by others who adopt DS epistemology.
Marginalization
There is no question that pattern recognition algorithms generate higher levels of predictive efficacy when compared to multivariate linear methods (cf., Tonidandel et al., 2016). However, when epistemology is considered, the question of how DS methodologies produce those results becomes salient. The answer to that question is worrisome because, in many cases, even data scientists are not able to explain how the algorithms that they built generated those superior results (Lipton, 2018; Desai et al., 2022).
It is hardly surprising that organization scientists have raised concerns about the incomprehensibility of the results generated by pattern recognition algorithms. Lindebaum and Ashraf (2024) point out that science entails both predicting variance and then explaining the predicted variance. Studies guided by DS epistemology, thus, highlight the problem of translating unexplainable variance (Schoppe et al., 2016) into explained variance so that the value unlocked in data can be realized. This problem is exacerbated when the process by which algorithms learn or recognize patterns in data is examined in detail. Changes in how algorithms are configured and trained can have a dramatic effect on predictive efficacy and the identification key predictors, and concerns have been raised about this indeterminacy with respect to OS research (Scarborough & Somers, 2006).
One possibility, therefore, is that the collision of epistemologies takes the form of a glancing blow such that DS remains at the margin of OS. In this case, select DS methods will be gradually incorporated into the “toolbox” of OS researchers as one analytical option, while basic tenets of OS epistemology remain unchanged. The DS methods most likely to be adopted are those tied to text, speech and visual images because the analytical methods used in OS research for these data sources are not as powerful in identifying patterns in these types of data as are DS methods.
Early trends in the adoption of DS methods in OS are consistent with this scenario. Specifically, Leavitt and colleagues (2021) assessed the use of DS methods in OS and closely related disciplines, and found that analysis of text and/or images was the most common application of DS methods, representing 56 percent of studies cited. This finding is not surprising in that studies using text data typically offer a new perspective in established areas that complements existing quantitative research. For example, analysis of text and facial analyses using machine learning provided new insights into CEO oral communication styles that can be incorporated into existing theory and research (Choudhury et al., 2019).
Integration
Integration is based on the premise that DS has a place in OS because it will enhance and extend OS theory and research. Although we are at early stage in the process of building connections between DS and OS, there are clues in the OS literature about the direction integration might take. The scope of the proposed integration can be viewed along a continuum. At one end of the spectrum, preservationists advocate leaving the basic tenets of OS epistemology (i.e., scientific realism) intact while, at the other, revisionists propose modifications to OS epistemology that alter how knowledge is defined and how research is conducted.
Enthusiasm for DS among preservationists stems from the power of DS methods and from frustration with the current state of OS theory and research. Although they emphasize the need for improving theory development and testing, preservationists affirm the epistemology that guides OS research. For example, Tonidandel and colleagues (2018) point to how DS can refine and extend theory development by discovering new relationships among theoretical concepts, and by opening new domain areas with new sources of data. Thus, the analytical models that underpin DS epistemology are seen as vehicles for generating knowledge to refine explanatory processes (i.e., conversion processes) in theoretical models, and to guide the development of new theoretical models. Scarborough and Somers (2006) make a similar argument emphasizing the value of analytical models in assessing the almost universal assumption of linearity in OS theory and research.
The preservationist mindset is represented in the OS literature. For example, an early study built and tested analytical models of employee turnover (Somers, 1999). However, consistent with OS epistemology, variable selection was guided by turnover theory and included established predictors of turnover including job satisfaction, organizational commitment, and turnover intentions. Further, results from analytical models were interpreted within the context of turnover theory and research, and the focus of this study was to use analytical models to advance theory development. To that end, observed patterns of nonlinearity suggested that there were discontinuities in relationships between turnover antecedents and turnover. These tipping points were interpreted as a resulting from shocks that trigger the decision to leave suggesting that turnover processes might not always be linear. Analytical models, thus, were used to assess and possibly refine processes in a theoretical model.
Revisionists are more accepting of DS epistemology so that integration takes the form of varying degrees of an hybridized science. Enthusiasm for DS among revisionists stems from the pragmatism inherent in DS epistemology as a means to drive more relevant research and to foster new forms of knowledge and knowledge discovery. The result is a reimagined OS epistemology that includes multiple forms of knowledge and knowledge discovery that enhance OS theory and research.
Leavitt and colleagues (2021) offer a relatively conservative revisionist perspective grounded in the notion of local theory. Local theory is focused on specific, contextualized problems so that the resultant insights are not generalizable. As an example, Leavitt and colleagues (2021) suggest that a study that predicts the popularity of applications in Apple's App Store using analytical models might not generalize beyond Apple's ecosystem, but has the potential to provide valuable insights for the company's stakeholders. They go on to suggest that studies building local theories using analytical models can benefit OS theory and research by increasing precision in theory development. For example, with respect to Apple's App Store, identifying patterns in the popularity of applications by developer and domain is useful in gaining a better understanding of usage and users. Those insights can then be used in theory development with respect product development and adoption. Importantly, while Leavitt and colleagues (2021) acknowledge the pragmatism associated with local theory, they do propose a linkage to the development of traditional OS theory.
Others have broadened the scope of the proposed integration in their conceptualization of an hybridized OS epistemology. Specifically, Oswald and colleagues (2020) state that: “To be clear, there certainly remains a firm place for deductive research….. But if IOP and HRM researchers and practitioners keep themselves situated within the research paradigm of NHST (null hypothesis significance testing, added by author) and confirmatory modeling, then we will rarely avail ourselves of the multidisciplinary engagement in big data opportunities and challenges, such as those raised above, that may also contribute to advancing both science and practice in organizations.” (p. 517).
This statement is significant because it raises the prospect that OS will be left behind if DS (both epistemology and methodology) is not integrated into OS, a view also offered by McAbee and colleagues (2017). It is also significant because knowledge discovery is reshaped such that the search for patterns in data without preconceived notions of what might emerge is legitimized. Thus, from a revisionist perspective, OS is comprised of theoretical and analytical models so that theory driven research and empirically driven research coexist within one discipline.
To summarize, integration occurs when DS and OS epistemologies collide and they become intermingled. Variation in the level of integration is a function of the extent to which the empiricism and relativism in data science epistemology influences theory, knowledge, and knowledge discovery in OS.
Bifurcation
There is risk to adopting DS models and methods in OS research. Those risks revolve around differences in values, accepted practices and desired outcomes among those OS scholars who accept DS models, methods and epistemology, and those who do not. Thus, one possible outcome is a schism that results in the bifurcation of OS. Although one might be inclined to dismiss this outcome as very unlikely, there are internal and external forces that can result in a bifurcated OS. Internal forces stem from by the widely held belief that OS theory and research is deficient and in need of a new ideas and new directions (Kim et al., 2018). Put simply, some OS scholars see the need for change, and proponents of DS have argued that it can address the perceived stagnation and lack of relevance in OS research (Leavitt et al., 2021; McAbee et al., 2017; Tonidandel et al., 2018).
Turning to external forces, there is a growing body of research by data scientists using analytical models and opportunistic data sources in OS domain areas. As this research is published in outlets that are unfamiliar to OS scholars, most of them are not aware of studies across a wide range of topics such as decision-making (Gou et al., 2021), turnover (Yuan, 2021), absenteeism (Lawrance et al., 2021) and work attitudes (Rustam et al., 2021). This body of research is significant because OS scholars who break from accepted practices in OS will have scholarly outlets to publish their research.
Studies building and testing analytical models are also beginning to appear in the OS literature. For example, a recent turnover study closely mirrors the design and interpretation of the analytical model of turnover described earlier in this paper. Specifically, El-Rayes et al. (2020) used a dataset taken from glassdoor.com to compare and refine multiple analytical models of employee turnover using pattern recognition algorithms. Variables were selected opportunistically based on the contents of the glassdoor.com website and the focus of the research was on identifying analytical models with the most predictive efficacy and the most predictive variables within those models. As such, the data, study objectives, analyses and interpretation of results aligned closely with studies of turnover conducted by data scientists (cf., Raza et al., 2022). Although they are not yet common, it is noteworthy there are instances where analytical models have replaced theoretical models in established OS domains with no apparent connection to theory and accumulated research results.
Bifurcation results when the collision of DS and OS epistemologies generates enough energy for the escape velocity necessary to create a schism in OS. In considering this possibility, it is unlikely that any one factor will lead to bifurcation, but when they are considered in concert, it is an outcome that is not so easily dismissed. At this point, the absence of a conversation among organization scientists about the possibility of bifurcation is more worrisome that the outcome itself because a bifurcated discipline might be mostly in place before we realize what has taken place.
Implications
Although a consensus about how and where DS fits into OS has yet to emerge, there is general agreement that DS cannot be ignored. The strong focus on methodology in prior work has directed the conversation toward the role of new data sources and new methods as tools for enhancing OS theory and research. Shifting the focus to epistemology changes that conversation from enhancing to potentially redefining our science. In so doing, the epistemological narrative raises important questions about our values, accepted practices and desired outcomes, issues that have not received much attention in the methodological narrative.
The implications of DS epistemology for OS theory and research fall into two areas. The first area, epistemological pluralism, crystalizes long-standing arguments about the limits of scientific realism in OS research. The second area, the relative value attributed to explanation and prediction, follows from epistemological pluralism and has implications for how research is conducted and interpreted, for the identity of organization scientists, and for the identity of the discipline.
Epistemological pluralism
The adoption of DS methods has led to cracks in the hegemony of scientific realism that defines OS epistemology. That is, the case for incorporating DS models and methods into theory and research guided by OS epistemology includes (an often implicit) acceptance of DS epistemology resulting in a new, potentially disruptive addition to the epistemological pluralism that is present in the discipline. Interest in DS and DS epistemology, thus, revisits questions raised in the organization studies literature about whether scientific realism is an adequate epistemology to study organizational phenomena (cf., Cunnliffe, 2022).
Although the paradigm that guides OS research has not yet shifted, it might be cracking. A paradigm shift that alters an established epistemology is not a simple matter so that there are many questions to be answered and issues to be resolved. Most are centered on the value and place of analytical models within OS, as analytical models sit at the cornerstone of DS epistemology. Specifically, if a modified OS epistemology is to include both analytical and theoretical models, clarity is needed with respect to how different forms of knowledge and different methods of knowledge discovery will coexist within a unified discipline.
Given the complexity of this issue, it is not surprising that there is confusion among those advocating for integrating DS and OS. For example, Leavitt and colleagues (2021) emphasize the value of DS in theory building, yet propose local theories built with analytical models as a separate and valuable type of knowledge. Tonidandel and colleagues (2018) make a similar argument while Oswald and colleagues (2020) argue for two distinct and separate types of knowledge that are placed on an equal footing.
Sorting out the nature of the integration of DS and OS requires agreement on values and accepted practices (Longino, 2002). To date, the rules of the road (i.e., accepted practices) are unclear so that a nascent pluralism in OS driven by DS epistemology remains ill-defined. For example, at present there are no mechanisms for identifying problems and domain areas that are suitable for analytical models and those that are suitable for theoretical models. Consequently, OS scholars can presumably use analytical and theoretical models as they see fit. The result is very likely to lead to disparate and contradictory findings by research guided by analytical models because studies using localized and opportunistic data sources with indeterminate methodologies are unlikely to converge. Moreover, those results are not likely to converge with studies based on theoretical models. It is not clear how such contradictions are to be resolved, especially because studies building analytical models and studies testing theoretical models, operate on different timelines (Oswald et al., 2020). Thus, one is left to wonder how theoretical research spanning several years that contradicts the most recent findings from analytical models results in an OS that is stronger, more vibrant and more relevant.
It is also important to be mindful that DS is seen as a disruptive, unpredictable force by data scientists. The disruptive capabilities of DS stem from rapid developments in the power of the pattern recognition algorithms that drive analytical models. Preliminary studies using artificial intelligence in human resources management research had weak theoretical grounding (Pan & Froese, 2023) indicating a possible weakening of the role of theory in OS research. Given that AI is at an early stage in its development, there appear to be risks in embracing DS methods and the associated epistemology that are not fully understood. As such, when Tonidandel and colleagues (2018) point to the value of DS in improving “our” science, it is also important to be mindful that integrating DS and OS involves adopting key elements of “their” science, and that decision comes with risks that need to be evaluated.
Explanation and prediction
Incorporating a local perspective into OS research in which analytical models coexist with or eclipse theory has significant implications for the work and the identities of OS scholars as well as for the identity of the discipline. Analytical models, by definition, are empirically driven so that their value is defined in terms of predictive efficacy, even if the basis for that efficacy is not clear (Vallverdu, 2009). Given that the search for patterns is not equivalent to the search for truth as currently defined in OS, research driven by the goal of maximizing predictive accuracy changes the nature of work and the identity of those scholars who adopt this mindset. Indeed, while data scientists have embraced their identity as algorithm “tuners” (Lowrie, 2017), it is not clear that this orientation aligns well with the goals and objectives of organization scientists.
With respect to the perceived value of and identity of OS, it has been noted that important problems require deeper levels of understanding grounded in theory, and that understanding is not gained quickly or easily (cf., Weick, 1989). A greater focus on empiricism and predictive efficacy raises questions about whether OS has shifted its emphasis from difficult problems to pursue more circumscribed problems that generate immediate results. A commitment to this type of research seems to raise the question of whether we are defining data or whether data are defining us. As such, it is important to ensure that if DS and OS are integrated, the next wave of criticism of OS is not that we have taken the easy way out by pursuing “unexplainable” variance at the expense of more meaningful research.
Summary
A science grounded in theory-based explanation does not preclude the adoption of elements of DS epistemology and methodology. Thus, an hybridized, pluralistic epistemology that enhances OS theory and research is achievable, but here is much to be resolved before this outcome can be realized. With respect to epistemological pluralism, it is necessary to begin by building a consensus in OS about where and how DS epistemology and methods fit into OS theory and research. In particular, the current conversation about DS methodology must advance to include OS epistemology so that there is widespread agreement about values, accepted practices and quality standards (cf., Longino, 2002). With regard to the role of analytical models in OS, consensus must be reached about whether they serve as precursor to formal theory development as form of protoscience (cf., Bunge, 1967a), an engine for discovering new, but less generalizable knowledge (Leavitt et al., 2021) or a new form of knowledge that expands the scope of OS research (Oswald et al., 2020).
These conversations will necessarily occur in at a time of significant change in DS. Given the flux and risks associated with these changes, the interaction between DS and OS epistemologies was characterized as a collision rather than as an ordered process. To get a clearer picture of how DS might change OS research, the marginalization, integration and bifurcation scenarios are presented as a preliminary framework for mapping the potential influence of DS on our discipline and our science. (See Table 1) This summary is intended to assist in continuing the conversation about our values, accepted practices, expectations and desired outcomes as we chart the future direction of OS.
Comparison of scenarios for data science and organization science.
Conclusion
The mindset associated the adoption of DS in OS appears to be following a trajectory similar to that associated with the adoption of the Internet in business. At the early stages of the development of the Internet, marginalization and controlled integration were seen as the only possible outcomes with respect to its influence on commercial activity. The possibility of transformative disruption was met with either derision or disbelief (Levine et al., 1999). Yet, the unimaginable came to pass in a comparatively short time. We are at a similar early stage with respect to DS and OS. Consequently, OS scholars would do well to pay more attention to DS epistemology and its influence on our science and our identity.
Footnotes
Author note
There are no conflicts of interest to disclose.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
