Restricted Racial Realism: Heterogeneous Effects and the Instability of Race

Abstract

This paper challenges the view that race is a reliable scientific variable or kind for the purpose of inductive inference within the social sciences. I characterize stability in terms of Extended Conditional Independence (ECI) and show that the heterogeneity and instability of racial categories across different background circumstances undermines their ability to support robust inductive inference and explanatory power. I claim this, in turn, undermines racial categories' status as real scientific variables or kinds. Race, has local stability within restricted sets of target systems and thus, its reality is limited to those domains. I argue for a restricted form of racial realism, a view I call Restricted Racial Realism (RRR).

Keywords

social kinds philosophy statistics race causal inference

1. Introduction

In the philosophy of race, much of the literature has argued that race is a real social kind (Haslanger 2008, 2012; Jeffers 2013; Mills 1998; Root 2000; Sundstrom 2002; Taylor 2004). Often implicit in these views is the belief that race as a social kind has stability sufficient to support reliable inference in the social sciences. Further, this literature has minimal discussion about how strong our epistemic reasons or scientific support are for believing race is a real social kind. I am suggesting that few have posed the following question: Based on our best scientific knowledge, particularly social science findings, how real is race as a social kind? This paper challenges the view that race is a real social scientific kind and fills in the epistemic conceptual gap by arguing for a restricted form of racial realism with respect to race as a kind. I call this view Restricted Racial Realism (RRR). The main thesis of this paper is that race, as a kind, fails to meet important epistemic criteria for being considered real. Namely, it fails to be robustly stable. From the framework of scientific metaphysics (Ladyman 2012), we have good reasons to think a scientific variable or kind is real just if it figures into or supports reliable scientific generalizations. Stability is roughly a relationship (causal or probabilistic) that holds under some range of different background conditions and comes in degrees. It is an important characteristic that serves as the basis of what it means to generalize scientifically. This claim is true because for a kind or a scientific variable to possess stability, its core properties remain invariant across a wide range of different background conditions. This property, in turn, underwrites the scientific kind’s explanatory power (Woodward 2010). This fact implies that what we have good epistemic reasons to believe is real based on scientific generalization may come in gradations. More generally, justificatory belief states come in degrees, and science is no different. As a result, metaphysical reality that is informed by scientific epistemology is not all or nothing. We can have kinds that we have good reasons to believe are locally real but lack sufficient scientific generalization capacities to believe in them having a global reality or global realness. This is because these kinds do not have stability over a wide range of different background conditions and thus lack robust scientific generalizability. I claim that race as a kind lacks a large degree of stability needed for it to be considered as an epistemically valuable kind for robust scientific generalizations, even in the social sciences, where the category often has the most scientific uptake as a legitimate category to use for inductive purposes. I acknowledge that while race may appear stable within certain restricted regimes defined by a scientific research program’s (SRP) specific scientific epistemic goals and aims, it does not have the stability needed to be considered robustly real. I call this type of stability border crossing or global stability, which is stability across a wide range of regimes.

This picture implies a restricted form of realism with respect to race. This paper is as organized as follows. Section 2 introduces the argument against racial realism. Section 3 argues that real kinds must be stable. Section 4 argues for the status of race as a putative kind. Section 5 argues for the instability of race through examples of heterogeneous effects and the difficulty for race to maintain stability across different regimes.

2. The Argument Against Racial Realism

In this section, I will provide an argument schema for my argument against racial realism. It is stated as follows:

Premise 1: For all putative kinds x, if x is real, then x is stable.

Premise 2: Race is a putative kind.

Premise 3: Race is not stable.

Conclusion: Therefore, race is not real.

3. Real Kinds Are Stable

The premise that “for all putative kinds x, if x is real, then x is stable” asserts that stability is a necessary condition for the reality of any putative kind. To explain this premise, I will draw on a notion of stability, as characterized by Dawid (2021). Dawid (2021) defines stability as a probabilistic relationship that holds across different regimes. Extended Conditional Independence (ECI), developed by Dawid (2021), provides a mathematical characterization of stability.

Dawid (2021) formally characterizes stability roughly as follows: consider two random variables, X, representing an intervention, and Y, representing the outcome. Further, I consider a regime indicator F_X, a non-stochastic variable that indexes different distributions corresponding to different contexts under which a system is observed. Regime indicators are different from random variables in that they are fixed and do not vary randomly. Regime indicators function similarly to statistical parameters and index the different contexts under which a data distribution is generated. For example, the regime indicator F_X can describe three distinct data-generating distributions such as the observational regime F_X = ∅, where data is collected without external intervention, representing the system’s natural state; the regime F_X = 0, representing an external intervention that sets X = 0, allowing the observation of this specific intervention’s effects on Y, and the regime F_X = 1, X is set to 1, representing a different intervention from the previous one and the effects on Y. With this formalism, stability can be expressed with ECI as Y ⫫ F_X | X. This means Y is conditionally independent of the regime indicator F_X given X. This fact implies that once X is known, the distribution of Y remains the same across all regimes. The relationship between X and Y is unaffected by the specific regime, showing the stability of Y across different regimes. Said differently if the conditional distribution of Y given X remains consistent across all regimes. Once X is known, the distribution of Y does not change across regimes in F_X, suggesting that Y is stable across the characterized represented by F_X.

Woodward (2010) provides a related concept of stability but explicitly focuses on causal relationships. For Woodward (2010, 291-292), a causal relationship (or relationship of counterfactual dependence) is stable if it is invariant under some large range of different background conditions determined to be important. The regime indicator, as mentioned earlier, does not need to refer to interventions. It can simply refer to the different contexts under which a distribution is observed without invoking interventions, making Dawid (2021)’s ECI, which is used to characterize stability, more general than Woodward (2010)’s characterization of stability insofar as it does not require the regime indicator to represent interventions.

It is worth noting that many objections regarding race in causal models focus on the inability to manipulate race as a variable. However, regardless of where one comes down in that debate, since the regime indicator F_X does not need to represent an intervention and can index different observable distributions, the regime indicator may be able to be applied to race without invoking any potential conceptual difficulty around race as a non-manipulable variable (Dawid 2022). According to Dawid (2022), using ECI, it is possible to discuss the stability of probabilistic regimes without requiring that regime indicators represent interventions over non-manipulable variables.

To say more, Dawid (2015, 2021) characterizes modularity as the stability of the conditional relationship Y | X across different regimes indexed by F_X. Specifically, Dawid refers to the modular component as the conditional distribution of Y, given X = x ability to transfer from one regime to another without change, essentially remaining invariant. This relationship is expressed as (Y | X = x; F_X = x) ≈ (Y | X = x; F_X = ∅), where X = x with probability 1 in the regime F_X = x. This statement means that the effect of X on Y remains consistent whether F_X indicates an observational or interventional regime. Dawid views this modular interpretation of causality as a pragmatic approach to a complex philosophical concept (Dawid 2015, 284).

Dawid (2010, 71) states that modularity, understood as transferable deterministic relationships between variables, is often regarded as an important feature of causality. However, Dawid points out that scholars have challenged this view of modularity. He explicitly references Cartwright (2007, 75–78), who questions if modularity should be regarded as a fundamental property of causality. Dawid acknowledges these criticisms and does not commit to modularity as a metaphysical necessity for understanding causality. However, he maintains that modularity, particularly as captured through the stability of the conditional relationship Y | X across different regimes indexed by F_X, is sufficiently comprehensive to address most aspects of what can be termed statistical causality (Dawid 2010, 71).

Dawid (2010) explains that while modularity has often been associated with deterministic and interventionist views of causality, it can also be understood through the framework of ECI, where relationships such as Y | X remain stable across different observational regimes indexed by F_X. Dawid suggests that this approach effectively handles complex causal structures, especially in non-manipulable scenarios (Dawid 2010, 71). To support this stance, Dawid (2010), 71) references Russo (2008), who examines causality in terms of structural stability. In later work, Dawid (2022, 299) discusses how non-manipulable causation can be understood through transferability, transportability, and invariance of probabilistic relationships across different regimes and not necessarily involve interventions. He cites the work of Bühlmann (2020), who looks at invariance, and Pearl and Bareinboim (2011), who examine transportability across different contexts.

Dawid uses this work to motivate his approach to non-manipulable causation using ECI (Dawid 2022, 299). Dawid (2022, 299) provides several examples to illustrate his view, such as clinical tests that yield stable false-positive and false-negative rates regardless of the population to which they are applied, the consistency of Mendelian inheritance patterns across different matings, and the application of Newton’s laws to various physical phenomena such as the motion of the moon and the behavior of the tides. According to Dawid (2022, 299), these examples show how invariances across regimes can be effectively expressed using extended conditional independencies involving regime indicators. However, Dawid does warn that attaching a regime indicator to every domain variable in a problem is inappropriate and may lead to inappropriate modeling decisions. However, he gives no specific recommendations for best practices regarding how to make these modeling decisions (Dawid 2022, 299).

Nevertheless, I concur with Dawid’s core observation that stability, particularly as described by the conditional relationship Y | X, plays an important role in understanding causality. Further, I claim that this provides a straightforward way to understand and examine the invariance or lack thereof of race as a variable. Recall that ECI allows us to express modularity across different regimes indexed by F_X. In this framework, F_X represents different observational contexts (context where data is drawn from a distribution), such as racial groups, while X denotes some variable of interest (e.g., income, education level). The relationship Y | X observed across these regimes reveals how Y depends on X in different contexts. This setup does not involve setting or manipulating X; instead, it examines how Y varies with X across various groups indexed by F_X.

The literature above supports the view that the invariance of probabilistic relationships is at least a very valuable characteristic in understanding causality, even if it is not the full story of causation some philosophers may want. Further, the literature suggests that Dawid’s approach to statistical causality is a sensible, pragmatic position that one may take to assess how probabilistic relationships hold across different contexts. This is because adopting Dawid’s perspective has the added benefit of being able to use ECI to express modularity without invoking specific causal terminology. ECI allows for articulating certain relationships without engaging in metaphysical debates about the meaning of “cause” and may even avoid using contested terms like “cause” altogether. I believe we can extend Dawid’s perspective on non-manipulable causation to debates regarding the supposed non-manipulability of race as a variable and help elucidate much of the conceptual confusion around the subject. This is because, as I have previously mentioned, the regime indicator F_X does not need to relate to interventions on race; instead, it can index different contexts where racial distributions are observed and examine how stable they are across different racial regimes.

Returning to the primary claim of this section, I believe that stability, as articulated, can also allow us to examine the inductive success and, thus, the reality of putative kinds. A paradigmatic example of an unstable kind is phlogiston. Phlogiston is a substance that was at one time claimed to explain combustion. Phlogiston was eventually discarded as a real kind because it could not account for observed phenomena across different contexts, and its inadequacy was made especially evident with the discovery of oxygen since oxygen could provide stable and, thus, robust explanatory power. Phlogiston’s inability to provide consistent relationships across different scientific observations and experiments led to its rejection as a real scientific kind. This example supports the central claim of the first premise, namely that a kind that lacks stability does not possess generalizability. Thus, we have a good reason to pause with respect to thinking race is a real scientific kind. Stability is valuable because it allows the kind’s core characteristics to remain invariant across different regimes, which is important for robust scientific generalizations.

To further illustrate this, consider the case of a paradigmatic scientific kind that is considered real: chemical elements. Chemical elements, such as carbon, oxygen, and hydrogen, possess stable properties across a wide range of different conditions. Irrespective of their contexts, these kinds have the same properties, more or less. These stable characteristics allow scientists to develop reliable generalizations, such as the periodic table and principles of chemical bonding, which are foundational to the field of chemistry. The stability of chemical elements is one of the primary reasons they are considered real kinds within the well-ordered SRP of chemistry.

However, this section’s primary claim raises an important question: How much stability is necessary for a kind to be considered real? Specifically, how much variability or heterogeneity in causal or probabilistic relationships can be allowed before a kind is determined to be unstable and, therefore, unreal? By heterogeneity, I mean a variation in effects or outcomes across different subgroups within a population, where these subgroups (our kinds in this context) are defined by specific covariates or features such as BMI, race, gender, ethnicity, age, or genetic markers. Just as chemical elements or species serve as traditional examples of kinds in the philosophy of science, these covariates categorize individuals into groups.

The concept of stability is very related to heterogeneity. Recall that stability refers to the consistency of causal or probabilistic relationships within a kind; a stable kind exhibits uniform effects across its members. When heterogeneity increases, meaning the effects vary significantly among the subgroups defined by our covariates, the stability of the kind decreases. If the variability becomes too large, the kind’s capacity for inductive generalizations decreases and may no longer be a useful category for scientific inference.

For example, consider a new drug to treat hypertension. Clinical trials may show that the drug reliably lowers blood pressure in younger patients but has a diminished effect in older patients because of age-related physiological changes. In this example, age is a covariate defining different kinds within the patient population. The heterogeneity in the drug’s effect based on age indicates that the kind may lack stability. Realizing this, researchers might redefine the kinds more narrowly such as patients under 50 and patients over 50 to achieve greater stability within each group.

Now consider a probabilistic case where we are interested in the risk of developing type 2 diabetes. The probability varies among individuals based on BMI, race, and lifestyle choices. For example, individuals with a higher BMI may have a greater probability of developing the disease compared to those with a lower BMI. In this instance, BMI defines different kinds with varying probabilities of the outcome. The heterogeneity in risk across these kinds affects the stability of the probabilistic relationships we observe. I believe that the threshold for stability is not universal but context-sensitive. Researchers must consider how much variability is acceptable based on their specific scientific objectives. No priori universal principle exists for determining how much heterogeneity is acceptable.

It is worth noting that determining how much heterogeneity is acceptable is different from methods that allow one to detect when heterogeneity is present. For example, economists often use significance testing to detect heterogeneity; however, this does not tell them how much heterogeneity to allow. Another example of this obervation is Pearl (2015 [2022]), who provides methods to detect heterogeneity; however, this does not specify the amount of heterogeneity that would be significant, leaving this question open. In fields like epidemiology and the medical sciences, scientists are often interested in criteria that constitute a minimum clinically relevant difference. For example, if a blood pressure medication causes a −20 mmHg shift in some patients and a +20 mmHg shift in others, this level of heterogeneity is significant and suggests instability. However, a shift of −2 mmHg versus +2 mmHg might be clinically irrelevant and below the precision of measurement, suggesting stability. The point here is that each specific local context of inquiry often determines the threshold for stability in scientific practice, and what differences are considered meaningful is based on the particular goals of scientific inquiry in that local context.

This context sensitivity is also illustrated by how often different studies set seemingly arbitrary statistical criteria for determining stability. For example, Villar et al. (2014) used the criterion that babies’ sizes should not differ by more than half a standard deviation from the global distribution to conclude that all babies are essentially the same size. However, this choice of 0.5 SD was not based on any inherent property of the data but rather as a convenient threshold.¹

Determining stability requires consideration of the goals of scientific inquiry. My emphasis on the context dependence of inductive inference is inspired by learning theory, which employs a means-ends analysis. Learning theory asks: for a given empirical problem and a set of cognitive goals, what method best achieves those goals? It is normative and context-sensitive and seeks to provide standards for assessing methods in contexts of inquiry rather than trying to provide universal principles of induction. In learning theory, methodological recommendations depend on research aims, background assumptions, available observational methods, and an agent’s cognitive abilities (Schulte 2002 [2022]). This claim would roughly mean that a scientific variable or kind’s acceptable level of stability will vary based on the specific research inquiries and goals.

For instance, consider a case where medical research aims to identify a treatment that consistently produces some desired effect across various patient populations. Suppose a treatment’s effect varies significantly across populations. In that case, it may fail to meet the necessary stability threshold and be classified as unstable, even if the observed variation is not statistically significant in a broader sense.

Conversely, a higher tolerance for heterogeneity might be acceptable in a research context where variability is anticipated or considered beneficial. As a result, the threshold for stability is determined by the specific goals of the inquiry. The important point here is that stability must be sufficient to achieve the intended epistemic goals within the context of a local SRP. Given the discussion above, the stability threshold necessary for a kind to be considered real is determined by how well the kind supports a scientific community’s coherent and well-motivated research aims. Here, Spencer (2016) can provide valuable conceptual resources to help us understand this threshold. Spencer argues that for a kind to be genuine, it must be stable enough to achieve the epistemic goals of the SRP within which it is used.

A genuine kind, according to Spencer, is valid within a well-ordered scientific research program (SRP) (Spencer 2016, 165). A genuine kind is “epistemically useful” and “justified” within that well-ordered SRP. The epistemic usefulness of a kind within an SRP is determined by its ability to underpin scientific generalizations, which can manifest as observational laws, theoretical generalizations, or presuppositions (Spencer 2016, 166). Drawing on Nelson Goodman’s work (Goodman 1955 [1983]), Spencer characterizes observational laws as generalizations about specific observations that are “lawlike” due to their relevant counterfactual stability. Stability here refers to the generalization’s robustness under various counterfactual conditions.

Theoretical generalizations form part or whole of a theory and provide a framework for predicting future observations or phenomena (Spencer 2016, 166). Presuppositions are the underlying assumptions or principles on which the SRP operates. They form the SRP’s conceptual foundation and provide a basic framework (Spencer 2016, 166). Whether a kind is suitable for use in an SRP depends on whether it aligns with the SRP’s aims and epistemic values (for relativized a priori kinds) or whether it can be used to “explain or predict observational laws in the SRP” (for properly empirical kinds) (Spencer 2016, 167).

Spencer further explains that a well-ordered SRP must possess three important characteristics to ensure long-term scientific progress: coherent and well-motivated aims, competitive predictive power, and rigorous cross-checks (Spencer 2016, 168). The characteristic of coherent and well-motivated aims states the the aims of the SRP should guide the research direction and provide a benchmark for evaluating progress. The characteristic of competitive predictive power condition asserts that SRP must make accurate and reliable predictions to advance scientific knowledge. The characteristic of rigorous cross-checks involves getting consistent results using different assumptions, such as changing experimenters, methods, or theoretical frameworks.

Although Spencer does not explicitly state that stability determines degrees of kindness or “realness,” his framework has the conceptual resources and is actually well-suited for this purpose. As has been articulated, stability comes in degrees. This claim means the extent to which a kind supports scientific generalizations across different contexts varies in degrees. This fact allows for an empirical focus on the distinction between social and so-called “natural” kinds without relying on the notions of mind-independence.

Consider the paradigmatic genuine kinds of chemical elements. Their properties, such as atomic number and chemical reactivity, remain invariant across different experimental conditions, supporting observational laws like the periodic table. The stability of these properties under various conditions allows chemical elements to be categories that allow inductions about them to be generalized across multiple regimes. This ability affords scientists accurate predictions and understanding of chemical phenomena, which in turn supports chemical elements’ status as genuine kinds.

In contrast, consider economic markets. While markets may have stable relationships within certain target systems, such as supply and demand laws, these patterns can vary significantly across different economic conditions or policies. The variation of these patterns under different target systems and background conditions means, as a kind, that markets have limited inductive capacities, which suggests that markets are locally stable and may only warrant restricted realism. This observation means that markets possess some degree of stability, but their stability is limited to a restricted set of target systems and thus does not have the same level of invariance as chemical kinds. The main takeaway is that the reality of a kind is tied to its stability and ability to underpin robust scientific generalizations. It is worth noting that I am only imposing a necessary condition on real kinds, not a sufficient one.

3.1. Local and Global Kinds

I will now explain how the ECI framework enables the classification of real kinds into two broad categories: local kinds and global kinds (“border-crossing kinds”). Stability is the defining characteristic that distinguishes these categories. Local kinds exhibit stability only within a narrow range of conditions while, global kinds possess stability across a wide range of regimes. For example, the relationship between variables of a local kind may be stable only under a particular target system or set of target systems, within certain geographic locations, or with a limited set of experimental parameters. The ECI framework allows one to examine these relationships by examining the conditional independence between variables and regime indicators.

To illustrate further, consider the earlier example of chemical elements. Defined by their atomic numbers X and chemical properties Y (such as atomic radius, ionization energy, etc.), chemical elements are typically considered what I would call global kinds. The regime indicator F_C may represent various experimental conditions, including temperature, pressure, and other environmental factors. Chemical elements are global kinds because their properties remain invariant across a fairly wide range of different experimental conditions. The conditional independence property Y ⫫ F_C | X holds under most background conditions. This property means that the relationship between atomic numbers and chemical properties is stable irrespective of specific conditions.

However, if we consider markets in economics, we see something different. For example, let X represent the price of a commodity, and Y represent the demand for that commodity. The regime indicator F_M might include various market conditions such as economic policies, consumer preferences, and other external factors. Markets are better understood as local kinds because their stability is more limited in scope. The ECI property Y ⫫ F_M | X may only hold across a smaller number of regimes. This claim would mean that the demand for a commodity, given its price, is independent of market conditions only within a limited set of regimes. For example, market dynamics in a financial target system might display stable patterns only under specific economic policies or within particular timeframes, suggesting markets are local rather than global kinds.

To drive the point home more, consider how researchers in a well-ordered SRP focused on economic systems may try to understand market behavior, mechanisms, and dynamics of markets in different target systems, such as finance, commodities, or real estate. Within these target systems, the kind market may show some type of limited or restricted stability within the target system. This type of local stability suggests that within specific market conditions, markets function according to discernible rules, mechanisms, and behaviors that can be observed and analyzed. For example, short-term price volatility in financial markets and patterns of investor behavior, such as herding or information asymmetry, influence market dynamics. These stable patterns are restricted to some small range of local target systems.

However, these characteristics may not hold across all markets or in different economic target systems. For instance, markets in the real estate domain may exhibit different dynamics and factors causing their behavior compared to financial markets. The stability and behavior of the social kind market within a specific target system are contingent on the specifics of that regime. This variation is clear how different markets respond to policy interventions, economic shocks, or shifts in consumer sentiment. For example, an abrupt change in interest rates might alter the housing market but have a little effect on the commodity market.

The point is that stability determines whether a kind is classified as local or global. Local stability is a stable probabilistic relationship only within a limited range of different background conditions. It is often restricted to things like particular environmental conditions, geographic locations, or experimental parameters. On the other hand, border-crossing stability (or global stability) refers to stable probabilistic relationships across a wide range of conditions and regimes. These two types of stability characterize two types of kinds, global and local. Global kinds are characterized by border-crossing stability, and local kinds by local stability.

4. Race Is A Kind

Premise 2 asserts that race is a putative kind. This is actually an assumption, but it is defensible and dialectically sound, as it is supported by several scholars within the philosophy of race who affirm that race is a kind. For example, some scholars argue that race is a biological kind and contingently real. These perspectives can be divided into those who see race as a biosocial kind (Kendig 2011; Outlaw 1996) or those who consider race as a biological kind (Hardimon 2017; Spencer 2014). Another portion of the philosophy of race literature holds that race is real but is not grounded in biological facts. These views believe that race is contingently real but as a social, political, or cultural kind (Alcoff 2006; Haslanger 2008; Jeffers 2013; Mills 1998; Root 2000; Sundstrom 2002; Taylor 2004). Despite these views disagreements, they all hold that race is a kind.

One of the most well-known theories of race in the philosophy of race literature is Sally Haslanger’s Social/Political Race (SPR) account. Her account focuses on the social-politically constructed nature of race, particularly social structures and hierarchies. According to Haslanger, racial groups are defined not by biological characteristics but by socially constructed attributes linked to perceived physical features and associated with specific geographical regions. These social constructs position groups within a hierarchy of privilege and subordination, shaping the lived experiences of individuals in important ways (Glasgow et al. 2019, 25-26). Haslanger’s theory claims that geographical associations and perceived physical characteristics form racial categories in the United States and determine how members of racial groups are viewed and treated. The process of racialization places racialized groups within a social hierarchy, where their positions are justified by prevailing ideologies (Glasgow et al. 2019, 26) and these racialized groups constitute “races” according to Haslanger. An underappreciated aspect of Haslanger’s theory is its subtle epistemic reliance on empirical inquiry. She draws a parallel between the discovery of social kinds and natural kinds, suggesting that just as we discover chemical kinds through empirical research, we also discover social kinds through a similar process:

The kinds in question are social because they exist in the social world (and so, in some sense, depend on us). But we discover these kinds through empirical inquiry, just as we discover chemical kinds through empirical inquiry. (Glasgow et al. 2019, 5)

This epistemic commitment in Haslanger’s approach to understanding race grounds her theory to some extent in empirical inquiry, particularly the social sciences. This commitment means that the study of race should align with broader scientific practices. Circling back to the main claim of this section, “race is a putative kind,” we can see this claim is clearly a dialectically sound assumption from simply surveying the philosophy of race literature.

5. Race Is Not Stable

In this section, I defend the claim that “race is not stable.” I will show that race should not be considered a global kind. Essentially, the presence of heterogeneous effects within racial categories indicates a violation of the ECI condition, thereby demonstrating that race lacks the stability required to be considered a global kind. Recall heterogeneous effects challenge stability. As aforementioned, this is because heterogeneous effects are observed where the impact of X on Y varies across different subgroups or contexts. It indicates that Y is not independent of F_X given X, violating the ECI condition and demonstrating instability.

For example, heterogeneous effects in medical trials are illustrated by risk models for coronary heart disease (CHD), such as the Framingham Risk Score (FRS), Atherosclerosis Risk in Communities (ARIC), and Reynolds Risk Score (RRS). This is also an example of the reference class problem, where different risk models may lead to varied treatment recommendations for the same patients, showing differences in predicting individual risk (Kent and Shah 2012). A study applying these risk models to a nationally representative population found that only a small fraction of individuals received unanimous high/low-risk scores across all models (Kent and Shah 2012). The recommendation was conditional on the applied tool for at least 60 percent of patients who might receive aspirin, with disproportionately more model agreement among men than women.

I will now examine different SRP’s scientific aims to examine whether race is locally stable, globally stable, or neither. Recall that stability depends on the SRP inquiry’s epistemic aims. That is, say we have race as a predictive attribute and a variety of prediction tasks for which it might be used—death from disease X, income at age 35, and the highest level of education achieved by age 30. For each prediction task, we consider our predictor (such as race) to be reliable if it achieves an acceptable accuracy level for how we plan to use the prediction. Whether we consider the predictor reliable depends not just on its measured accuracy but also on whether that accuracy meets the standards required for our particular purposes. Once we have set acceptable levels of reliability for each task based on our specific aims, determining whether the predictor (such as race) is reliable for each task becomes straightforward. We can then easily compile a list of tasks for which the predictor is reliable. This tabulation can provide significant insights into local and global patterns among the range of tasks and may even suggest causally relevant factors. For example, suppose we have a specific accuracy threshold for predicting disease X. We might discover that race is a reliable predictor of mortality from disease X (perhaps a type of cancer) in certain geographically defined communities. In contrast, in other communities, race is not a reliable predictor for disease X. For another disease, disease Y (such as death from renal failure), the communities where race is a reliable predictor might significantly overlap with those for disease X. These patterns could lead us to investigate whether disease X has environmental causes and whether the connection between race and the disease is mediated by factors like zip code, which might determine proximity to hazardous substances. For disease Y, we might explore whether the link between race and the disease is influenced by misconceptions within the healthcare system about how certain conditions affect different groups.²

In some scientific regimes, race may show local stability and be a good predictor of certain social outcomes. Studies on racial disparities in healthcare in the United States broad goal is often to understand how racial discrimination may cause health outcomes for African Americans. For example, studies have shown that African Americans may have higher rates of hypertension compared to White Americans because of certain social determinants such as less access to healthcare, more exposure to chronic stress, and socioeconomic inequality (Abrahamowicz et al. 2023; Carnethon et al. 2017; Forde et al. 2020; Yearby 2018).

However, this picture is complicated because we can also see heterogeneity with race in different SRPs like medical genetics and epidemiology, where we are interested in the effectiveness of medication or how it varies across different racial groups due to genetic differences affecting drug metabolism. A well-known case is the varying response to antihypertensive drugs like ACE inhibitors, which are less effective in African American patients compared to Caucasian patients (Peck et al. 2013). Further, the risk of certain diseases, such as sickle cell anemia, differs greatly among racial groups, implying that the probabilistic relationships within the kind are not uniform (Pokhrel et al. 2023).

Also, heterogeneity issues exist within racial categories such as “Black.” The descendants of Africans enslaved in the U.S. often have different social outcomes in the U.S. than African immigrants (Darity and Mullen 2020 [2022]). This observation is because some African immigrants arrive with higher education levels and professional skills. Selective immigration policies may favor highly educated and skilled individuals. This demographic advantage results in higher educational attainment and income levels than the native US Black population. Many African immigrants can access better education systems in their home countries before migrating. Higher education levels and professional skills gained in their home countries can be leveraged in the U.S. job market. Immigration policies, such as the Diversity Visa Lottery, selectively admit individuals with higher education and skills. These factors play a role in explaining why African immigrants sometimes have higher socioeconomic status (SES) compared to descendants of Africans enslaved in the U.S.

Further, De La Cruz-Viesca et al. (2016) examined racial wealth disparities in Los Angeles. Contrastingly, Black immigrants, particularly those from Africa and the Caribbean, often have different SES profiles. Consistent with this, Tamir and Anderson (2022) show 41 percent of African immigrants in the US have at least a bachelor’s degree, surpassing the 30 percent rate for the overall US population. Further, Massey et al. (2007) show that Black immigrants generally have higher academic achievement in selective, particularly Ivy League, institutions than African Americans. However, the aggregate variable “Black,” often used to categorize African Americans and Black immigrants, can obfuscate these important differences.

Health research has also shown that African immigrants and Afro-Caribbeans have better outcomes with respect to cardiovascular disease (CVD) than African Americans despite all groups being categorized as “Black” (Baptiste et al. 2022). In Baptiste et al. (2022), the goal of observing trends in CVD risk factors within Black showed considerable heterogeneity in CVD risk factors among 3 Black ethnic subgroups compared with White adults. The heterogeneity within Black subgroups may interfere with drawing reliable conclusions about CVD for any individual belonging to these subgroups. The diversity within these racial categories may even prevent race from providing the necessary stability even at the local level.

For any theory of race, such as Haslanger’s theory of race, to exhibit border-crossing stability across a wide range of scientific aims, the relationship between race and outcomes must remain stable across different scientific contexts, such as health, education, and socioeconomic studies. However, race does not maintain consistent predictive power across diverse scientific inquiries. For example, how race functions as a predictor of wealth across different scientific inquiries suggests that race lacks global stability, as it does not consistently predict outcomes in diverse contexts. While race may be a reliable predictor of wealth inequality in the US it cannot be generalized to other national contexts without accounting for the specific historical, social, and political factors that shape dynamics in those regions. While race may possess some local stability, heterogeneity within racial categories often undermines global stability, particularly when the scientific inquiry demands attention to subgroup differences.

6. Conclusion

This paper has argued for RRR by examining the stability of race as a scientific variable. Race fails to meet epistemic criteria of stability that is required to be considered a scientific kind and thus real. Due to the heterogeneous effects and the overall instability of racial classifications, race does not support scientific generalizations across a wide range of different background conditions. However, I claim that racial categories may possess stability and inductive capabilities within a restricted set of target systems of interest.

Footnotes

Acknowledgements

I am deeply grateful to the many participants who commented on an earlier version of this paper at the Philosophy of Social Science Roundtable. I am particularly indebted to Jonathan Y. Tsou, Kareem Khalifa, Yosef Washington, and Quayshawn Spencer.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Correction (January 2025):

This article has been updated with minor grammatical corrections since its original publication and has been updated to remove one of the footnotes.

Notes

Author Biography

Alexander Williams Tolbert is an Assistant Professor in the Department of Quantitative Theory and Methods at Emory University. He specializes in the philosophy of science, applied ethics (particularly AI ethics), and quantitative methods, including causal inference, machine learning, and decision science. Dr. Tolbert earned his Ph.D. in Philosophy and his M.A. in Statistics from the University of Pennsylvania. His research integrates quantitative models, data analysis (including machine learning and causal inference), and philosophical analysis to reason about social issues.

References

Abrahamowicz

Aleksandra A.

Ebinger

Joseph

Whelton

Seamus P.

Commodore-Mensah

Yvonne

Yang

Eugene

. 2023. “Racial and Ethnic Disparities in Hypertension: Barriers and Opportunities to Improve Blood Pressure Control.” Current Cardiology Reports 25 (1): 17-27.

Alcoff

Linda Martín

. 2006. Visible Identities: Race, Gender, and the Self. Oxford University Press.

Baptiste

Diana-Lyn

Turkson-Ocran

Ruth-Alma

Ogungbe

Oluwabunmi

, et al. 2022. “Heterogeneity in Cardiovascular Disease Risk Factor Prevalence Among White, African American, African Immigrant, and Afro-Caribbean Adults: Insights from the 2010-2018 National Health Interview Survey.” Journal of the American Heart Association 11 (18): e025235.

Bühlmann

Peter

. 2020. “Invariance, Causality and Robustness.” Statistical Science 35 (3): 404-426.

Carnethon

Mercedes R.

Jia

George

Howard

, et al. 2017. “Cardiovascular Health in African Americans: A Scientific Statement from the American Heart Association.” Circulation 136 (21): e393-e423.

Cartwright

Nancy

. 2007. Hunting Causes and Using Them: Approaches in Philosophy and Economics. Cambridge University Press.

Darity

William A.

Jr. Mullen

A. Kirsten

. 2020 (2022). From Here to Equality: Reparations for Black Americans in the Twenty-First Century. 2nd ed. University of North Carolina Press.

Dawid

A. Philip

. 2010. “Beware of the DAG!” In Proceedings of Workshop on Causality: Objectives and Assessment, edited by Guyon

Isabelle

Janzing

Dominik

Schölkopf

Bernard

, 59-86. ML Research Press.

Dawid

A. Philip

. 2015. “Statistical Causality from a Decision-Theoretic Perspective.” Annual Review of Statistics and Its Application 2: 273-303.

10.

Dawid

Philip

. 2021. “Decision-Theoretic Foundations for Statistical Causality.” Journal of Causal Inference 9 (1): 39-77.

11.

Dawid

Philip

. 2022. “Decision-Theoretic Foundations for Statistical Causality: Response to Pearl.” Journal of Causal Inference 10 (1): 296-299.

12.

De La Cruz-Viesca

Melany

Chen

Zhenxiang

Ong

Paul M.

Hamilton

Darrick

Darity

William A.

Jr.

2016. The Color of Wealth in Los Angeles. A Joint Publication of Duke University, The New School, University of California, Los Angeles, and the Insight Center for Community Economic Development. https://www.aasc.ucla.edu/besol/Color_of_Wealth_Report.pdf

13.

Forde

Allana T.

Sims

Mario

Paul

Muntner

, et al. 2020. “Discrimination and Hypertension Risk Among African Americans in the Jackson Heart Study.” Hypertension 76 (3): 715-723.

14.

Glasgow

Joshua

Haslanger

Sally

Jeffers

Chike

Spencer

Quayshawn

. 2019. What Is Race? Four Philosophical Views. Oxford University Press.

15.

Goodman

Nelson

. 1955 (1983). Fact, Fiction, and Forecast. 4th ed. Harvard University Press.

16.

Hardimon

Michael O.

2017. Rethinking Race: The Case for Deflationary Reaslism. Harvard University Press.

17.

Haslanger

Sally

. 2008. “A Social Constructionist Analysis of Race.” In Revisiting Race in a Genomic Age, edited by Koenig

Barbara A.

Soo-Jin Lee

Sandra

Richardson

Sarah S.

, 56-69. Rutgers University Press.

18.

Haslanger

Sally

. 2012. Resisting Reality: Social Construction and Social Critique. Oxford University Press.

19.

Jeffers

Chike

. 2013. “The Cultural Theory of Race: Yet Another Look at Du Bois’s ‘The Conservation of Races.’” Ethics 123 (3): 403-426.

20.

Kendig

Catherine

. 2011. “Race as a Physiosocial Phenomenon.” History and Philosophy of the Life Sciences 33 (2): 191-221.

21.

Kent

David M.

Shah

Nilay D.

. 2012. “Risk Models and Patient-Centered Evidence: Should Physicians Expect One Right Answer?” JAMA 307 (15): 1585-1586.

22.

Ladyman

James

. 2012. “Science, Metaphysics and Method.” Philosophical Studies 160 (1): 31-51.

23.

Massey

Douglas S.

Mooney

Margarita

Torres

Kimberly C.

Charles

Camille Z.

. 2007. “Black Immigrants and Black Natives Attending Selective Colleges and Universities in the United States.” American Journal of Education 113 (2): 243-271.

24.

Mills

Charles W.

1998. Blackness Visible: Essays on Philosophy and Race. Cornell University Press.

25.

Outlaw

Lucius

. 1996. “‘Conserve’ Races? In Defense of W. E. B. Du Bois.” In W. E. B. Du Bois on Race and Culture: Philosophy, Politics, and Poetics, edited by Bell

Bernard W.

Grosholz

Emily

Stewart

James B.

, 15-37. Routledge.

26.

Pearl

Judea

. 2015 (2022). “Detecting Latent Heterogeneity.” In Probabilistic and Causal Inference: The Works of Judea Pearl, edited by Geffner

Hector

Dechter

Rina

Halpern

Joseph Y.

, 483-506. Association for Computing Machinery.

27.

Pearl

Judea

Bareinboim

Elias

. 2011. “Transportability of Causal and Statistical Relations: A Formal Approach.” In Proceedings of the 25th AAAI Conference on Artificial Intelligence, 247-254. AAAI Press.

28.

Peck

Robert N.

Smart

Luke R.

Beier

Rita

Liwa

Anthony C.

Grosskurth

Heiner

Fitzgerald

Daniel W.

Schmidt

Bernhard M. W.

. 2013. “Difference in Blood Pressure Response to Ace-Inhibitor Monotherapy between Black and White Adults with Arterial Hypertension: A Meta-Analysis of 13 Clinical Trials.” BMC Nephrology 14: 201.

29.

Pokhrel

Akriti

Olayemi

Adeniran

Ogbonda

Stephanie

Nair

Kiron

Wang

Jen Chin

. 2023. “Racial and Ethnic Differences in Sickle Cell Disease within the United States: From Demographics to Outcomes.” European Journal of Haematology 110 (5): 554-563.

30.

Root

Michael

. 2000. “How We Divide the World.” Philosophy of Science 67: S628-S639.

31.

Russo

Federica

. 2008. Causality and Causal Modelling in the Social Sciences: Measuring Variations. Springer.

32.

Schulte

Oliver

. 2002 (2022). “Formal Learning Theory.” In Stanford Encyclopedia of Philosophy, edited by Zalta

Edward N.

Nodelman

Uri

. https://plato.stanford.edu/entries/learning-formal/

33.

Spencer

Quayshawn

. 2014. “A Radical Solution to the Race Problem.” Philosophy of Science 81 (5): 1025-1038.

34.

Spencer

Quayshawn

. 2016. “Genuine Kinds and Scientific Reality.” In Natural Kinds and Classification in Scientific Practice, edited by Kendig

Catherine

, 157-172. Routledge.

35.

Sundstrom

Ronald R.

2002. “Race as a Human Kind.” Philosophy and Social Criticism 28 (1): 91-115.

36.

Tamir

Christine

Anderson

Monica

. 2022. “One-in-Ten Black People Living in the U.S. Are Immigrants: Immigrants—Particularly Those from African Nations—Are a Growing Share of the U.S. Black Population.” Pew Research Center. 20 January. https://www.pewresearch.org/race-and-ethnicity/2022/01/20/one-in-ten-black-people-living-in-the-u-s-are-immigrants/

37.

Taylor

Paul C.

2004. Race: A Philosophical Introduction. Polity Press.

38.

Tsou

Jonathan Y.

2020. “Social Construction, HPC Kinds, and the Projectability of Human Categories.” Philosophy of the Social Sciences 50 (2): 115-137.

39.

Tsou

Jonathan Y.

2022. “Biological Essentialism, Projectable Human Kinds, and Psychiatric Classification.” Philosophy of Science 89 (5): 1155-1165.

40.

Villar

José

Ismail

Leila Cheikh

Victora

Cesar G.

, et al. 2014. “International Standards for Newborn Weight, Length, and Head Circumference by Gestational Age and Sex: The Newborn Cross-Sectional Study of the INTERGROWTH-21st Project.” Lancet 384 (9946): 857-868.

41.

Woodward

James

. 2010. “Causation in Biology: Stability, Specificity, and the Choice of Levels of Explanation.” Biology & Philosophy 25 (3): 287-318.

42.

Yearby

Ruqaiijah

. 2018. “Racial Disparities in Health Status and Access to Healthcare: The Continuation of Inequality in the United States Due to Structural Racism.” American Journal of Economics and Sociology 77 (3-4): 1113-1152.