Abstract
Are corporate social responsibility (CSR) initiatives providing the societal good that they promise? After decades of CSR studies, we do not have an answer. In this review, we analyze progression of the CSR literature toward assessing the performance of CSR initiatives, identify factors that have limited the literature’s progress, and suggest a new approach to the study of CSR that can overcome these limits. We begin with comprehensive bibliometric mapping illustrating that although social impact has infrequently been its explicit focus, the CSR literature has measured outcomes other than firm performance, especially in the current decade. Thereafter, we conduct a more fine-grained analysis of recent CSR studies. Adapting a logic model framework, we show that even the most highly cited studies have stopped short of assessing social impact, often measuring CSR activities rather than impacts and focusing on benefits to specific stakeholders rather than to wider society. In combination, our analyses suggest that assessment of the performance of CSR initiatives has been driven by the availability of large, public secondary data sources. However, creating more such databases and turning to “big data” analyses are inadequate solutions. Drawing from the impact evaluation literature of development economics, we argue that the CSR field should reconceive itself as a science of design in which researchers formulate CSR initiatives that seek to achieve specific social and environmental objectives. In accordance with this pursuit, CSR researchers should move toward “small data” research designs, which will enable studies to better determine causation rather than just identify correlation.
The road to hell is paved with good intentions. Corporate involvement in addressing targeted [social] problems is no guarantee of improvement . . . we need to understand the conditions under which a corporation’s efforts benefit society. Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.
Corporate social responsibility (CSR) entails the “policies and practices of corporations that reflect business responsibility for some of the wider societal good” (Matten & Moon, 2008: 405). Firms commonly engage in a myriad of CSR initiatives that promise a variety of societal benefits. For example, Cisco (2019) states that through CSR, “We empower social change agents with technology and expertise. Our goal: Accelerate global problem solving to benefit people, society, and the planet.” Prudential (2019) claims that through its corporate giving program, “We’re invested in creating long-term partnerships that strengthen communities, help tackle social challenges and solve complex problems.” Through its Sustainable Living Plan, Unilever (2019) intends to improve health and well-being for more than 1 billion people, reduce environmental impact by half, and enhance livelihoods for millions.
Are CSR initiatives providing the societal good that they promise? With corporations filling institutional voids and better positioned than governments to address many social problems (Besley & Ghatak, 2007), it is important to understand how well these initiatives fulfill their aims. Moreover, given that the resources firms allocate to CSR are limited and societal needs are not, it is essential to understand how these initiatives can be more effective, so that greater societal good can be generated from them.
In this review, we determine what is known about the performance of CSR initiatives, and we develop a path forward to improve CSR performance. We first illustrate patterns and changes in how CSR performance has been addressed over time. Thereafter, we delve into recent studies to identify the forefront of CSR performance. From these analyses, we determine that although the literature has advanced in specific ways, it continues to provide inadequate insight into the effectiveness of CSR initiatives, and we uncover factors that have stalled its progression. We then put forth a revised research agenda, based in design science, that can overcome these limitations, enabling scholars to determine how effectively CSR initiatives provide their promised benefits and identifying ways to make these initiatives more effective.
We begin by visually mapping the 6,254 articles addressing CSR performance published in the past 50 years. As expected, the analysis shows that studies have overwhelmingly focused on firm financial performance throughout. However, it also shows that scholars have not ignored social performance. Especially in the years since Margolis and Walsh (2003), Blowfield (2007), Wood (2010), and others decried the field’s focus on financial performance as the primary measure of the effectiveness of CSR initiatives, the literature has attended to various CSR performance outcomes. Studies have addressed environmental outcomes, such as changes in toxic (Li & Zhou, 2017) and carbon emissions (Wright & Nyberg, 2017); green innovations (Lampikoski, Westerlund, Rajala, & Moller, 2014); social innovations (Mithani, 2017); and human resource outcomes, such as diversity and gender equality (Nie, Lamsa, & Pucetaite, 2018).
Thereafter, we analyze the specific CSR and social performance measures 1 used in recent studies. Drawn from development studies, logic models (Weiss, 1972) portray the steps that link good intentions to realized outcomes and are widely used to evaluate the performance of governmental and nonprofit initiatives (Ebrahim & Rangan, 2014; Wry & Haugh, 2018). We create a CSR logic model and categorize the dependent variables used in the most-cited CSR performance studies over the past decade according to their placement within it. We find that nearly all of the papers fit into the early stages of the CSR logic model, assessing activities or immediate outputs. A handful of papers assessed intermediate outcomes, but none of these studies demonstrated that CSR initiatives achieved their ultimate intended impact.
Overall, our findings indicate that although the massive and ever-growing CSR literature has progressed beyond its long-standing focus on firm financial performance, it has stalled short of providing adequate insight into how effectively CSR initiatives fulfill their promises to society. Trends in the literature of exploiting a growing number of publicly available secondary databases that provide aggregate measures of social performance and turning to “big data” analyses have proven inadequate to advance the literature beyond its problematic state.
To chart a path forward, we look beyond the management literature, to the impact evaluation literature of development economics (Banerjee & Duflo, 2009; Duflo, Glennerster, & Kremer, 2007; Khandker, Koolwal, & Samad, 2010). An “evaluation revolution” has enabled the field of development economics to assess and improve the impact of many governmental and nonprofit initiatives. “This ‘evaluation revolution’ has made it possible to measure whether a given program or policy works, thereby turning the spotlight on the question of how to go about designing programs that are likely to work” (Datta & Mullainathan, 2014: 8).
Developing field-tested knowledge and solutions (R. King, 2012; Rousseau, 2012) based on rigorous impact evaluation are at the root of what Herbert Simon called “the science of design,” in which scholars turn “existing situations into preferred ones” (Simon, 1988: 67). Accordingly, we argue that the CSR field should reconceive itself as a science of design in which researchers formulate initiatives that seek to achieve specific social and environmental objectives. In doing so, CSR researchers will move toward “small data” research designs that enable studies to better determine causation rather than just identify correlation. This revised approach can make the tremendous resources continuously devoted to CSR scholarship more effective, so that the good intentions of business can better be realized by society.
The Big Picture: Mapping CSR Performance Over Time
Skeptical of government, people are increasingly turning to corporations to address social problems (Stewart, 2018). Whether calling on Domino’s Pizza to fix potholes (see pavingforpizza.com) or Facebook to stabilize democracy (Ohme, 2018), many are asking firms to ameliorate more and more of what ails society, big and small (Eggers & Macmillan, 2013). According to BlackRock CEO Larry Fink’s (2018) much-publicized letter to CEOs, public expectations of corporate responsiveness to social problems have never been greater.
Given fiscal pressures and the rise of political ideologies based on market liberation, many governments have ceded to firms the responsibility for addressing a myriad of social problems (Haufler, 2013). As the Business Roundtable’s (2019) Statement on the Purpose of a Corporation openly declared, many leading firms have embraced social responsibility. Heeding the popular advice of prominent business scholars, managers view CSR initiatives as a way to soften attacks on the “capitalist system [that] is under siege” while “creating shared value” that benefits both firms and society (Porter & Kramer, 2011: 4), as well as strategic investments in stakeholder relations (Freeman, 1984). Thus, firms typically take on a portfolio of CSR initiatives. Overall, the scale and scope of CSR is staggering. Annual spending by Fortune 500 firms on corporate philanthropy alone exceeds $15 billion; combined with time and money spent on other CSR initiatives, the total corporate investment is “countless” (Davidson, Dey, & Smith, 2019).
How effective is all of this CSR? Since CSR initiatives may serve as a substitute for government in providing a variety of critical social services (Gond, Kang, & Moon, 2011), it is important to assess their performance. Yet the literature that has exploded over the past several decades has shown little interest in key aspects of CSR performance. Scholars have shown that CSR can provide relational, reputational, and financial benefits to the firms undertaking it (Fombrun, Gardberg, & Barnett, 2000) but have seldom sought to validate the presumed benefits to society. The thousands of studies that have exhaustively sought to explain if (e.g., Orlitzky, Schmidt, & Rynes, 2003), how (e.g., Peloza & Shang, 2011), and when (e.g., Grewatsch & Kleindienst, 2017) it “pays to be good” have done so without giving due heed to how much “good” has actually been produced.
As Margolis and Walsh (2003: 289) lamented, “Although the financial effects of corporate social performance have been extensively studied, little is known about any other consequences of corporate social initiatives.” Blowfield (2007: 683) later asserted that “we know most about CSR’s impact on business itself and the benefits for business, and least about how CSR affects the major societal issues it was intended to tackle.” More recently, Wood (2010: 76), who, like Margolis and Walsh (2003), focused on the closely related concept of corporate social performance (CSP), argued that social impact continues to be overlooked: “The whole idea of CSP is to discern and assess the impacts of business-society relationships. Now it is time to shift the focus away from how CSP affects the firm, and towards how the firm’s CSP affects stakeholders and society.”
Yet, not every study has completely ignored the social good produced by CSR initiatives. The CSR literature began about 50 years ago (Wood, 2010), and “the theme of improving society was certainly in the mind of early theorists and practitioners” (Carroll & Shabana, 2010: 91). Though several reviews have rightfully criticized the well-established literature’s predominant focus on firm financial performance, they have not closely analyzed what those studies that have looked beyond financial performance have revealed over time. Moreover, many years have passed since many of these calls to look beyond financial performance were made, and in that time, thousands more studies have been published. In the face of rising public expectations and mounting scholarly calls for change, the literature may have advanced in assessing CSR performance.
Given that the concept of CSR itself has gone through many iterations over the past several decades (Carroll, Lipartito, Post, Werhane, & Goodpaster, 2012), it seems likely that the literature’s approach to CSR performance has also evolved over this time. To develop a comprehensive picture of this vast and dynamic field, we therefore begin with a broad analysis of performance measures used in the CSR literature over time. Prior CSR reviews have examined a short time period (e.g., Mellahi, Frynas, Sun, & Siegel, 2016) or focused on a limited range of journals (e.g., Aguinis & Glavas, 2012). We do not bound the time frame or similarly constrain the set of journals in our review.
To analyze a literature of this size, we use VOSviewer, a bibliographic mapping software program that creates item co-occurrence maps based on text mining of article titles and abstracts (Van Eck & Waltman, 2018). Bibliometric cartography enables visualization of the intellectual structure of a specific discipline via maps of the conceptual field, which allows us to see changes in how concepts have been used over time (Ding, Chowdhury, & Foo, 2001: 819; Linnenluecke, Marrone, & Singh, 2020). In this methodology, co-occurrence refers to the counting of paired data within a collection unit. When items co-occur, there is an association between them across titles and abstracts. When there is only a single pairing, the association is tenuous or spurious, but when the pairing occurs across many articles, then the association strengthens with additional pairings. Refer to the appendix for further description of VOSviewer’s methodology.
We searched the fields of business, management, ethics, economics, and environmental studies from 1968 to 2018 for articles containing the keywords “CSR,” “corporate social responsibility,” or “corporate social-responsibility” and “performance.” We chose performance as a keyword in order to find articles addressing any type of CSR performance, including social and environmental, as well as financial. Our objective was to be inclusive at this stage. We took articles from the Science Citation Index (SCI) and Social Science Citation Index (SSCI) databases from the Web of Science Core Collection on October 16, 2018. The search returned 6,254 articles with 76,925 items. 2 Figure 1 displays a count of articles published each year. Note that no published articles met our criteria until 1973. The CSR–performance field started coalescing in the 1990s, with 65 articles published in that decade. In the 2000s, 831 CSR–performance articles were published. A staggering 5,314 CSR–performance articles have been published thus far in the 2010s.

Corporate Social Responsibility–Performance Articles Published in Past 50 Years (6,254 Articles)
Figure 2 provides a bibliographic text map of the full sample (76,925 items), displaying the number of items grouped into each cluster and the link strength of each item. Items are the objects of interest, which in our case are text data taken from the titles and abstracts of the 6,245 articles. Relationships between pairs of items are indicated by links. For example, the word “responsibility” is an item and is linked to a host of other items, including “norm,” “ethic,” “justice,” and “governance.” The strength of the link is based on the number of publications in which two items occur together; the higher the strength, the stronger the link. Items are clustered via software processing (see appendix), which shows the number of items clustered into one group and the link strength of every item. The clustering process identifies important themes in the research field. Visually, the larger its label and circle, the more important an item is. Clusters located close to each other on the map indicate closely related subfields. Clusters are distinguished by different colors, and each item’s circle is displayed in the color of the item. Thus, it is necessary to view Figures 2 to 5 in color.

Cluster Analysis of Corporate Social Responsibility–Performance, 1968 to 2018 (6,254 articles)

Cluster Analysis of Corporate Social Responsibility–Performance, 1990s (65 articles)

Cluster Analysis of Corporate Social Responsibility–Performance, 2000s (831 articles)

Cluster Analysis of Corporate Social Responsibility–Performance, 2010s (5,314 articles)
At the root of the map is the item responsibility. To whom is the firm responsible? Four overall clusters containing 356 items give us insight into how the CSR literature has tended to answer this core question. The red cluster (132 items), which we label “Ethics & Governance,” includes responsibility to a host of societal actors, including the supply chain, community leaders, nongovernmental organizations (NGOs), the environment, entrepreneurs, and the social values and norms of the region (e.g., Van Bockstael, 2018). The “Ethics & Governance” cluster reflects a firm’s responsibility to secondary stakeholders (Clarkson, 1995). The pink cluster (74 items), “Responsible Marketing,” is another dominant theme, with activities relating to customer satisfaction, perceptions, intentions, and loyalty (e.g., Habel, Schons, Alavi, & Wieseke, 2016). This cluster reflects responsibility to the customer. The orange cluster (58 items), “Responsible HR,” reflecting responsibility to employees, has leadership, job satisfaction, personal values, and human resource management relating to CSR as areas of focus (e.g., Ali & Jung, 2017). Thus, the CSR literature has indeed been concerned with responsibility beyond the firm’s own doorstep throughout its lengthy history. However, the “Results” cluster’s (blue; 92 items) primary item, financial performance (e.g., Delmas, Etzion, & Nairn-Birch, 2013), evidences that this broad concern for responsibility has not tended to translate into the assessment of CSR performance, which instead has been focused on the bottom line, reflecting responsibility to the shareholder.
Figures 3 to 5 divide the vast CSR–performance literature by decade, enabling comparison across the three most recent decades, after the literature began to grow in the 1990s. Since each cluster lists their respective items, we can systematically track items to determine whether past items persist and see how new items enter the literature.
Figure 3, drawn from 65 articles published in the 1990s, shows 23 items grouped into three clusters. The first cluster (red; nine items), “Ethics & Governance”, includes items relating to the manager, business ethics, regulation, stakeholders, and interests (e.g., Smith & Hasnas, 1999). The second cluster (green; eight items), which we denote as “CSR Antecedents,” includes items such as relationships, attitude, and community (e.g., Besser, 1999). The third cluster (blue; six items), Results, includes performance, R&D, change, and impact (e.g., Poesche, 1998). The largest item, CSR, is linked to every other item but attention.
Figure 4, depicting the cluster map of the 831 articles published in the 2000s, shows an explosion of items, with 103 items grouped into five clusters. The Ethics & Governance cluster (red; 29 items) has grown to include such items as governance, legitimacy, business ethics, stakeholders, and corporate responsibility (e.g., Choi & Jung, 2008), with the largest item in the cluster being responsibility. The CSR Antecedents cluster (green) has also expanded (23 items) and includes items such as ability, commitment, community, and competitive advantage (e.g., Tracey, Phillips, & Haugh, 2005), with benefit being the largest item. The third cluster (pink; 20 items) introduces “Responsible Marketing” (e.g., Mao, Luo, & Jain, 2009) and includes items relating to customer, marketing, dialogue, and quality, with customer being the largest item. The fourth cluster (yellow; 20 items) introduces another subfield, “Environmental & International CSR” (e.g., Etzion, 2007), and includes such items as sustainable development, multinational enterprise (MNE), supplier, and adoption. The largest item in this cluster is pressure. The Results cluster (blue; 11 items) includes such items as CSP, financial performance, investment, profit, and return. Relative to the 1990s (Figure 3), this cluster has expanded to include socially responsible investment and environmental performance (e.g., Renneboog, Ter Horst, & Zhang, 2008). During this period, the literature focused on why firms undertake CSR (CSR Antecedents) and the targets of these CSR activities (Ethics & Governance, Environmental & International CSR, and Responsible Marketing). However, the main focus of results continued to be economic or financial performance.
The 2010s has seen a remarkable 5,314 CSR–performance articles published through mid-2018. Figure 5 depicts the cluster map, with a total of 321 items divided into four clusters. The Ethics & Governance cluster (red; 153 items) embodies many of the items found in Figure 4 but includes new items, such as poverty, education, mining, climate change, and partnership, indicating that the literature is now delving more deeply into the social aspects of CSR (e.g., Berrone, Gelabert, Massa-Saluzzo, & Rousseau, 2016). It is also in this cluster that we find interest in expanding responsibility to include environmental and social impacts (e.g., Nguyen, Boruff, & Tonts, 2018). Issues of social impact appear to have come from the literature on extractive and agricultural industries, where communities and NGOs are demanding that firms address the social impact of their operations (e.g., Esteves, 2008a, 2008b; Hofmann, Schleper, & Blome, 2018).
The Responsible Marketing cluster (pink; 74 items) continued to expand, including cause-related marketing, emotion, and loyalty (e.g., Pritchard & Wilson, 2018). A cluster emerged (orange; 16 items) that we denote Responsible HR. This emerging cluster entails the role of employees and human resource management in addressing CSR challenges (e.g., Carnahan, Kryscynski, & Olson, 2017). The Results cluster (blue; 78 items) contains the same items mentioned previously (e.g., Shabana, Buchholtz, & Carroll, 2017).
Table 1 summarizes the differences across time. Comparing Figure 4 to Figure 5, we see two clusters disappear: CSR Antecedents and Environmental & International CSR. This suggests that CSR research has begun to ask not just why firms undertake CSR but also how, when, and whether such activities have any impact from a conceptual (e.g., Wickert & De Bakker, 2018), quantitative (e.g., Shu & Wong, 2018), or qualitative (e.g., Reinecke & Ansari, 2015) perspective. Items related to costs and benefits and competitive advantage have been replaced with moral imperatives, justice, human rights, and reputation, absorbing the CSR Antecedents into the Ethics & Governance cluster. Items addressing environmental and international CSR are now bundled in the Ethics & Governance cluster, suggesting that environmental and international CSR issues are now being examined from an ethical or governance perspective.
Comparison of the CSR–Performance Literature Over Time
Note. CSR = corporate social responsibility; EPA TRI = Environmental Protection Agency Toxics Release Inventory; ESG = environmental, social, and governance; HR = human resources.
Our longitudinal comparative analysis of text maps suggests that the availability of large databases has driven the spread of specific CSR performance measures. For example, the appearance of the Environmental & International CSR cluster in the 2000s map coincides with the diffusion of the KLD database (see https://www.msci.com/msci-kld-400-social-index) and the U.S. Environmental Protection Agency’s Toxics Release Inventory database (see https://www.epa.gov/toxics-release-inventory-tri-program/history-toxics-release-inventory-tri-program-list). Many studies merged these social and environmental data with existing financial performance data from large databases, such as Compustat (available since 1962). The 2000s also saw an explosion of worldwide environmental, social, and governance (ESG) databases, such as Asset4, CSRHub, the European Union’s Environment Agency, Bloomberg ESG data service, Vigeo-Eiris, and others. 3 Akin to the corporate philanthropy literature, in which a recent review found a paucity of studies looking at the impact of corporate philanthropy on society because the literature is focused on areas where data are readily available (Liket & Simaens, 2015), it thus appears that, likewise, large secondary databases attract CSR scholars, thereby limiting work in areas that lack such data.
With broad evidence that the CSR literature has made some progress, we next take a closer look at the specific social performance measures used in recent studies. In the next section, we analyze the dependent variables used in high-impact CSR studies over the past 10 years to determine the current forefront in assessing the effectiveness of CSR initiatives.
A Closer Look at Recent High-Impact CSR Studies
Narrowing focus to a 10-year window enables a finer-grained approach than was possible in the multidecade analysis of the prior section but still leaves much ground to cover. From 2008 to 2018, 5,788 CSR–performance papers were published. For sake of feasibility, we focus our analysis on the most-cited articles during this period. The Web of Science Core Collection employs a citation-based evaluation tool, InCites, to determine which papers are most influential in a given field. Highly cited papers are the top 1% in each of the 22 Essential Science Indicators subject areas per year. As of July/August 2018, 106 papers received enough citations to place in the top 1% and so be considered Web of Science Highly Cited Papers for the business, management, ethics, environmental studies, and economics fields and publication year. Of these, we removed review papers (e.g., Bromley & Powell, 2012), papers focused on consumer perceptions or products (e.g., Nuttavuthisit & Thøgersen, 2017), and conceptual papers specifically focused on defining or criticizing CSR activities (e.g., Du, Bhattacharya, & Sen, 2010). Omitting these 37 papers left 69 to analyze. 4
To structure our analysis of these 69 highly cited papers, we adapt the logic model, a tool rooted in the development studies literature. In the 1960s, the U.S. Agency for International Development developed the logic model to map and evaluate complex aspects of aid interventions (Ebrahim & Rangan, 2014; Rogers, 2008; Weiss, 1972). The logic model illustrates that inputs feed into activities that result in immediate outputs, leading to medium- to long-term outcomes that have impacts on communities and ecosystems. This model has been adapted to examine the social performance of nonprofit organizations, philanthropy, and social enterprises (Ebrahim & Rangan, 2014).
Figure 6 presents our logic model, as adapted to analyze CSR impact. It contains four categories: (a) CSR activities; (b) immediate outputs from CSR activities, such as tax deductions, number of beneficiaries served, emissions, or financial performance; (c) outcomes, defined as the change in the output, such as reduced emissions, improved work environment, or improved quality of life; and (d) impacts, defined as the change in the output that is caused by the CSR activity (Khandker et al., 2010). Each of the authors read, assessed, and categorized each paper according to its fit within this model, based on the dependent variable used. 5 The few discrepancies were discussed and resolved in a group meeting.

A Corporate Social Responsibility Logic Model
Table 2 lists each of the 69 highly cited papers within their respective categories. Category 1 studies analyzed CSR activity as the dependent variable. Papers looking at the impact of CEO political ideology (e.g., Chin, Hambrick, & Trevino, 2013), ecological responsiveness (e.g., Hamann, Smith, Tashman, & Marshall, 2017), greenwashing (e.g., Delmas & Burbano, 2011), CSR ratings (e.g., Bear, Rahman, & Post, 2010), and political CSR (e.g., Scherer, Rasche, Palazzo, & Spicer, 2016) on CSR activities are included in this category. Of the 69 papers, 30 (43%) examined factors influencing CSR activities.
Highly Cited CSR Articles by Category, 2008 to 2018
Note. References available from authors upon request.
Composing Category 2, 35 papers (51%) had a result or output of a CSR activity as the dependent variable. Of these, 80% had financial performance as the dependent variable (e.g., Flammer, 2015). That is, the focus was on the business case for CSR rather than on the social output of the activity. In the seven papers that did not focus on financial performance, food waste (Devin & Richards, 2018), emissions (Chatterji, Levine, & Toffel, 2009), employee engagement (Flammer & Luo, 2017), societal goals (Benabou & Tirole, 2010), stakeholder relations (Bhattacharya, Korschun, & Sen, 2009), and environmental innovations (e.g., Kesidou & Demirel, 2012; Varadarajan, 2017) were the dependent variables.
Moving to Category 3, only four papers addressed the outcome of CSR activities. Jamali, Lund-Thomsen, and Khara (2017) examine whether CSR initiatives among small and medium-sized enterprises (SMEs) in an industrial cluster in India served the intended purpose of improving the quality of life of marginalized workers. They collected fieldwork data in the Jalandhar football manufacturing cluster. They interviewed marginalized workers to see work conditions from their perspective. Jamali et al. (2017: 480) found that “local SME manufacturers have circumvented local labor laws and outsourced the most labor-intensive aspects of football manufacturing—the stitching of footballs—to home-based locations,” which led to deteriorating social outcomes for the stitchers.
Kitzmueller and Shimshack (2012) provide an economic perspective on CSR and its outcomes. The standard economic argument is that the market’s invisible hand allows consumers and corporations to pursue their self-interests and that government intervenes to address market failures resulting from externalities by redistributing income and wealth via the tax system and regulations. Unfortunately, governments can fail to provide ample oversight, creating a void wherein citizens and corporations voluntarily undertake CSR activities beyond their legal and contractual obligations (Benabou & Tirole, 2010). Kitzmueller and Shimshack (2012) argue that more research is needed to determine the welfare properties of CSR. More specifically, questions relating to who, what, when, and how to measure changes in welfare need to be developed. McWilliams and Siegel (2011), drawing from economics, suggest that hedonic pricing and contingent valuation techniques are methods managers can use to measure the value of a CSR action to society. Firms can capture the value of providing such social goods through gains to reputation (Fombrun & Shanley, 1990). Finally, Devika, Jafarian, and Nourbakhsh (2014) propose a supply chain network design in which the focus is not solely on minimizing total costs or maximizing profit but includes environmental and social impacts. Here the social and ecological dimensions are quantified and embedded as distinct objectives along with total costs. This requires researchers to correctly model and provide initial values for the system being examined and to subjectively account for trade-offs between objectives.
Concerning the impact of CSR activities, the pipeline goes dry; no papers fit Category 4. Thus, these studies focused almost entirely on inputs and practices, with a handful addressing immediate outputs, often drawn from secondary data sources that take account of relative levels of corporate spending toward various social issues. This is consistent with Ebrahim and Rangan’s (2014: 123) broader insight that “outcome measurement is less common and more difficult to do given that organizations have the most control over their immediate activities and outputs, whereas outcomes are often moderated by events beyond their organizational boundaries.”
As a check on this lack of impact measures among highly cited papers, we performed an additional analysis of all 5,314 CSR–performance articles published in the past decade, using “social impact” as the keyword. The search returned 24 papers. Two were unavailable because they were published in obscure journals. Of the remaining 22 papers, only 3 actually evaluated and measured the social impact of the CSR activity, and they are very recent. Sinha and Chaudhari (2018) examined the impact of a CSR program that develops special classes for weaker students. The study includes a pre- and post-test but no control group. Thus, it cannot determine what might have happened to students in the absence of the initiative. Loosemore and Bridgeman (2018) examined employee volunteering in schools. Similar to the Sinha and Chaudhari (2018) study, it includes an evaluation of students before, during, and after participation in the program but also fails to include a control group. Both of these studies allow the analyst to determine whether students attained educational goals but cannot infer causality. Luo, Kaul, and Seo (2018) examined the impact of corporate philanthropy on oil spills. In a rigorously designed study that uses panel data to confidently infer causality, they theorize about adverse selection and moral hazard and find that philanthropy increases oil spills, which is not the intended result. Thus, after an extensive analysis of the thousands of studies populating the CSR–performance literature, we find that the literature has advanced, but we do not find a single study that adequately demonstrates that CSR initiatives resolved the social problems they intended to address.
Designing a New Approach to CSR Studies
The analyses we have presented show that although the literature has advanced along the framework depicted in Figure 6, it has stalled far short of offering substantive insights into the degree to which CSR initiatives achieve their intended impact. We have gone beyond prior reviews through uniquely comprehensive and current analyses that map the literature’s progression and show the specific ways that it has advanced. In this section, we go yet further by building on our prior analyses to explain how to move the literature past its long-standing limitations.
To push the CSR literature firmly into impact assessment, we return to the development literature, from which we drew the logic model. Impact evaluation is an enormous concern in the development literature, given the interests of national governments and international development organizations, like the World Bank, in the effectiveness of development initiatives (Banerjee & Duflo, 2009; Duflo et al., 2007; Khandker et al., 2010; Ravallion, 2008). Vast sums of money are spent on development initiatives, and taxpayers demand accountability. As Khandker et al. (2010: 4) explain, “The main question of impact evaluation is one of attribution—isolating the effect of the program from other factors and potential selection bias.” Similarly, in the CSR literature, we seek to make attributions about the effects of these initiatives (e.g., Wood, 2010). A number of studies have cited the parallels and complementarities between the CSR and development literatures (Oetzel & Doh, 2009; Sagebien & Whellams, 2010). Several authors have examined CSR through the development lens (Blowfield, 2005; Newell & Frynas, 2007). There is even a debate regarding whether CSR is good or bad for development (Sagebien & Whellams, 2010). Interestingly, failure to focus on impacts underpins this literature’s arguments against CSR.
Khandker et al. (2010) divide impact evaluations into ex ante and ex post approaches. Ex ante evaluation consists of methods to forecast the impact of a program before it is implemented, while ex post evaluation uses data derived from the intervention to evaluate actual outcomes. In either case, the key to impact evaluation is the identification of a counterfactual, which is what would have happened to subjects if they had not participated in the program. This goes beyond simply comparing outcomes with initial objectives. A single observation after the program has been implemented may determine whether the objective of the program was met, but it cannot say that this objective was reached because of the program.
Given that firms engage in specific programs and target beneficiaries based on specific needs, selection bias makes causal inference in impact evaluation of CSR initiatives difficult. For example, early and late participants in voluntary environmental programs (VEPs) systematically differ in their substantive or symbolic implementation of such programs (Delmas & Montes-Sancho, 2010). Firms that adopt VEPs are often dirtier than firms that do not, so a comparison between participants and nonparticipants may provide an inaccurate assessment of impact. If program participants were assigned to programs randomly, then there would be no selection bias and impact evaluation would be more straightforward (Duflo et al., 2007). However, selection bias can occur either because the firm purposely invites targeted groups to participate or because beneficiaries voluntarily decide to join. Where groups are invited, an objective, such as family income, may be a criterion in selection, which makes assignment to the CSR initiative nonrandom. Where groups choose to participate, there may be unobserved factors influencing their choice.
Given that randomization of assignment to treatment and control groups is not always possible, statistical controls must be employed to account for observed and unobserved factors that influence self-selection (Duflo et al., 2007; Duflo & Kremer, 2005). An intermediate step is results-based monitoring, in which firms establish specific goals, indicators, and targets for their CSR initiatives. Observation before and after the intervention can determine whether it has reached its goals; however, no causal inference is possible.
The selection bias problem can be illustrated with a simple model adapted to the CSR context from Duflo et al. (2007) and Khandker et al. (2010). Subjects are randomly assigned to participate in a CSR initiative or not. The treatment effect of participating in the initiative is the difference in outcomes between those who participate and those who do not—the control group. We can write this as follows:
where I is the impact or the treatment effect of the CSR initiative, Oi is the outcome for participant i, CSR is the CSR initiative (the treatment), and C is the control group. To this equation, we add and subtract
The average treatment effect is
The extent of the selection bias is
From this simple model, we can identify four elements that are central to an impact evaluation. First, impact evaluations need a baseline comparison, which consists of the measurement of, or information about, the level of the outcome variable before participation in the CSR initiative. Second, impact evaluation requires a control or comparison group. The control group should be as similar as possible to the participants in the CSR initiative. Third, randomization of assignment of participants to the CSR initiative ensures the equivalence of the participants versus nonparticipants. Where randomization does not occur, careful attention must be given to selection bias. Fourth, impact evaluation requires a counterfactual, that is, an estimate of what would have happened to the affected parties had they not received the treatment (Khandker, Koolwal, & Samad, 2010). This is the gold standard in, for example, medical research, wherein randomization is used to isolate the treatment’s effect on subjects (Hariton & Locascio, 2018). Table 3 summarizes these four requirements for effective impact evaluation and provides examples from the CSR literature that have sought to address them.
Methodological Problems, Solutions, and Examples in Impact Evaluation
Note. References available from authors upon request. CSR = corporate social responsibility.
In addition to these four elements of impact evaluation, there are further insights from development studies to consider. First, development research does not draw from large, secondary data sets. Rather, it develops appropriate measures based on the specific impacts that it seeks to evaluate. Many of these measures are drawn from surveys of participants and nonparticipants in specific initiatives regarding health, education, and other outcomes. Quantitative CSR research currently relies heavily on large, secondary data sets, which do not provide the fine-grained data needed to evaluate specific types of CSR initiatives (Blowfield, 2007). Instead, they often aggregate complex data into composite scores that ignore the commensurability of the underlying dimensions and their weighting (Capelle-Blancard & Petit, 2017). In contrast, the experimental designs from developmental studies entail primary data that are specific to each initiative. This sort of “small data” is also needed for CSR impact evaluation.
A second insight from development studies is that impacts are not evaluated at the level of the sponsoring institution, be it government, NGO, or in the case of CSR, firm. Instead, they are evaluated at the initiative level in relation to the intended beneficiaries. For example, an educational initiative could result in changes in student test scores, which could be evaluated at the level of the student (student score), classroom (average score for classroom), school (average score for school), or community (average scores for community), among others. In contrast, most CSR research occurs at the firm level; few studies take into account CSR at a community or regional level (for exceptions, see Husted, Jamali, & Saffar, 2016; Marquis, Glynn, & Davis, 2007).
Using development studies as a guide, we see that the focus should be on specific CSR initiatives and their impacts on intended beneficiaries, whether individuals, communities, or ecosystems, not firms or the sponsors of the initiatives. These impacts may be scalable at a societal level where governments can coordinate and mandate the implementation of such initiatives, but the studies focus on the impact of specific initiatives on intended beneficiaries. Development initiatives may be evaluated at a national or an international scale, but even at such a broad level of analysis, they involve testing specific initiatives in randomly selected areas in order to determine whether they will have impact before scaling. See, for example, Behrman, Sengupta, and Todd (2005), who randomly select 506 rural villages to evaluate the impact on enrollment rates and educational achievement of the Mexican Progresa program, which involved transfer payments to families based on the enrollment of their children in schools.
Overall, how well can these methods be applied in the CSR literature? Development projects differ from CSR initiatives in terms of size and sponsorship. Most CSR projects are relatively small in comparison. For example, Banerjee et al. (2015) report on six randomized control trials that examine the “Graduation Program,” which provided training, mentoring, and encouragement for savings and health education and/or services. These trials took place in six countries and included follow-up surveys to measure the sustainability of the impact 1 year after the program ended. The scale of evaluation was immense given the size of the program. As the size of a project decreases, it becomes less cost-effective to engage in sophisticated impact assessment. This is the situation many firms face with their CSR initiatives.
Furthermore, the sponsors of impact evaluation for development studies are usually governments or intergovernmental organization. For CSR, the sponsor is a firm or group of firms. This difference in sponsorship matters because development projects are accountable to the citizens that elect governments or the governments that support intergovernmental organizations. These groups may be more concerned about the effectiveness of tax dollars spent on social welfare, which is core to the purpose of governments and intergovernmental organizations, than are stockholders concerned about corporate dollars spent on CSR initiatives, which are seen as peripheral to the corporate mission and may serve purposes of legitimization rather than of achieving social impact. Thus far, firms have largely gained the benefits of CSR without providing clear evidence of substantive social impact (cf. Margolis & Walsh, 2003).
These differences suggest that the implementation of impact evaluation for CSR initiatives will need to be adapted to the scope and scale of corporate initiatives. Impact evaluation is increasingly being undertaken by social enterprises (Rawhouser, Cummings, & Newbert, 2019), which are smaller in nature and possibly more relevant to corporate initiatives. However, these differences do not undercut the basic idea of impact evaluation as it is practiced in development studies. The methods to evaluate social impact exist, but they will require CSR scholars to alter course, after many decades of largely ignoring them.
A further insight from impact evaluation as practiced within the development studies literature is that it constitutes a science of design. As Simon (1988: 67) wrote, “Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.” Development studies offer different methods for testing alternatives to achieve changes in development objectives. Development thus shares a common focus with other disciplines, such as medicine, architecture, and engineering: searching for alternative solutions and rigorously testing these alternatives (R. King, 2012; Roth, 2002). This systematic search for alternatives and their testing provides a road map for research in the CSR field, which we next discuss.
Discussion
As portrayed in their reports, press releases, websites, and other public statements, businesses intend to do a tremendous amount of good for society through their countless CSR initiatives. How well do these initiatives fulfill their promises? Our review finds that the massive CSR literature offers little evidence of the actual impact of CSR initiatives. The ever-growing field is evolving but still primarily assumes, rather than validates, that the myriad CSR activities that firms undertake generate the positive impacts that they intend.
Even when they have purported to focus on impact, CSR studies have stopped short of doing so. For example, in their popular article that seeks to explain social change by putting “the S back in corporate social responsibility,” Aguilera, Rupp, Williams, and Ganapathi (2007: 841) developed a complex figure that shows the myriad drivers of change in CSR. Yet, it concludes with only a singular bold arrow connecting change in CSR to social change. The study simply presumes that CSR initiatives help society. Peloza and Shang (2011) reviewed the literature to determine how CSR creates value for stakeholders, but they focused on financial returns to the firm from improved stakeholder relationships, not on the benefits to stakeholders. Aguinis and Glavas (2012), in clarifying “what we know and don’t know about corporate social responsibility,” reviewed the outcomes of CSR actions across different levels of analysis, yet no outcomes extended beyond the firm and its employees. Thus, CSR studies have demonstrated at best that it can pay to intend to be good. To move beyond good intentions and determine how to assess and improve the impact of CSR initiatives, “Colleagues, we have a lot more work to do” (Wood, 2010: 76).
After thousands of studies, how is it possible that CSR scholars still have a lot more work to do? Our analyses suggest that the analytical approach used in the CSR literature, while increasingly sophisticated, is inherently limited. As Margolis and Walsh (2003: 268) noted, we face a world that “cries out for repair,” and repair requires not just analysis but action. Despite insistent calls to refocus CSR research spanning decades, there continues to be a paucity of action-oriented research. There is still no theory of CSR to test, at least as it relates to the firm (cf. Freeman, Phillips, & Sisodia, 2020), and so no basis for taking the scholarly actions necessary to ensure that CSR is impactful, under the current norms of the literature.
To move forward, we must take a different approach. Our findings suggest that to bridge the gap between analysis and impact, enabling studies to become action oriented and solution focused without sacrificing their scholarly underpinnings, CSR research should be reconceptualized as a science of design. According to Simon (1988), design involves two problems: searching for alternatives and choosing among alternatives (either based on optimizing or satisficing). Thus, CSR research as design involves the development and testing of the impacts of alternative CSR initiatives. Taking a design approach, CSR scholars transform from passive observers and assessors of organizations into active agents in designing and redesigning organizations to create a better world. Guiding managerial decision making toward the most efficient and effective means of achieving specific impacts—positive social changes—becomes the objective of CSR research.
In turning toward design in CSR research, we may advance not only the field of CSR but also the field of organization design. Organization design entails “explicit efforts to improve organizations” (Dunbar & Starbuck, 2006: 171). The original purpose of seeking to improve organizations concerned “both the effects of organizations and how to extract more benefits from them” (Dunbar & Starbuck, 2006: 171). Yet, in the ensuing decades of research on organizations, “few organization researchers, for example, have focused on the social problems associated with organizations. . . . They have concentrated on making organizations more efficient or profitable and have not devoted resources to the effects of organizations on their employees, their communities, or their societies” (Dunbar & Starbuck, 2006: 171).
A design approach requires CSR researchers to account for each firm’s unique situation, as no two are the same (Romme, 2003; van Aken, 2005). It is thus problematic to find generalizable implications. However, through a design approach, scholars may develop field-tested technological rules and solution concepts that provide strong grounding for application to other settings (Bunge, 1967). This may be unsatisfying to the theory driven. However, we have been disputing the drivers of CSR performance for decades (Barnett, 2007), and our leading framework is described by its originator as atheoretical (Freeman et al., in print). Thus, there is little downside in moving beyond correlational models to design models that seek to draw out the causal link between CSR initiatives and some dimension of societal well-being.
A design approach entails a focus on what Romme (2003: 567) calls an ideal-target system that “can inspire, motivate, and enable agents to develop new organizational processes and systems.” According to Dunne (2018: 5), this requires reflective conversation in which the “success or failure of each solution attempt reveals more information and builds a tacit understanding of the problem.” Thus, it is an iterative and comparative process of learning from active interventions (Schön, 1983). If a firm seeks, for example, to eliminate hunger, it may engage in an array of CSR activities, such as breakfast programs for school kids, employees volunteering in soup kitchens, and so on. Firms should use the most efficient and effective means to do the most good with their limited resources, but the CSR literature offers scarce guidance to firms seeking to do so. CSR scholars using a design approach could accumulate knowledge about the most effective practices for ending hunger by designing several different pilot programs, measuring and monitoring the performance of each, then comparing and contrasting across designs to determine effectiveness in various settings.
An example of a well-reported and even praised report assessing the impact of a series of CSR initiatives is the case study of Unilever in Indonesia undertaken jointly by Oxfam and Unilever (Clay, 2005). Unilever was engaged in numerous projects intended to alleviate poverty. The assessment of the impact of Unilever included its effects on employment, its value chain, its low-income consumers, and its community engagement initiatives. The report is full of data and figures about the number of jobs generated, taxes paid, and monetary value distributed in the value chain as well as amounts invested in various philanthropic activities. This report has rightfully been praised for its transparency and comprehensiveness. Nevertheless, it is impossible to assess impact because of the lack of a counterfactual or comparison groups in these many initiatives. In some cases, baseline amounts were recorded (e.g., employment in a region), but in other cases, not. This joint impact assessment thus provides an example of current best practice but falls short of what is required for causal inference in order to conclude that these goals would not have been achieved without Unilever’s intervention.
Much as corporate foundations increasingly demand that organizations receiving their funding demonstrate impact on the social problems they seek to address (Ebrahim & Rangan, 2014), such as climate change, inequality, and human rights, corporations should be expected to demonstrate impacts associated with their CSR investments. According to Frynas (2008: 276), Indeed, if a firm chooses to spend a significant proportion of their funds on CSR-related initiatives, it would be in its own interest to have objective data to demonstrate any societal benefits from CSR. The linking of CSR to development requires a new repertory of tools and mechanisms by which such private interventions can be justified, planned, executed, and evaluated. Of course, such tools already exist in development schools and the public sector. But, until now, such tools are missing from private sector initiatives, and the claims about the contribution of CSR to international development cannot be verified.
Moving down this path is not easy, though, which explains why the literature has not yet embarked. Design processes are exploratory, not exploitative. Design research requires the researcher to “understand user experience, explore alternative problem frames, and work toward solutions” (Dunne, 2018: 3-4).
While it is certainly a complex undertaking to reorient the massive CSR field in this way, there is plentiful low-hanging fruit to harvest as we initiate this turn. Using established data sets like Bloomberg ESG, for example, scholars could relate the varying policies and practices used by different firms to achieve the same social impact and compare their efficacy in achieving it. Clearly it is important to distinguish indicators of CSR activities and output variables that dominate this and other data sets from those indicators that actually describe outcomes or impacts (e.g., Graafland & Smid, 2019), but significant opportunities nonetheless exist to move in the right direction. Thus, even though a design approach requires active primary data gathering, we may embark down this path from the starting point of secondary data analyses.
Beyond the low-hanging fruit of secondary data analysis, a variety of methods and settings are available for CSR scholars taking a design approach. Adapted from R. King’s (2012: 280) guidelines for economic design scholarship, the following guidelines can be used:
Formulate a creative CSR initiative with a clear purpose.
Design the initiative to solve a relevant social or environmental problem.
Establish the effectiveness of the initiative with robust evaluation methods.
Provide contributions that are novel and compelling.
Develop and evaluate the initiative with rigorous research methods.
Iteratively search for the functionality of the initiative, taking into account the context in which it operates.
Disseminate the results in a compelling way.
These guidelines can be used, for example, to assess the performance of social impact bonds, which are pay-for-performance contracts in which the government agrees to pay a service provider only after it has demonstrated that it has achieved a specific set of social outcomes (Pandey, Cordes, Pandey, & Winfrey, 2018). Each unique contract stipulates the performance metrics associated with a social outcome, such as reduced recidivism rates in criminal justice systems and improved graduation rates in school systems. Tse and Warner (in press) compared three early childhood care and education social impact bonds operating in the United States using public documents, loan agreements, press releases, and interviews with persons involved in the design, launch, and implementation. They found that social-impact bonds brought about greater public and political support for broader public funding for early childhood care and education, based on “their capacity to translate a complex service into quantifiable, investable variables that can change policy” (Tse & Warner, in press: 11).
Another step forward is to move beyond impacts on specific stakeholders by linking firm-level outcomes to impacts at the population level. The key is to understand the relationship between changes at the initiative level and changes at a population or ecosystem level. Conceptual models and spatial methods are essential to linking these levels (Doh & Hahn, 2008). With the advent of big data, it is more feasible to apply spatial econometrics and other methods to study the firm–population relationship. However, the development of spatial methods has outpaced understanding of the mechanisms driving the relationships, so there is a significant opportunity for CSR scholars to contribute.
Individual firm-level impacts can also be aggregated at a local level. Even across widely diverse industries, firms located in the same area can cooperate to achieve impact. One example is the Five-Percent Club in the Twin Cities area of Minnesota, which consisted of local firms that committed to contributing 5% of profits to local community causes (Galaskiewicz, 1997). Community pressures can augment or decrease the impact of CSR initiatives in a community, such as those studied by Marquis, Glynn, and Davis (2007) in Cleveland and Columbus, Ohio. CSR initiatives aggregated at the community level form part of what Ostrom (2010) called “polycentric governance of complex economic systems,” which allows diverse actors in the public sector, private sector, and civil society to act together to govern common-pool resources and public goods at different levels of analysis. This may include local communities, nation states, and the planet as a whole. Although her institutional analysis and development framework provide a vocabulary to include these diverse actors, Ostrom argued that more specific theories are needed to examine the relationships among variables and actors.
Industries also commonly work together to address social problems. Environmental and social certifications exemplify this kind of collective action. One well-known example is the Responsible Care initiative that emerged from the chemical industry after the infamous explosion in 1984 at a Union Carbide plant in Bhopal, India. Whether Responsible Care had an impact on environmental performance is debatable (A. King & Lenox, 2000; Prakash, 2000), but the chemical industry sought to create change through a program, which can be assessed. Thus, industry activity, not just location, is also a basis of collective action for social impact.
Conclusion
Our review finds that despite many calls to change course, adherence to established analytical approaches has left the CSR literature unable to assess the effectiveness of CSR initiatives. By reorienting as we have suggested—(re)turning to management’s roots in organization design and small data—the CSR literature can assess impact. This will provide firms the insights needed to select and design CSR initiatives that can realize their good intentions.
Society faces many grand challenges. To address them requires a grand (re)design of CSR research, as we have put forth. Just as in the 19th century, business experiments with old-age pensions and health insurance provided the basis for modern social insurance (McCreary, 1968), so, too, business experimentation to develop well-designed CSR initiatives can provide solutions to the grand challenges that we face today (George, Howard-Grenville, Joshi, & Tihanyi, 2016). Experimentation and design are the keys. Though the extensive CSR literature has stalled, if reoriented toward an exploratory, experimental design approach, guided by what works, it may yet help people to live better lives.
