Abstract
Social innovations (SIs) frequently bring previously unrelated actors, ideas, and practices together in new configurations with the goal of addressing social needs. However, the dizzying variety of definitions of SI and their dynamic, exploratory character raise dilemmas for evaluators tasked with their evaluations. This article is based on a systematic review of research on evaluation, specifically an analysis of 28 published peer-reviewed empirical studies, within SI contexts. Given that design considerations are becoming increasingly important to evaluators as the complexity of social interventions grows, our objectives were to identify influences on design of evaluations of SI and clarify, which SI features should be taken into account when designing evaluations. We ultimately developed a conceptual framework to aid evaluators in recognizing some differences between SI and conventional social interventions, and correspondingly, implications for evaluation design. This framework is discussed in terms of its implications for ongoing research and practice.
“Everything we evaluate is designed. Every evaluation we conduct is designed. In our profession, design and evaluation are woven together to support the same purpose—making the world a better place,” wrote John Gargani, the President of American Evaluation Association (AEA), in his call for proposals for the 2016 annual conference. As noted in the call, social programs now take many different forms, ranging from those operated by nongovernmental organizations to provide services to individuals, to collective impact initiatives sponsored by philanthropic organizations, and private sector initiatives such as social enterprises, sometimes aimed at system-level change. As the range of initiatives diversifies, so too must the designs evaluators employ.
Initiatives running under the banner of “social innovation” (SI) are in particular need of attention. Presently, conceptual challenges exist with respect to understanding the differences between SIs and conventional social programs (Preskill & Beer, 2012). Because design decisions in evaluation ideally flow from conceptual clarity about the evaluand, we seek in this article to contribute to ongoing work aimed at delineating SI (e.g., Cunha, Benneworth, & Oliveira, 2015; Edwards-Schachter & Wallace, 2017; Lorinc, 2017; Pol & Ville, 2009; Westley, 2013) but with a special focus on the implications for evaluation design. We draw on findings from a review of empirical evidence from 28 published studies on evaluation in SI contexts to explore factors that influences evaluation design in SI contexts. 1 We begin by introducing the concepts of SI and “evaluation design” before delving into our methods and findings. We then discuss implications for evaluators and present a heuristic framework developed through this study for identifying features of SI that are relevant to evaluation design. We propose this framework as a contribution to efforts to improve conceptual clarity of SIs and as a practical tool for evaluators.
SI
Over the past 15 years, there has been an increased, explicit emphasis on developing and supporting SIs, although, arguably, SIs have been around for centuries (Mulgan, 2006). Today, the term SI is used to refer to a plethora of models and entities including microcredit organizations, eco-innovation partnerships, and new medical practices, to name a few. In the United States, the Social Innovation Fund alone has invested over US$295 million in federal grants and collected over US$627.5 million in partner commitments between 2009 and 2016 (Corporation for National and Community Service, 2016). In Canada, governments and foundations, annually allocate tens of millions of dollars to SI projects (Government of British Columbia, 2017; Government of Canada, 2014; J.W. McConnell Family Foundation, 2017a; Social Science and Research Council, 2016).
There is no single, widely accepted definition of SI, although multiple efforts have been made to advance conceptual clarity in scholar and practitioner communities (Cunha et al., 2015; Edwards-Schachter & Wallace, 2017; Lorinc, 2017; Pol & Ville, 2009; Westley, 2013). The term SI has a varied and sometimes imprecise application across contexts (Pol & Ville, 2009). For example, Westley and Antadze (2010) note the term is sometimes used interchangeably with “social enterprise” and “social finance,” despite there being significant differences in the configurations, purposes, and funding bodies of these initiatives. For some observers, SI has earned the label of a catchword applied without any effort at conceptual precision (Godin, 2015).
Among the most prominent definitions, Westley and Antadze (2010) define SI as a “complex process of introducing new products, processes or programs that profoundly change the basic routines, resource and authority flows, or beliefs of the social system in which the innovation occurs” (p. 2); while Mulgan, Tucker, Ali, and Sanders (2007) highlight that SIs are primarily motivated by meeting a social need and are developed through a range of organizational bodies such as governments, the private sector, nonprofit organizations, and social movements, differentiating SIs from business innovations primarily motivated by profit maximization (Mulgan, 2006). For our purposes, we defined SI relatively broadly as a process or product aimed at achieving social good by enabling actors to collaborate across conventional boundaries, to alter relationships and/or other resourcing to make positive change. We used this definition to include or exclude studies in our sample (see Methods section).
What Do We Mean by Evaluation Design?
In basic terms, an evaluation design typically “identifies what questions will be answered by the evaluation, what data will be collected, how the data will be analyzed to answer the questions, and how the resulting information will be used” (Wholey, Hatry, & Newcomer, 2010, p. 2). The choice of design in an evaluation ought to be closely linked to the specific needs of the program or initiative in question (Devaney & Rossi, 1997). Each design component is meant to shed light on specific aspects of the program and its real-life applications, thus making design quite context-specific (Devaney & Rossi, 1997; Wholey et al., 2010). Given this emphasis on context specificity, design decisions should be tailored in a way that makes sense for the program and attendant information needs of the stakeholders in question. The two most frequently discussed broad evaluation types are formative evaluation (which included implementation and process evaluation) and summative evaluation (also sometimes referred to as outcome or impact assessment). More recently, developmental approaches to evaluation are also gaining prominence (Patton, 1994, 2011). The following section will provide a brief overview of related design options all of which have implications for the evaluation of SI (Antadze & Westley, 2012; Preskill & Beer, 2012).
Evaluation Approaches and Design Implications
Formative and summative approaches to evaluation are well known within the evaluation community and require little comment here. Despite Scriven’s (1991) reminder that the term formative evaluation was intended to connote a preliminary summative evaluation early in the process of program development and implementation, the field concurs that formative evaluations are essentially program improvement oriented (Devaney & Rossi, 1997; Patton, 2011). In contrast, summative evaluations are considered to be judgment oriented and necessarily implicate the comparison of program observations against something (e.g., alternative program or control condition, baseline performance, and an external standard or theory of change). Despite widespread agreement about form and function, Mark (2009) reminds us that purely summative evaluations responding to “fork-in-the-road” program decisions (e.g., program renewal or termination) are relatively rare in practice. Rather, formative and summative evaluation objectives are commonly integrated within the same design.
Evaluations which are predominantly formative tend to draw from multiple streams of inquiry using qualitative methods (e.g., interviews, site observations, focus groups, and document analysis) or quantitative methods (e.g., longitudinal data on service provision, field surveys; Devaney & Rossi, 1997; Henry, Smith, Kershaw, & Zulli, 2013) or both. Randomized control trials (RCTs) are often regarded as the “gold standard” for summative evaluation, although this assertion has been hotly contested in evaluation forums such as EVALTALK. Some argue that quasi-experimental designs using comparison groups may be utilized when an RCT design is not feasible (Devaney & Rossi, 1997). Given the variety and scope of interventions, some summative evaluations also rely on nonexperimental designs, such as contribution analysis (Mayne, 2012). Such design choices are regarded by some as a weaker alternative for causal analysis (Craig et al., 2012), although, as mentioned, this assertion is controversial. To bolster the strength of evaluation findings, evaluators working with nonexperimental designs may use multiple sources of evidence (Craig et al., 2012).
Over the past few decades, and due to an increased interest in utilization-focused evaluation (Patton, 2008, 2012), discussions about “design” have come to include the conscious decision to invite program staff and stakeholders to contribute to the process by engaging with evaluation question formation, planning, data collection, analysis, and reporting (Cousins & Chouinard, 2012; Greene, 1987). Participatory evaluation approaches deliberately stress the participation of stakeholders in design activities and emphasize the importance of rapport, trust, and credibility, and paying close attention to social as well as technical processes within the evaluation cycle (Cousins & Chouinard, 2012; Greene, 1987). Also important is participatory evaluation’s contribution to conceptualizing degrees of participation in terms of diversity (all legitimate groups vs. primary users), depth of engagement (deep participation vs. light touch consultation), and level of control regarding the evaluation technical decision-making (practitioner controlled vs. researcher/evaluator controlled; Cousins & Chouinard, 2012; Cousins & Whitmore, 1998).
Since its inception in the mid-1990s (Patton, 1994), developmental evaluation (DE) has grown in importance. It has been distinguished from formative and summative evaluation, essentially as a third, markedly different approach (Antadze & Westley, 2012; Patton, 1996, 2011; Preskill & Beer, 2012). DE does not focus on making punctual improvements (as per formative evaluation) or final judgments about program worth or effectiveness (as per summative evaluation). Instead, DE emphasizes working collaboratively within program teams to generate current data to support decisions or explore alternate solutions and to embed evaluative thinking into development of the initiative (Henry et al., 2013; Patton, 1996, 2001, 2011). With respect to design, DE is guided less by a preference for specific methods or insistence on the inclusion of comparative elements and more by a commitment to establishing strong relationships and understanding the unique features of program contexts as well as how program actors and stakeholders intend to use evaluation data and findings (Patton, 1994, 2011).
Evaluation Design and SI
The evaluation of SI remains an understudied stream of inquiry, but this is likely to change with the growing interest in SI. Evaluations conducted on SIs have employed a variety of designs, including those often used for summative evaluation such as RCTs, quasi-experimental designs, and social impact assessment (e.g., social return on investment; Corporation for National and Community Service, n.d.; Spila et al., 2016; The Young Foundation, 2012). It has been argued, however, that SIs do not fit the mold for linear “conventional” evaluation designs and such designs may inadvertently thwart the momentum of the nonlinear processes of experimentation and learning that characterize SI contexts (Preskill & Beer, 2012). In this debate, DE has emerged as an important player, with DE practitioners and theorists claiming the collaborative, adaptive stance at the heart of DE is well suited to working in complex social contexts, where social interventions are more fluid and less programmatic and where solutions are emergent (Gamble, 2008; Patton, 2011; Patton, McKegg, & Wehipeihana, 2015; J.W. McConnell Family Foundation, 2017b).
The struggle of how to approach SI evaluation may stem in part from the differences that those working in (or with) SI communities imply or claim to exist between SIs and conventional social programs. The latter typically have a well-defined structure that governs “how people and materials jointly interact in trying to do something for program recipients” (Alkin, 2011, pp. 60–61); whereas, following Preskill and Beer (2012), “social innovation strategies often cross sectors, involve changing the dynamics, roles, and relationships between many players, and challenge conventional wisdom about the nature of the problem and its solutions” (pp. 2–3). SIs are seen to be distinct because of their explicit emphasis on experimentation and on the fusion of new ideas and the work of diverse actors to tackle unprecedented, complex, or intractable challenges (Mulgan, Tucker, Ali, & Sanders, 2007). If a metaphor of a program is a ship following a course laid out on a map, then SIs are ships navigating new waters, in unpredictable weather conditions. The argument follows that evaluations of SIs need to be approached with exploration and rough seas in mind as they trace the journey and ports of call.
The concept of “complexity” is a recurring theme in the literature on SI (e.g., Antadze & Westley, 2012; Mulgan et al., 2007; Westley & Antadze, 2010) and in evaluation of SIs (e.g., Milley, Szijarto, Svensson, & Cousins, 2018; Mowles, 2014; Patton, 2011; Patton et al., 2015). Some elements of complexity mentioned in the evaluation literature include lack of control over the innovation; context sensitivity and unpredictable conditions making it difficult to end up with a similar result from an intervention even when the steps to achieve it are carefully repeated; and highly interdependent relationships between actors, structures, and processes, leading to unanticipated results or “ripple” effects (Preskill & Beer, 2012, p. 2). It remains an ongoing challenge to weave complexity thinking into evaluation design, despite growing interest. Mowles (2014), who advocates for greater use of complexity thinking in evaluation, notes a tendency in practice to homogenize elements of various disciplinary perspectives on complexity or to use them selectively. This risks oversimplifying inputs from the complexity sciences. It also has the potential to distort the adoption and application of conceptual inputs by decontextualizing them. Dahler-Larsen (2016) assumes a more guarded stance towards complexity thinking, observing that complex conditions do not automatically require evaluators to accept “that all goals and objects and all knowledge about links between means and ends became irrelevant” (p. 8) in such contexts. Differences of perspective among experts illustrate some of the difficulties evaluators face when trying to design evaluations for those working with SIs.
Given the existing gaps in the literature, the purpose of our study was to draw on the empirical knowledge base on evaluation in SI contexts to understand what influences evaluators’ design choices, thereby contributing to the ongoing conversation. The following sections detail the study methodology, report the findings, and end with a discussion of implications for practice and future inquiry on the topic.
Method
As described by Milley et al. (2018) and Szijarto, Svensson, Milley, and Cousins (2018), our study followed multiple rounds of literature search, review, and culling, followed by independent coding, thematic analysis, comparison, and consensus on findings. We searched the English language peer-reviewed empirical literature on evaluation in SI contexts, covering January 2000 to October 2015. Following a full-text review of 84 publications, we retained 28 in our sample (see Milley et al., 2018, for more detail).
For our inclusion criteria, we elected to be broad in our conceptualizations of SI in order to capture the spectrum of SI initiatives detailed in the empirical literature. This was in keeping with the exploratory intent of the study and the transdisciplinary, cross-sectoral nature of the field. Our initial search included empirical, peer-reviewed studies using terms related to “SI” 2 and “evaluation.” During full-text review, we excluded any studies that did not meet our broad working definition of SI. We also excluded any studies that did not meet a broad definition of evaluation, as: “systematic inquiry to serve a range of policy and program purposes, such as enabling learning, development, improvement and capacity building, informing judgements about the merit, worth and significance of policies and programs, and supporting oversight, accountability and compliance” (Milley et al., 2018; Szijarto et al., 2018). This led us to a sample of 41 studies. During the first phase of analysis, we identified two subgroups of articles that differed on characteristics important to our study. At that stage, we set aside 13 articles on the evaluation of social enterprises (Szijarto et al., 2018). We continued with in-depth analysis of the remaining 28 studies.
Themes were drawn by two team members working independently and later modified into a more detailed coding framework which was applied recursively to narrative summaries of the 28 articles in the sample. 3 The thematic results were compared and discussed as a full team and are reported in Milley et al. (2018). These results also raised our attention to the importance of evaluation design in SI contexts, congruent with the growing awareness of the importance of design in the evaluation field generally (e.g., AEA, 2016). We therefore revisited our analysis from this new perspective and focus the remainder of the article on this lens.
Findings
The studies in our empirical sample focused on evaluations of SIs in a number of sectors such as education, health, agriculture, poverty reduction, and public policy. The SIs ranged in scale and were based in various countries (e.g., Italy, Finland, Israel, Ethiopia, Australia, New Zealand, the United States, and Canada), with most authors based in Europe or North America. The majority of studies in our sample were reflective case narratives (75%). As noted by Cousins, Goh, Clark, and Lee (2004), reflective case narratives “are based on observation and interpretation of lived experiences with evaluation, yet authors do not specify methods for capturing their observations nor other relevant sources of evidence supporting the case” (p. 108). Just seven articles (or 25%) of the sample were single, multicase, or other study designs, where authors included specific methodological details.
What influences evaluation design decisions in SI contexts?
As noted earlier, we take design to include the purpose driving the evaluation, the types of questions that are asked, the data that are collected to answer those questions, and to the analysis that takes place (Wholey et al., 2010). Our analysis showed that these were in many ways directly related to actors’ understanding of the nature of the particular SI and SIs generally. We noted explicit acknowledgment of a complexity perspective in 64% of studies. Other notable themes influencing design, in order of frequency were the focus on learning (61%), the responsiveness to context and stakeholder needs (46%), and accountability to the funder (39%). We summarize these findings next.
Understanding of the nature of the SI
Our study findings reflect SI evaluations taking place in a “messy” landscape characterized by ambiguity and trial and error. The majority of the studies (82%) described evaluations with developmental goals. Authors wrote about the importance of sharing control over the evaluation process, codeveloping questions, and iterative feedback to guide users through an evaluation of SI processes that were understood to be inventive and emergent. In a number of instances, the initial evaluation designs were changed or adapted to address new needs that emerged over time in light of evaluators’ experiences with the SIs. For example, Wilson-Grau, Kosterink, and Scheers (2015) describe experimenting with a number of approaches and adopting outcome harvesting, later making the connection between their work and DE. In another study, Gopal, Mack, and Kutzli (2015) describe the choice of DE by project leaders who felt that traditional evaluation would be ineffective because “the initiative was in its infancy; the strategies were fluid; the relationships were fragile” (p. 49); yet who wanted evaluative input to support the initiative’s development and help them be accountable to investments being made. In this case, the evaluation design included multiple methods (e.g., systems mapping, focus groups, interviews, and survey) that were adapted to changing needs as the intervention matured. Methods were guided by a set of learning questions under an overarching question of how the initiative was developing.
Formative and summative aims were noted far less frequently in this sample, with improvement of programs and initiatives being noted in 11% of the articles, and the summative impact of the initiative being a prevalent theme in just 7% of the articles. However, we noted a lot of variation even among the few articles bearing the “formative” label, signaling that a lot of experimentation may be happening within those evaluation contexts. Ramstad (2009) describes using a DE framework, however, the evaluation’s purpose, as described in the study appears to have been formative. Mathie and Peters (2014) describe participatory formative evaluation that used mixed methods with the most significant change technique (Dart & Davies, 2003) to elicit learning from the trial and error taking place in a developing initiative. Ambiguity was also something we noticed in the summative evaluations in the sample. For example, Tan et al. (2014) described a context in which the evaluation began as an RCT for a summative purpose but was later adapted due to feasibility issues within the SI that made the intended experimental design difficult to conduct. This involved negotiation among multiple partners in an initiative, which included a “patchwork” of funding sources, to arrive at a longitudinal observational design with matched comparison groups (p. 7).
Complexity perspective
A complexity perspective was the most prominent influence on design choice in the studies in our sample. Authors often used the language of systems-thinking and made linkages to complexity as a rationale for the evaluation design choices. This was linked to their understanding of the nature of the intervention (above). In 18 of the 28 studies (e.g., Anzoise & Sardo, 2016; Asher, Foote, Radner, & Warren, 2015; Poth, Pinto, & Howery, 2012), design was influenced by the novel, emergent, iterative nature of the SIs being evaluated, and by the dynamic contexts surrounding the SIs which featured a diversity of stakeholders, multiple influences, and competing objectives. For example, Anzoise and Sardo (2016) attributed their choice of a comparative case study design to complexity of the context, which ruled out quasi-experimental designs.
Focus on learning
Learning was a recurrent theme influencing evaluation design, as noted by 17 of the 28 studies in our sample (e.g., Cherniss & Fishman, 2004; Marcellus, 2004; Moore & Cady, 2015; Saari & Kallio, 2011). For example, by purposefully structuring design around the unpacking the complexity of the SI, and knowledge sharing across partners, the evaluations were said to have contributed to conceptual use about the innovation being evaluated. For example, in a DE, Langlois, Blanchet-Cohen, and Beer (2013) used a hybrid design that included action research and qualitative methodologies with focus on eliciting learning. Bimonthly group phone calls were utilized for reflection, sharing learning between sites, and course correction. Saari and Kallio (2011) describe combining DE and impact evaluation in a single participatory evaluation framework, to better orient impact evaluation from “warranting and justifying actions already taken” to learning for strategic renewal (p. 229).
By extracting lessons for individuals and organizations involved in decision-making and program development, learning also had an instrumental essence, that is, a type of learning that translates lessons into action. Besides conceptual and instrumental forms of use, authors also made reference to learning that contributes to capacity building (e.g., Allen et al., 2015; McKegg, Wehipeihana, Becroft, & Gill, 2015; Rey, Tremblay, & Brouselle, 2014), for example, learning how to conduct evaluations and engage in evaluative thinking through participating in the evaluation, often called process use (Patton, 1998). Process use is often treated as a positive unintended consequence (Kirkhart, 2000); however, we noted that by expressly structuring design around learning, process use was a premeditated or intentional effect (Patton, 2008) in some of the SI evaluations in our sample.
Responsiveness
In 13 of 28 studies (e.g., Cabaj, Leviten-Reid, Vocisano, & Rawlins, 2015; Murphy, 2015; Ramirez, Kora, & Shephard, 2015), authors highlighted that design was influenced by the needs of the participants. In 8 of the 13 studies, designs were collaboratively negotiated between evaluators and actors in the SI context, an approach that squared with the collaborative character of many of the initiatives themselves. Responsiveness also included being culturally sensitive to the context, active monitoring of changing needs of the organization (e.g., Gopal, Mack, & Kutzli, 2015), and diversifying data collection methods to meet the needs of multiple participant groups (e.g., McKegg et al., 2015). Timely feedback was a factor commonly mentioned in the studies in that the clients sought up-to-date information that would inform decision-making (Poth et al., 2012); designs incorporated a reliance on data under changing conditions. As such, evaluation designs were also frequently developed with the anticipation of need for iteration and flexibility (e.g., Marcellus, 2004).
Accountability
The need to meet the accountability requirements of funders was reported as an important influence on design in 11 studies in our sample (e.g., Dickson & Saunders, 2014; Ramstad, 2009; Saari & Kallio, 2011), a surprisingly small proportion. Funder preference for a particular design sometimes played a role, but design was sometimes also an outcome of negotiation between the funder and the evaluator, contingent on the degree of rapport between the two. At times, a conventional (summative) evaluation design was stressed by the funder (e.g., Allen et al., 2015); however, the evaluators negotiated to expand the design to also accommodate the needs for data to help actors within the SI to develop solutions. Cabaj, Leviten-Reid, Vocisano, and Rawlins (2015) and Ramirez, Kora, and Shephard (2015) observed that the funders in their particular evaluations were inclined to support organizational learning, which allowed for flexibility in design.
On a similar note, Wilson-Grau et al. (2015) noted that adherence to donor requirements for evaluation methods “clashed with reality” (p. 96) of the initiative and its context, suggesting that funder pressure on evaluation design could act as an impediment to the initiative. For example, the use of log framework plans, which relied on predetermined SMART outcomes was required, however, was later augmented with other activities to better fit the needs of the initiative (Wilson-Grau, Kosterink, & Scheers, 2015). A number of funding bodies appeared to be cognizant of the “complex” and “dynamic” labels these initiatives garner, and in response, appear willing to negotiate design options.
Discussion
Despite growing interest in SI and evaluation approaches appropriate for SI contexts, the empirical research on these topics remains somewhat underdeveloped; we were only able to locate 28 studies that shed light on evaluation design considerations in SI context. From our analyses of the sample, two observations are remarkable. First, our search revealed a lack of diversity with respect to the types of evaluation designs being discussed, and second, in published research, there appears to be a lingering confusion about what does or does not constitute an SI. In the following section, we discuss some implications from these findings and possible directions for future research.
A Need for Research on Evaluation (RoE) on a More Diverse Set of Designs
Our findings indicate that evaluation scholars and practitioners publishing in academic venues about their work are strongly influenced by DE or “developmental thinking.” These studies suggest that evaluators working in SI contexts tended to make ongoing adaptive choices about the evaluation process, guided to some extent by principles (e.g., focusing on collaboration among evaluators, funders, and users of evaluations) rather than closely adhering to designs formulated or prescribed at the outset. Several authors attribute this to how they understood SIs and SI contexts. We noted a heavy emphasis on dynamism between actors, uncertainty and unpredictable circumstances, experimentation with various approaches, and linking accountability to various forms of learning (see, e.g. Patton, 2011). Analysis of the citation patterns of these studies showed authors drawing consistently on DE literature (Szijarto et al., 2018). Other approaches have been proposed as useful for complex, innovative initiatives (Bonell, Fletcher, Morton, Lorenc, & Moore, 2012), yet these did not appear in our sample. For example, we expected to see examples of theory-based evaluation approaches such as realist evaluation (Pawson & Tilley, 1997, 2004) or contribution analysis (Mayne, 2012). As the interest in SIs and its evaluation gains momentum, we hope to see peer-reviewed accounts of a greater range of designs and approaches used in practice.
Confusion About What Constitutes SI
During the analysis process, we realized that authors were defining SI in different ways, sometimes in terms that were ambiguous. This reflects the lack of precision and conceptual clarity in discussions of SI in the broader literature (see above). We also read several accounts of actors struggling through trial and error to find evaluation methods to fit the needs of an SI. We think their challenges relate to at least three factors: the uniqueness of each SI and its context, the hype and “buzzwordiness” in SI discourse that serve to obscure SI as a phenomenon for the actors involved, and the open question of how SIs differ from conventional social programs in ways that matter to evaluation design.
The field of program evaluation has developed alongside conventional social programs and is said to be “deeply enmeshed in a project/program mentality” (Patton et al., 2015, p. 63); said to be reflected in many of our assumptions as evaluators. We propose that to better design evaluations of SIs, it is worthwhile to consider how SIs may differ from conventional programs in ways that are important for evaluation design. While we do not consider there to exist an absolute distinction between SIs and conventional programs, we developed a framework as a heuristic “thinking tool” and tangible starting point for approaching evaluation in this domain. In what follows, we elaborate the framework and then draw connections to evaluation design.
We began with conceptual work in innovation studies by Baregheh, Rowley, and Sambrook (2009) and Cai (2015). These authors describe six key attributes of innovation. From this, we developed an analytical framework as a starting point. Using a matrix, we assembled conceptual definitions of SI from multiple sources along this framework; in particular, detailed work on SI concepts by Cunha, Benneworth, and Oliveira (2015), Edwards-Schachter and Wallace (2017), and Hubert (2010) with reference to other leading theoretical work reviewed in our study and reported elsewhere (Milley et al., 2018). We then compared results of the theoretical analysis to descriptions of SIs in our sample of empirical studies, arriving at the following 5-point framework (Table 1).
A Five-Point Framework for Comparing Social Innovations and Conventional Social Programs.
Focal point. SI is described as more about a process, or even a worldview, than a specific organizational or program structure, or product or service. While the outputs of an initiative (such as ideas, products, or services) are an important locus of attention, the predominant focus of actors in SI tends to be on the process by which those outputs are developed. This was noted by Hubert (2010) who described the process dimension as the “sine qua non” in defining SI. This contrasts with conventional social programs which, in our experience, more often identify with their structure, with the services they provide, or specific outcomes of those services. Means. SI is described as a change to the way innovation is done. The influence of “design thinking” is very evident in the SI discourse. SI actors are likely to report experimentation, iteration, and interactive learning among relevant actors as key means to achieving social change. There is also likely to be an expressed intent to leverage diversity in order to increase creativity, and therefore more intense efforts to foster collaboration among groups that may not typically come together. This can involve actors crossing multiple boundaries between social groups, sectors, and/or levels of a system. The means are largely social: “social means to meet social needs” (European Commission, 2013). Compared to the processes described in SI, conventional program design and implementation are more likely to favor applications of existing models, evidence, or theory with some degree of fidelity or with careful and controlled adaptation. While efforts at collaboration are increasingly common, in our experience, they are still more likely to be within-sector, or among nearer neighbors (e.g., physicians and social workers), with priority placed on substantive expertise. Some authors point to the use of information and communications technology (ICT) as a distinguishing feature of SI, particularly to “transform how people interact,” collaborate, and co-create (Cunha et al., 2015, p. 622). There may be greater use of ICT to enable collaboration in SIs, however, we did not observe this in the empirical sample and would suggest that this may be a larger trend and a new aspect of the collaborative process that is not necessarily exclusive to SI. Concerning “financial” means, SIs are said to draw from a more diverse range of funding sources generally, including the private sector and private foundations, as well as public sources, in comparison to conventional social programs, which typically draw on public support. These differences in funding sources may impact evaluation through a link to expectations of funders and to their needs for outcomes and accountability measures. Outcomes. Although there are exceptions, the language surrounding outcomes among SIs differs from conventional programs with respect to the type and the level of change envisaged and the degree of advance specificity. The term “social change,” in terms of changed relationships, resource flows, social practices, and/or social organization, is a term commonly used in SI contexts, as opposed to “social betterment,” a term commonly used to mean the improvement of conditions (see Mark, Henry, & Julnes, 2000). Where conventional programs often describe “scaling-out” laterally to achieve more effective or broader reach in a client population, for instance, through replication of the program at more sites, SIs are more likely to add vertical scaling as well, that is, up and down through levels of a system, to get at root or systemic causes, and to ensure the durability of the change. This was a striking commonality among many of the studies in our sample also (e.g., Langlois et al., 2013; Murphy, 2015). Lastly, while efforts to achieve clarity about intended outcomes and targets in advance are the norm in social programs, at least in principle, some actors in the SI sphere argue against being specific about outcomes in advance, so that these can emerge from the process and be defined with or by participants (Gamble, 2008). Vision. Our review of the literature made us cognizant of the language around transformation or disruption in the vision of SIs. Also evident is use of terms such as “adaptation” or “resilience” in the face of changing conditions. The use of these terms differs from the emphasis on ongoing incremental improvement or spread of effective practice more often encountered with conventional social programs. This relates to some underlying assumptions, related to complexity, and the interest of SIs in tackling systems level issues with new approaches. Assumptions. The explicit use of complexity language is prevalent in the SI literature, as is the assumption of change, which is often an underlying characteristic in descriptions of SI. This is particularly the case in North America, where authors such as Patton (2011), Westley and Antadze (2010), and Westley (2013) have been quite influential. This is applied in the rationale for the focus on process, iteration, and systems change or adaptation. Complexity is understood to be present at multiple levels: intrinsic to the SI process (involving multisectoral or multilevel collaboration), to the problem the SI seeks to address (the so-called wicked or intractable social issues, such as poverty, that are influenced by many interdependent variables), and/or to the overall context in which the SI and the problem are embedded (seen as dynamic, subject to changes in demographics, economies, etc.; Preskill & Beer, 2012).
In our experience with conventional social programs, we more often encounter focus on complicated aspects of the process, problem, or context, which more easily lend themselves to modeling or are more amenable to prediction and control.
Implications for Evaluation Design
Focal point
Evaluators would be advised to anticipate process to be a core interest of SI stakeholders. Particularly in the early years of an initiative, evaluation can play an important supporting role. For example, evaluation can serve to illuminate and dispel the so-called magical thinking by helping stakeholders articulate and assess linkages between their activities and their vision, surfacing and questioning assumptions about how processes are believed to advance the initiative, and drawing on related research if available (e.g., on the nuanced relationships between diversity and creativity; see, e.g., Roberge & van Dick, 2010) to help practitioners build robust theories of change to underpin their processes.
Means
The emphasis on experimentation and iteration in SI is likely to vary in degree among specific initiatives. Where strongly emphasized, the evaluation can focus closely on capturing and facilitating systematic interpretation of feedback after each iteration to advance learning (see, e.g., Gamble, Van Sluys, & Watson, 2015).
Cross-sectoral collaboration is a common feature of the SI initiatives described in our sample of studies. This can include the involvement of multiple types of funders. Collaboration may also be more intensive than other interventions. Perhaps unsurprisingly, our findings also indicate the likelihood of conflict occurring among actors in SIs and SI evaluations. A utilization focus, noted in several of the studies reviewed, can ground the evaluation in stakeholder needs and increase likelihood of use (Patton, 2008) but can be especially challenging to apply when primary stakeholders’ needs and interests sharply diverge (see, e.g., Cabaj et al., 2015), or if some stakeholders are not at the table (see, e.g., Williams, 2015). Evaluators can support productive collaborations by helping stakeholders identify shared principles of engagement (e.g., Murphy, 2015) and using these as a form of “common grammar” (Snowden, 2011) among collaborating groups. Systems approaches can support systematic assessment of the inclusion of perspectives (e.g., Williams, 2015). Assessing actions over time against principles and providing credible feedback can be a purpose of the evaluation design (Patton, 2017).
The positioning of the evaluator may also be important when navigating relationships among actors in an SI. We noted that most of the studies in our sample described evaluators embedded in the SI teams, or evaluation teams with a hybrid structure (members “inside” the SI and members “outside”). Close interaction or partnership with the SI team can be useful for understanding context and building trust (see, e.g., McKegg et al., 2015) but make it more challenging to “stay outside the fray” or to be seen as neutral with “no personal or organizational agenda” where that adds to credibility (Langlois et al., 2013, pp. 42–49). It has been suggested that the evaluator role be explicitly “named” to make it easier for stakeholders to accept critical feedback (Patton, 2011, p. 219). Hybrid evaluation teams (e.g., Asher et al., 2015; Togni et al., 2015) may be an option to consider in planning an evaluation that includes embedded evaluators.
Where persistent/entrenched differences in perspective between stakeholders translate to different information needs (i.e., in terms of what is credible, their level of decision-making, and timing of decision-making cycles), incorporating multiple methods and tailored reporting into the evaluation design may help (e.g., Allen et al., 2015; McKegg et al., 2015). Well-established methods for stakeholder analysis in evaluation can be useful here, although stakeholder needs may emerge over time as collaboration evolves. Allocating resources to coaching stakeholders throughout the evaluation (i.e., revisiting purpose, questions, feasibility, and methodological scope) are reported by study authors to be advisable (e.g., Cabaj et al., 2015). The degree and nature of diverging perspectives and how they will impact the evaluation process is not always easy to identify at the beginning; negotiating flexibility into the design, so that it can be adapted over time may be warranted.
Outcomes
SI commentators (e.g., Lorinc, 2017) advocate for patience on the part of funders while SIs experiment and learn from mistakes, suggesting longer than usual time horizons for observable outcomes (e.g., a decade), and “suspending the fetish for outcomes” in the meantime (p. 3). Although accountability was not a primary purpose of evaluation in most of the studies in our sample, it remained an important secondary driver. In SI literature, risk aversion is frequently positioned as a fault of the “traditional” social sector and the enemy of innovation (e.g., Lorinc, 2017). In the case of SI, where vulnerable people are likely to be at the receiving end of interventions, accountability may be a healthy driver if it promotes critical thinking. The question is, accountability to whom or what (Patton, 2011)? Evaluation can serve a useful function by clarifying the focus of accountability, for example, to diligence in the learning and adaptation process and to reducing risk of adverse outcomes in innovative initiatives (e.g., Gamble et al., 2015; Togni et al., 2015). Evaluation can mitigate the risk of longer term investment in innovative approaches by providing technical support to experimentation, raising awareness of the potential for unanticipated adverse effects and monitoring for signals, helping to ensure purposeful and careful trial, and ensuring learning is captured and applied to adjust course where indicated (e.g., J. Gamble et al., 2015).
Helping stakeholders to differentiate between outcomes objectives at different levels (e.g., at the level of SI processes, the initiative as a whole, the organization or collaboration, or wider system impacts; Ebrahim & Rangan, 2014) and relating these to the scope and maturity of the SI (Preskill & Beer, 2012; Urban, Hargraves, & Trochim, 2014) can help direct stakeholders to useful and meaningful outcome-related evaluation questions. Unexpected outcomes (positive and negative) may be important to capture, especially in novel initiatives, and/or when an initiative’s effects may be distributed among many groups (as may be the case in multilevel/cross-sectoral SIs; see, e.g., Wilson-Grau et al., 2015‘s use of outcome harvesting for this purpose). Also important are time and capacity to adapt the evaluation as learning accrues. Asher, Foote, Radner, and Warren (2015) and Gopal et al. (2015) offer examples of designing evaluation around multiple rounds of structured discussions to make sense of the evolving initiative, informed by field data.
Vision
Although latitude may be needed for outcomes objectives to change over time, either due to dynamic contexts or an SI’s emphasis on co-creation, an initiative’s vision may provide needed cohesion and direction for SI actors. Evaluation design that integrates theory of change approaches, and/or systems frameworks (see, e.g., Williams & Hummelbrunner, 2009), and includes participatory methods, are likely to be valuable where they can help actors articulate and engage with the implications of a vision (e.g., for social change or vertical systems transformation vs. lateral spread) for the initiative’s actions. Articles in our sample describe helping initiatives stay “on course” over time by questioning deviations between actions and stated intentions (e.g., Langlois et al., 2013), and encouraging consideration of potential adverse outcomes, so that stakeholders’ thinking frame is wide. Murphy’s (2015) account of evaluation work with the Homeless Youth Collaborative is an example of developing principles to guide a collaboration working toward systems level change.
Assumptions
Our findings suggest that assumptions of actors are unlikely to be homogenous within the SI and may change over time as the SI matures. However, awareness of the implications of complexity thinking for evaluation is well advised for evaluators entering the SI arena. Some assert that a complexity lens requires rethinking fundamental ideas in evaluation, however, discussion of this topic is beyond the scope of this article, and available elsewhere (e.g., Callaghan, 2008; Imam, LaGoy, & Williams, 2006; Patton, 2011; Mowles, 2014; Westhorp, 2013; Williams, 2015).
The focus on process, iteration, continuous change, vertical scaling, transformation, and assumptions of complexity: All have obvious implications for evaluation design. However, it is essential to note that no two SIs are the same (Spila et al., 2016), and the differences between SIs and conventional social programs may be viewed as interrelated continua, not a set of binary distinctions.
We also assert that it is important not to form conclusions too early in an evaluation process about what is appropriate for evaluation on the basis of use of the term SI. SI initiatives may change over time. Noted by Mulgan et al. (2007), “much of what we take for granted in social policy and service delivery began as radical innovation: promising ideas and unproven possibilities” (p. 9). Thus, some SIs, having gone through stages in a sequence (Mulgan et al., 2007; National Endowment for Science Technology and the Arts [NESTA], 2014), or an “adaptive cycle” (Social Innovation Generation, 2016; Westley et al., 2006), and may in time become established interventions (Preskill & Beer, 2012; Urban et al., 2014; Westley & Antadze, 2010), even though they may still consider themselves SIs (see Figure 1).

The “adaptive cycle” of social innovations. Adapted from Westley and Antadze (2010).
As pictured in Figure 1, at early stages (closer to the center of the cycle), when the context is very dynamic or the SI is less stable, evaluation aimed at development and rapid feedback among stakeholders might be most useful. In contrast, at later stages (closer to the outward arrow on the right), a more stable SI might be looking to refine and consolidate, so a retrospective formative evaluation might fit. As noted by Preskill and Beer (2012), an SI may have components at different stages of maturity at any given time. In summary, evaluators tasked with evaluation of SIs may wish to remain flexible and pay attention to the characteristics that these innovative projects exhibit and adjust their design accordingly. This is aided by soliciting input from stakeholders throughout the process to ensure that the design of the evaluation reflects the changing nature of the initiative.
Limitations
Our search was confined to English language peer-reviewed articles and edited book chapters published between 2000 and 2015. This meant that systematic integration of gray literature was outside the scope of this study. However, we recognize that this realm of literature may be capturing important findings that have not entered the academic sphere. The use of secondary data was another limitation as we were confined to the authors’ published accounts of the evaluations being studied. Our final sample yielded only 28 studies, and many of them did not discuss design of their studies in great detail. Our findings provide a snapshot of the current state of research in this area: Evaluators working in SI contexts are treading on new terrain, and RoE of SI is still in its formative years.
Concluding Remarks
Defining SI is controversial. On the one hand, too much defining of SI might be counterproductive if it limits our imagination of what is possible and where solutions might lie (J.W. McConnell Family Foundation, 2016). On the other hand, definitions do and will continue to matter. Dahler-Larsen (2016) notes that “innovation” is a concept that carries with it unquestioned assumptions, for example, that innovation always results in a positive change. Definitions can also be political, therefore, a degree of care is crucial in how they are constructed and used; built-in assumptions can hinder the development of good evaluation design and create a level of immunity to feedback and critique necessary for evaluation to perform its core function.
There is plenty of room for productive debate about what is or is not an SI and we encourage this debate to continue. However, hard distinctions might not be useful if they limit or distort how we imagine or evaluate these new initiatives. For this reason, rather than push for an absolute definition, we have sought to develop a flexible heuristic framework (see Table 1) to help us when we are faced with the opportunity to contribute as evaluators in SI contexts. Just like social innovators who fashion new solutions and interventions out of existing ideas and resources, evaluators have a wealth of unconventional and conventional approaches and tools available that might be useful for SI evaluation. A starting point, however, is to understand what we are evaluating in each case. As such, this ongoing conversation is in need of more high-quality research. Having found preliminary empirical support in our sample of SI evaluation studies for the framework presented in this article, we propose it as a jumping off point for future research to further develop, refine, and/or confirm its components.
Furthermore, we invite scholars to explore how design decisions influence the outcomes of evaluation and what consequences these may have on how SI develops. We also encourage evaluators working in the SI domain to continue to share their experiences from the field. For example, more research is needed on identifying what influences the adaptive cycle of SIs and if and how evaluation plays a role in this transformation.
Footnotes
Authors’ Note
A previous version of this article was presented at the annual meeting of the American Evaluation Association, Atlanta, GA, USA, October 2016.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has benefited from financial support from the University of Ottawa and infrastructure support from the Centre for Research on Educational and Community Services (CRECS). The views and opinions expressed in this article do not necessarily reflect those of the sponsoring organization.
