Abstract
When evaluating the effectiveness of an intervention, the evaluation approach must match the intervention complexity, ensuring that the chosen evaluation is “fit for purpose.” For simple interventions, evaluating short-, mid-, and long-term outcomes is appropriate. However, for complex interventions, an additional outcome that must be considered is the essential system property that emerges as a result of the interaction of interdependent intervention components. By focusing on the emergent system property, evaluators are better able to assess the holistic effectiveness of a complex intervention. This article illustrates this principle through a comparative case study of a simple intervention and a complex intervention within a National Institute of Health (NIH) funded Clinical Translational Research center. The analysis illustrates that a more effective and appropriate evaluation results when a complex intervention, deemed to be operating and functioning as a system, is evaluated as a system than could have been achieved by treating each component as independent and evaluating the short-, mid-, or long-term outcomes of each component.
• A systems approach is a better fit for evaluating complex interventions operating and functioning as systems. • System Evaluation Theory (SET) is a framework for practitioners to evaluate complex interventions acting as systems. • There is available literature detailing how to define and evaluate complex intervention interdependencies.
• Evaluators first evaluating complex interventions struggle with understanding what an emergent outcome is; they most often want to equate it with the long-term outcomes in logic models with which they are familiar. • This is the first article that raises evaluator awareness' about the difference between evaluating “outcomes” and “emergent properties” using an actual evaluation to illustrate the difference.What we already know:
The original contribution the article makes to theory and/or practice:
Introduction
This article uses a comparative case study to illustrate how adopting a systems approach when designing and implementing outcome evaluations of complex interventions produces significantly different practices and results than evaluating multiple components as simple, independent interventions using traditional approaches such as logic modelling.
Interventions go by many names, including strategies, services, activities, programs, policies, partnerships, and collaborations (Scheirer, 2013). The goal of an intervention is to bring about some change in its participants; that change is the focus of an outcome evaluation. Interventions can also vary in terms of their complexity. A simple intervention has a “simple linear pathways linking the intervention and its outcome” (Petticrew, 2011). A complex intervention “involves at minimum multiple components and a complex pathway…” (Guise et al., 2017).
The evaluation literature is characterized by considerable debate as to whether an intervention with many moving parts should be labeled as complex or complicated (e.g., Williams & van ‘t Hof, 2016 vs. Rogers, 2008). However, it may be more important for evaluators to define and evaluate the operation and function of an intervention (Renger, 2015; 2022), rather than worrying about how it is labeled. That is, it is important to understand how the component parts of a complex intervention interact with each other (i.e., their operational purpose) and the purpose of the interaction of these parts (i.e., the functional purpose). Whether the operation and function of a complex intervention is labelled as complex or complicated is a purely an academic exercise in every pejorative sense (Checkland, 2000).
The key for evaluators is to match the evaluation approach to the intervention complexity, ensuring that the chosen evaluation is “fit for purpose.” Sometimes evaluators only focus on a component of a complex intervention because of their desire to achieve clarity through simplicity (Hummelbrunner, 2011) or because the prospect of evaluating all components, their interactions, and the purpose of those interactions are overwhelming (Renger, 2022). As Renger (2022) noted, while it is important that the evaluation approach be appropriately aligned to capture the complexity of interventions, there is no need to overcomplicate the evaluation of simple interventions. For example, simple interventions (e.g., physician reminders), have been demonstrated to increase behavioral compliance in diverse medical and social settings (e.g., Benger & Pierce, 2002; Kessler, 2016). However, complex social problems like homelessness require multi-component interventions that must coordinate efficiently with each other to be effective (Renger, 2022). It is not difficult to imagine that the approach needed to evaluate a homelessness intervention would therefore be very different from the approach used to evaluate a reminder system. But in what ways would the evaluation approaches differ and what influence does that have on evaluation practice?
Renger (2022) argues that an evaluation approach aligned to an understanding of system properties is best suited for evaluating complex interventions, assuming they meet the system test, that is that they are operating and functioning as a system and that the multiple components are not just “a bunch of stuff” (Meadows, 2008). System Evaluation Theory (SET) is one approach tailored for evaluating complex interventions that is deliberately aligned to the two system properties of interdependence and emergence (Renger, 2015, 2022). Within SET, evaluating interdependence involves defining the intervention components and how they interact with each other. System principles like feedback loops, cascading events, and reflex arcs are then applied to evaluate whether the intervention components are operating as intended. The interested reader is referred to Renger (2022) for detail as to the methods and steps in defining and evaluating the interdependence of complex intervention.
Once it is determined that a complex intervention is operating as intended, then the focus can switch to evaluating its effectiveness (Renger, 2022). One aspect of evaluating a complex intervention’s effectiveness is to evaluate whether its essential property is emerging (Renger, 2022). Evaluating the emerging system property involves defining the product that emerges as a result of system interdependencies. Thus, the adage that the whole is greater than the sum of its parts is inaccurate (Ackoff, 1994).
This article uses a comparative case study of an ongoing evaluation to illustrate how an outcome evaluation differs when treating multiple intervention components as independent versus a single, complex intervention consisting of multiple interdependent components and how an understanding of emergence influences evaluation practice (i.e., data collection and recommendations).
The case example
The Dakota Collaborative on Translational Activity (DaCCoTA) is a National Institute of Health (NIH) funded Clinical Translational Research (CTR) center. The DaCCoTA consists of six “cores,” including the Biostatistics, Epidemiology, and Research Design Core (BERDC). Each core provides specific support to facilitate the overall goal of the DaCCoTA; to increase the amount of clinical and translational research focused on cancer treatment in the region. To achieve this goal, the DaCCoTA helps novice researchers develop sustainable clinical or translational research programs.
Simple intervention
The evaluation approach prescribed by NIH is to develop a logic model for each core. To illustrate, consider the BERDC as though it were operating independently. The BERDC provides biostatistical support to medical researchers by a team of expert statisticians and researchers. Providing statistical support can be considered a simple intervention because the underlying theory of change is relatively straightforward and linear (Petticrew, 2011). For example, one key piece of statistical support is training researchers how to conduct a power calculation. The theory of change for this support is as follows: if a researcher receives power analysis training, then she/he will be better able to complete a power analysis and this in turn will lead to a more competitive research proposal. In such cases, where the intervention is simple and the theory of change is linear, the prescribed NIH logic model is appropriate and fit for purpose.
A logic model was developed for each core in conjunction with their respective leadership. The BERDC logic model is shown in Figure 1. Each of the logic model outcomes are evaluated using mixed methods. For example, the short-term, intermediate, and long-term outcomes in Figure 1 are evaluated using a knowledge test, self-efficacy survey, and grant review scores received on the statistical criterion, respectively. The Biostatistics logic model for a simple intervention.
Complex intervention
As noted above, the DaCCoTA consists of six interdependent, not independent, cores. Success hinges on cores interacting with each other, “handing off” the researcher to other cores at certain key points in the researcher’s journey. The DaCCoTA cores and strategies for evaluating their interdependence have previously been described (Souvannasacd, et al., 2022; Renger et al., 2020).
From a system perspective, then, the DaCCoTA is a complex intervention of which the BERDC is now understood to be one interdependent component. The BERDC is no longer viewed as operating independently, but rather as working collaboratively with other cores in support of the researchers. Understanding that the DaCCoTA is operating as a system, the authors (evaluation team) recognized that evaluating the effectiveness of the DaCCoTA required going beyond the prescribed NIH logic models and simply collecting data on the short-, mid-, and long-term outcomes of each core independently. Instead, evaluating the effectiveness of the DaCCoTA required evaluating the collective effect of all of the cores and their interactions, the emerging essential system property. Thus, the authors worked with the DaCCoTA leadership to identify the emerging essential system property, which was defined as Researcher Self-Efficacy (RSE); the knowledge, skills, and abilities of the researcher in the pursuit of sustainable clinical or translational research programs.
Self-efficacy has been used as a way of predicting pursuit and participation in a variety of fields, educational programs, and professions. Self-efficacy is a cognitive construct that refers to the beliefs that individuals hold about their abilities to successfully execute those activities necessary to achieve desired outcomes (Bandura, 1977; Hutchison et al., 2006). Self-efficacy beliefs can influence an individual’s behavior either negatively or positively, depending on how the individual perceives their ability in relation to a given task. For example, an individual who has a high self-efficacy would be more willing to engage, work harder, and persist longer in the face of failure, challenges, and difficulties than an individual who doubts his own abilities (Renninger & Hidi, 2016).
To measure RSE, we adopted the Self-Efficacy in Research Measure (SERM) developed by Phillips and Russell (1994). Our underlying theory of change posited that if the DaCCoTA is functioning effectively, then the interaction among the cores and the researchers should result in increased RSE; a precursor to future clinical and translational research programs.
While each core contributes to increasing task-specific self-efficacy for specific components of research (e.g., statistical analysis, negotiating with an institutional review board, recruiting research subjects from challenging populations), RSE emerges as the product of interdependent core interactions with each other and the researcher (Ackoff, 1994). No single core can independently create RSE (Figure 2). RSE emerging from the product of the interactions between interdependent DaCCoTA cores.
Baseline RSE data are collected on new researchers entering the DaCCoTA system. Changes in RSE over time are evaluated by administering the SERM annually throughout a researcher’s tenure with the DaCCoTA.
Discussion
This comparative case study illustrates important points about evaluating outcomes of simple versus complex interventions. The long-term outcome of simple interventions as depicted in a logic model is qualitatively different than the emerging essential system property of a complex intervention functioning as a system. In the authors’ experience, it is a common error among evaluators making the transition from program to system evaluation to equate a long-term outcome with an emergent outcome. Many, if not most, long-term outcomes listed in a logic model, by definition, have a chronological trait. The essential system property is not chronological in nature; it emerges as a product of the interactions of intervention parts (Ackoff, 1994). By placing the evaluation focus on the emerging essential system property rather than on a series of short-, mid-, and long-term outcomes, we can assess the effectiveness of the complex intervention as a whole, rather than solely the effectiveness of specific intervention parts.
Another common problem with logic models is that they encourage leadership to hold programs accountable for long-term outcomes which they do not control (Huntington & Renger, 2003). Long-term outcomes found in program logic models are often not only unrealistic for a program to change (Scriven, 1991), but are qualitatively different than an emergent outcome. Evaluating a complex intervention’s emergent property does not preclude holding individual intervention components responsible for changing and evaluating outcomes within their direct and immediate control. In our DaCCoTA evaluation, we developed the prescribed logic models to hold cores responsible for those outcomes in their direct and immediate control to change as well as evaluating their collective effect, that is RSE, the emergent DaCCoTA property (see Figure 2).
Interdependence is a prerequisite for emergence (Renger, 2022). Thus, there is a logical sequence for applying a system approach to evaluating a complex intervention. In our case example, a necessary first step was to define the complex intervention that is the DaCCoTA. To do this, we followed the first step of SET (Renger, 2022): working with leadership to define DaCCoTA components and with core directors and coordinators to define the interdependencies of core standard operating procedures. Readers interested in the details of this first step are referred to Renger et al., (2020). As Williams (2014) correctly warned, blocks and arrows do not a system make. Thus, investments in evaluating the DaCCoTA emergent property were justified. The evaluator must begin by establishing that an intervention with many components isn’t simply “a bunch of stuff” (Meadows, 2008) and is in fact operating and functioning as a system (Renger, 2022).
In our evaluation of the DaCCoTA effectiveness, RSE was defined as an emergent system property. It is important to note that one cannot assume, and we do not assert, that this is the “correct” or only emergent property. Based on the DaCCoTA core structure, leadership reasoned RSE would be a reasonable and important emergent property to analyze. Chalmers (2006) refers to essential system properties like RSE that might be predictable from an understanding of the system interdependencies as “weak emergence.” However, essential system properties can also be more difficult to predict because they emerge as a result of the product, not the summation, of the interactions among system components (Ackoff, 1994). Chalmers (2006) uses the term “strong emergence” to refer to an emergent property that is less predictable or that is unexpected. If our evaluation should reveal that RSE is not emerging, then the results of the interdependency evaluation would need to be brought to bear to determine the recommended course of action (Renger, 2022). For example, if the evaluation of interdependencies revealed core inefficiencies, then the recommendation would be to first improve the interdependencies (a prerequisite for emergence). Alternatively, should the evaluation of core interdependencies suggest the cores are operating optimally, then our recommendation would be to engage in deeper inquiry with DaCCoTA awardees, staff, and leadership to identify and define other, unexpected (strong) emergent properties.
Evaluating complex interventions from a system perspective has significant implications for evaluation practice. In terms of data collection, the responsibility for data collection cannot fall on individual intervention components. Friedman (2005) notes that the responsibility for collecting data and evaluating the emergent property must reside at a higher system level, which he refers to as the community level. In practice this means that emergent data must be collected by those overseeing the entire intervention. In the DaCCoTA context, the authors (i.e., the evaluation team) assumed responsibility for collecting RSE data. Contrast this outcome evaluation approach to that where each core is treated as an independent, simple intervention. Under the scenario where cores are treated as independent and not interdependent, each core would assume responsibility for evaluating the outcomes listed in their logic model. Not only does treating cores as independent, simple interventions add data collection burden it also fails to evaluate the collective effect, that is the emergent system property.
Evaluating the DaCCoTA as a complex intervention acting as a system also influences the nature of the evaluation recommendations. While logic models might be sufficient to determine how well certain tasks are performed within a core, only RSE, the emergent property, can tell how well the system itself is functioning in relation to the researchers. By using RSE to inform evaluation practice and feeding the results back into the system in an ongoing fashion, we were able to identify operational gaps that would not have been apparent from simple logic models alone. For example, the SERM, our tool to evaluate RSE, might indicate that a researcher lacks confidence in completing a human subjects approval application. This information is relayed to the cores in the form of an evaluation recommendation. The cores would then be prompted to reach out to the researcher to provide them with the extra assistance they might need to successfully complete a human subjects approval application. We would then evaluate the success of these interactions through structured interviews with researchers and core coordinators focusing on using system principles like the feedback loop, cascading events, and reflex arcs to evaluate core interdependencies (Souvannasacd et al., 2022).
Only by recognizing and adapting evaluation to the complexity of the intervention can an evaluator avoid either unnecessary complexity (e.g., by treating a simple intervention as complex) or over-simplifying the evaluation (e.g., treating a complex intervention as a composite of simple, linear, independent interventions). An evaluation must be appropriately tailored to the level of intervention complexity to be optimally useful (Renger, 2022).
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research and/or publication of this article: This work was supported by National Institutes of Health [award Number U54GM128729].
