Abstract

Introduction
This book summarizes the foundations of program evaluation as represented in classic works by evaluation pioneers and seeks to bring their ideas up to date with reflections from the book's authors. It republishes seminal papers and adds to them reflections on how their ideas sit in today's world. There is a focus on diversity and cultural issues in program evaluation. Authors whose seminal work is reproduced command their own section and theme: they include Michael Scriven, Carol Weiss, Stafford Hood, Michael Quinn Patton, and Eleanor Chelimsky. The reader is given a study and reflection guide at the end of each section. In summary, the contents follow this pattern: classic work by seminal author—contemporary reflection by Wingate et al.—pedagogical guide to learning from the foregoing. It is both history and curriculum.
The Context of This Publication
Program evaluation lies at a “cusp” moment—hence the importance of books like this which create space for us to reflect on our theory and practice. The artificial intelligence (AI) revolution gathers pace, threatening conventional notions of validity and authenticity; political populism reduces debate to assertion and accusation; the public sphere is in rapid decline—and advances in intercultural understanding are rolled back. This evaluation landscape is different from that when the classic works as set out in this book were first written. Are we evaluators still connected to our history? Or are we entering a new world in which we have to learn to reposition ourselves? The basic promise of books like this remains: how do we reflect on evaluation “roots” and recontextualize them in the contemporary world? Let me cite a historical moment.
In the 1990s, Bob Stake was giving a seminar on the state of program evaluation at the Center for Applied Research in Education (CARE—United Kingdom)—a pioneering center for qualitative evaluation, cousin to Stake's Center for Instructional Research and Curriculum Evaluation (CIRCE—United States). To stunned silence, he opened with, “It's over.” I was one of a group under the leadership of Barry MacDonald, promoting Democratic Evaluation. This was not what we were thrilled to hear.
Stake continued by talking of his father who owned an old-fashioned drugstore that sold everything from aspirin to ice cream and cola. This all-embracing eccentricity lasted a single generation before being overwhelmed by commercial specialization on the high street. It disappeared. Stake likened the two: drugstore and program evaluation, a one-generation phenomenon. Evaluation was conceived in an era of expansionism and the return to progressivism in the United Kingdom and the United States. That era was fast ebbing, leaving us stranded.
If Stake was ahead of the time, time catches up. We live in a world of ever-expanding authoritarianism and assertive government, hardly friends to open, inquisitive evaluators. This is a post-truth world in which AI displaces data-management jobs with machine ruthlessness. Why bother with expensive evaluators when ChatGPT can do the work at minimum cost and with no argument? I talk, mostly, of political contexts in the Global North. In any event, and more so in the Global South, evaluation is co-opted into the administrative system—whether by straitjacket contracts or straight internalization. To whom do we report—to elected community representatives? Rarely, if at all.
Do we put up the “CLOSED DUE TO CIRCUMSTANCE” sign in our shop window, or do we seek ways to evolve to match the environment? Either way, the future of ethical independence and “responsiveness” seems flaky, at best. “Bad news”—even complex news—is outlaw. Why commission a study of the “value” of a program if value is assumed and proclaimed to start with? More disturbingly, does the evaluator risk dismissal, or even censure, for highlighting cultural and ethnic variables? If this is the context, what, then, for a book which aligns us with our professional history and with our sensitivity to cultural variables?
Reviewing This Book
The book is structured around chapters pulled from seminal works by authors including Michael Scriven, Eleanor Chelimsky, Stafford Hood, and Michael Quinn Patton. Each seminal contribution brings with it a single theme, which is then elaborated by our authors and brought up to date. The themes are: Value in evaluation; Stakeholder engagement; Politics in evaluation; Cultural responsiveness; Utilization; and Meta-evaluation (defined by one author as reflexivity). We see, in each case, how a seminal thought can be extended and adapted in the light of experience with what we have learned from the practice of evaluation. Much of the book rests on a tension between the theory and the practice of our discipline: for example, “the connection between ethics as a discipline and the professional practice of evaluation, or the practical-moral dimensions of practice.” The authors reflect on some core concepts, holding them up for judgment against today's issues and dilemmas.
It is important to add that there is, indeed, a chapter that reflects the world we increasingly live in—by Rakesh Mohan. As the director of an evaluation office at the U.S. state level, he shows how his office has, hitherto, been able to withstand political and legal assaults aimed at closing him down.
So, how do we read this, and other new books on program evaluation? Core Concepts in Evaluation is both out of its time and in its time. “Out,” in the sense that much of its content has been overtaken by the circumstances I talk of. The book shows fine scholarship, an elaborate but tight structure, and a carefully considered take on issues of evaluation roles, goals, culture, and, to a lesser extent, methods. It brings together a group of intellectuals from the hemispheric North and South, as well as those from hegemonic and indigenous cultures. But it lifts off into immediate political turbulence.
However, the book is “in” time in the important sense that it stakes out a position statement, reminding us of what is threatened with loss, what is worth struggling to protect as independent evaluation faces its suppression.
A Critical Reading
The book starts with Michael Scriven's much-cited definition of evaluation as assessing the merit, worth and significance of a program or product. The brilliance of Scriven's foundational work is interrupted only by the puzzling absence (in most of his work) of the political and ethical dimensions of his definition. What are the politics and ethics of telling hard-working program people that their work has no “merit,” or is “insignificant”?
But this is to signal one of the important contributions of this book. It looks critically at foundational ideas. Scriven's definition of evaluation is expanded, for example, to include “ethics” and “value” (not that these were ignored by him). The authors point to Scriven's intellectual orientation, which was to Philosophy, and most significantly to the use of logic to address issues of the validity of practical knowledge. His tendency, therefore, was toward theory and logical process. Nonetheless, as the book states clearly, he made many epistemological foundations for evaluation firm and solid, while leaving space for exploring the practical complexities of program evaluation—the politics and ethics.
It is not frivolous to suggest that Scriven's commitment to logic, which underpins so much evaluation thinking, forms part of the fabric of “colonial” values dealt with critically in this book. The authors recognize that revisiting seminal works risks “amplifying privileged voices and reinforcing a historically inequitable canon of evaluation literature,” and so they establish cultural responsiveness and critique as an underlying and much-unifying meta-theme throughout.
The authors do note, but do not fully explore, that criteria of merit and worth will be contested—as will the agenda of evaluation (which is, in fact, argued strongly in a chapter on evaluation in Africa). One author points to “contestation in the way different actors within the evaluation system value and prioritize use.” In early experimentation with program evaluation, conflicts over “value” were widely addressed—perhaps more by Eleanor Chelimsky. But Ernest House and Barry MacDonald had already concluded that politics was not simply a feature of evaluation, but that evaluation is a political process per se. “Political” in the sense that evaluation has to position itself in the battleground for meaning and value. This is reaffirmed by this book as it enjoins evaluators to enter the political fray to advocate for minority views and indigenous/community values and priorities—to “decolonize” evaluation, and to acknowledge liminal groups as solid stakeholders.
For the evaluator to make a judgment of the kind promoted by Scriven is a political intervention—certain to define winners and losers. MacDonald resolved the tension with a focus on negotiation as a tool for methodology and validity (the term Chelimsky is noted by the authors of this book as using was “translation”—that is, each constituency being made aware of others’ realities.) The premise of negotiation defines the evaluation role and makes it challenging to arrive at Scriven's “final synthesis”—what this book identifies as the final determination of value.
Over time, evaluation validity came to be seen as bound up with the democratic necessity to represent divergent views—for the evaluation to remain both independent (of the sponsor) and impartial (in relation to likely outcomes). Patton emphasized the practical utility of evaluation as a key criterion for evaluation. There is, in fact, a short section in the book directed at the issues of Validity. But the discussion barely touches on the pragmatics of “contestation [among] different actors.” The identification of evaluation criteria is a fundamental pillar of the authors’ take on validity, as it is in the validity literature. Yet, where much of the rest of the book firmly places criteria in the camp of the political and cultural validity of evaluation—how evaluation resolves the inevitable competition over what counts as a plausible criterion of judgment—this section avoids the issue and, thereby, introduces a fascinating tension in the book.
The overlap between politics and ethics in evaluation is obvious, here, and both are continuously addressed in this book, though political realities are sometimes treated more lightly than we need today—though the book was written before the political maelstrom that now engulfs program and policy evaluation. The concept of “politics” in this book is mostly confined to the exercise of hierarchical power in culturally hegemonic systems, whereas today, evaluators encounter a more dispersed kind of power in the form of ideological tensions which arise from the competition over criteria for judging the quality of a program. Much of this has been deliberated by evaluators under a consideration of democracy—democracy in evaluation; evaluation in a democracy. This receives too little attention in the book. The same goes for ethics. The book talks a great deal about evaluation ethics, not least in collisions between divergent cultural value systems. But, again, it misses out on some of the complexities involved. Central to this is the issue frequently addressed by Ernest House and closely linked to contestation over judgment criteria: that a non-contingent rule of ethics cannot prevail in particular circumstances in local contexts. Ethical norms too easily contradict each other; they are frequently mutually bracketing. In the end, evaluation ethics have to be negotiated in situ. Again, this dimension of evaluation is severely tested in a world where ethical rules are increasingly stipulated and controlled by government and the administrative system.
Many of these issues are seen throughout the book through the lens of engagement, a term drawn from the work of Carol Weiss, another of the seminal authors cited (though it was later fully articulated methodologically in many forms, including those such as Democratic, Dialogic, Responsive, and Empowerment Evaluation). This perspective on engagement surfaces through discussions of stakeholder involvement, evaluation utilization (with special reference to Michael Quinn Patton), and sponsor relations along the dimension of “colonial values”—and inclusion; “The intersections of [evaluation] use and culture,” as the authors put it. Evaluation is promoted as “translating” culture and values across stakeholder groups—especially sponsors, whose pre-formulated questions too easily suppress indigenous and diverse views.
One of the book's more powerful takes on the stresses and strains of culture in evaluation implicates the field of international development—most vulnerable to claims of colonial baggage. Here, the critique of evaluation is at its sternest in the book, a challenge for Western evaluators that is intensifying: “working with people to fundamentally reimagine what development systems should look like goes beyond the scope of most evaluations and involves positioning the evaluation function outside the structure of existing projects and programs.”
That is, we need to resist the co-option of program and policy evaluation by sponsors, freeing up the evaluator to be responsive to the communities on whose democratic behalf they work. (Shockingly, the Millennium Development Goals and the current Sustainable Development Goals were/are not subject formally to a program evaluation embracing the views of those being “developed.”) How, in a development context or back home, do we respond to the lofty aspiration of this book, that the evaluator needs to be “the custodian of the rigorous and independent process of facilitating imaginations and choices in support of a well-governed development process.”
“Custodian of … facilitating imaginations.” How far this book carries us from early, workaday conceptions of the evaluation task! The book bravely stands in confrontation with contemporary political values. The source of today's retreat into populism is economic austerity and the erosion of common wealth. But this has, inevitably, cast us into austerity of imagination and compassion, certainly in the United States, where denial of diversity is part of the federal policy agenda. The book's call for “the decolonization of evaluation funding practices” argues that cultural exclusivity seeps into evaluation through the stated priorities of sponsors.
This theme is revisited in a section whose theme is meta-evaluation. Here, again, we hear from the developing world in a piece which, once more, argues persuasively that evaluation in the Global South differs in “scope, ownership, and purposes(s)” from that of the North. Meta-evaluation is promoted to expose evaluation approaches that smuggle in Western values, and to distinguish the approaches and capacity needs particular to the South. A second paper in this section offers meta-evaluation as an offshoot of Donald Schön's Reflective Practice, partially expressed, for example, in the tendency for country evaluation associations to publish evaluation standards.
The seminal paper leading this particular theme is taken from Daniel Stufflebeam, whose treatment of the subject is far from comprehensive, in my view. Much more influential on the field were House's meta-evaluations of the Follow-Through and Push-Excel programs, and Stake's outstanding review (titled Quieting Reform) of the Charles Murray evaluation of the Cities-in-Schools program. These elevated the program evaluator's role from that of a reporter on the quality and productivity of a program to a theorist of change. Not to that of an engineer of change, as some advocate today, but as its critical analyst. As MacDonald and, later, Cronbach argued, evaluation was “the means by which society learns about itself.” It was realized in a way that, once again, some chapters in this book just edge on saying that evaluating a program makes change empirically observable. What would the public sphere be like without such a resource?
Why This Book Is Relevant for Students of Evaluation
Today's student of evaluation hovers at the door to a changing academy—one of dwindling tolerance for diversity, experimentation, and risk. As I have said, much of this book gives today's student a historical perspective and a pedagogical guide to its reflection. Where have we come from? What will be your response to this political context? It is to the evaluation student of today that we look for a redemptive culture. Holding to the ethical and cultural standards that this book suggests is no easy task for the research student. Unlike in my time, we see universities increasingly seeking to control research studies, including through sometimes over-policing research ethics; through restrictive regulatory expectations (e.g., that doctoral research must be preceded by a detailed design + definitive research questions—inappropriate for emergent design such as in case study); and through insisting on often proscriptive methods training. Students may expect this to intensify in response to external threats to the university—such as the so-called “war on woke.” So, the advice for evaluation students is not just to read this book, its review of seminal ideas and their modern-day adaptations, but also to develop a personal reason for reading it. How we read evaluation texts is perhaps as important as what we read.
For Evaluation Teachers and Supervisors
“Core Concepts in Evaluation” has equal promise for evaluation supervisors. Let me explain. A little snippet of history gives yet more context to the book under review. In 1978, at the University of York, United Kingdom, Lawrence Stenhouse orchestrated a pioneering conference on case study methods. At that time, the educational research and evaluation community had almost no practitioners and little experience of case study, which was, however, rapidly being embraced by research students. One of its principal themes was this question: How do we supervise case-based research dissertations if few supervisors (and examiners) have experience of the method? Of course, the issue is only relevant given the expectation that doctoral students, in particular, may not only come up with “original knowledge,” but also do so in original ways—i.e. methodologically. As for supervisors the advice on reading this book is to do so with an expert eye on changing contexts, since what once stood as a recognized accomplishment might now be taken to be outlaw. This book is a valid and useful teaching text, for sure. As such, it can be treated as a space within which to look at how good evaluation practice today is meeting these new contexts.
It is to students and supervisors of evaluative inquiry that we look for creative advances. Core Concepts in Evaluation edges on that insight mentioned earlier—“facilitating imaginations”—without fully articulating it. Yes, it implicitly promotes the concept of “custodianship of imagination,” surely needed now more than ever, and it is saturated with this potential for program evaluation. Again, the book is timely in reminding us of just what is at stake in today's world of menace. Sherman Alexei once wrote: “Imagination is the politics of dreams; imagination turns every word into a bottle rocket….” At its most effective, this book encourages us to light the fuse on that rocket.
