Abstract
This ‘roundup’ review of evaluation blogs, podcasts and webinars covers the second half of 2024. The roundup is organised around key trends that emerged throughout the second half of the year: (1) multi-perspectival storytelling, (2) sustainability amid the polycrisis and (3) the promise and potential pitfalls of feminist evaluation. It then discusses two meta-trends related to the challenges of drawing evaluative boundaries and seeking parsimonious explanations. These cut across the trends mentioned in the three roundups so far. The article ends with a brief reflection on some potentially promising initiatives in the evaluation field to addressing some of the bigger challenges of the day.
Different ways of knowing and storytelling
Previous reviews raised questions about approaches to indigenous evaluation discussed recently in the United Nations Development Programme’s (Tolmer, 2024) M&E Sandbox, particularly related to a lack of reflexivity on positionality and power. I pointed to the promise of the European Evaluation Society (EES) conference in Rimini to respond to some of the questions I had and was not disappointed. Much of the discussion in this first Review of 2025 on multiple perspectives, different ways of knowing and storytelling, is a dialogue between the online and offline worlds as represented by EES 2024.
A recent Inclusive Rigour Co-Lab captured many of these issues in their blog reporting back on EES 2024 (see Inclusive Rigour Co-Lab, 2024). The authors argued that they
noticed a significant shift in discourse on this main stage of evaluation, especially since presenting our work for the first time at EES 2022 in Copenhagen. The shift towards more plural epistemologies and decolonising evaluation builds on the work of many over many years.
The blog highlighted a keynote conversation at EES between Zenda Ofir, Bagele Chilisa and Yvonne Pinto on ‘indigenous ways of knowing and decolonising the sector’. Pinto emphasised being aware of whose story is being told, encapsulated in her quotation of the African proverb that ‘until the lions have their own storytellers, stories of the hunt will always favour the hunter’. This connects to the broader issue of whose reality counts, a topic in several EES sessions I attended that emphasised the importance of storytelling.
Zenda Ofir’s remarks included a caution of the dangers of a single story(line). This reflection on narrative pluralism also resonates with Tapella and Menese’s (2024) blog on participatory evaluation, discussed below. Indeed, it fits squarely with the title of last years’ American Evaluation Association Conference, The Power of Story. And yet, surprisingly little has been published on storytelling in evaluation scholarship.
EES 2024 sessions that I attended spanned many countries and evaluation contexts: Canada, Colombia, Hawaii, Iraq, New Zealand, Senegal and the United Kingdom. Larry Bremner and Kate McKegg both called attention to how evaluators should engage with storytelling. Reflecting on evaluation practice in Māori communities in New Zealand, McKegg argued that we must start by considering who sets the evaluation criteria, and who has decision-making power over the process and on what is produced, used and shared. Bremner further affirmed that if you pass the story on, you take on responsibilities of the storyteller.
Some time back, Bremner and Lee (2023) noted in a Footprint blog on the Better Evaluation website that ‘the foundation to telling and listening to stories is trust . . . Trust takes time to build . . . Trust is also built through reciprocity. As the evaluator, you too must share your story and give of yourself’. This raises the question as to whether individuals subjected to traditional interview techniques are likely to provide ‘valid’ responses if they do not trust you. At EES, both Larry Bremner and Kate McKegg stressed the importance of reciprocity and relationality in the evaluative process itself, not merely as an ontological or epistemic perspective. The idea of evaluators reciprocating – ‘sharing part of us’ – evidently challenges traditional interviewing techniques and styles and would likely make interviewers who view themselves as being ‘objective’ and ‘independent’ uncomfortable, opening up all kinds of concerns about response bias. As McKegg suggested, when we see people as being in relationship with one another (including evaluators), we begin to ask different questions.
This perspective from both indigenous and trauma-informed evaluation and storytelling also connects to a recent blog on realist interviews and their challenges. The teacher-learning cycle in realist interviewing raises particular challenges in leading the interviewee to answers consistent with one’s theory. Broome et al. (2024) look at the challenges of realist interviews with people with dementia. Realist interviewing is theoretically dense, and Broome et al. note that ‘it is difficult to grasp abstract ideas and make sense of theories relating to interventions/programmes’, especially for people with cognitive impairments. Broome et al. point to challenge for ‘the interviewer to re-direct people with dementia away from personal storytelling, toward an exploration of causal mechanisms of a particular phenomenon’. Yet, it seems potentially unethical and unhelpful to do so. They offer a series of recommendations for how to be more flexible and facilitate a supportive environment for interviews, including the use of vignettes and visual resources.
These arguments resonate with another EES session on removing the Euro-Western lens from storytelling. Shahrazad Qassem recalled an experience in Iraq during a period of intense conflict. She noted that despite speaking Arabic, the Lebanese evaluation team lacked the appropriate cultural competence related to how Iraqis tell stories, usually in groups, and they had not been trained to conduct interviews in a trauma-informed way. This therefore made the proposed use of Most Significant Change (MSC), which typically relies on personal accounts of change, more challenging. She argued that respondents were already fatigued by answering numerous surveys and resisted straight answers to direct and individual (sometimes intimate) questions due to the recent trauma they faced. The team was met with what appeared to be evasive responses, entirely understandable given the circumstances.
All of these contributions highlight the need for flexibility when we ask questions. We often enter into an interview with a set of questions linked to our evaluation questions and primary units of analysis, and in the case of realist evaluation – our (candidate) theories. We have a plan regarding what we seek to find but often leave limited room for discovery of different perspectives of the multifaceted phenomena we are studying. Bremner and Lee (2023) highlight that ‘stories illuminate the importance of culture and history, including language, customs, spirituality, and physical locality’. These may not be the primary units of analysis in your study, but they may well be the things communities themselves want to share. Sanjeev Sridharan captured the thrust of this beautifully in a response he received in Hawaii: ‘our story is not a sub-set of your data – your data is a sub-set of our story’. This perspectival switch is a humbling reminder to evaluators. It does not necessarily mean that we should throw away our structured protocols, but it does mean that some degree of flexibility is necessary if we see authentic and credible responses.
Participation, interpretation and validity
In line with this more flexible stance, Tapella and Menese (2024) wrote an interesting blog for EvalParticipativa (in Spanish – an English version is also available), offering some reflections on rigour in participatory evaluation. They reflected on their experience over the last decade and argued that all evaluation should make their ontological, epistemic and methodological positions explicit. Apgar et al. (2024) recently published an article in Evaluation which cited Brown and Dueñas’ (2020) useful framework on axiology, ontology, epistemology, methodology, methods and sources. This also resonates with Bagele Chilisa’s argument in the EES keynote conversation mentioned above where she argued that the three primary divides in the evaluation community are our ways of conceiving reality [ontology], our ways of conceiving what counts as knowledge [epistemology] and our ways of conceiving what counts as values [axiology]. I find it interesting how the values (re)turn in evaluation, very much consistent with reflections above on indigenous epistemologies, have allowed evaluators to have a more thoughtful conversation regarding the importance of axiology (Gates et al., 2024; Montrosse-Moorhead and Bitar, 2024). After all, what is evaluation without values?
Tapella and Menese’s (2024) reflection is chiefly epistemic and methodological. Much of the blog recalls Lincoln and Guba’s (1989) seminal work, albeit uncited. They argue that participatory evaluation is chiefly an interpretive approach more about (situated) discovery and collective construction than proving a theory. Three points are particularly noteworthy. First, they emphasise diverse forms of interpretation and comprehension of the social world rather than a single reality. They urge evaluators to avoid homogeneous interpretation and take account of differences and tensions. This thus poses questions for how evaluators make what are typically singular evaluative judgements which also tend to prioritise consensus rather than dissensus. Second, rather than defining instruments a priori, they argue that these should be built progressively over the course of an evaluation. This is something colleagues at the Centre for Development Impact (CDI) have also recently discussed with regards to how to ensure rigour in methodological bricolage. Especially, for complex interventions and contexts, until we learn more about what our evaluation questions and units of analysis should be during an evaluation, we cannot fully know the best methodological mix a priori. To do so requires process flexibility rather than fidelity to what may end up being an inappropriate protocol.
Finally, Tapella and Menese (2024) argue that there are no infallible means of knowing an intervention, much less valuing one, and the best we can aspire to is reflecting as if from multiple interdependent sources of knowledge. This is their understanding of ‘triangulation’. They argue that credibility is achieved through triangulation, which they divide into triangulation of perspectives, sources, theories and techniques (albeit uncited, see Denzin, 1970, 2006; Shadish, 1993). They argue that multiple perspectives are what grants participatory evaluation trustworthy findings. Source triangulation is about combining information on the same scene, or different spaces and different moments in time. They note that technique (or method) triangulation is perhaps the most used, weighing up the strengths and weaknesses of different methods. Then there is the least visible element in participatory evaluation, theory (or paradigmatic) triangulation. Tapella and Menese argue that one of the most common difficulties in participatory evaluation is the identification of diverse actors and their representativeness, noting weaknesses in key stakeholder mapping, and in focusing too much on similar perspectives, avoiding those whose strategies and interests may be contradictory. The potential importance of dissensus should give evaluators some pause for how we make synthesise evaluative judgements.
Sustainability commitments amid the ‘polycrisis’
One topic the last two roundups failed to cover is perhaps the greatest issue of our age – climate change. In the last review, published in Evaluation 30.4, I cited Patricia Rogers’ (2023) blog where she argued that the world is at crisis point in terms of environmental sustainability, including climate change, equity and inclusion, and consequential increases in hunger, poverty, fragility and violence. It has been estimated that 3.6 billion people live in contexts highly vulnerable to climate change (Intergovernmental Panel on Climate Change, 2023). Ian Goldman made a similar call in a recent webinar on Decolonising National Evaluation Systems in Africa (Global Evaluation Initiative (GEI), 2024d). Up to relatively recently, the scale of this crisis has not been reflected in blogs, podcasts or webinars. However, at EES this year, there were 14 sessions on climate change – more than any other theme. It appears that there has been a slow trickle of material which has raised environmental concerns to a central place in evaluation discourse. However, it was only by the second half of 2024 that I was able to detect a sense of momentum in the discussion.
Blue Marble Evaluation and Footprint Evaluation stand out as the most notable contributions to debates about evaluation and climate change in recent years. Michael Quinn Patton’s book on Blue Marble evaluation in 2019 centred a critique of the Anthropocene. However, I personally struggled with the practical implications of the book. For instance, the principle ‘Think Globally, Act Globally and Evaluate Globally’ seems infeasible unless and until programmes are themselves global rather than national or local (see also Patton, 2024 for an argument on how, in his view, all evaluations should include environmental, social, economic sustainability and regeneration criteria). Patton seems right to emphasise, as have others, that the unit of analysis may need to change from projects and programmes to systems. But it is a far bigger ask that we should evaluate globally (with full cost accounting), as he suggested in his debate with Mike Jackson last year (EES, 2023).
Footprint Evaluation has taken what might be considered a more pragmatic mainstreaming approach but espoused a similarly global ambition to incorporate environmental sustainability into all evaluations. Unlike Blue Marble, Footprint Evaluation is not an approach, but rather an effort to co-create knowledge about methods and approaches that can be used to understand the environmental sustainability of interventions (Macfarlan, 2022). In 2023, Alice McFarlan (2023) wrote a thoughtful blog which outlined relevant methods and tools. This referenced several blogs from Canada which called attention to the importance of place and its social and natural attributes in how we understand the outcomes and impacts of an intervention. Rowe (2023) defines a place as ‘where the action is, where projects, plans, strategies, policies, or other types of interventions are put in motion’. Places have both social and natural attributes. He argued that
interventions occur at and cause effects to natural and human systems within a place and are affected by the systems at a place . . . [and] if an evaluation only focuses on the human system’s spatial boundaries and timeframes, it will overlook crucial processes and impacts within the natural system.
Yet, this is evidently truer for a sector such as accountability in the biodiversity sector than accountability in the education sector, for example. The coupling of human-natural systems is prevalent in indigenous perspectives, often alongside spirituality (Bremner and Lee, 2023). So, there are arguably further dimensions to consider when taking more of a place-based approach.
Beyond these two pillars (Blue Marble and Footprint evaluation), in 2024, we find increasing discussion of environmental and sustainable development opportunities and challenges. Geeta Batra was interviewed by Dugan Fraser on how to mainstream environmental indicators in evaluations and national evaluation systems (GEI, 2024b). In April 2024, the World Bank’s Independent Evaluation Group (IEG) director general of evaluation, Sabine Bernabè (2024) summarised her remarks at the 4th Conference on Evaluating Environment and Development organised by the Global Environment Facility’s Independent Evaluation Office. She argued that ‘environmental considerations must be implemented across all our evaluation work in a strategically selective way for maximum impact’, in line with the Bank’s revised mission to end poverty, but to end poverty on a liveable planet. Though she recommended that rather than making environment a default issue, IEG should instead ask questions such as on understanding where in policy evaluations might there be the highest potential negative unintended environmental effects. Bernabè identified priorities in taking a systems approach to evaluation, using data science to better understand geospatial phenomena such as land use conservation, pollution and flood resilience. She also argued that ‘we need to expand the scope of our data collection and analysis and better take into account different views and voices from stakeholder groups such as indigenous communities, business owners, farmers, local government officials’. This may be a dilution of the proposals made above (Macfarlan, 2022; Patton, 2024); however, it is arguably more realistic for most evaluations to achieve these aims in practice.
We also see a few recent reviews of what IEG and the International Fund for Agricultural Development (IFAD) have learned from a greater focus on environmental sustainability in evaluation (de Nys, 2024; Nanthikesan, 2024). Suppiramaniam Nanthikesan (2024) suggests that there is a paucity of evaluative experience in collecting together evidence-based win–win solutions that promote climatic, environmental and development resilience. Conversely, in his summary of lessons from two recent evaluations on reducing emissions from deforestation and forest degradation (REDD+), Erwin de Nys (2024) was more optimistic. He identified three key lessons: (1) addressing financing and capacity gaps, (2) inclusive engagement and benefit sharing, and (3) enhancing non-carbon benefits and strategic learning. For the latter, de Nys argued that clearer guidance is needed for identifying and monitoring outcomes related to livelihoods, biodiversity, climate change adaptation, land rights, gender, social inclusion and governance, and he sees a role for amplified knowledge-sharing and communications activities. de Nys argued that ‘we are on the right path and making real progress’. There is clearly a lot of work yet to be done.
The promise and potential pitfalls of feminist evaluation
A final trend that merits further attention is how to address gender in evaluation alongside a push towards feminist evaluation. Gender has been debated in the evaluation field since the late 1990s with the Beijing declaration in 1995, but the embrace of feminist evaluation has been more tentative (Bamberger and Podems, 2002; Bustelo, 2003; Espinosa, 2013; Mertens, 2005; Podems, 2010). There has been an abundance of tools such as various gender guidelines, indices, markers and even gap reports produced. However, there has been relatively less momentum and discussion of a coherent agenda, at least, in blogs, podcasts and webinars.
Discussions on how to address gender or the promotion of feminist evaluation have emerged as a slow trickle of material in recent years, typically with a few blogs around International Women’s Day (e.g. Kohlweg and Novakova, 2021; Lehrner and Novakova, 2024), or occasional discussions of discrete initiatives (e.g. Commonwealth Scholarships, 2024; Podems, 2023). We find some contributions from the EES’ Gender and Evaluation Thematic Working Group, the American Evaluation Association’s Gender Equity and Inclusion working group and also the Graduate Equity and Diversity Initiative (GEDI) week on the AEA 365 blog. The media under review in this article are not the most conducive to deeper discussion. Though sometimes these contributions include actionable recommendations such as making gender analysis compulsory or making gender more visible in terms of reference, or a quick tip such as using round-robin questioning to make focus groups more gender inclusive (Brown, 2024; Kohlweg and Novakova, 2021). However, there are two contributions in recent months from the GEI that constitute a stocktake of how seriously gender and feminism are taken in the evaluation field.
The first contribution was a podcast where Dugan Fraser hosted Elena Bardasi from the World Bank’s IEG (GEI, 2024c). The aim of the podcast was to discuss the importance of a gender lens and links to sustainable development. The podcast sheds light on some of the gaps between rhetoric and action, but also how powerful institutions revert to their comfort zone when addressing deeply political issues such as power. Bardasi and Fraser agreed that taking a gender lens helps to ground an evaluation in real people, or ‘beneficiaries’, as Bardasi called them. The use of the term ‘beneficiaries’ is a contested term in international development due to its association with an assumed paternalism. Nonetheless, Bardasi’s point was that when we look at how interventions affect different groups differently, we are able to engage with effectiveness in a new light.
An interesting part of the discussion was regarding the (political) barriers to adopting a gender lens. Podems’ (2010) article above calls attention to the fact that gender was (and remains) a more politically acceptable term than feminism. And indeed, in the webinar discussed below on why we need more feminist evaluation, there was a discussion of more activist and more neutral terminology preferred by different evaluation stakeholders (GEI, 2024a). Fraser and Bardasi noted that organisations such as the World Bank do not typically use more politically sensitive words like ‘patriarchy’ or ‘feminism’. This recalls Lant Pritchett’s discussion of the World Development Report 2004 which preferred the term ‘voice’ to ‘politics’ due to the sensitivity inside the World Bank to the word (Pritchett, 2019). There are only fleeting references to ‘patriarchy’ and ‘feminism’ in the World Bank’s Gender Strategy (2024–2030) (World Bank Group, 2024). Hence, institutions like the World Bank have historically been critiqued for instrumentalising feminism (Bessis, 2001 in Podems, 2010).
A relatively easy response is to ensure the availability of gender-disaggregated data. Invisible Women (Criado-Perez, 2019) does a good job pointing to the gendered silences and biases in our data from research and national statistics. Improving data sources, rather than discussing patriarchal social norms, is a political choice. Though questions have been raised, for example, as to whether gathering more and better sex-disaggregated data will reveal the complexities of gender dynamics (see recent GEI webinar below).
In June, the GEI (2024a) also held a webinar on Evaluation and Gender Equality: Why we need more feminist evaluations. This was held with Global Affairs Canada (GAC), the United Nations Population Fund (UNFPA), UN Women in Latin America and the Caribbean and the Centre for Learning on Evaluation and Results (CLEAR) South Asia (GEI, 2024a). Each of the contributors identified similar ways of thinking and principles such as the goal of social justice, a transformative paradigm, intersectional lenses, participatory approach, reflexivity and complexity-informed and mixed methods. There are clear resonances and numerous overlaps here with the framing of indigenous evaluation or inclusive rigour in a previous ‘roundup review’ (see Evaluation 30.4). Each organisation had different anchors: GAC linked the shift to their feminist policy; UNFPA to their leave no one behind strategic plan; UN Women to their tool to assess, measure, and integrate a gender perspective in national evaluation systems in Latin America; and CLEAR/J-PAL to their evidence production and use ambitions in South Asia. All contributors agreed that the move towards feminist evaluation was helpful because of its capacity to shed light on multiple realities (of different groups), provide more diverse perspectives, encourages greater buy in from communities, is more ethical, and generates richer insights.
One of the most interesting contributions came from Silvia Grandi of GAC, which has pushed the feminist agenda harder than most. Grandi’s contribution hailed the virtues but did not shy away from the practical challenges GAC has faced. Three of these are most instructive. First, GAC had promoted greater ownership by commissioning local evaluators. But procurement had been much more challenging as a result. Second, their aim was to co-define progress towards transformative change. But this raised challenges of whether GAC was imposing a particular Northern definition of feminism rather than one which was locally grounded and owned (diverse feminisms). Finally, while promoting participation can have benefits in terms of stakeholder ownership and engagement, GAC found that there were significant trade-offs, because as in other forms of participatory evaluation, participation takes time, and time is not cheap. So, while the contributors may reasonably conclude that we need more feminist evaluation, they helpfully engaged in what the implications of taking such an approach looks like in practice. This should allow evaluators to make better informed choices about how to introduce feminist evaluation in the most practical ways.
Where next?
I anticipate growing attention to the effects of climate change and environmental sustainability. The urgency of the challenge to decarbonise, to stem the flow of biodiversity loss, and ecosystem collapse ought to compel the evaluation community to be a great deal more serious and to take a more systemic perspective on the interactions between human and natural systems. Yet, evaluators have struggled to articulate how to practically engage with scope issues. This is related to a traditional focus on intervention effectiveness but not necessarily wider impacts and relationships with the places and systems in which interventions are implemented.
This connects to another meta-trend which also emerged at EES 2024 – boundaries – and how and where evaluators draw them. As discussed in this and previous reviews there is a widespread concern with the (artificial) boundaries drawn around interventions or programmes. The push towards systems evaluation alongside the prioritisation of both environmental sustainability concerns and indigenous epistemologies all suggest that the intervention (evaluation’s historical focus) may be the wrong unit of analysis to confront the urgent challenges of the day. It seems likely that we will see a drive for evaluations to take a more holistic perspective, pursue systemic approaches and greater focus on how context and interventions/programmes interact.
I also anticipate that we will see a fuller accounting of the benefits and costs of interventions as well as a more explicit drive towards portfolio evaluation (evaluations of big and small bet-spreading experiments across an organisation or funder) as a means to rise to these challenges. Synthesis studies and meta-reviews have been conducted by institutions such as the European Union (EU) and World Bank for years and there has been a growth of interest in portfolio approaches in some government departments, international non-governmental organisations (INGOs), foundations, United Nations agencies and multilateral development banks (see Pereverza, 2024). Notable examples include Oxfam, Open Society Foundations (OSF), the Centre for Public Impact (CPI), UNDP, GAC and the World Bank. There is considerable untapped potential here, as much of the learning on how to conduct portfolio evaluation effectively remains undocumented.
A related trend to the unit of analysis problem is the widening gap as to how to offer parsimonious explanations a world of ongoing methodological divides. On one hand, there is a continuing trend towards evidence packaging of cost-effective ‘best practice’ with a narrow definition of value and what works (see previous ‘Roundup Reviews’). The core pitch remains the same – conduct more randomised controlled trials (RCTs) and systematic reviews and package these in an easy-to-access database which tells policymakers ‘what works’ (Hartford, 2024). The strengths and limitations of this perspective are well understood. In particular, it offers a very limited perspective on what evidence policymakers should consider and how they should consider evidence to make wiser decisions.
On the other hand, there are those committed to complexity, participatory, feminist and indigenous evaluation. In different, but sometimes overlapping guises, these aim to move beyond the ‘what works’ paradigm to one concerned with context, causal pathways and reflexive learning. In Realist Evaluation language, ‘what works, for whom, in what circumstances, and why’. The answer to this question is, of course, ‘it depends’. While this is likely accurate, because context matters, it is an unsatisfactory answer for policymakers seeking silver bullets even if those most likely do not exist. There is still work to be done to bridge this divide.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
