Abstract
While evaluation is seen as a mechanism for both accountability and learning, it is not self-evident that the evaluation of niche experiments focuses on both accountability and learning at the same time. Tensions exist between the accountability-oriented needs of funders and the learning needs of managers of niche experiments. This article explores the differences in needs and expectations of funders and managers in terms of upwards, downwards and internal accountability. The article shows that as the multi-stakeholder contexts in which niche experiments take place give rise to various requirements, tensions in evaluation are essentially a specific manifestation of tensions between niche experiments and their multiple contexts. Based on our findings, an adjusted accountability framework is proposed, including several strategies that can reconcile a learning approach with accountability needs in niche experiments aiming to change current practices in a more sustainable direction.
Introduction
While today both accountability and learning are considered important motives for programme or project evaluation, the literature shows that it is not self-evident that evaluation focuses on both motives at the same time. Different scholars suggest that tensions and trade-offs exist between accountability and learning as reasons for and results of evaluation (Chouinard, 2013; Feinstein, 2012; Guijt, 2010; Newcomer and Olejniczak, 2013; Van der Meer and Edelenbos, 2006). These apparent tensions between accountability and learning pose challenges to evaluators. Although evaluators are increasingly asked to facilitate and support learning, the call for accountability remains and, despite best efforts, often gains priority – hence the need to find ways to reconcile the two. This article investigates these tensions in more detail in the context of niche experiments that aim to change current practices in a more sustainable direction through a learning process. These experiments seek to develop sustainable solutions in practice, while integrating issues related to people, planet and profit-making. For this reason we define them as having a Triple P purpose 1 (de Wildt-Liesveld et al., 2013). Increasingly learning-oriented evaluations are employed to evaluate system innovation programmes of which niche experiments are a part, but these evaluations do not always meet the requirements of funders. Evaluators can become caught in a web of tensions between accountability and learning. In the case of evaluating niche experiments, a gradual change of expectations, from in-depth reporting on learning experiences, failures, mistakes etc. at the start of the project, to presenting discrete results and clear recommendations towards the end, is often experienced.
The article starts with a theoretical exploration of accountability in the context of evaluation, distinguishing three types of accountability: upwards, downwards and internal accountability. Subsequently, we study the needs of both the funders and project managers of niche experiments with a Triple P purpose with regard to the three types of accountability, resulting in an adjusted accountability framework for the evaluation of niche experiments. Moreover, two alignment strategies are derived from our study, which may support the design of evaluations that satisfactorily address both accountability and learning purposes simultaneously. We conclude with a reflection on evaluation as a way to develop the (dynamic) capabilities that are necessary to realise Triple P ambitions in today’s complex and changing society.
Niche experiments
Triple P projects, which aim to equally prioritise people-, planet- and profit-related values concurrently, are notoriously hard to bring to fruition, because of dominant market mechanisms. In recent years, specific types of innovation programmes, system innovation programmes, have aimed to facilitate Triple P projects, by providing a protected space, a ‘niche’, in which selection pressures prevailing in current regimes, such as regulatory requirements and demands for competiveness and profitability, are reduced during the most vulnerable start-up period (Geels and Schot, 2007; Schot and Geels, 2008; see Figure 1b). Without this protection the projects would risk losing their radical, Triple P nature, and soon turn into business as usual (see Figure 1a). At the same time, these projects find themselves in several institutional contexts that cannot be ignored: all actors involved in the development of the Triple P project have to deal with requirements from their respective institutions; participating entrepreneurs have to weigh the expected return on investment; participating scientists are required to publish in peer-reviewed journals; participating policy-makers need to show output in terms of realised policy objectives; and participating NGOs need to show to their supporters to what extent their interest is incorporated. Moreover, system innovation programmes support niche experiments through different means, often including financial support, to protect them from selection pressure of the mainstream market, which brings in a financial accountability dimension (see Figure 1b). Thus, whilst niche experiments, by their very nature, are protected from surrounding systems, questions of accountability to external parties do arise.

Unprotected Triple P initiative.

Protected Triple P initiative.
Accountability
According to the Webster dictionary, accountability means ‘an obligation or willingness to accept responsibility or to account for one’s actions’. 2 Especially the latter part of this definition holds true for public officials, as they are accountable to their electorate; for businesses, as they are accountable to their shareholders; but also for individuals, who account for their own actions in all domains of life. In the scientific discussion around the relationship between NGOs and their funders, accountability is commonly defined as ‘the means by which individuals and organisations report to a recognised authority and are held responsible for their actions’ (Edwards and Hulme, 1996: 967). In this context accountability is mostly understood in terms of reporting on activities and outcomes of these activities; organisations are expected to report to what extent the designated money has been spent on the designated purposes (Najam, 1996). Organisations need to report to funding organisations, such as donors, investors and governments that set reporting requirements to the initiative or enterprise they fund. This is also true for niche experiments, which are generally funded through large governmental innovation programmes, such as the Dutch interdepartmental programmes ICES/KIS I-III, to strengthen the country’s research and development capacity or through departmental innovation programmes in particular domains (e.g. agriculture or mobility).
Evaluation as a mechanism for accountability
There are several mechanisms for accountability, such as annual project reports and financial audits. Evaluation in particular is a widely used mechanism for accountability (Ebrahim, 2005; Ovens et al., 2012). In practice, evaluation for accountability usually takes the form of assessing the effectiveness and efficiency of a programme 3 or project (Feinstein, 2012). Effectiveness refers to the question whether programmes or projects are generating the expected output (Lehtonen, 2005), while efficiency refers to the question whether the right means for achieving the objectives were employed, often answered on the basis of a cost-benefit analysis (Feinstein, 2012). The role of the associated performance measurement or programme evaluation is generally to gain insight into the outputs and outcomes (Jann and Wegrich, 2007), based on the systematic collection and analysis of information about the activities and results (Van der Knaap, 2001), to provide evidence of the impact and effectiveness of the programme (De Lancer Julnes, 2006). Often logic models (Chen, 1990), logical frameworks (e.g. Baccarini, 1999), programme theories (Patton, 2008) or theories of change (Connell et al., 1995; Fulbright-Anderson et al., 1998) are developed to specify the rational underpinning of a particular programme by ex-ante explicating expected causal relations between inputs, activities and desired outcomes. Subsequently, in practice, these models or frameworks are often used ex-post as a frame of reference to assess whether, and to what extent, programme goals and objectives have been achieved.
While it is legitimate for a funder to require insight into how their funds have been spent – whether it is an investor in a new business or a donor of a non-profit organisation (Ebrahim, 2005) – the above described focus of evaluation on accountability in terms of predefined goals and relationships with interventions is also perceived to have considerable unfavourable effects (see e.g. Benjamin, 2008; Ebrahim, 2005; Joshi and Houtzager, 2012; Lehtonen, 2014; O’Dwyer and Unerman, 2007; Perrin, 2002; Richmond et al., 2003). As organisations often rely on positive evaluations for future funding, satisfying the donor’s needs for information becomes not only a time-consuming activity – especially if this is not considered relevant for internal decision making (Ebrahim, 2005) – it also increases the incentive for ‘strategic misrepresentation’ of evaluation findings to ensure funding is continued (Lehtonen, 2014). Another adverse effect of the emphasis in evaluation on accountability to funders is that discrete, measurable and proven approaches using planned project management are favoured over innovative, uncertain and more risky approaches that require emergent governance approaches (Ebrahim, 2005; Lehtonen, 2014). These dominant forms of programme evaluation presuppose a relatively stable programme, whose activities, goals and intended effects can be univocally described (Regeer et al., 2009). However, a growing body of literature shows that many contemporary projects, programmes and initiatives, ranging from those that aim for social change as a response to persistent problems (e.g. sustainable development, poverty alleviation), to so-called megaprojects (e.g. nuclear waste facilities, large infrastructural projects), and Triple P projects, have longer-term change objectives, that are more intangible (Ebrahim, 2005) and moreover objectives that are redefined over the course of the process (De Wildt-Liesveld et al., 2013) which, in turn, may fundamentally change the nature, scope and rationale of the project (Lehtonen, 2014). As Ebrahim (2005: 61) points out, for an organisation: that aims to feed schoolchildren a daily warm breakfast, there may be no problem with regular reporting on the number of children fed. But for an organization that aims to address broader public policies concerning urban poverty, such measures may provide limited useful information on how to tackle long-term systemic change.
Similarly, in the case of megaprojects, many evaluation approaches: ‘erroneously assume that the complexity and irreducible uncertainties inherent in megaprojects can indeed be controlled through careful ex ante planning of appropriate governance measures’ (Lehtonen, 2014: 283, referring to Sanderson, 2012).
Naturally, this has consequences for the role evaluation can play in these processes. Whilst goal-oriented evaluation is an important mechanism for accountability to funders and other external actors, it does not sufficiently take into account the emergent nature of complex projects and their multifaceted environment. In recent years, the approach to policymaking has developed from a more government-centred policy paradigm to polycentric networks of governance (Hajer, 2003) where power is dispersed over numerous involved actors (Fischer, 2006). This new multi-actor and multi-level character of contemporary policymaking has been accompanied by a focus on learning, which becomes more important in such contexts due to increased uncertainty and complexity of potential policy impacts (Van der Meer and Edelenbos, 2006). This shift has been accompanied by the embracement of an evaluation paradigm with a focus on learning. Within this evaluation paradigm, rather than focusing on pre-set output and outcomes, evaluation aims to support ‘the process of continuous reflection on visions, strategies, actions and contexts that enable continual readjustments’ (Guijt, 2010: 281). Increasingly, emphasis is placed on incorporating evaluation in the intervention process (see e.g. Friedman, 2001; Patton, 2008) by performing evaluation ex-durante rather than ex-post. Also, since Guba and Lincoln’s (1989) well-known work on fourth generation evaluation, the call for involving not only policy agents, but also the supposed beneficiaries and potential victims of an intervention has gained prominence in the designing of evaluation frameworks and gathering and interpreting of data. Thus, in order for evaluation methodologies to support learning, they should be participatory (e.g. Cousins and Earl, 1992) and responsive (Abma and Stake, 2001) to the learning needs of the evaluation stakeholders. More recently, there is a call for a systemic perspective in evaluation and a proliferation of associated approaches aiming to accommodate complex situations, involving non-linear change processes with feedback loops and intertwined influencing factors (Cabrera et al., 2008; Kurtz and Snowden, 2003; Rogers, 2008; Williams and Imam, 2007). Specifically in the context of ambitious system innovation projects and programmes, reflexive evaluation approaches have emerged. These are designed not only to deal with uncertainty and diverse perspectives, but also to challenge the systemic lock-ins that reproduce the unsustainable state of affairs through dominant power relations, existing rules and infrastructures (Arkesteijn et al., 2015; Regeer et al., 2009; Taanman, 2014). An example is Reflexive Monitoring in Action (Van Mierlo et al., 2010a), which facilitates recurrent reflection on long term goals in the light of the intermediate results and continuous interactive learning in order to contribute to system change. In this approach, the reflexivity of a niche experiment pursuing systemic change is regarded as its emergent systemic property in terms of its ability to interact with as well as affect the institutional context in which it operates (Arkesteijn et al., 2015; de Wildt-Liesveld et al., 2015; Van Mierlo et al., 2010a, 2010b).
In Table 1, with the risk of oversimplifying matters, the main differences between goal-oriented evaluation, which is usually conducted for accountability purposes, and system innovative learning-oriented evaluation, are exposed.
Comparison of evaluation for accountability and evaluation for learning for system innovation.
According to Guijt (2010), two ideas keep accountability and learning apart in evaluation practice. First, the deeply rooted notion that accountability is not learning. Secondly, the idea that intervention theories represent reality rather than provide a simplified map rationalising a small part of reality. Several practicing evaluation scholars have contested the prevalent assumption that accountability and learning and related mainstream evaluation systems are methodologically and practically irreconcilable (Guijt, 2010; Van Mierlo et al., 2010b; Zapico-Goñi, 2007). Suggested ways to overcome tensions, and combinations thereof, are: a) redefining accountability (e.g. Ebrahim, 2005; Perrin, 2002) or diversification of its meaning for different policy contexts (e.g. Zapico-Goñi, 2007), b) articulating the complementary values of learning and accountability to be used as design criteria for the evaluation approach (e.g. Van Mierlo et al., 2010b), c) reformulating accountability evaluation questions from a learning perspective by regarding accountability as one of the learning purposes (Guijt, 2010).
While both accountability and learning can be considered important for the success of Triple P projects that are protected with subsidies in the context of system innovation programmes, in our personal experience it is not self-evident that evaluation focuses on both motives in an integrated manner. While in many cases both motives are attended to, it is usually with separate evaluation approaches, applied in different phases of the experiments and targeting separate audiences. The tensions between accountability and learning require extra effort to reconcile them in one unified evaluation approach for niche experiments. To explore ways to align them, let us return to the concept of accountability and examine potential relations to learning more meticulously.
Upwards, downwards and internal accountability
The common conceptualisation of accountability that we have discussed so far has also been referred to as upwards accountability (Ebrahim, 2005; Edwards and Hulme, 1996), functional accountability (O’Dwyer and Unerman, 2007) or financial accountability (Richmond et al., 2003). It concerns relationships with funders and often reflects a hierarchical relationship (hence ‘upwards’ accountability) between ‘principal’ (in this case the funder) and ‘agent’ (the project), where the principal uses evaluation ‘as a tool to ensure desired results (directing and motivating action)’ (Benjamin, 2008: 326). However, in the corporate world, development projects and the public sector alike, it is recognised that organisations are accountable to multiple actors, notably to those that are affected by the programmes and projects they execute. For instance, in the case of health care, accountability is defined as ‘a multi-layered and multifaceted component of the ethical relationship of health care providers and institutions to patients and consumers’ (Cassel and McParland, 2002: 250). And in the field of development cooperation, NGOs are considered to be accountable to ‘groups to whom NGOs provide services’ including communities or regions indirectly affected by NGO programmes (Najam, 1996: 345, as cited in Ebrahim, 2005: 60). In this field, this type of accountability has been termed ‘downwards’ accountability (Ebrahim, 2005; Edwards and Hulme, 1996), however, it is closely linked to the concept of ‘social accountability’ that has been introduced in the public sector, which concerns the accountability of governments towards society (Vedung, 1997). It has also been formulated as the ‘ongoing and collective effort [of actors in civil society] to hold public officials accountable for the provision of public goods which are existing state obligations’ (Houtzager and Joshi, 2008: 3). Learning-oriented evaluation approaches open up many possibilities to enhance downwards accountability, particularly participatory approaches that involve multiple stakeholders in the endeavour.
Finally, next to upwards and downwards accountability, organisations, programmes and projects are said to be accountable to themselves and to their own mission, which is referred to as ‘internal accountability’ (Ebrahim, 2005: 60) and relates to commitment to continuous improvement of practice (Perrin, 2002). In this case, evaluation is not employed to assess progress towards externally defined objectives, but towards internal goals and missions. Moreover, in situations of changing external environments and unexpected effects, evaluations have the potential to assist in making timely adjustments, of objectives as well as programme activities, and to facilitate day-to-day decision-making.
Aligning accountability with learning for system innovation through evaluation
Although evaluation for accountability and evaluation for learning in practice are often experienced as oppositional (as in Table 1), a closer examination of the concept of accountability (following Ebrahim, 2005, see also Andreaus and Costa, 2014; Siddiquee and Faroqi, 2009) allows for a more nuanced view, with different forms of evaluation appropriate for different types of accountability (see Table 2). We would like to emphasise that the evaluation approaches we mention may be used to enhance multiple types of accountability; they are not mutually exclusive. However, we do believe that certain evaluation types may present a more suitable choice, depending on the character and aim of the project or programme.
Reconciling accountability and learning through evaluation.
The first type of evaluation – evaluation for upwards accountability – clearly serves the purpose of assessing performance, or effectiveness, as well as efficiency of organisations, programmes or projects by gauging resource use in relation to services provided and impacts achieved. Learning may occur by considering the outcomes of the evaluation, in terms of lessons learned, to improve subsequent projects or activities. However, it has been argued that the potential contribution of evaluation for accountability to subsequent decision making about future actions is limited (e.g. Kirkhart, 2000; Patton, 2008), unless outcomes are so poor that future funding is threatened (Ebrahim, 2005). An important reason is that evaluation is commonly a practice performed by external evaluators of which the results are publicly available. As a result, those under evaluation may be more inclined to defend their decisions, rather than be willing to internalise the findings and learn from them (Van der Meer and Edelenbos, 2006). The second type of evaluation corresponds to downwards accountability and starts from a multi-constituency perspective (D’Aunno, 1992), involving relevant stakeholders at different stages of the evaluation process, to assess not only the outcomes or impacts of programmes, but also to ‘reassess the desirability of those very outcomes’ (Ebrahim 2005: 70). In the past two decades, participatory evaluation practices have proliferated under banners ranging from multi-stakeholder evaluation, participatory action research, 360-degree evaluation and empowerment evaluation. The concept of ‘relevant’ stakeholders has been expanded to include not only the agents or key players of the project or programme, but also potential ‘beneficiaries’ or ‘victims’ (Guba and Lincoln, 1989). The third type of evaluation emphasises the potential of evaluation to contribute to internal learning processes and decision-making by working together closely with project teams or staff during the process of intervention, and is closely related to the process of organisational learning and change. The aim is to conduct the evaluation in such a way that results can directly be used in practice (utilisation-focused; Patton, 2008), are aligned with the learning needs of the participants (responsive; Stake and Abma, 2005) and triggers adjustments along the way. Adjustment can include practical improvements, strategic adjustments and rethinking the core driving values, corresponding to single, double and triple loop learning respectively (Guijt, 2010).
The above framework (Table 2) brings together accountability and learning through evaluation by expanding the notion of accountability to include downwards and internal accountability. The latter two could well be served by learning oriented evaluation methodologies. However, the question remains how to align evaluation for upwards accountability with evaluation for learning towards system change. To gain insights in ways to integrate accountability and learning in an evaluation approach for a Triple P project, we examined the views of both funders and project managers of programmes and projects aiming at sustainable development, which we refer to as niche experiments with a Triple P purpose. This article compares the needs and requirements of project managers and funders regarding the evaluation of these multifaceted initiatives. Based on their experiences, we formulate strategies that can be put in place by managers to comply with the requirements of funders in a way that will not jeopardise the flexibility of the evaluation approach required to stimulate learning towards system innovation.
Methodology
This study explores the tensions between accountability and learning towards system innovation and options to align these motives in evaluation as experienced in the context of niche experiments with a Triple P purpose (addressing amongst others sustainable agriculture, mobility and regional development). This exploration is based on a qualitative data set, comprising 30 semi-structured interviews conducted between 2008 and 2012.
The 30 semi-structured interviews comprised 17 interviews with project managers and 13 interviews with funders. The interviewees referred to in this article as project managers were all involved in the management of niche experiments in the context of the sustainable development of agriculture. Some managers were involved as initiating entrepreneurs (n = 2), and others as consultants (n = 3). The largest number of interviewees (n = 12) was involved as intermediary project manager, appointed by system innovation programmes. All interviewed funders were involved in system innovation programmes, either from the government (national level (n = 8), provincial level (n = 2) or as funders of niche experiments conducted within system innovation programmes (n = 3).
Respondents were interviewed at their workplace and each interview lasted about one hour. All interviews were audiotaped after informed consent and transcribed verbatim. Summaries of the interviews were sent back to the interviewees for member check. The interviews were analysed by two independent researchers on the needs and expectations regarding upwards, downwards and internal accountability, resulting in an adjusted accountability framework for niche experiments with a Triple P purpose.
In the next two sections we will examine the diverse needs, expectations and experiences regarding evaluation expressed by both funders and project managers. As we consider evaluation to be an important accountability mechanism, the expressed needs will give insight into the spectrum of ‘forms of accountability’ as experienced by (or required from) the interviewees. Evaluation needs, and their underlying reasons, thus reflect the broader context in which interviewees operate, and the multiple demands they need to comply with in order to reach their purpose. We will see that these complex contexts do not only form constraints, but also provide possibilities to align accountability and learning towards system innovation in an integrated evaluation approach.
Funders’ evaluation needs
When policy makers at national governmental level were asked for their main reason for administering evaluations of niche experiments and system innovation programmes, they generally expressed two interrelated needs: the need to determine the effectiveness and the need to determine the efficiency of the system innovation programme, both related to upwards accountability.
One of the policy makers described ‘effectiveness’ as “the extent to which the programme achieved the things we wanted it to achieve and thus contributed to the predefined policy goals”. Whether a programme is effective in contributing to the overall policy of the Ministry should be evaluated by answering questions such as: What are the results of the programme? What are the effects of the programme on the problem that it was designed to solve? Is the problem solved or have the activities aggravated the problem? Or as one policy maker put it: “Can you make it plausible that the programme contributed to the policy goals?” This question is important to answer as “it doesn’t matter how well you carry out your programme: the Minster has to be able to prove that it is safer or more sustainable after the execution of the programme.”
‘Efficiency’ was seen as another important objective of the evaluation of the system innovation programmes as a policy instrument. According to a policy maker “efficiency considers the question of whether the investment has been well spent.” According to these funders, efficiency thus refers to the spending of resources in relation to results obtained. For the interviewees it was important to be transparent about how the money was spent, as the financial resources are often obtained through taxes. Naturally, in the political context, financial accountability is closely related to political accountability, as politicians are accountable to their electorate, via their representatives in parliament. They require insight into the effectiveness and efficiency of policy programmes so that the Minister can answer questions from the House of Representatives (Kamervragen in Dutch). As one policy maker noted: A question that could be posed by the House of Representatives is: Are you aware of the fact that you spent over two million Euros on the programme ‘Growing with Future?’ Can you inform me on the effects of this invested capital? […] When a parliamentary question regarding our programme is posed, I have to convince the Minister in five minutes that the two million Euro investments was spent well.
It appeared that most of the interviewed policy makers at national governmental level did not consider system innovation programmes as much different from other policy instruments in terms of the evaluation needs arising from them and they spoke more of evaluation at the programme level than at the level of niche experiments. The needs of funders regarding evaluation for upwards accountability can be found in Table 3. We found that none of the funders mentioned needs regarding downwards accountability. Evaluation needs regarding internal accountability will be discussed below.
Evaluation needs of funders in relation to spectrum of accountability.
Although the main reason for evaluating effectiveness and efficiency is to ensure the programme is accountable (to the funder, to parliament), the process of adjusting or even terminating a policy programme also involves a process of learning from the outcomes of the policy evaluation by the funders. Internal accountability – from the perspective of programmes or projects – concerns remaining compliant to the own mission, longer-term vision and ideas about ethical conduct. However, interviews with funders showed that internal accountability also plays at the level of the funder (which we refer to as regime level): they too wanted to learn from the conducted projects to assess and possibly redefine objectives to ensure these remain compliant to their mission and longer-term vision.
First, the interviewees from national government said they learned from the system innovation programmes in order “to revise current policies”. As one of the policy makers said: The evaluation can be used as input to revise the current policy. Some points of the policy document may not be achieved within the programme. The question then is: why they are not achieved, did we use the right resources? The answers to these questions may result in an adjustment of the policy objective.
Second, outcomes of programme evaluation are also used to guide the content of new policies, providing thoughts and ideas for upcoming policies and programmes. As one of the policy makers mentioned: On account of the data of an evaluation, you may develop the policy for the upcoming years. For this, you have to determine the learning points or the points that are still underdeveloped. These points you can amplify and implement in new policies.
Thus, learning from a programme, in particular related to aspects or policy strategies with insufficient success, may form the foundation of new policies. In addition, programmes may also reveal new insights or new questions to be put on the policy agenda.
Although the main focus of policy makers’ needs and expectations regarding evaluation is on upwards accountability and, related to this, internal accountability at regime level, some of the interviewees regarded learning at niche level as an integral part of the evaluation. One of the policy makers, for example, talked about evaluating the process within the programme: “How is the process developing? Are the right actors involved? Does the programme interact with its wider environment?” These types of evaluation questions we would find under internal accountability at niche level; it deals with questions that are helpful in making timely adjustments, to better reach the intended purpose (or even adjust the purpose), during the course of the project.
The policy makers who mentioned this type of internal learning stated that it is important to reflect on the actions and the direction of change during the execution of a project or programme. While most interviewed policy makers did not make a distinction between the evaluation of system innovation programmes and other policy instruments, some of the interviewees suggested that especially system innovation programmes require a continuous reflection process that can be aided by new forms of evaluation.
In correspondence with our findings, in Table 3 internal accountability is split up into two types: internal accountability at regime level and internal accountability at the niche level.
Project managers’ evaluation needs
Project managers of niche experiments supporting Triple P initiatives talked about upwards accountability primarily as a burden, requiring a lot of time without clear benefits. According to one interviewee: The funding really put a mark on the programme, both financially and content wise. The conditions they set were limiting, for instance, in terms of what partners to involve and under what conditions, but moreover a lot of management attention was needed; half of the time we were busy with bureaucracy rather than innovation.
Additionally, the managers also faced a tension between the requirement of showing concrete results on the one hand and the necessary space to learn from the emerging properties of niche experiments on the other. As one of them stated: The government is accountable with regard to the money spent and needs to show results. Accountability and results are part of our contemporary [governance] culture. But for transitions space and direction, […] are far more important than accountability and results. Transitions are to do with trust, with giving space to make things emerge, but that relates badly to money.
This tension gives rise to challenges for project managers in terms of acquiring funds: Even though I do not have a strict plan to be executed, I do have to write one – the funder requires this. They really would not give money if we would say: ‘it depends on the entrepreneurs what we will be doing’. So, you do have to say beforehand, we will have so many meetings about this and that, etc.
These experienced tensions between upwards accountability and experimentation space suggest that new types of performance measures are required for upwards accountability; measures that better allow for the emergent process of niche experiments, and thereby allow for learning processes to occur to support innovation management. See the first row of Table 4 for a summary of project managers’ evaluation needs in relation to upwards accountability.
Evaluation needs of project managers in relation to spectrum of accountability.
Like the funders, the project managers also did not mention evaluation needs in relation to downwards accountability. They did, however, speak frequently about accountability towards parties other than funders and themselves. This is clearly because niche experiments by definition are multi-stakeholder projects, bringing together entrepreneurs with actors from knowledge institutes, governmental and non-governmental organisations and other businesses. In Table 4, we replaced downwards accountability with horizontal accountability, as the multi-stakeholder nature of niche experiments brings in new accountability dimensions: What I see a lot in these types of change processes is that you always need to involve more actors. This makes that you are very dependent on structures that you cannot really influence. Structures that are very robust and steady […] and that bring in many institutional interests.
For instance, scientists who participate may on the one hand contribute by bringing in relevant knowledge and taking up new research questions, but on the other hand they are accountable to their own institutions. “Universities and science need to account for how much they publish. In principle, science can really contribute [to niche experiments], but this is at the expense of fundamental research. So there is a tension there.” Other examples of horizontal accountability are the need to comply with laws and regulations: “For instance, in one of the projects, an air-cleaning system was used in the stable that was based on best practices …. However, it was innovative and not yet accepted by legislation.” Another example concerns an urban planning project with the ambition to develop houses without external sewage system, which similarly was not allowed under current legislation.
Thus, as an extension of downwards accountability – which concerns the responsibility to take into consideration all those affected by the project and involve them in all phases of the project planning, monitoring and evaluation – horizontal accountability concerns the responsibility to take into consideration all relevant parties in the environment of the project. Evaluation can contribute by assessing the diverse institutional contexts and the actors that need to be involved in the niche experiment or otherwise changing practices in order to deal with obstacles. As one of the interviewees remarked about the role of the evaluator: In one of the projects, different parties are involved; entrepreneurs, governments and knowledge institutes. They intend to realise the project together. But what we see is that citizens living in the area, and the nongovernmental organisations, have no access to this network. That is a problem in the network. In this case, the evaluator is a person that indicates that certain parties are missing.
Besides horizontal accountability, project managers also expressed the need for internal accountability at the niche level. They considered that reflection on actions taken, and decisions made, may result in adjustments that improve the effectiveness of the project. Time to reflect, however, seemed to be limited. For example, one manager indicated: “I am extremely busy and I have to make many decisions. Therefore I tend to act rather intuitively.” Another project manager expressed: “Often I am ruled (directed) by the hectic daily business. Therefore, I don’t have or don’t make time available to reflect on current practices.” Managers also said they benefited from the ‘outsider perspective’ effectuated by an evaluator. One of the project managers illustrated this as: “I need feedback from outsiders; I am not capable to pull my own head out of the swamp.” Another articulated: “Sometimes I wonder: ‘are we doing the right things? Should we direct our energy on other things?’ It is helpful to discuss these issues with someone.” Thus, ex-durante evaluation can contribute to internal reflection and making timely adjustments along the way to remain internally accountable.
As we saw above, dealing with the multi-stakeholder nature of niche experiments is a challenge to many project managers. Also, ensuring the project is accountable to its purpose and mission was one of the expressed needs of the project managers. In addition, we found that some project managers expressed a need for ways evaluation may help deal with tensions on higher levels that stand in the way of their innovative endeavours. As one of the interviewees pointed out: There are various institutional, regulatory and financial obstacles in the context of a project. However, we need to put the term ‘obstacle’ into perspective, because it could also be seen as the task or challenge of the project to deal with these hampering factors, and to find solutions within the context of a given institutional environment.
This quote signifies that project managers do not necessarily accept hampering factors at regime level as unchangeable conditions, but take up the challenge by circumventing or confronting these factors. The interviewees indicated that, for example, an open dialogue with the respective stakeholders could support this. This fits with our understanding of internal accountability at regime level, as explained previously, as it implies reflection on the mission of both experiment and policy including their relationship. Evaluation may contribute to this reflection and learning process by bringing together actors from different levels to collectively discuss the encountered obstacles for system innovation ambitions and ways to solve these.
Table 4 summarises the needs of project managers regarding evaluation and shows that the needs of project managers are different in various respects from the needs of funders, both operating in a complex context, which requires them to engage with different actors, reflect on their realities and adjust strategies to accommodate the multiple requirements whilst keeping direction. This insight opens up opportunities for the conceptualisation of strategies to align evaluation needs of managers of Triple P experiments and funders, and hence overcome the seeming incompatibility between learning and accountability. Before discussing these, we refine the perspective on accountability on the basis of our results.
An adjusted accountability framework for niche experiments
Following Ebrahim (2005), we argued that the concept of accountability should be broadened from upwards accountability to also include downwards accountability and internal accountability (see Figure 2). Our findings allow for a further conceptualisation of accountability in the context of evaluation of niche experiments as part of system innovation programmes. We propose two adaptations.

The original framework for accountability (based on Table 2).
Whilst downwards accountability reflects the need to take into consideration the fact that niche experiments are accountable to multiple actors, its focus is limited to those actors that are affected by a programme. Following our findings, the first adaptation we propose is to refer to downwards accountability as ‘horizontal accountability’. This also reflects the move in governance from vertical principal–agent relationships to horizontal network governance (Benjamin, 2008; Lehtonen, 2014) and the multiple constituencies involved. In the case of Triple P initiatives, where multiple actors are involved in the governing of projects, horizontal accountability implies that project teams are accountable – besides to funders – to multiple actors including consumers, suppliers, citizens, national and international regulators, civil society and non-governmental organisations. This is reflected in Figure 3 where outward arrows indicate the accountability of niche experiments to multiple actors, including funders and intended beneficiaries. This implies that downwards and upwards accountability are integrated into the concept of horizontal accountability.

An accountability framework for niche experiments.
The second adaptation we propose is to distinguish between internal accountability at niche level and regime level, because work within the niche experiment may also ‘critically scrutinize and even attempt to adapt their structural context and the self-evident assumptions embedded therein’ (Grin, 2010: 274). Internal accountability refers to the fact that any programme or organisation should be held responsible to its own mission, values and ideals. This is particularly true for niche experiments, where keeping the ambition high and not reverting to suboptimal solutions under pressures from the environment is considered the primary challenge (e.g. Hoes et al., 2012; Van Mierlo et al., 2010b). It also implies taking into account the multifaceted environment of projects. This means that boundaries between what is internal and what is external to the experiment become ‘vague, fluid and subject to constant change’ (Lehtonen, 2014: 283, referring to Benjamin and Greene, 2009). Learning and reflection are said to be indispensable aspects of the management of niche experiments (e.g. Loorbach, 2010; Raven et al., 2007), due to the many uncertainties, the unpredictability of results related to their experimental nature and the multiple stakeholders that each bring requirements from their respective institutions. The internal goal, to establish a Triple P outcome, therefore self-evidently includes wider, ‘external’ changes. Internal accountability at regime level could be that a governmental department adjusts its objectives and measures as a result of outcomes of experiment and programme evaluation, that universities adapt their publication requirements for career development or that people and planet values become an integral part of accountability frameworks of companies.
The adapted accountability framework for niche experiments thus includes two shifts: the first one involves the integration of upwards and downwards accountability into horizontal accountability and the second one involves applying internal accountability not only at the level of the niche experiment and system innovation programme, but also its relevant context, the related regime. For both types of accountability, evaluation is essentially needed to create alignment between the experiment and the multiple institutional contexts (including the funders’) that are relevant to the Triple P initiative. Such empowering interaction of the niche experiment with the regime is essential for eventually taking down the niche protection measures (Smith and Raven, 2012).
Given that evaluation needs in the practice of Triple P niche experiments include the multiple facets of horizontal accountability and internal accountability at both niche and regime level, what can we learn from the strategies of project managers to align accountability with learning needs in the evaluation of niche experiments?
The need for dynamic capabilities to create alignment
For learning-oriented evaluation methodologies to contribute to alignment between the experiment and the diverse institutional settings, they would need to support project managers by making visible and stimulating reflection on the uncertainties, barriers and opportunities resulting from a changing environment in relation to project activities and their direct results. As we already saw above, dealing with conditions outside the project, and changing them is considered the task, challenge or goal of the Triple P experiment. It can be argued that the managers and innovators of niche experiments face functional task uncertainty: they know what to do, but not how to do it (Whitley, 1984). As one of the interviewees expressed it: “In principle, everything is already there; the only question is how to organise it. I think that the how-question is the most important question that each project manager faces. How can I do this?” He continues that project managers cannot stick to their normal routines: “It is more than their daily work. They really need to find the pitfalls and obstacles that have hampered the transition since years.”
Evaluation can contribute to this process of learning by the participants in the experiment by introducing routines that help to develop dynamic capabilities. Dynamic capabilities are those capabilities – both of the actors and the project – required to deal with dynamic and unpredictable environments (see e.g. Eisenhardt and Martin, 2000; Teece et al., 1997; Zahra et al., 2006). The following quote regarding the role of evaluation in the context of development organisations clearly resonates with the experiences of project managers of niche experiments: Development organisations that only monitor very selective aspects of their work and environment (such as meeting financial targets) risk deceiving themselves into thinking that their environment is stable. On the other hand, organisations that attempt to anticipate future uncertainties (e.g., by conducting strategic reviews or scenario planning that involve various levels of staff in strategic discussion and that may require alliances with other organisations such as research institutions, funders, and even competitors) may be better positioned to recognise and respond to environmental change. (Ebrahim, 2005: 76)
In recent years, various evaluation methodologies and tools have been developed that can be argued to support the process of developing dynamic capabilities (e.g. De Wildt-Liesveld, 2015; Van Mierlo et al., 2010a). The evaluator contributes by organising reflection within a project team on questions of how to engage multiple actors, how to keep ambitions high, how to keep the process open; in short, how to realise a Triple P initiative in contexts in which selection pressures, such as regulatory requirements and demands for competiveness and profitability, prevail.
Learning-oriented evaluation can contribute to the development of dynamic capabilities that aid the implementation of alignment strategies. Alignment refers to the ‘bringing in line’ of different social worlds (Star, 2010), or communities of practice (Wenger, 1998). In other words, alignment strategies aim to create overlap between different worlds (see also Hendriks and Grin, 2007), ensuring that the innovation developed within the niche is comprehensible and logical in the eyes of relevant stakeholders (Hoes and Regeer, 2015), including funders, intended beneficiaries and other stakeholders. We have identified two alignment strategies from our data, which we will present in the next section; boundary people and boundary objects.
Evaluation strategies for alignment: boundary people and boundary objects
The multiple stakeholders that are indispensably engaged in niche experiments are not only involved to bring in their experiences, views and knowledge to the benefit of developing an innovative Triple P initiative, they also form the bridge to the multiple institutional settings. They can be considered ‘boundary people’ (Edelenbosch et al., 2015) or ‘boundary actors’ (Keshet et al., 2013): people that can bridge or create alignment between different worlds. Evaluation may help to define, assess and encourage collective reflection on the actions – and outcomes thereof – of boundary people in the experiment in the light of the system innovation ambition. This thereby supports horizontal and internal accountability (the latter both at niche and regime level).
Our findings show that even though often a funder is at a distance from the projects and has little insight into what goes on in the experiment, he or she may function as a boundary person. By becoming actively engaged within the project, by for example taking part in the project team and attending meetings, funders gain more inside knowledge of the projects which makes it easier for them to report to their superiors or constituency. In this way, the evaluation requirements they impose on the experiment may become better fitted to the project practice. One of the policy makers stated: Sometimes I drive to Lelystad [city in the Netherlands], where the project is located, to talk with the project participants and to see in real life what the project is about. Being there enables the sharing of content related activities and results.
In this case, the policy maker functioned as a boundary person; his involvement within the project made it easier to answer parliamentary questions because in his position he had the insight required to bridge both the evaluation needs of the niche experiment and his superiors.
In large-scale innovation programmes, relatively independent from governmental departments, it has become more common to work closely together with the experiment. As one of the funders said: “I also participate in project team meetings as one of the funders and bring in the interests of our programme. And I make connections between the project and other partners in our network, like knowledge institutes.” This role of the funder as active participant is also recognised by project managers. One of them describes the first project meeting she attended of a new initiative: One person of the funding agency was there too, but I didn’t realise that. He really was a member of the project team from the start. I never associated him with the funder. From the beginning we were a team and the project really benefitted from this later on.
Such involvement has proven to be beneficial to bridge differences in evaluation needs later on, when superiors request new types of information from the niche experiments.
A second alignment strategy that can be deduced from the interviews is to create ‘boundary objects’; objects that make sense in the eyes of different stakeholders, but that through shared use allow these stakeholders to collaborate without requiring full consensus (Star and Griesemer, 1989). Here, ‘object’ does not necessarily refer to a tacit ‘thing’; rather ‘its materiality derives from action, not from a sense of prefabricated stuff or “thing”-ness.’ (Star, 2010: 603). It is not only the Triple P idea itself that can be considered a boundary object, as it needs to maintain coherence across different contexts and fields, other boundary objects can be created to support the process of realising a Triple P initiative, also with regard to stakeholders in evaluation. An example of a boundary object that contributed to horizontal accountability towards regulatory systems is the ‘status aparte’ (a legal status) that was granted to one of the niche experiments, based on the idea that the government should support innovations that comply with the values on which legislation is based, even if they do not comply with the existing legislation. The ‘status aparte’ is a boundary object, because it has legal status in the context of legislation and it provides the necessary innovation space in the context of the niche experiment. Another example of a boundary object, one that contributed to horizontal accountability towards funders, is the aforementioned conception of a project plan that leaves room for emerging development but at the same time is concrete regarding the number of meetings and required budget. Also, developing new types of indicators that on the one hand do justice to the processes involved, thereby making the task of evaluating contributing to rather than distracting from the core business, and that on the other hand comply with the requirements of funders, are examples of boundary objects. Evaluation work can not only help to signal the need for boundary objects between stakeholders, it may also be instrumental in the development of such objects by which alignment is created between niche experiments and the regime level.
In sum, appointing and supporting ‘boundary people’ and developing ‘boundary objects’ have been identified as important strategies that interviewees have used to create alignment between the niche experiment and its multiple contexts; they have aided horizontal accountability, supported by internal accountability at both niche and regime level. The importance of strategies that focus on crossing boundaries between funder and project has been stressed in evaluation literature: ‘[I]n addressing complex social problems, ends often are clarified and redefined through practice (Harmon, 1995; March and Olsen, 1985). These scholars suggest that accountability efforts should encourage dialogue and greater understanding between principal and agent (Fry, 1995; Harmon, 1995)’ (Benjamin, 2008: 326)
Concluding remarks
This article deals with ways to reconcile evaluation for accountability and learning in the context of niche experiments that support the realisation of Triple P ambitions. These niche experiments are temporarily funded by government programmes, which brings in an accountability dimension. We often see a gradual change of expectations from needs regarding internal accountability (how can we learn from and improve what we are doing) to needs regarding upwards accountability (how can we justify resources used in relation to outcomes). Project managers and evaluators are thus in need of strategies to bridge the gap, or better: to create alignment between requirements of funders on the one hand and needs of internal and other external actors on the other.
In this article we have redefined the concept of accountability and thereby proposed an adapted accountability framework for evaluating these types of projects, which reconciles evaluation goals of learning for system innovation with accountability. Horizontal accountability is central in this adapted accountability framework; it refers to the fact that a niche experiment with a Triple P purpose is accountable to multiple actors including consumers, suppliers, citizens, national and international regulators, civil society and non-governmental organisations, as well as funders. Tensions in evaluation are thus essentially a reflection, or better, a tangible manifestation, of tensions between niche experiments and their multiple contexts. Through involvement of diverse actors, and by extension their respective institutional settings, learning (as part of internal accountability at regime level) reaches beyond the boundaries of the experiment.
Accountability needs require the evaluation of niche experiments to help create alignment between the experiment and its multiple institutional contexts. Learning-oriented evaluation is essentially needed to contribute to the development of the dynamic capabilities of participants in the experiment that further aid such alignment. Gearing the evaluation to the role of boundary persons and objects in the niche experiment in an iterative, participatory process of assessing and reflecting on progress in the light of external developments, is key to a reconciling learning and accountability evaluation needs regarding Triple P niche experiments.
We believe that the insights and framework developed in this article, as well as the evaluation strategies for alignment, are also applicable to evaluation beyond the context of Triple P niche experiments. Programme evaluation, involving the analysis of (causal) links between project activities and their outcomes, is increasingly considered to be challenging (e.g. Perrin, 2002; Williams and Imam, 2007), particularly when it concerns projects or programmes that deal with persistent problems firmly embedded in society and infrastructure, involve multiple actors and find themselves in unstable environments. They are faced with interconnected ecological, social, political and economic systems, as well as with a plurality of values and perspectives brought in by the different actors involved, and as such are ‘influenced dramatically by a range of ambiguous and uncertain external and internal forces’ (van Marrewijk et al., 2008: 599). Rather than perceiving changing objectives and unintended and unexpected effects as signs of inadequate design, it is increasingly argued that the uncertainties emerging from case-specific contexts should be seen as a ‘source of learning, reflexivity and adaptive governance’ (Lehtonen, 2014: 279, see also Arkesteijn et al., 2015). The adapted accountability framework accommodates exactly these arising needs.
Evaluation is a means to an end, not an end in itself. Rather than treating accountability and learning as two separate notions, we have argued that learning may be considered a concept within the redefined continuum of accountability. Strategies to align evaluation needs, such as purposely searching for boundary objects and people, and collectively reflecting on the results of these strategies, contribute to uniting these demands simultaneously, at niche as well as regime level.
To conclude, in our contemporary society, projects and programmes become increasingly complex, requiring the involvement of multiple stakeholders for change in multiple contexts in order to deal with persistent problems. The local needs to be aligned with the global, environment with profit, human rights with health care. The development of dynamic capabilities is a prerequisite for public services, non-governmental organisations, knowledge institutions and corporations that are willing to embrace the complexities of our contemporary world and anticipate future uncertainties. We hope to have shown that the field of evaluation is in a position to contribute to this important endeavour in a significant and meaningful way.
Footnotes
Acknowledgements
We would first like to thank all the interviewees who were willing to share their experiences and insights with us. We are grateful to the three anonymous reviewers who have provided valuable feedback that helped us improve the paper considerably. We would like to thank Wanda Konijn and Anne Charlotte Hoes for their contribution to this study, by conducting and transcribing several of the interviews used in this study, and Lisa Verwoerd for her valuable feedback and help in the final revisions. Moreover, we also thank the four interns, Jessie, Daisy, Sandra and Marije, who were involved during the data-collection phase.
Funding
This work was supported by the former Dutch Ministry of Agriculture, Nature and Food Quality through the project ‘Monitoring and Evaluating Networks’ (grant number ond/2006/16/01); and TransForum, particularly through the project ‘TransLearning’ (KP-093).
