Abstract
The continued prevalence of different forms of collaborative working within public policy requires adaption in evaluation practices. In recent years evaluation toolkits, audits and guides have migrated online, but with varying success. At their worst, such tools can offer a disengaging user experience, limited coverage of issues or normative bias. This article outlines POETQ, designed to be engaging, comprehensive and methodologically robust. An overview of this approach is set out alongside an analysis of its merits. The article concludes by reflecting on the kinds of evidence that policy makers actually want to generate in relation to the topic of collaboration.
Introduction
Although collaboration has long been recognised as important in the context of public management, the election of the New Labour government in 1997 signalled a new era of ‘partnership’ for the UK. This government diagnosed that what public policy needed was ‘joined up responses’ to ‘joined up problems’ and collaboration became a core task of public organisations (Dickinson, 2014). The UK was following a more general pattern in international public management as illustrated by Kelman (2007) who argues that the topic of collaboration across government agencies and between government, private and non-government organisations are the ‘most-discussed questions involving the performance of public institutions and achievement of public purposes’ (p. 45). Working across boundaries has become the modus operandi for governing for the 21st century (O’Flynn, 2014). Yet we still lack good evidence concerning the impact that collaborative working has in terms of outcomes and the components that are essential to drive high quality collaboration (Gajda, 2004; Woodland and Hutton, 2012).
The late 1990s and early 2000s saw the idea of collaboration spread not only through the UK but under governments of different political complexions in mainland Europe, North America and Australia. Collaboration can take a number of different forms, some of which require structural changes or legal agreements and others which are less informal and more organic in nature relating to the ways in which individuals interact with one another. In the UK, government agencies were encouraged to form partnerships not just with other government entities, but also with a range of different public, private and not-for-profit bodies. A greater focus on the relationship that government bodies have with citizens and service users was also called for under this agenda. Such was the persistent appeal of the idea of partnership – as collaboration was typically labelled around this time – that it was applied to a whole range of different areas of public policy from welfare to service delivery to urban regeneration and economic development. Partnership proliferated public policy to the degree that Sullivan and Skelcher (2002: 1) described this concept as the ‘new language of public governance’. Today, the language of partnership has largely dissipated although the focus on collaboration has not. In this article we use the terms collaboration and partnership interchangeably, with the latter more associated with the New Labour governments. We acknowledge that others use these terms to refer to specific forms of joint working, but in this article we are referring more to the emphasis of collaborative working than the specific mechanism for this.
An emphasis on collaboration occurred alongside a focus on evidence-based policy and practice and there was a call for evidence which would demonstrate ‘what works’ in collaboration. As a consequence many different research programmes emerged aiming to evaluate partnership working ranging from small scale investigations around front-line practice (e.g. Abbott et al., 2005; Abendstern et al., 2006; Brown et al., 2003; Carnwell and Buchanan, 2005) to large scale evaluations of, for example, Health Action Zones, Local Strategic Partnerships, Sure Start and the Children’s Fund (Barnes et al., 2005; Edwards et al., 2006, Melhuish et al., 2008; Office of the Deputy Prime Minister, 2005, Office of the Deputy Prime Minister and Department of Transport, 2006; Sullivan et al., 2006). What this frenetic period of research into partnerships demonstrated possibly more than anything else is the difficulty of evaluating collaboration. Typically evaluations focused more on the processes of collaborative working than on the sort of outcomes that this produced for citizens, service users or broader public value. As Dickinson (2008) argues, this is due, at least in part, to the complexities involved in evaluating the outcomes of collaborative working. Often the types of aspirations for collaborative working are broad (e.g. reduce health inequalities or reduce gaps in educational attainment rates) and it is difficult to know whether changes to these factors take place when the study is over a relatively short period of time or, indeed, whether any changes detected are due to the collaborative efforts and not some exogenous factors. Even when significant amounts of money were invested in complex evaluations of collaborative efforts, the evaluations were unable to capture all the possible complexities and impacts of processes (e.g. Melhuish et al., 2008)
Despite the high profile and resource intensive evaluations of the 2000s, much of what characterises the evaluation of collaboration has largely been based on rapid, diagnostic, often self-administered and structured tools. The proliferation of partnerships in the early 2000s was coupled with the creation of an array of partnership evaluation toolkits (see Jeffares et al., 2012, for discussion). Yet such tools have often been criticised as superficial, implicitly normative, preoccupied with process rather than outcome or as having an overt focus on ‘performance’ and ‘success’ (Dickinson, 2008; Dowling et al., 2004; El Ansari and Weiss, 2006; Provan and Sydow, 2008). In their wake came a second wave of tools, this time online (e.g. Ball et al., 2010; Dickinson 2006; Jeffares and Bovaird, 2010). This article critically examines the online migration of partnership evaluation tools to highlight the challenge of designing such a tool that is together engaging, comprehensive and methodologically robust.
The article is structured as follows. Firstly we explore the challenges involved in designing an evaluation tool that can assess collaborative working and is administered online. We then set out the design process for a new tool, POETQ, and how we ensured that it was engaging, comprehensive and inclusive. The article then reflects on the use of POETQ in a study of the experience of joint commissioning within five English localities and analyses the degree to which it achieved its design principles. As this article shows, although we succeeded in producing an online evaluation tool that was engaging to participants and which produced findings that were comprehensive of a range of themes and inclusive of a number of voices, these messages were not necessarily always what practitioners, managers and policy makers wanted to hear. Although we succeeded in providing a range of nuanced views about what local stakeholders believed that their local collaborative arrangement was attempting to achieve, in practice what people wanted to know was simply whether joint working ‘worked’ or not. We conclude the article reflecting on the forms of evidence that are needed to support collaboration.
The challenges of evaluating partnership through an online mechanism
The collaborative turn in public administration has spawned a multiplicity of arrangements that involve two or more organisations formally working in partnership. One of the challenges of the concept of collaboration is that it has many different potential names and meanings and it is likely that in practice organisations are engaged in a complex mix of inter and intra-organisational collaborations (Gajda, 2004). Whether these be joint venture vehicles, integrated arrangements, partnership boards or other forms of joint working, they have the potential to be disruptive and expensive to set up, but they may also generate value in terms of the services provided, the meaning and identity that these give to the individuals that engage with them and the fact that they often provide a strong political statement (Dickinson, 2014; Williams and Sullivan, 2009). Yet, collaborative working is not without its potential challenges. Joint working might lead to staff turnover, redeployment or staff being line managed by a person from a different profession to their own. Although savings are often a driver for collaboration, their establishment can be resource dependent and these types of arrangements often cost before they start to pay (Leutz, 1999). In the corporate sector, Hughes and Weiss (2007) find that whilst a significant number of alliances and partnerships are created each year the vast majority do not succeed, and it is likely that these figures are similar in the public sphere. Once established, partnerships can be precarious arrangements, especially following the honeymoon period. Although the collaborative advantage of partnership is widely celebrated (e.g. Huxham and Vangen, 2005), their precise contribution, or value-added, is less clear (Dickinson, 2008). Understanding the contribution that collaboration makes is important and there are a range of reasons why public sector organisations might wish to evaluate partnership working including: to assess readiness for change; to understand the potential costs involved; to legitimise political decisions; to assess their ongoing viability; or to measure the contribution of these arrangements across a range of dimensions.
While there is a clear imperative to evaluate collaboration, it is well recognised that this is difficult to achieve (e.g. Appleton-Dyer et al., 2012; Dickinson, 2008). Partnerships have diverse memberships who may have different expectations about what that collaborative endeavour aims/should aim to achieve (Marek et al., 2014). As partnerships are heterogeneous entities and can incorporate a range of different types of working arrangements they can be difficult to compare with one another (Brown et al., 2012). The kinds of contexts within which partnerships operate can be difficult to codify in a clear way, but may have a significant impact on joint working if there has been a bad experience of collaboration previously, for example, or if there is a series of other changes currently underway in that locality that might confound efforts at joint working (Provan and Milward, 2001). We now move on to think about the ways in which partnerships have been evaluated previously and the relative merits of these types of tools in dealing with the challenges set out above. In doing so we have reflected primarily on the sorts of evaluation tools developed in the UK, acknowledging that additional evaluation tools have been developed in other jurisdictions (e.g. Marek et al., 2014; Woodland and Hutton, 2012). Although there is an array of different ways in which we might evaluate partnerships, here we identify two main categories of approaches – bespoke and rapid. Bespoke responses typically involve an external team of researchers or evaluators involved in ethnographic engagement, interviews, focus groups, documentary analysis and so on. An advantage of this type of approach is the level of engagement and ability of evaluators to empathise with those involved in the research. The systematicity and comparability of these many possible types of approaches depends on the specifics of the methods adopted (see Dickinson, 2008) and these can often be expensive and time consuming. The alternative is a rapid approach and there are three recognised types of approaches here: toolkits, audits and guides (Markwell, 2003). The latter includes instructions and examples to guide partnership formation; toolkits provide activities to develop and advance existing partnerships; and audits provide a means to assess the effectiveness of partnerships and help monitor progress. Markwell’s (2003) review of some 40 toolkits, audits and guides marked a highpoint for partnership evaluation. Whilst these approaches take different forms, they characteristically differ from the bespoke approach outlined above: many are designed to be self-administered with an evaluator present as a facilitator and tend to focus on the view of a lone voice in the partnership. Three of the most widely used rapid approaches are The Partnership Assessment Tool (Hardy et al., 2003), The Working Partnership (Health Development Agency, 2003) and the Partnership Readiness Framework (Greig and Poxton, 2001).
Several authors have expressed criticisms of these rapid toolkit approaches. These generally fall into four types of criticism, namely that these approaches are:
Superficial (El Ansari and Weiss, 2006);
Normative in the sense that all collaborations should conform to particular ideals or standards which are often implicit rather than explicit (Dickinson, 2008);
Focused on process to the detriment of outcomes (Dowling et al., 2004); and,
Too narrowly focused on what is considered performance, or success of joint working (Provan and Sydow, 2008).
More recently improved access to high-speed internet connections and all round improvements in information technology have opened up opportunities for evaluators to move their data collection processes online. This shift is mirrored in the increased use of online surveys and polling, aided by low cost online survey applications and targeted online advertising.
The remainder of this section explores three examples of UK partnership evaluation tools designed to be administered online. The first example of a tool is the Partnership Outcomes Evaluation Toolkit (POET) (Dickinson, 2006). Drawing on the established ‘strategic assessment approach’ (Jackson, 1989), the online survey includes a series of questions to explore how effectively partners work together and their assumptions regarding the outcomes the partnership is aiming to achieve. The findings are then fed back in group settings with an aim of promoting dialogue and discussion around the purpose of the partnership. An attempt at synthesis around key assumptions is made, but if synthesis cannot be achieved points of disagreement are noted and implications discussed.
A second example of a tool is that of Ball et al. (2010) who have adapted the widely used Partnership Assessment Tool for online application. The tool was designed to evaluate Community Health partnerships in Scotland and, drawing on Hudson and Hardy’s six partnership principles, the tool presents respondents with a set of processes and outcomes and respondents specify to what extent these are a priority and to what extent they have been achieved. On average each respondent spends 90 minutes responding to the questions. The tool makes it possible to give the partnership an overall score against some 12 process principles and 11 outcome objectives. It offers a means of comparing performance of partnerships in different localities.
In the third example, Jeffares and Bovaird (2010) developed the 360 Partnership Tool to explore relationships within public-private joint ventures. Their tool was based on Q methodology (see Brown, 1980b) which aims to systematically capture the diversity of debate in the partnership (through interviews and observation). When applied online Q methodology asks respondents to rank order fragments of this debate in order of preference, a process known as Q sorting. The method assumes everybody offers a unique Q sort, but factor analysis reveals the topology of the debate in terms of the number of distinct shared viewpoints and the character of these positions (Watts and Stenner, 2005). The results are then interpreted at a series of workshops with members of the partnership.
In comparing the tools, the first observation should be that the notion of a complete shift to something remote and online is overstated. In all three examples the application allows the facilitation to be blended, in the sense that there is both private individual engagement and a public dialogue in workshops or the like. They also stand in contrast in how they are open to the many voices of the ‘many hands’ (Sullivan, 2003) engaged in partnerships. Furthermore, they offer more than a formal objective measure of performance. They seek to explore the outcome expectations or priorities of the partners. They are interested in measuring the degree of consensus present between what are sometimes diversely situated stakeholders. But with this comes limitations. In the case of POET, while the emphasis on surfacing outcome expectations is the right one, eliciting this remotely, online, is troublesome (see Dickinson, 2010). This theme of engagement is also relevant when exploring the limitations of Ball et al. (2010). Here the issue is of stamina, the ability of respondents to engage for an average of 90 minutes and give meaningful responses to 66 questions. Research into response rate quality of online surveys shows significantly higher response rates where surveys are between 10 and 20 minutes compared with those 30 to 60 minutes (30% and 18% respectively; Marcus et al., 2007), and respondents are more likely to be willing to complete a survey of 10 minutes compared with one of 30 minutes (Galesic and Bosnjak, 2009). That is not to say actors are unable to engage in an online task for 90 minutes, rather that careful consideration is required to make this an engaging experience. Although the Q sorting used in the 360 Partnership Tool (Jeffares and Bovaird, 2010) has been celebrated as offering an engaging and ‘game-like’ user experience (Eden et al., 2005), and as such stimulates respondents to offer lengthy text based responses, there are limitations of what can be understood from Q-sorting alone. Further, this raises with it questions around the selection and methodological foundation of the statements used in the Q-sort. In contrast to the 360 tool, Ball et al. cover much ground and take a holistic view of what a partnership should be.
In contrast to the programme theory of POET, Ball et al. serve to ratify a set of normative principles for how partnership should be. Their desire here is objectivity, to benchmark and compare rules without surfacing alternative accounts of success. For example, if 10 of the 20 people taking the survey indicate that the partnership is ‘giving a role to the voluntary sector’, what can be inferred? The partnership is ‘performing well’? The partnership is engaging flexible and innovative individuals? Or for that matter, the partnership is performing irresponsibly, engaging essentially private sector partners free of democratic accountability? A criticism here would be it constrains us into thinking that there is only one way to collaborate. Table 1 summarises this comparison of the three tools. When compared in terms of engagement, comprehensiveness and inclusiveness each tool has strengths and weaknesses.
Three online evaluation tools compared.
The purpose of the remainder of this article is to outline the development of a tool for online evaluation that draws on the strengths and improves on the weaknesses of these tools discussed above. It imagines a tool that is engaging, comprehensive and inclusive. Having set out the approach we then reflect on the degree to which the tool proved to fulfil these criteria in use and reflect on the sorts of evidence that policy makers and practitioners desire to support collaborative endeavours.
The POETQ tool
The tool we designed is essentially a synthesis of the POET tool of Dickinson and the 360 tool of Jeffares. The structure and purpose is similar to the POET tool but with the added incorporation of Q methodology which is present in the 360 tool. POETQ is designed to be both formative and summative in the sense that it collects insights into the processes in terms of how partners perceive that they are working together and the purpose of the collaborative endeavour. Having established the purpose of the collaboration then further data collection can be undertaken to explore the degree to which these aims are being fulfilled. Before moving to the Q sorting process, individuals are asked a number of questions relating to the processes of joint working covering a number of topics such as structures, leadership, culture, process and context. The kinds of topics covered in this section have a high degree of resonance with those outlined in Marek et al.’s (2014) Collaboration Assessment Tool, developed in the United States. Given the sorts of data that the POETQ tool collects, it can be utilised at any point after the formation of the joint working arrangements. As Gajda (2004) notes, collaboration is not a destination but a journey and the sort of data generated through this process should better enable the partners to further refine and improve their processes of working together.
Traditionally the Q sorting administered as part of a Q methodology study is applied face-to-face as part of an interview. It involves presenting a respondent with a set of cards, usually around 35 to 40, each containing a statement about the topic under investigation. Respondents are first asked to pre-sort the statements – to read each statement in turn and sort them into three piles: agree, disagree and where they have no view a third neutral pile. They then refine and prioritise the cards by populating a grid that resembles an up-turned pyramid, for example Figure S1 (in the supplemental data files). In this example, respondents review the agreeable statements and select their two most agreeable – these are placed in the +4 column. They then review those pre-sorted disagreeable and place the two most disagreeable in the −4 column. The remaining agreeable statements are then reviewed once more and the three most agreeable placed at +3 on the grid. This process continues switching between the gathered agreeable and disagreeable statements, populating the grid from outside in. Once one of the piles is exhausted the neutral statements are brought into play. In some instances neutral statements will be around the 0 column of the distribution, but not always. The measure is relative not absolute. This Q sorting is essentially a modified ranking procedure and the grid is interpreted as their point of view on the topic.
The use of a standardised grid allows for quantitative comparison of respondents. Briefly, the analysis compares the placement of the statements pair-wise allowing for the production of a by-persons correlation matrix. This is then subject to a centroid factor analysis – weighted factor arrays are produced from extracted factors and are interpreted as ideal-type Q-sorts.
Our aim was to design a tool that incorporated an online Q-sort, that reflected the process of pre-sorting statements and allocating statements to a sorting-grid. Although online sorting has been used previously, early versions differed somewhat from the tactile and engaged process of selecting and allocating statements. Tools like FlashQ allowed users to drag and drop statements. These drag and drop tools were a great advance on earlier tools but were hampered by limitations in how much of the statement could be displayed and problems of compatibility with evolving operating systems and internet browsers. The challenge facing online Qsorting is not so much the pre-sort, where each statement is read in turn and allocated to one of three piles, but rather how to guide the respondent through the process of prioritising statements by allocation to a forced-free grid. The grid is typically of nine columns (see Figure S5 in the supplemental data files), labelled from −4 to +4, with place markers restricting how many items (statements) can be placed in each column, for instance two under the +4 and −4 columns, four under +3 and −3, five beneath −2 and +2. The user would then be asked to prioritise their agreeable statements first, starting by placing their top two statements in the two places under +4 . The problem here is the user is required to scroll through statements in three piles and manually drag statements into the grid. When performed face-to-face the researcher will prompt the respondent to focus just on their agree statements, to fan them out in front of them, select just two and help them place them under the +4 column. They would then repeat using the disagree statements and flip between the two in order to populate the grid.
Our response to the challenges of online Q-sorting is depicted in the figures in the supplemental data files. Figure S1 shows the user pre-sorting the statements into three piles in terms of how much they agree with their current view of the collaborative initiative. Figure S2 shows how the respondent is presented with their agreeable statements and asked to select two. These are then automatically removed and allocated to the response grid under position +4. Figure S3 shows the respondent is then shown their disagreeable statements and two more are selected and allocated. Figure S4 depicts how the screens flip between agreeable and disagreeable, each presenting the remaining statements and the user choosing the required number of statements to fill each subsequent column of the sorting grid. Once agreeable and disagreeable statements are allocated, those pre-sorted as “neutral” are allocated to the remaining empty spaces towards the centre. Given this is a relative process of ranking rather than rating, what constitutes “neutral” will vary between respondent.
Previous validation studies have shown online Q-sorting to be equitable to face-to-face administration (Reber et al., 2000; van Excel et al., 2015), however it was important to build in features to prevent repeated and erroneous sorting. Upon clicking on the invite link, the tool generates a unique ID and places a cookie on the browser of the respondent. The start and submission times are recorded to allow the researcher to exclude any sorts completed well below or above the average median sort-time. For instance, after piloting, van Excel et al. (2015) set a minimum of 10 minutes for a sort to be considered valid. Mandatory text boxes require the user to enter additional information to contextualise their Q-sort, and users failing to complete these fields can also be excluded from the analysis. Given that Q-methodology is a measure of inter-subjectivity, that is shared viewpoints, any erroneous sorts will be naturally excluded as sorts are correlated and only significantly correlated sorts are likely to be flagged to a factor.
A dashboard was developed to allow researchers to monitor responses and export data into the established analysis programme for Q methodology (PQmethod, Schmolck and Atkinson, 2012) and the qualitative data to a spreadsheet. The concourse of debate is established through the statements that participants sort. The aim is to cover the entire range of perspectives on this topic through the concourse. In searching for a manageable number of statements, too few (i.e. fewer than 30) and the diversity of the debate is under-represented; conversely, too many (i.e. 60+) and respondents struggle to complete. One of the authors has previously found between 36 and 45 to be an optimal size for a Q set administered online (Jeffares and Bovaird, 2010; Jeffares and Skelcher, 2011; Sullivan et al., 2012).
In developing POETQ, the solution to ensuring it is comprehensive came from building in additional questions before the main Qsort. In order to keep the amount of time engaging with the tool to a minimum, we built in some basic demographic questions, questions to verify understanding and a focus on barriers and enablers of collaborative working at the start of the process. In selecting the statements to use in the Q sort we were guided by a set of principles:
Thorough acknowledgement of the diversity and totality of the debate – drawing where appropriate on primary and secondary sources;
Systematic and transparent sampling – using Fisher’s (1971) balanced block and a sampling framework influenced by leading scholars in organisational and public management (see below for more detail on the ‘4P framework’);
To collect qualitative justification for choices following the Qsort;
Analysis in keeping with Q methodology by-persons factor analysis – following approach set down by leading Q methodologists (e.g. Brown, 1980a); and,
Thorough and holistic interpretation of the results drawing on characteristic, distinguishing and qualitative reasoning.
In addition to being engaging and comprehensive, a third imperative for POETQ was that it be inclusive of alternative views around the purpose of collaboration. Those engaged in collaborative endeavours are not always clear about what it is they are trying to deliver – beyond some rather broad aims (Woodland and Hutton, 2012). One thing we tried to get a sense of was what people really think that they are trying to deliver in terms of joint working. However, as Dickinson’s previous experience with POET found, this is not always an easy process and individuals often struggle to elicit a clear sense of the outcomes they are aiming to achieve. Rather than individuals agreeing or disagreeing with different statements, Q methodology provides a way of differentiating further beyond these broad positions. Q methodology is inclusive in the sense that it is able to take into consideration a wide array of different perspectives and express these as a nuanced set of positions. Analysing data from the sorts allows us to identify groups of individuals who display similar feelings towards the group of statements as a totality. Qualitative data is provided in relation to those statements that individuals feel most strongly about, allowing the extrapolation of rationales for these decisions, and these were often illuminated by examples from practice. Further, when combined with the background analysis this allows us to consider issues such as profession, level of experience and employing organisation.
Applying the POETQ tool
This section examines an experience of the application of the POETQ tool and the degree to which it illustrated the facets of being engaging, comprehensive and inclusive that was intended in its design. The POETQ tool was used in a research study of joint commissioning arrangements in five English localities. The tool was employed to provide a snapshot of how professionals working in these arrangements viewed these in terms of their processes of working together and what they are aiming to achieve. The tool was used in the sites as the first phase of data collection, with subsequent activities conducted after this initial process particularly with the aim of collecting summative data regarding the impact of these collaborations. Participants were invited to take part in the process by email, with each following a personal link so that they could complete their own survey confidentially. In addition to collecting data via the POETQ tool, interviews and focus groups were conducted with staff at these sites, interrogating the experience of completing the online survey and discussing the findings produced through this process. More details on the research project and its outcomes can be found in Dickinson et al. (2013a, 2013b).
The statements that were used in the study were generated using the ‘4Ps’ outcome framework that we developed in accordance with the design principles for this tool (where the Ps are: people, partnership, productivity, professional), inspired by the influential work of Janet Newman on theorising governance (Newman, 2001). In Q-methodology, the function of the coding framework is to devise a means of systematically mapping the diversity of discussion around the topic in question, that is, its concourse. By using a framework adapted from Newman’s (2001) work we could map statements drawn from a literature review and statements gathered from an earlier pilot Q-study of collaboration in an English town. Although we collected around 300 statements it was possible, using the framework, to identify overlapping themes and duplicates and select 40 short and distinct statements of opinion about, in this case, integrated commissioning in health and social care.
One example of a statement is: “Commissioning jointly is about delivering a seamless service for service users”. This we associated with a focus on people outcomes, about improving “real lives” and in favour of pro-active prevention. There is a democratic underpinning to this domain, where service users have a degree of influence on the way the service is planned and delivered. A second example of a statement is “Properly done Joint Commissioning can deliver a quantum leap in how organisations work together”. In contrast this was located in a domain around “Partnership Outcomes”. Statements mapped into this domain focus on developing new and different ways of working, of aligning systems and sharing information. It includes statements about professional empathy, building relationships, common language and trust. Moreover, such outcomes bring a transformational impact on organisations. A third example of a statement is “Joint Commissioning can feel like a battle of the models: a health approach verses a social care approach”. Statements mapped into this domain focus on professional culture and identity. Statements here consider the organisational influence of different partners, risk management and even promoting insularity. These statements consider the opportunity for private and voluntary organisations. Finally, a fourth example of a statement is “joint commissioning is about delivering more for less”. This is mapped into a domain about productivity outcomes, and joins statements about reducing duplication, cost-shunting, speeding up referrals, reducing demand on services and the knock-on implications for structures and services.
Figure S3 (in the supplemental data files) shows the core theme of our final 40 statements mapped into the four domains described above. The task was one of striking a balance between covering the diversity of the discussion but also creating a set of statements of a size that could be realistically sorted into order of preference by respondents. We also developed the statements mindful that we cannot, nor should not seek to, control the meaning of a statement. As a research team we can make assumptions about how particular statements might be interpreted by those undertaking the sort, but in the end the whole reason for doing this kind of research is to explore how respondents interpret operant statements of opinion about a topic.
Using this sampling frame, 40 statements were selected and presented to individuals to sort via the online process. Additional data were collected about role, background, experience, understanding of the joint commissioning arrangements and what barriers to and enablers of joint working exist locally. The data were entered into PQMethod, where each Q-sort was compared pairwise to produce a correlation matrix from which factors could be extracted. The factor analysis typically used in Q is a by-persons centroid factor analysis. Factors are rotated using Varimax and sorts loading significantly are flagged and their weighted loadings are used to produce a discrete set of ideal type sorts, one for each factor. This process was undertaken for each of the five sites and a second order correlation to compare the similarity of factors in each. We also conducted a single composite analysis. Qualitative data were coded by theme and analysed separately to the Q sort and other background data.
In total 93 respondents across the five sites completed the POETQ survey (between 10 and 34 responses per site). The mean completion time for the survey was 35 minutes (see Table 2). By aggregating, the five common viewpoints on joint commissioning were identified and these were to greater and lesser degrees expressed at the different sites. The findings demonstrated that the intended purpose for joint working had fragmented into five distinct accounts. But importantly these accounts were visible across different types of collaboration and were not necessarily based on profession or experience. The focus groups and interviews that followed the completion of POETQ predominantly discussed the viewpoints generated by this initial process, but participants were also asked to reflect on their experience of using the tool. Much as the design had intended, we found that participants reported POETQ to be engaging. The different activities that the survey involves were found to be accessible and easy to complete but they also provoked thought on the part of the individuals completing the survey. As the findings demonstrate (Dickinson et al., 2013a, 2013b), POETQ proved to be comprehensive in the sense that it drew on a broad array of different themes and the viewpoints produced by the survey were derived from a wide range of different voices.
Numbers of completed surveys and time spent sorting statements.
The trouble with evidence
Despite the fact that POETQ seemed to have fulfilled the design principles that were aimed for in terms of being engaging, comprehensive and inclusive this did not mean that the project did not encounter difficulties with the evidence produced. When presented with evidence that was inclusive and provided a number of nuanced accounts of what local stakeholders believed joint commissioning to be, many local managers struggled with this and said that they just ‘wanted to know whether it worked or not’. This inclusive, and to some degree deliberative, evidence was not always perceived as being helpful and their expectations of an online tool such as POETQ was that it would result in a solution or a definitive judgement about the performance of their local collaborative arrangements. In this section we reflect on whether this was due to the application of the design principles that are inherent to the POETQ tool, or whether local respondents were looking for different sorts of evidence.
One of the issues that many of the sites had struggled with was evidencing the impact that joint commissioning had produced locally. If the local sites had been using Woodland and Hutton’s (2012) Team Collaboration Assessment Rubric (TCAR) they would score very low in terms of the evaluation component. Whilst there were many examples of projects and different aspects of joint working, what people struggled with was thinking about how (and if) this was related to joint commissioning or some other kind of agenda. In interviews many stakeholders demonstrated a strong commitment to joint commissioning and firmly believed that ‘it works’. Often interviewees would suggest that the problem in evidencing this was due to difficulties with data information systems. As one respondent explained, ‘I think we’ve not always been as focussed on demonstrating the outcomes that it’s achieved and some of that’s to do with data information systems … not being … robust enough to sort of come up with what we want really’ (Manager, Site A). These challenges were argued to be exacerbated where services had a preventative aim. One site had a project that aimed to reduce hip fractures through preventative means but local managers struggled with how they could prove that these interventions had led to a reduction in hip fractures. Site E also encountered difficulty in measuring impact because they had a range of different interventions that co-existed; ‘we implement new interventions, commission new services at the same time all with similar benefits against them that we want to achieve. It is quite difficult to unpick which variable is impacting on which’ (Manager, Site E).
One of the issues we were interested in exploring is whether these equivocal data responses were as a result of the design principles inherent within the POETQ tool. In particular, where approaches take a limited amount of time to complete is the inevitable result that data is insufficiently rich? In fact, we generated incredibly rich data from the tool and found that participants wrote significant amounts of free text information, which was incredibly useful when reflecting on the findings. The data from POETQ instead suggested that at least part of the difficulty in demonstrating the outcomes of joint commissioning derived from the fact that there was a number of different perspectives on what it should achieve in that locality. POETQ was designed precisely to pick up these differences and we were able to reflect these back to the local teams. Many of those we spoke to reported experiencing difficult conversations in the past about how to measure success as a result of collaborative action because outcomes were rarely agreed upon in any way beyond broad statements about success (e.g. better, cheaper, faster). Many had agreed to be involved in the research as they expected that the team would present them with a series of things that they could measure to demonstrate the impact of joint commissioning. When presented with the array of different viewpoints concerning joint commissioning in their locality some individuals recognised the full array of different perspectives, whilst others dismissed these other viewpoints suggesting that they were ‘incorrect’. To many the generation of evidence that is both inclusive and comprehensive was not necessarily helpful. Some suggested that they had become engaged in the research project to get a sense of whether joint commissioning worked or not and part way through at least the answer tended to be ‘it depends on whose perspective of joint commissioning you are talking about and even then there doesn’t seem to be a huge clarity about what you are trying to achieve’. Whilst we had designed POETQ to pick up a range of different perspectives and allow us to reflect these back to participants, it appeared that this was not necessarily what they wanted in practice and they struggled in some cases to utilise this evidence.
In some ways these findings echo the observations made in Sullivan’s (2011) article on New Labour and the process of evaluation. As Sullivan reports, even though the New Labour governments had professed a belief in the importance of robust evaluations and clearly understanding ‘what works’ with a level of clarity that had not previously necessarily been sought by national governments, the results of many of the large-scale theory-based research projects that were commissioned were not always welcome. Sullivan concludes, ‘a key development here was scepticism on the part of policy makers about the “value for money” generated by the investment in evaluation when many of the final evaluation reports appeared equivocal – answering the question “what works?” with the answer “it depends”. While this is entirely keeping with some theory-based approaches that seek to establish “what works, for whom, in what circumstances, and in what ways”, the inevitable caveats implied by the findings did not always sit comfortably with policy makers’ demands for findings that could be easily translated into universal policy messages’ (Sullivan, 2011: 506–507). This article goes on to suggest that subsequent evaluations were commissioned where they could provide some form of statistical output and offer more ‘generalisable’ conclusions, and ‘the engagement with multiple actors to generate evidence was no longer required’ (Sullivan, 2011: 507). What policy makers wanted was ‘concrete factual realism’ where evaluators could clearly demonstrate the linear processes associated with particular interventions and that A caused B and C. In the case of POETQ we were unable to provide this level of certainty because it did not exist in the local sites and this, for some, was something of a surprise.
One of the difficulties with focusing on ‘concrete factual realism’ is that it inevitably treats interventions as being relatively similar and having clear instrumental aims. It does not allow for the fact that in practice policies such as joint commissioning may in fact have greater symbolic than instrumental power (Dickinson, 2014). One clear message within the data we collected on joint commissioning was that on the whole most people believed in it and viewed it as a ‘good thing’ that should bring about improved outcomes across a whole array of different domains. Individuals were fundamentally wedded to this idea and saw it as a way of responding to a whole array of the difficult challenges that local health and social organisations currently face. Simply focusing on the instrumental misses what Dickinson (2014) calls the ‘cultural performance’ of these initiatives. This refers to the symbolic value of these kinds of policies and the value that they are able to deliver at a local level in terms of individual and organisational identity and practice. Joint commissioning seemed to be doing something for individuals at a local level that went beyond the instrumental. Some of this was captured through the inclusive approach of POETQ and was further explored in interviews and focus groups after the completion of the survey. It is this component that approaches such as the TCAR are unable to capture. What this experience suggests is that not only do approaches to evaluating collaborative working need to be inclusive of a range of different opinions about what collaboration is and should achieve but it should also be able to capture the additional values that go beyond the instrumental. What work does this approach do in terms of capturing and containing individual and organisational anxieties about difficult and complex issues (Hoggett, 2006)? Without a sense of these kinds of factors it is unlikely that we will be able to capture the full array of impacts that collaboration has; although we should not assume that these observations will necessarily be welcomed by policy makers and practitioners who are looking for concrete answers.
This observation in turn raises a series of questions about the viability of web-based evaluation tools such as POETQ. In a context of increasing fiscal restraint, the attractiveness of web-based rapid evaluation tools will no doubt increase. Local areas will look to these sorts of resources to aid in the evaluation of local services, and online tools have the benefit of being inclusive as well as relatively cheap in comparison to more bespoke approaches. Yet, as this article has demonstrated, the options for evaluating collaborative working seem to fall between those which are relatively simple but may not be able to include a variety of voices, and those (such as POETQ) which are more nuanced and inclusive but their findings more complex. Ultimately this is a question of what sorts of data practitioners and policy makers require of collaborative working, but also the time, space and support that they have to use this. If collaborations are less mature in their development than they may believe themselves to be, it may mean that they do not get the sorts of data that they expect from this process and may find it difficult to use in practice. Arguably the current context is so strongly in favour of collaboration it makes it difficult to report data which does not support this agenda. The current fiscal climate of UK public services also means that there is insufficient time and space to think about how to use complex data about the relationships that underpin collaboration. Within this context it may be challenging for tools like POETQ to gain traction.
Conclusion
This article has argued that there remains an appetite to evaluate collaborative working and in ways that have gone beyond those attempted to date. The migration of rapid toolkit, audit and guide approaches online is to be expected but attempts so far suffer from disengaging user experience, limited coverage and normative bias. In response, this article has documented the design of a new online tool that is engaging, comprehensive, methodologically grounded but also inclusive. The article has described the process of producing a webtool that can engage a respondent for around 30 minutes, that is compatible, secure, engaging, user friendly and adaptable. The POETQ tool includes a process of sorting a set of statements related to partnership working, and the article described the process of combining theory and practice to ensure this set of statements was comprehensive and covered the diversity of the issues. By focusing on outcomes and the inclusion of alternative aspirations for what partnerships seek to achieve, POETQ is inclusive and open minded to what partnerships want to achieve without imposing normative expectations.
The article has provided an example of the application of POETQ in practice in the context of research into joint commissioning in five English localities. The use of the tool within this research project demonstrated it to fulfil the desired design features in the sense that it proved engaging, comprehensive and inclusive. However, the research team found that the type of evidence produced by this tool was not necessarily welcomed. Local stakeholders were interested more in ‘concrete factual realism’ than the sort of deliberative findings generated by this tool. What this suggests is that significant work needs to be done to manage the expectations of research users about what the evaluation process might deliver, the types of evidence that might be useful in different applications and how data can be used to inform local processes of improvement. Ultimately we conclude that to capture the full array of different performances that collaborative working produces we may need to incorporate a far broader array of issues. Yet we should not assume that these are necessarily always the kinds of messages that policy makers and practitioners will want to hear.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
