Abstract
Scholars have expressed frustration that public policy formulation sometimes proceeds with optimism that lies in stark contrast to the evaluation literature. We present theoretical insights from the rules of evidence that cast greater light on why this might be so. We also conduct novel sophisticated empirical analysis – based on a natural experiment in New South Wales, Australia – to better measure the efficiency outcomes at the heart of most amalgamation proposals. We conclude with our thoughts on the important normative measures, derived from rules of evidence, that might be operationalised to avoid injustice and enhance accountability in the future.
Introduction
Local government amalgamation (also known as ‘consolidation’ or ‘merger’) remains a controversial public policy intervention. From a political perspective, people worry about loss of identity, disenfranchisement, and also the oft-neglected failure to secure an electoral mandate (Lee & Seo, 2009; Voda & Svacinova, 2020). From a social perspective, others express concerns about unintended harmful side-effects, such as loss of livelihoods or the effective transfer of public debt from one group of citizens to another (Drew et al., 2017). Furthermore, some people question whether amalgamation can indeed result in improvements to efficiency (see, for example, Aldag et al., 2020; Holcombe & Williams, 2009). This latter uncertainty takes on particular salience given the heavy reliance on efficiency arguments by amalgamation proponents (Drew, Razin, & Andrews, 2018).
Amalgamations are also controversial in scholarly circles because of the ‘unbridled optimism we often see in consultancy and governmental reports,’ regarding local government amalgamation – work that lies in stark contrast to the large body of evaluation literature (Tavares, 2018, p. 12; see also, Drew et al., 2015; Galizzi et al., 2023; Gendzwill et al., 2021). Traditionally this policy evaluation literature has largely focussed on a positivist agenda of comparing goals to actual outcomes (Weiss, 1990). A realist perspective has also been introduced to interrogate ‘what works for whom, in what circumstances and in what respects and how?’ (Pawson & Tilley, 1997, p. 2; see also, Sanderson, 2002).Moreover, scholarly evaluation frameworks that combine both “positivist” and realist perspectives also exist (Bovens et al., 2001; Howlett, 2012). Despite these various frameworks it seems that the evaluation literature is failing to have a decisive impact on actual practice.
What we seek to do is a little different. Instead of merely assessing outcomes with respect to goals or stakeholders, we focus more on the nature of evidence itself (as well as the witnesses) with recourse to the established rules developed over centuries and used in courtrooms around the world. We argue that understanding the different kinds of evidence provided by different witnesses at different times can better explain why the apparent disconnect between policy formulation and the corpus of scholarly literature persists. Moreover, we cast new light on how the disconnect might be mitigated and propose remedies to better ensure accountability when it is not. Certainly, we will conduct investigations that bear semblance to both the positivist and realist agendas – where we most differ is in our assertion that if policymakers were to follow the rules of evidence required in courtrooms in the first instance, then there might be both less failures and also greater accountability.
This article also makes a methodological contribution to the literature through its use of sophisticated envelopment techniques to assess claims of efficiency, which lay at the heart of the particular amalgamation programme that we study. Previous literature mostly used regressions on unit costs (Gendzwill et al., 2021; Tavares, 2018) (average expenditure calculated from a single proxy for production), – which may not adequately capture the concept of efficiency, that dominates most amalgamation debates (McQuestin et al., 2017). Accordingly, we employ an enhancement to basic envelopment analysis that generates ‘benefit of the doubt’ empirical evaluations, consistent with the jurisprudence theory to which we appeal. 1
Another novel contribution is the employment of a natural experiment. The far majority of the extant literature compares outcomes of amalgamated local governments with respect to trends in the remaining cohort (Gendzwill et al., 2021; Tavares, 2018). However, this almost certainly introduces important bias, because local governments not targeted by an amalgamation intervention might well have been considered by public policy architects to be better performers (authors sometimes acknowledge this (see, for example, McQuestin et al., 2017). Instead, we use a comparison group in New South Wales (NSW), the largest state in Australia, where some local governments, initially identified for amalgamation, escaped their fate at the very last moment. We argue that this is a far superior – though still not infallible – way of understanding what might have been expected in the absence of amalgamation (more on this in Empirical Methodology to Assess Matters of Fact).
As we have already alluded to, contemporary local government amalgamation debates often emphasize the potential for the public policy intervention to improve operating efficiency and hence financial sustainability (Blom-Hansen et al., 2016). Technical efficiency is defined by economists as the conversion of factors of production – generally money and staff – into a range of outputs such as services to distinct types of properties, or the maintenance of distinct types of major infrastructure (Coelli et al., 2005). Key to this contention is the economic concept of economies of scale. In its simplest terms, economies of scale refers to a reduction in total average costs for various goods and services as a result of increased output. Specifically, it is argued that greater size facilitates greater specialisation and concomitant higher productivity, better utilisation of spare capacity, and improved purchasing power (McQuestin et al., 2021). The hope is that these functionally specific economies of scale might drive relative improvements to technical efficiency across the whole organisation and hence increase the affordability and financial sustainability of local government services.
To better understand how we might remedy the apparent disconnect between the evaluation literature and practice, we first explore the types of evidence used by the respective parties. We also consider the ideal characteristics of witnesses providing the different kinds of evidence. Following this we provide further details of the context of our natural experiment, the evidence employed by decision-makers, and the type of witnesses engaged. Thereafter we detail the empirical strategy that we used to assess efficiency, in a benefit of the doubt way, that follows the rules of evidence. We present our results and then conclude with some examples of the kind of injustices that occur when rules of evidence are neglected, our thoughts on how rules of evidence can improve public policy formation, and measures that might be used to enhance accountability.
Rules of Evidence, Witnesses, and Amalgamation
When assembling and assessing evidence the emphasis is on isolating truth as distinct from the art of advocacy (Brandes, 1961). Evidence can be employed for the purposes of making a decision ex ante or assessing outcomes ex post, and it is clear that the temporal context largely dictates the kind of evidence (and witnesses) employed.
According to Aristotle (2012), there are two types of evidence – ‘matters of opinion’ and ‘matters of fact’: the former is mostly related to experience and plausible hypotheses and the latter to objective measurements. Generally, people with broader and deeper experience in a particular field are less likely to have their judgements subsequently defeated by new information (Simpson & Seldon, 2003). In addition, a proper understanding of the field of plausible hypotheses is also salient to the competent assessment of evidence. For the particular context of efficiency outcomes arising from local government amalgamations, three hypotheses must be considered.
First, it could indeed be the case that the higher levels of output arising from amalgamation may result in the capture of sufficient economies of scale to bring about greater technical efficiency. Second, the increase to output arising from amalgamation may not result in any material change to overall average total costs – what economists refer to as constant returns to scale. This alternate outcome also is consistent with economic theory and is quite probable given that scholars generally expect a relatively long domain of constant returns to scale once excess capacity has been exhausted and most jobs have been specialised. Third, increased output arising from amalgamation might result in an increase to total average costs (diseconomies of scale). This third possibility is consistent with neo-classical economics and reflects a reduction to technical efficiency. Diseconomies principally arise due to the need for more middle management to co-ordinate greater numbers of staff as well as a decrease to organisation transparency. (See Drew, 2022 for a comprehensive discussion of the three possibilities and their causes.) Figure 1 illustrates of the field of plausible hypotheses about local government amalgamation. It should be noted that scholars can measure technical efficiency outcomes quite precisely by using envelopment analysis related models (Caldas et al., 2019a, 2019b; see our empirical methodology below).

The field of plausible hypotheses for local government amalgamation: Economies of scale and technical efficiency.
The extant literature has produced mixed and inconclusive evidence about the presence of economies of scale in local government over many years in many geographical contexts (Bernardelli et al, 2020; Fahey et al., 2016; Gendzwill et al., 2021; Holcombe & Williams, 2009). Moreover, careful empirical work has made it clear that economies of scale are likely to be function specific, which makes sense given that each type of service produced is likely to have different opportunities for specialisation, varying degrees of capital equipment investment and capacity, as well as varying levels of input requirements (Fahey et al., 2016; Tavares, 2018). Because of this, the potential for economies of scale would also seem to be specific to the remit of local government as they differ from country to country (Blom-Hansen et al., 2016). Here, we assess the evidence relating to local government in NSW, which strongly suggests the need to consult the literature from Australia. The preponderance of evidence there suggests no economies of scale at the local government level. Those that do occur are restricted to highly capital-intensive functions such as road maintenance (Drew et al., 2021; Fahey et al., 2016; Mc Questin et al., 2017).
As we have noted, scholars often appear frustrated that public policy formulation sometimes proceeds in stark contrast to the voluminous evaluation literature (Drew, 2022; Gendzwill et al., 2021; Tavares, 2018). We believe that part of the explanation for this lies in people not fully appreciating the distinction between the two kinds of evidence and the appropriate witnesses for each.
Matters of opinion are most relevant to questions of a qualitative nature that are open to dispute; this is the kind of evidence routinely employed as part of the ex ante policy formulation process conducted principally by consultants. Indeed, it is almost inevitable that policy makers might rely heavily on matters of opinion because before a policy is executed: various options exist, and few, if any, facts (pertaining to outcomes from the proposed policy intervention) exist to measure. Thus, matters of opinion are particularly relevant for decision-making relating to whether a thing (proposed policy) might be just or unjust, useful or harmful (Aristotle, 2012). Yet, because matters of opinion are subjective in nature and cannot be readily tested, they come with a pronounced risk of ‘corruption’ or unwarranted influence by others (Simpson & Seldon, 2003). That is, witnesses might skew their testimony to preserve existing relationships or seek future rewards. For this reason, Aristotle (2012, p. 71) was adamant that to be worthy of serious consideration matters of opinion ought only to be sourced from ‘detached’ and ‘trustworthy’ persons or the ‘ancients.’ Detached witnesses are those without current or proposed fiscal ties, and they preclude those with histories of close associations with proponents or opponents of the proposal under consideration. The ‘ancients’ referred to people who had recorded their ‘testimonies’ in the past (Aristotle cites Plato as an example). Here the idea is that an opinion recorded in the past is unable to be influenced by protagonists in the present. Thus, the considered views of a past scholar or expert might be adjudged reliable testimony where it was established that the context was sufficiently similar.
Matters of fact, by way of contrast, are indisputable data that are independent of the pleader and more amenable to measurement (Brandes, 1961);– this is the kind of evidence scholars routinely employ in ex post assessments that we find in the evaluation literature (Howlett, 2012; Weiss, 1990). Indeed, it is not surprising that scholars would conduct assessments of this kind given that are usually in possession of both data and analytical techniques after a policy has been executed. Thus, matters of fact are particularly relevant to evaluating whether a thing (goal) has happened or not, or whether a thing exists or not. Because facts are able to be objectively tested [the kind of work featured in the systematic reviews of Tavares (2018) and Gendzwill et al. (2021)], they come with less risk and tend to be relied on more heavily in both courtrooms and the decision-making of prudent people. Aristotle’s (2012) only fears appear to have been that a witness might perjure themselves or be less than thorough in the way that they observed and recorded facts. For these reasons, he lauded the evidence provided by witnesses who might share punishment if facts were subsequently found to have been falsified.
Figure 2 summarises the two main kinds of evidence as well as the types of witnesses that classical thinkers considered to be appropriate for each.

Types of evidence and the appropriate kinds of witnesses.
Thus, the nature of matters of opinion and matters of fact respectively go some way to explaining the disconnect between what consultants and policymakers do, on the one hand, and the large corpus of evaluation literature produced by scholars about the outcomes of policymaking, on the other. The two respective parties are dealing with different kinds of evidence, because of the nature of the public policy cycle. Moreover, different types of evidence require different attributes in witnesses. To understand this better, in the next section we describe the context of our study and explore whether the matters of opinion tendered ex ante did indeed conform to the relevant rules of evidence.
Context and Matters of Opinion
The context of our study is the amalgamation of 18 local governments in NSW in 2016. Community engagement prior to these amalgamations heavily focussed on efficiency with promises that they would ultimately deliver greater financial sustainability (see NSW Government, 2015).
The amalgamation process started in 2011 as a putatively evidenced-based inquiry into the financial sustainability of the jurisdiction’s (then) 152 local governments (Drew, O’Flynn, & Grant, 2018). After a long review process, the state government offered a generous package of incentives (including $AUD258 million in funding) to encourage voluntary amalgamations. However, just four proposals were received from local governments and the voluntary programme was abandoned (Drew, Razin, & Andrews, 2018). In its stead, the state government embarked on a forced amalgamation programme that revolved around ministerial proposals that were subject to community consultations and inquiries under extant legislation.
During the inquiry phase, Councillors at some of the local governments resolved to challenge the procedural fairness of the forced amalgamation programme in the courts (see, for example, Land and Environment Court, 2016). This provided an effective stay of execution for some local governments, whilst the others were amalgamated by Governor’s proclamation in May 2016. As the legal contests continued, public opposition to forced amalgamations gathered apace and became pivotal to a crucial by-election loss by the incumbents in a formerly safe seat. Both the Premier and Deputy-Premier were ultimately obliged to resign, and the Minister for Local Government shuffled to a different portfolio (Drew, O’Flynn, & Grant, 2018). To further stem political casualties, the incoming Premier announced that all outstanding local government amalgamations would be indefinitely postponed (Glandville & Stuart, 2017). Thus, 26 local governments that had originally been designated for amalgamation escaped their fate (this cohort will be henceforth referred to as the ‘escapees’).
These 26 escapees present an important control group by which to assess the efficiency outcomes arising from a large-scale forced amalgamation programme. Ordinarily, scholars are obliged to make comparisons between amalgamated local governments and their non-amalgamated peers, even though they recognise that this approach is likely to introduce considerable bias (see, for example, McQuestin et al., 2017). 2 Having a comparative sample of local governments originally designated for amalgamation is an improvement on the standard scholarly approach. Of course, this control group does not completely eliminate bias: the local governments still needed to make an endogenous decision to fight matters in the Courts. However, because the councils were originally designated for amalgamation – and their ultimate fate was determined by exogenous decisions in courtrooms and state government party rooms – the ‘escapee’ cohort has the potential to provide us with deeper insights into matters of fact than have hitherto been available to most scholars.
As we will demonstrate, the evidence used in ministerial proposals to advance the forced amalgamations can be confidently categorised as matters of opinion. Readers will recall that Aristotle – and modern courts for that matter – hold some suspicions around evidence of this kind due to its subjectivity and potential vulnerability to unwarranted influence. Moreover, the importance of the totality of experience – and the need to properly appreciate the field of plausible hypotheses – cannot be over-emphasised with respect to the reliability of witness tendered.
The key fiscal evidence put forward in the ministerial proposals was ‘supported by independent analysis and modelling by KPMG’ (NSW Government, 2016, p. 3). In particular, KPMG asserted ‘$2.0b in total financial benefits for councils over 20 years’; a ‘3 year payback period when merger benefits will exceed merger costs’; and ‘stronger balance sheets’ (NSW Government, 2015, p. ii). ‘The financial model drew on a series of assumptions to estimate the potential savings, costs and overall financial impacts’ (emphasis added, KPMG, 2016, p. 2). Specifically, KPMG (2016, p. 3) asserted ‘staffing efficiencies. . .[of] 7.4 percent for metropolitan mergers’ and ‘efficiencies . . . . between 3.7 to 5 percent of a council’s employee salary and wage costs’ for rural local governments. In addition, ‘the assumed value of efficiency savings was up to 3 per cent of a council’s expenditure for materials and contracts,’ but was ‘capped at 2 percent for regional councils’ (KPMG, 2016, p. 2). There were also some smaller assumed savings relating to a reduced net number of elected representatives.
In terms of our later testing of the claims of KPMG (2016), it is clear that we cannot yet test the 20-year horizon. An asserted 3-year payback period, however, certainly ought to be evident in improved technical efficiency. Readers will recall that economists generally conceptualise efficiency in terms of the conversion of inputs (staff and money) into the various disaggregated outputs (more on this in the next section). Accordingly, in Table 1, we group the claims made in terms of the technical efficiency implications, although in our later analysis we also test the specific accounting categories articulated in the matters of opinion.
Summary of Projected Costs and Savings for 2016 New South Wales Local Council Amalgamations, According to KPMG (2016).
Source. Adapted from KPMG (2016, p. 6).
The KPMG (2016) document was notable for its lack of robust empirical analysis to substantiate the assumptions that it presented (Drew et al., 2021). The evidence presented by KPMG was assumed efficiencies (that is, opinions about what was useful or harmful to a local government’s size-dependent production function) rather than observations of a thing that had happened or existed.
Was KPMG an appropriate witness for this kind of evidence? KPMG, as one of the world’s Big Four accounting firms is highly experienced in accounting, auditing, and advisory services (the latter accounts for a little less than a third of the Australian arm’s revenues). Yet, the names, qualifications, and experience of the actual people performing the work for the organisation cannot be assessed because they were not disclosed in the documentation. What we can explore, though, is their breadth and depth of knowledge relating to the complex field of local government boundary reform. One way that we might expect to be able to do so is to examine the relevant scholarly literature cited by KPMG in support of the matters of opinion. However, there was no scholarly citation in the work conducted on behalf of the state government (although the authors did cite previous KPMG opinion). This was somewhat surprising given the copious scholarly literature produced over many decades (e.g. Galizzi et al. 2023; Gendzwill et al. 2021; McQuestin et al. 2017; Tavares 2018). This literature at the international level had confirmed that the ‘overall reaping of significant economies of scale is unlikely’ (Tavares 2018, p. 12). Even more important – given the scholars’ contention that ‘results [from abroad] cannot be generalised without caution’ (Blom-Hansen et al., 2016, p. 829) – was an extant paper by Drew et al. (2021) that detailed matters of fact relating to large increases to staff expenditures and overall operating costs subsequent to a similar forced amalgamation programme in the neighbouring state of Queensland less than a decade earlier. Notably, the KPMG report did not cite this analysis. Such apparent neglect to inform assumptions with reference to past matters of fact (testimonies of the ‘ancients’) appears to cast some doubt on the appropriateness of the KPMG evidence.
It is also likely that the KPMG witness failed to properly appreciate the field of plausible hypotheses that ought to have been subscribed to the NSW forced amalgamation proposal. The word ‘efficiency’ is used nine times and ‘scale’ specifically mentioned three times in the eight pages of text constituting Outline of Financial Modelling Assumptions for Local Government Merger Proposals (KPMG, 2016). Yet there is not a single mention of other plausible hypotheses – such as constant returns to scale or diseconomies of scale – in the document under reference (KPMG, 2016). Indeed, none of the documents KPMG produced specifically mention the possibility of no changes to overall efficiency or reductions to efficiency.
Indeed, one must also wonder whether KPMG could be deemed to be a detached witness with respect to its matters of opinion (in an Aristotelian sense). The accounting firm was paid $AUD400,000 for the work and completed the tasks whilst embedded 3 in the NSW Office of Premier and Cabinet (Lands and Environment Court, 2016). Moreover, the NSW state government is a major client of KPMG, having provided it with ‘almost $137 million in public sector work . . .since 2018’ (Tadros, 2021). There is certainly room for observers to wonder whether a close and profitable association with the NSW state government may have influenced KPMG’s testimony.
In sum, it is by no means clear that KPMG had sufficient experience (that is, referred to the testimonies of the ancients), recognised the full array of plausible hypotheses, or could be regarded as being suitably detached from the pleader of the amalgamation case. Aristotle (2012) would thus have likely deemed the accounting firm to have been an inappropriate witness for evidence of matters of opinion. Using inappropriate witnesses for matters of opinion might be expected to yield poor outcomes. To see whether this prediction is reasonable we now need to establish some matters of fact. According to Aristotle’s criteria 4 , scholars are ideal witnesses for this kind of evidence. In the next section we outline our method for doing so.
Empirical Methodology to Assess Matters of Fact
In this section we set out our methodology to assess – as a matter of fact – the opinion provided ex ante by KPMG (2016, p. 2) that amalgamation would result in ‘efficiencies [being] generated.’ Our main unit of analysis is the efficiency assessment (efficiency score). We first examine the 18 local governments that we compare pre- and post- merger. To gain further insights into the significance of these outcomes we then make some later trend comparisons to the 26 similar ‘escapee’ local governments over the same period, resulting in a total of 440 observations (44 local governments across 10 years).
To establish matters of fact, we depart from the common scholarly practice that regresses estimates of unit costs to determine whether there is a statistically significant difference between the counterfactual trend and actual post amalgamation performance (see, for example, Blom-Hansen et al., 2016; Drew et al., 2021; Tavares, 2018). This common practice assumes that ‘cost per unit can serve as a reasonable proxy for efficiency,’ but many scholars note that more precise methods exist for measuring efficiency claims that form the focus of post-amalgamation assessments (Blom-Hansen et al., 2016, p. 828; Doumpos & Cohen, 2014; Oukil et al., 2022).
This is particularly important for a context such as NSW Australia where local governments have a constrained remit, relative to their international peers, focussed on services to property. To be precise, the single largest area of expenditure for Australian local governments is roads (Drew, 2022). Other important areas of responsibility include domestic waste, economic development, and recreation facilities. Furthermore, these services tend to be tailored to the different types of property categories specified in the enabling legislation – residential, business, and farms. Typically, residential properties have sealed roads, domestic waste services, and sometimes street lighting and footpaths. Business properties certainly have all of the aforementioned, as well as trade waste services, street-scaping, public conveniences, and economic development (such as tourist information centres). In contrast, farms are usually only serviced by unsealed (dirt) roads, and receive no waste services. In addition, local governments in rural areas generally are charged with providing potable water and sewer services but these only exist for residential and business assessments. Rural local governments are also typically obliged to provide services on behalf of state and federal governments, and to correct for market failure (often operating child-care, post offices, airstrips, and even grocery stores and medical centres).
The starkly different services focussed on various types of properties means that overall unit costs would be a poor basis for proxying efficiency in Australia. Using expenditure per property as a unit cost effectively implies that the different types of properties receive similar expenditures – which is certainly not the case in NSW. It also assumes that the single largest item of expenditure – roads – has a positive correlation with the number of properties. However, the opposite is, in fact, accurate (r = −.0358). Both of these assumptions are particularly problematic if comparing disparate local governments, especially if the proportion of various outputs changed over time.
For this reason, envelopment analysis is a superior method of assessing the veracity of ex ante opinions around efficiency. There are three main ways of conceiving of Free Disposal Hull (FDH) analysis, the specific type of envelopment analysis that we use in this work. On a conceptual level it provides an empirical evaluation of the success of local governments in converting combinations of inputs into various outputs (which readers will recall is the economic definition of technical efficiency):
Inputs (Staff + operating expenditure) → Outputs (Number of residential properties + number of business properties + number of farm properties + length of sealed roads + length of unsealed roads)
Inputs and outputs were all logged to mitigate the effect of potential outliers (called log envelopment analysis). Furthermore, an indicator variable was employed to ensure that rural councils were not compared with urban ones, responding to the differences in service remits.
Thus, FDH replaces the single input (money), single output (people or, in Australia, property) regression unit cost, with multiple inputs that better reflect actual factors of production as well as multiple output proxies that better reflect the different types of goods and services actually provided at various times.
A second way of conceptualising FDH is to look at matters graphically. Figure 3 shows the basic idea of FDH. First, a step-wise frontier (or ‘envelope’) is created by identifying the local governments that have the best conversion of inputs into outputs in a step-wise frontier (local governments A through F). These are all assigned an efficiency score of 1 (perfectly efficient in a relative sense). All local governments in the interior of the envelope (such as G and H) are then assigned scores by projecting the FDH frontier following a path of input minimisation (see arrows from LGA G and H in Figure 3).

Graphic conceptualisation of Full Disposal Hull (FDH) model.
The third way of conceptualising FDH is in terms of the mathematics of its linear programming:
The main way that FDH differs from better known envelopment techniques [such as Data Envelopment Analysis (DEA)] is through the introduction of the Boolean constraint (0 or 1 above). Notably, the FDH linear programme confers ‘benefit of the doubt’ to envelopment assessments of efficiency. What this means, in simple terms, is that any local government that has optimal conversion in at least one variable will be assigned a position on the frontier (an efficiency score of one; Leleu, 2006). This flexibility of FDH is particularly important to the amalgamated local governments that may have altered their production processes (relative combination of inputs) in response to new structures. Indeed, providing a benefit of the doubt through FDH is also consistent with the rules of evidence that form the framework of our novel approach to policy evaluation.
There is a large literature describing envelopment analysis, and we refer readers to seminal works such as Cooper et al. (2007) and Coelli et al. (2005), as well as Lim et al. (2016) for further information.
The FDH work yields a single efficiency score for each local government for each year under analysis, and we examine the period 5 years before and 5 years after the amalgamation. Notably, 5 years is considerably longer than the ‘3 year payback period when merger benefits will exceed merger costs’ (NSW Government, 2015, p. ii) – a statement that clearly implies significant expected efficiencies early on to mitigate the large costs KPMG acknowledged (2016; see Table 1).
In total, we analyse data on 44 local governments across 10 years (440 total observations), comparing pre- and post- amalgamation efficiency scores for the 18 amalgamated local governments and then comparing these to the trends of the 26 escapees over the same period. To allow for comparisons for the merged entities it was necessary to first aggregate constituent pre-merger local governments so that they corresponded to the post-merger configurations. For the inputs – staff and operating expenditure – we simply summed the individual elements from the audited financial statement data of the respective constituent local governments. For example, the staff expenditure for Cootamundra-Gundagai Council in 2012 was taken as the staff expenditure at Cootamundra in 2012 added to the staff expenditure for Gundagai in 2012. When conducted for each merged entity, this resulted in inputs which accurately reflected the money spent for each year by local governments prior to amalgamation for the geographical area subsequently subsumed by the amalgamated entity. For the outputs – the respective number of residential, business and farm properties, as well as the length of sealed and unsealed roads respectively – we also summed the elements from our data sources. For example, the length of sealed roads for Cootamundra-Gundagai Council in 2012 was taken as the length of sealed roads in Cootamundra in 2012 added to the length of sealed roads in Gundagai in 2012. When repeated for each output for each merged entity, this reflected the road outputs for the geographical area subsequently subsumed by the amalgamated local government. Notably our aggregation procedure was consistent with the extant envelope work in the literature (Drew et al., 2015; McQuestin et al., 2017). Moreover, it highlights another advantage of envelopment analysis over regression: no weightings were necessary to combine overall unit costs of constituent entities.
As the last section detailed, the analysis also compared amalgamation outcomes to trends in the escapee cohort to understand the full significance of our findings. It might be argued that a comparison could also be made to the entire cohort of non-amalgamated councils, by way of assurance. However, such a contention neglects the fact that we know – for the reasons outlined in the earlier section – that the entire cohort has greater and avoidable bias. Comparing a known less biased result to a more biased result has little if anyvalue for assurance. Furthermore, extant work on NSW (Drew et al., 2021) has already conducted analysis at the entire cohort level.
In Table 2, we present details of the variables employed in the FDH. Appendix A contains a difference-in-difference regression of the FDH as a check and confirmation of our findings discussed below
Definitions and Measures of Central Tendency for Key Variables, 2012 to 2021 Inclusive.
Source. Authors’ creation based on audited local financial statements.
Note. ‘Escapees’ refer to 26 local governments originally designated for forced amalgamation but later ‘escaped’ this fate when the incoming Premier announced that all outstanding local government amalgamations would be indefinitely postponed. ‘Amalgamated’ refers to 18 local governments that were amalgamated in 2016. Means in cells and standard deviations in parentheses.
In sum, our empirical strategy is designed to provide measures of efficiency that have actually happened, which are matters of fact according to the rules of evidence (cf. Brandes, 1961). Establishing – beyond a reasonable doubt – matters of fact is important so that we can more fully understand the consequences of neglecting the rules of evidence in the earlier policymaking phase. This, in turn, is essential to an appropriate appreciation of the importance of rules of evidence and accountability.
Matters of Fact Results and Discussion
Matters of fact are indisputable data that are independent of the pleader and amenable to observation and measurement (Brandes, 1961). In this section we contrast our measured facts to the matters of opinion expressed by proponents prior to the amalgamation event, especially the claim that it would result in significant efficiencies (KPMG, 2016).
As we noted earlier, most of the opined efficiencies were assumed to arise from considerable savings to staff expenditure (‘3.7% to 7.4%’), materials and contracts costs (‘3%’), and other expenses (not specified, but said to be ‘small’; KPMG, 2016).
Therefore, a reasonable starting point for our evaluation of these matters of opinion is to construct a summary of the relevant costs. In Table 3 we present data for each of the cost categories, along with the results of our ANOVA tests for statistical significance:
Comparisons of Nominal Costs (not indexed) Pre and Post Amalgamation for Control and Amalgamated Cohorts.
Source. Authors’ creation.
Note. ‘Escapees’ refer to 26 local governments that were originally designated for forced amalgamation but later ‘escaped’ this fate when the incoming Premier announced that all outstanding local government amalgamations would be indefinitely postponed. ‘Amalgamated’ refer to 18 local governments that were amalgamated in 2016. Means in cells and standard deviations in parentheses. For ‘materials and contracts’ as well as ‘other expense’ the probabilities are from Kruskal-Wallis tests because the data did not meet the assumptions required for ANOVA.
p < .05. **p < .01.
As will be noted, there was no statistically significant difference between the control and amalgamated staff expenditures prior to amalgamation. A statistically significant difference did emerge at the 5% level after amalgamation, however. Moreover, the gap between the respective cohort means for staff costs expanded from 7.22% (pre-amalgamation) to 10.88% (post amalgamation). This suggests that for most amalgamated local governments staff costs expanded at a faster rate after 2016 than it did for the escapee control group. Given that staff expenditure represents around 49% of operational costs (that is, excluding accruals), this unexpected result is likely to have important implications for relative efficiency.
Materials and contract costs represent a more modest proportion of typical local government budgets (around 35% of operational expenditures). As detailed in Table 3, statistically significant differences existed between the two cohorts both before and after amalgamation (with larger costs for amalgamated). Moreover, the gap between respective cohort means narrowed somewhat: from 19.54% (pre-amalgamation) to 18.46% (post-amalgamation). What this suggests is a relatively slower rate of expansion in materials and contract expenditure for the amalgamated cohort after 2016. This result is in the same direction as the KPMG matters of opinion presented earlier, albeit at a lower level.
Other expenses are an accounting category that includes items such as fire, waste, and environmental levies (imposed by state government), as well as donations and expenditures for collaborative arrangements. This is a relatively small portion of operational expenditure (less than 17% on average). Moreover, there were no statistically significant differences between the cohorts either before or after the 2016 amalgamations. The gap in the difference between cohort means for this area of expenditure narrowed from 2.6% to 2.2%, indicating a relatively slower rate of cost expansion for the amalgamated cohort after 2016. Once more, this result was consistent with the matters of opinion presented by KPMG (2016).
In sum, facts attest that staff expenditures rose on average following the amalgamations, contrary to the opinions presented by earlier witnesses. Moreover, our indisputable observations – derived from audited financial statements – show that actual savings in materials and contracts were lower than opined. To determine the overall effect of these contrariwise changes to expenditures with respect to efficiency, we conducted robust FDH modelling. The results appear in Table 4 and Figure 4. FDH is necessary to properly assess efficiency claims because (i) the expenditure data only refer to one side of the efficiency definition of economists, and (ii) they do not adjust for significant increases to output proxies characteristic of Australia’s rapid growth in global terms (Pew Research Centre, 2022).
Average Full Disposal Hull (FDH) efficiency intertemporal scores (Amalgamation Year is 2016).
Source. Authors’ creation.
Note. ‘Escapees’ refer to 26 local governments originally designated for forced amalgamation but later ‘escaped’ this fate when the incoming Premier announced that all outstanding local government amalgamations would be indefinitely postponed. ‘Amalgamated’ refers to 18 local governments that were amalgamated in 2016.

Average Full Disposal Hull (FDH) intertemporal scores 2012 to 2021 inclusive.
Comparing the FDH efficiency scores for the 18 amalgamated local governments (light grey line in Figure 4), it seems clear that typical efficiency reduced somewhat following 2016, contrary to the matters of opinion. Indeed, there was a decrease to efficiency immediately beforehand that is consistent with the literature about the disruptive effects of amalgamation as well as the accelerated spending that it often induces (Drew, 2022; McQuestin et al., 2017). To try to grasp the full importance of our observations of efficiency, it is also useful to consider what was happening to the escapees over the same time (dark grey line Figure 4). This cohort also reduced in efficiency over time, but did not suffer as badly in 2016 and has been performing better in recent years. When combined with the results of the statistical tests (see Table 3), this suggests that the outcomes for some amalgamated local governments were very poor indeed.
As a robustness check we also conducted a difference-in-difference (DiD) regression of the efficiency scores, comparing pre- and post- efficiency scores for amalgamated local governments both with respect to the trend established by the escapees and controlling for a range of demographic variables that known to affect local government performance in Australia (see Appendix A). Essentially, DiD offers a more precise mathematical evaluation of what readers can already deduce from the two earlier tables and Figure 4. A key assumption of DiD is that parallel trends occurred before the treatment in 2016 (as one can see visually in Figure 4), but we also failed to reject the null hypothesis of equality in pre-treatment effect (Mora & Reggio, 2012). DiD then tests differences in outcomes relative to an unobserved counterfactual trend derived from the actual performance of the control group (Reingewertz, 2012). The DiD variable was not statistically significant, suggesting no change from what might have been expected had the amalgamation not proceeded (p = .652; see Table A1). 5 Otherwise stated, the DiD confirmed what we observed in Table 4 and Figure 4: no enduring improvement to relative efficiency subsequent to the 2016 amalgamations, as the consultants expected.
It should be borne in mind that as expected the benefit of the doubt attributes of FDH worked in favour of the amalgamated cohort, with notably more of this smaller cohort placed on the efficient frontier over the 10-year panel (26 amalgamated compared to 18 escapees). Thus, even under a most generous evaluation it is clear that the matters of opinion evidence tendered prior to amalgamation is at odds with the observable facts that we have assembled. Moreover, this failure to provide reliable evidence prior to amalgamation has resulted in some injustices to the communities concerned. In the section that follows we detail some of these, along with our proposed remedies to help ensure that accountability might be improved in this and other areas of public policymaking.
Conclusion and Recommendations Regarding Evidence for Amalgamation
We conclude by responding to three main questions raised by our novel rules of evidence-based evaluation of local government amalgamation: (i) the article’s contribution to the literature, (ii) the implications for accountability, and (iii) the generalisability of our approach.
First, we comment on the importance of a rules of evidence approach to evaluating public policy where recommendations of policy architects often seem to be at odds with the corpus of scholarly literature (not only of amalgamation, but more broadly).
We have already remarked on the general dismay amalgamation scholars expressed about the imprudent optimism of consultants in stark contrast to the literature (Drew, 2022; Tavares, 2018). Indeed, it seems clear that the ‘conclusions [of scholars] have not yet been disseminated and absorbed by policymakers’ (Gendzwill et al., 2021, p. 50). Understanding theories around rules of evidence explains why this disconnect persists. Although evidence used for policymaking in large part takes the form of matters of opinion (especially problematic if combined with inappropriate witnesses), evidence employed in evaluation studies is usually matters of fact.
As we have made clear, for matters of opinion to be considered reliable evidence, they must be provided by witnesses with appropriate experience, understanding of plausible hypotheses, a knowledge of the testimony of the ‘ancients’, and appropriate detachment (Aristotle, 2012; Brandes 1961; Simpson & Seldon, 2003). It is our contention that were the rules of evidence around matters of opinion better understood and observed, prior to 2016 then things might have ended very differently. For example, had KPMG been in possession of demonstrable and appropriate experience, understood the potential for diseconomies of scale, referred to the testimony of scholars, and not been embedded in the office of their employer), then their recommendations would have been different. Otherwise stated, were we to follow the rules of evidence there are good reasons for believing that the ‘unbridled optimism we often see in consultancy and government reports’ (Tavares, 2018, p. 12) would be tempered considerably.
The matters of fact that we have presented make plain the consequences for communities when public policymakers – and their employers – eschew the rules of evidence used in most courtrooms. Indeed, our FDH analysis and other work employing a unique control group, with benefit of the doubt imputed into our formulas, highlights important evidence that we believe would be required in any potential courtroom deliberations. We were led to this technique – somewhat at odds with the usual methods scholars employ – precisely in response to the rules of evidence around establishing matters of fact (Aristotle, 2012; Brandes, 1961).
It is reasonable to assert based on our argument here that failure to observe the rules of evidence ex ante produced unjust outcomes ex post. Indeed, going beyond this analysis, events that have subsequently occurred. In the most recent financial year, at the time of writing, six of the eight councils that received permission to increase taxes by amounts greater than the regulatory cap (or peg) of 2.0% were amalgamated entities. This included increases of 36.34% at Canterbury-Bankstown, 53.5% at Cootamundra-Gundagai, and 32.6% at Georges River (IPART, 2021). Moreover, residents at Central Coast saw an increase of 15% to their taxes and had their local government placed into administration after it ran out of unrestricted cash (Drew, 2022). These additional matters of fact further underline the importance of faithfully following the rules around matters of opinion. Moreover, it is not surprising that many residents of amalgamated councils feel aggrieved by events and have campaigned for de-amalgamation (Pezet, 2021).
This leads to the second question our work posed: the conditions under which consultants and their employers should be held accountable. As Drew (2022) has argued elsewhere, principles of justice require that a remedy be available when one party causes damage to another. This is particularly the case where negligence can be proved, defined as:
An action in tort law, the elements of which are: the existence of a duty to care; breach of that duty; and material damage as a consequence of the breach of the duty. A duty of care is a legal obligation to avoid causing avoidable harm, and arises where harm is foreseeable if due care is not taken (Butt, 2004).
Our adoption of the rules of evidence to assess the public policy in the case study here points to one possible avenue of accountability by establishing foreseeability of the harm, breach of duty, and the materiality of the damage. There are thus some good reasons to expect that legal action by aggrieved parties may be forthcoming. Were a case of this kind brought forward successfully, then a high level of accountability would be established.
At the same time, Drew et al. (2022) in previous work have proposed an alternate means of accountability that would likely prove both effective and further respond to scholarly dismay expressed by Tavares (2018) and Gendzwill et al. (2021), among others. To increase accountability, clients (such as the NSW state government) would simply need to write into contracts for an independent public inquiry to be held after an appropriate time and for financial penalties to apply in the case of demonstrated negligence. Potential loss of both reputation and money would likely encourage commercial consulting firms to more faithfully follow the rules relating to matters of opinion, especially those around considering plausible hypotheses and consulting the testimony of the ancients.
Both of these remedies suggested by our rules of evidence approach to public policy evaluation in turn direct us to a third kind of question the generalisability of the analysis. We are conscious of Blom-Hansen et al’s. (2016, p. 829) astute observation that ‘results [from abroad] cannot be generalised without caution.’ However, the rules of evidence framing that we have adopted in this work, allows for some broader generalisations. Any time that there is potential for public policymaking to bring about foreseeable harm that is material clearly would benefit from a careful adoption of the rules surrounding matters of opinion and associated approaches to improving accountability. Consultants, government decision makers, and the wider public all would benefit if consideration of future public policy initiatives followed the rules required in most courtrooms around the globe (especially the dictates of experience, considering all potential outcomes, referring to analyses and outcomes of other similar interventions, and ensuring witnesses were appropriately detached). Whether the policy is amalgamation or pandemic responses, one can see how outcomes would benefit considerably from adopting the rules of evidence lens that we championed in this paper.
In sum, if one wants to mitigate controversy, operationalise the general consensus of the scholarly literature (Fleming & Rhodes, 2018), avoid future expressions of scholarly dismay (Tavares, 2018), and guard against future injustices (Drew et al., 2021), then clearly adopting the rules of evidence would be a very important step forward.
Footnotes
Appendix A
List of Additional Variables Employed in the Confirmatory Difference in Difference Regression, 2012 to 2021.
| Variable | Description | Escapees i.e. control group | Amalgamated |
|---|---|---|---|
| Population (ln) | Resident population of local government area. | 10.572 (1.134) | 10.696 (1.377) |
| Median Income (ln) | Median employee income. | 10.822 (0.183) | 10.677 (0.159) |
| Under 15 | Percentage of population under 15 years of age. | 17.962 (2.573) | 18.549 (1.765) |
| Aged (ln) | Percentage of population in receipt of aged pension. | 8.078 (1.067) | 8.574 (1.260) |
| ATSI (ln) | Percentage of population identifying as indigenous. | 0.121 (1.368) | 0.925 (0.942) |
| NESB (ln) | Percentage of population that speak a language other than English at home. | 2.285 (1.253) | 1.897 (1.191) |
| Disability (ln) | Percentage of people on a disability pension. | 6.757 (1.159) | 7.325 (1.224) |
| Unemployed (ln) | Percentage of people receiving Newstart allowance. | 6.534 (1.197) | 7.122 (1.352) |
| Carers (ln) | Percentage of people receiving a Carer’s pension. | 5.380 (1.267) | 6.123 (1.368) |
| Population Density (ln) | Population density. | 10.572 (1.134) | 10.696 (1.377) |
| Grants (ln) | Federal grants. | 14.826 (0.821) | 15.788 (0.477) |
Source. Authors’ creation.
Note. Standard deviations in parentheses.
Author’s Note
Joseph Drew is now affiliated to University of New England, Australia.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data are available upon reasonable request from the authors.
Notes
Author Biographies
.
