Effects and side effects of Flemish school inspection

Abstract

Despite the increased importance of school inspection in recent years, the current knowledge base does not provide a clear view on the effects and side effects of being inspected. More evidence is needed in more diverse educational contexts. This article responds to this need with a quantitative study on the effects and side effects of school inspections on Flemish schools. In total 2624 respondents from 130 primary and secondary schools participated in the study. The article discusses the conceptual, instrumental, symbolic and strategic effects of inspection and its impact on self-efficacy and collective efficacy. Regarding side effects, the emotional impact is discussed alongside disturbing effects, misleading activities by schools and individual teachers and the impact on staff members’ personal lives. Furthermore the article distinguishes between the responses of staff members from schools with a positive inspection judgement and those with a less favourable judgement, and between the responses of teachers and of members of the school management teams. This study is embedded in the Flemish educational context. The results of our study are put in perspective next to results of studies in other educational contexts. Several particularities of the Flemish educational context (and how they may have affected our results) are discussed.

Keywords

Inspection effects side effects primary schools secondary schools quantitative study

Introduction

In an international perspective, the importance of school inspection has intensified during the last decennium due to an increased demand for school accountability. In most countries, inspections have been established to control the quality of education provided by schools, but also to contribute to school improvement (OECD, 2013). The question as to what the effects of inspection are, and whether school inspections effectively contribute to school improvement has been subject of several studies (Chapman, 2001; Chapman and Earley, 2010; Ehren and Visscher, 2006). Initially, effectiveness research was concentrated in the UK in the years after the establishment of the inspection agency OFSTED in 1992 (Learmonth, 2000). However, until about 2005 the effects of inspections on school improvement had hardly been investigated outside the UK (Ehren and Visscher, 2008).

More recently and in line with the increased role of school inspection in recent years, there has been renewed academic interest in the effects of inspections in different educational systems, e.g. Germany (Dedering and Müller, 2011), Ireland (McNamara and O’Hara, 2006) and the Netherlands (Ehren et al., 2013b). These studies investigated the effects of inspection on schools and whether inspection does respond to its school improvement function, but often also examined the occurrence of unintended effects of inspection. Indeed, the effects of inspections are unarguably accompanied by undesirable side effects, such as ‘intended strategic behaviour’ (e.g. window-dressing activities), ‘unintended strategic behaviour’ (e.g. disturbing effects on normal school life) and ‘other side effects’ (e.g. increased stress) (De Wolf and Janssens, 2007: 383). Notwithstanding the increased research efforts, it remains unclear what the effects and side effects of inspection are for the inspected schools, and whether or not the inspection drives schools to further improvement. Different studies found incoherent or even contradictory results: whereas some studies concluded that inspections are helpful to most schools (e.g. Cuckle et al., 1998; McCrone et al., 2007), other studies have concluded that the benefit for schools is generally rather small (e.g. Chapman, 2002; Wilcox and Gray, 1996). Moreover, in order to clarify the driving forces for school improvement, more research on the effects and side effects of inspection in diverse educational contexts is warranted (de Wolf and Janssens, 2007; OECD, 2013).

Also in Flanders (Belgium), the research evidence regarding this topic is currently rather limited. In our view, the Flemish educational system provides an interesting context for this kind of research, as inspection is the only accountability measure for schools towards the national authorities. This study aims to fill the current gap in the knowledge base on the (side) effects of the Flemish inspection system and thereby to contribute to the international evidence base on this topic. Moreover it distinguishes between schools with different inspection judgements, as it has been argued that the inspection judgement determines the extent to which effects on schools occur (McCrone et al., 2007; Ouston and Davies, 1998). We also compare the responses of teachers and of members of the school management team. Earlier studies pointed out that the reported effects and side effects may depend on staff members’ position in the school (Brunsden et al., 2006; Chapman, 2002). It has yet to be investigated whether or not the above-mentioned findings can be confirmed in the Flemish educational context.

We set out on a profound investigation of the effects and side effects in Flemish primary and secondary schools guided by three research questions:

What are the effects and side effects of inspections on schools?

What is the impact of the inspection judgement on the effects and side effects of the inspection?

What are the differences between teachers and the school management teams regarding their perception of the effects and side effects of the inspection?

Research context

In this section we provide a thumbnail sketch of the particularities of the Flemish educational accountability context. Flemish schools enjoy a relatively large degree of autonomy when it comes to setting up processes with regard to their traditions and educational methods on how to achieve the ‘attainment targets’ imposed by the government (OECD, 2013; Van Bruggen, 2010). In the absence of central examinations, these inspections are the only accountability measure that exists (Standaert, 2001). The lack of reliable output data in terms of students’ learning performances constrains the inspectors to adopt a process-oriented approach (Van Bruggen, 2010).

Compared to other countries, Flemish schools are not often inspected, i.e. (at least) once every ten years. The inspection gives schools a ‘positive’, ‘restricted positive’ or ‘negative’ judgement. A ‘positive’ judgement means that a school is considered to have the competencies and preparedness to continue working in an optimal manner and that no follow-up needs to be scheduled. The judgement ‘restricted positive’ denotes a second inspection is required three years after the initial inspection to determine whether or not identified shortfalls have been adequately addressed. Schools that show structural deficiencies are given a ‘negative’ judgement, which comes down to the school being obliged to set up an improvement plan and to have its progress monitored by an external agency. During the school year 2012–2013, the judgements ‘positive’, ‘restricted positive’ and ‘negative’ were given to respectively 60.6%, 36.1% and 3.3% of the inspected primary and secondary schools (Onderwijsinspectie, 2013).

As it is highly unlikely that schools will be closed down, or that staff members lose their job as an immediate result of inspection, the Flemish inspection system is generally regarded as a relatively ‘low-stake’-inspection compared to other educational contexts (Van Bruggen, 2010). Low-stake inspection is considered to foster school improvement and to reduce undesirable side effects (Gärtner et al., 2009; Martin, 2005; Yeung, 2012).

Inspectors in the Flemish educational context do not have the legal right to give advice to schools on how they can improve their current practices. Inspectors have to analyse and report on the school’s strengths and weaknesses, but need to refrain from any kind of recommendation to the school on how they might address identified weaknesses. The legislation makes a strict distinction between inspection (for control) and school counselling (for advice). The Inspectorate’s operating assumption to pursue its development-oriented function is that the objective analysis of the schools’ own strengths and weaknesses provided by the inspectors will serve as an impetus for the schools to secure the strengths and address the identified weaknesses (Vanotterdijk, 2008).

Conceptual framework

This section provides an overview of effects and side effects discussed in the present study. An overview and a definition of each effect and side effect are included in Table 1. Rossi et al. (1999) distinguished between the conceptual, instrumental and symbolic effects of interventions in schools. A fourth effect type, strategic effects, was added by Visscher (2002). Based on the assumptions made by several scholars (e.g. Matthews and Smith, 1995; McCrone et al., 2007), a fifth effect type can be added, namely the effect on the feelings of efficacy within the school, in which we distinguish between the effect on staff members’ self-efficacy and the effect on the school’s collective efficacy. The beliefs staff members hold about their capabilities to face the challenges of today’s education reality strongly influence students’ motivation and learning outcomes (Thoonen et al., 2011; Tschannen-Moran and Hoy, 2001). The collective efficacy is a key aspect of the school culture and may contribute to explain the effect of teachers on student achievement (Goddard and Goddard, 2001).

Table 1.

Conceptual framework.

	Definition	Exemplary overview of findings from earlier research
Conceptual effect	The extent to which the inspection ‘…influences the thinking of decision-makers (and practitioners) and as such may have an impact on their actions’ (Visscher, 2002: 58, text between brackets added by the authors).	The empirical evidence shows that schools most often do not gain new insights into their own functioning as a result of the inspection (Chapman, 2002; McCrone et al., 2007; Wilcox and Gray, 1996), but that indirect conceptual effects may occur, such as an increase in reflection and collegial consultation. Nevertheless, research evidence is still largely inconsistent (Brimblecombe et al., 1996; Chapman, 2002; Dedering and Müller, 2011).
Instrumental effect	The extent to which ‘… the decision-maker (and practitioner) bases decisions and actions’ on the inspection announcement or inspection result (Visscher, 2002: 58, text between brackets added by the authors).	A great deal of the research base is focused on instrumental effects. The inspection may lead schools to take actions upon recommendations (Dimmer and Metiuk, 1998; Ferguson et al., 1999; Lowe, 1998) or to change their policies (Dedering and Müller, 2011; Ehren et al., 2013b), but the effect on teaching practices seems to be rather small (Brimblecombe et al., 1996; Case et al., 2000; Chapman, 2002).
Symbolic effect	The extent to which the inspection ‘… is used to legitimise an opinion already held’ (Visscher, 2002: 58), when the inspection supports viewpoints of certain staff members.	Studies found that the inspection may be used deliberately to add an authority and legitimacy to the personal agendas of the principal (Ehren and Visscher, 2008; Hosker and Robb, 1998; Wilcox and Gray, 1996), but also teachers have adopted the strategy to use the inspection to achieve personal goals (Kelchtermans, 2007).
Strategic effect	The extent to which the inspection is used by the school for accountability purposes, e.g. towards parents or other external stakeholders.	To our knowledge, strategic effects from inspection have not yet been empirically documented.
Effect on self-efficacy	The impact of inspection on self-efficacy, defined for teachers as ‘… the beliefs teachers hold that they can positively influence student learning’ (Klassen et al., 2009: 67) and for other school staff members similarly as ‘… the beliefs school staff members hold that they can positively influence the outcomes of the school’.	Some of the evidence suggests that the effect on the feelings of efficacy depends largely on the inspection judgement: a positive inspection may result in an increase in self-efficacy (McCrone et al., 2007), while a reduced self-efficacy may result from a negative judgement (Perryman, 2009). Case et al. (2000), however, found a decreased feeling of self-efficacy amongst teachers, regardless of their evaluation by the inspectors.
Effect on collective efficacy	The impact of inspection on a school’s ‘… shared belief in its conjoint capabilities to organize and execute the courses of action required to produce given levels of attainments’ (Bandura, 1997: 477).	To our knowledge, the effects from inspection on the collective efficacy of schools have not yet been empirically documented.
Effect on stress	The impact of the inspection on the experience of stress before and during the inspection.	The increase in stress before and during the inspection has been well documented, albeit mostly in the UK context (Brunsden et al., 2006; Kogan and Maden, 1999; Scanlon, 1999).
Effect on anxiety	The impact of inspection on the experience of anxiety before and during the inspection.	An increase in anxiety was observed in the inspection context of the UK (Brunsden et al., 2006; Case et al., 2000), and in Ireland (McNamara and O’Hara, 2006). In German schools a smaller impact on the anxiety experienced by staff was reported (Gärtner et al., 2009), according to the authors, because of the lower stake status of German inspection.
Effect on enthusiasm	The impact of the inspection on the extent to which the teacher enjoys teaching (or for other staff members: the extent to which one enjoys carrying out their daily jobs) (based on Kunter et al., 2011) before and during the inspection.	Some studies investigated the effect on enthusiasm after the inspection, but to our knowledge only one study documented a reduction in professional enthusiasm before the inspection (Chapman, 2002).
Effect on tiredness	The impact of inspection on the extent to which school staff members feel tired before and during the inspection.	An increased tiredness among school staff during the inspection has been documented (Case et al., 2000), but the evidence is still rather limited.
Effect on conflicts	The extent to which conflicts in the school team are affected before and during the inspection.	Two studies described an increased number of conflicts and tensions between staff members due to the inspection (Gray and Gardner, 1999; Nicolaidou and Ainscow, 2005).
Impact on personal life	The impact inspection has on the personal lives of staff members.	Several studies concluded that inspection has a negative impact on the personal life of school staff (Jeffrey and Woods, 1998; Perryman, 2006). Woods and Jeffrey (1998) found an effect so severe they called the inspection the ‘colonization of life’ (Woods and Jeffrey, 1998: 561).
Misleading activities (school)	The extra preparation and intentional adaptations by the school before and during the inspection. These activities are often set up to provide a better image of the school or even to mislead the inspectors; only seldom do they really contribute to school improvement (Ehren et al., 2013a).	‘Window dressing’ or ‘playing the game’ has been documented by several studies in the UK context (Brimblecombe and Ormston, 1995; Perryman, 2009; Plowright, 2007), but also evidence has been collected for the fabrication of documentation and for sudden changes to the physical outlook of classrooms (Fitz-Gibbon and Stephenson-Forster, 1999; Perryman, 2009). In contrast, in the German study by Gärtner et al. (2009), only a limited number of misleading activities were reported.
Misleading activities (teacher)	The extra preparation and intentional adaptations by individual teachers before and during the observed lessons in order to present a more favourable image.	According to some studies, teachers prepare their lessons more carefully and they plan more ‘steering’ activities in order to ensure they keep control of the classroom (Brimblecombe et al., 1996; Case et al., 2000; Perryman, 2009). Other studies have concluded that the effect on teachers’ lessons is fairly limited (Chapman, 2001; Wilcox and Gray, 1996).
Disturbing effects	Distraction from normal school life and normal school development because of the notification of, or conduct of the inspection.	Studies pointed out that schools postpone or even omit their own priorities in favour of preparing for the inspection (Kogan and Maden, 1999; McCrone et al., 2007). Stoll and Fink (1996) reported schools devoted all their time to the inspection and used the term ‘development paralysis’ to describe the situation in the school for many months (Stoll and Fink, 1996: 57).

This study investigates four kinds of side effects: (a) emotional side effects before and during the inspection – i.e. an increase in stress, anxiety, tiredness, conflict, and a decrease in enthusiasm; (b) the impact on staff members’ personal lives; (c) engaging in misleading activities; and (d) disturbing effects on normal school life. The distinction between ‘effects’ and ‘side effects’ is not strict. Several effects (e.g. symbolic effects or the effect on self-efficacy and collective efficacy) may turn out to be negative, and thus be an undesirable side effect, while several side effects could have a positive outcome (e.g. when misleading behaviour leads to actual and lasting improvement).

Table 1 provides an overview of each of these (side) effects alongside evidence from other educational contexts. The present study focuses on the question whether or not these effects are generated in schools in the Flemish educational context.

Next to providing descriptive evidence, this study aims to find out to what extent differences exist between schools that received a different inspection judgement. As mentioned in the Introduction, stronger positive effects are expected in schools with a less favourable inspection judgement (McCrone et al., 2007; Ouston and Davies, 1998). Furthermore, earlier studies have shown that principals and members of the management team have a more positive stance towards the effects of inspection on the school compared to teachers (Chapman, 2002; Matthews and Sammons, 2004; Scanlon, 1999). Other studies suggested that the side effects also depend at least partially on the position of members of staff in the school, for instance that the emotional burden before and during the inspection is larger for teachers compared to non-teaching colleagues (Brimblecombe et al., 1996; Brunsden et al., 2006; Wilcox and Gray, 1996).

Research method and sample

This article reports on an online survey study of schools’ perceptions of the (side) effects of inspection. The study sample included every Flemish primary and secondary school that was inspected during a predefined period in the school year 2012–2013. The survey was intended to reach all the staff members in a teaching or managing position in these schools. Schools received the questionnaire eight weeks after the inspection or, in a few cases, where the inspection report had arrived late, three weeks after acknowledgement of the inspection report.

In total data from 130 schools (54.3% of the total number of schools inspected during the predefined period) were retained for this study, totalling 2624 respondents. We surveyed both primary and secondary schools. The results showed that 77.7% of the participating schools and 60.3% of the respondents were from primary education (the difference was due to the larger school size in secondary education). Of the respondents, 82.8% were teachers, the remaining 18.2% were either principals or members of the management team. In total, 78.7% of those taking part in the survey were women and 21.3% were men. These figures indicate a good representation with regard to the target population: 71.3% of the Flemish schools are primary schools and 28.7% of the schools in Flanders provide secondary education. Of all staff members in a management or teaching position, 73.6% are women and 26.4% are men (Vlaamse Overheid, 2013). Overall figures about the positions of staff members in schools are unavailable.

A total of 56.4% of the participants reported their school had received a ‘positive’ judgement, whereas 39.6% received a ‘restricted positive’ judgement and 4.0% of the responses were missing on this item. No schools with a ‘negative’ judgement participated in the study.

The various concepts in the theoretical framework were operationalised using 5-point-Likert scales, except for the emotional side effects (as we will discuss later in this section). Principal axis factoring (with Oblimin rotation) revealed that each of the concept scales were unique factors. Table 2 includes an example item for each scale, in addition to information about the psychometric characteristics of the scale in question. This table demonstrates that these scales can be trusted for use in the analyses. Cronbach’s alpha are ‘moderate’ (above 0.78) or ‘high’ (above 0.90) (Murphy and Davidshofer, 1988).

Table 2.

Psychometric characteristics of the scales.

Scale	Nr of items	Cronbach’s alpha
Conceptual effects	7	0.89
The inspection made me reflect about our school policy.
Instrumental effects	5	0.81
The inspection has led to concrete actions for improvement.
Symbolic effects	4	0.78
The school leadership uses the inspection to impose earlier planned changes.
Strategic effects	3	0.90
The school (plans to) use the inspection report to create a more favourable image of the school.
Impact on self-efficacy	4	0.92
The inspection contributed to my awareness that I am doing well in my job.
Impact on collective efficacy	4	0.90
The inspection contributed to the school team’s awareness that we are doing well as a school.
Impact on personal life	6	0.92
The inspection has put pressure on my relationship with my direct family members.
Misleading activities (school)	8	0.84
In our school documents were drafted to create a more favourable image of the school.
Misleading activities (teacher)	4	0.87
During the lesson which was observed by the inspectors, I gave certain pupils more opportunities to respond.
Disturbing effects	5	0.90
Because of the inspection we paid less attention to the pupils.

Measures for the emotional side effects (stress, anxiety, enthusiasm, tiredness and conflict) were obtained in an alternative way. In order to measure the increase or decrease in the emotional experience due to the inspection, we compared the extent to which the experienced emotion was experienced right before and during the inspection, with the extent to which the same emotion is experienced at a regular time without inspection. The respondents were asked to what extent they had experienced anxiety two weeks prior to the inspection (t1), during the inspection (t2) and two weeks after the inspection (t3) using 5-point-Likert scales. These estimates were compared to an estimate of anxiety during a regular day in the school (without inspection) (t0) using a t-test. Cohen’s d is used as a measure to determine the impact size of eventual differences, with |d| < 0.20 considered an effect with a ‘negligible’ impact size, |d| < 0.50 ‘small’, |d| < 0.80 ‘medium’ and |d| > 0.80 a major impact size (Cohen, 1988).

We also used t-tests to find out whether the reported (side) effects depend on the inspection judgement and on the respondents’ position in the school (research question 2 and 3).

Results

In this section we first discuss the results of each of the effects and side effects elaborated in the conceptual framework (research question 1). Next we provide figures on the effects in relation to the inspection judgement (research question 2) and on differences between teachers’ responses and the school management teams’ responses (research question 3).

Effects and side effects of school inspection

Table 3 gives an overview of the effects of inspection in our sample (with exception of emotional side effects, see Table 4).

Table 3.

Inspection effects and side effects.

	N	Mean	Std. Deviation
Conceptual effect	2508	3.36	0.79
Instrumental effect	2366	3.43	0.71
Symbolic effect	2049	3.02	0.90
Strategic effect	2027	3.00	1.16
Impact on self-efficacy	2347	3.36	1.02
Impact on collective efficacy	2533	3.49	0.91
Impact on personal life	2521	2.79	1.23
Misleading activities (school)	2473	2.06	0.75
Misleading activities (teacher)	1357	1.49	0.62
Disturbing effects	2415	2.30	1.03

Note: Answer categories: 1 = entirely disagree; 2 = disagree; 3 = neither agree nor disagree; 4 = agree; 5 = entirely agree.

Table 4.

Inspection effects on staff’s emotions before and during the inspection.

		Mean	Std. Deviation	T	p	Cohen’s d
Effect on stress (n = 2515)
t0	Regular	2.21	0.93
t1	Before inspection	3.29	1.16	–36.25	(***)	–1.03
t2	During inspection	3.39	1.17	–39.78	(***)	–1.13
t3	After inspection	2.31	1.17	–3.33	(***)	–0.09
Effect on anxiety (n = 2524)
t0	Regular	1.21	0.54
t1	Before inspection	1.88	1.07	–28.45	(***)	–0.84
t2	During inspection	2.09	1.20	–33.56	(***)	–1.01
t3	After inspection	1.43	0.84	–11.27	(***)	–0.32
Effect on enthusiasm (n = 2508)
t0	Regular	4.13	0.66
t1	Before inspection	3.54	1.01	24.43	(***)	0.71
t2	During inspection	3.36	1.10	30.08	(***)	0.87
t3	After inspection	3.74	1.06	15.75	(***)	0.46
Effect on conflicts (n = 2516)
t0	Regular	1.73	0.80
t1	Before inspection	1.88	0.96	–5.84	(***)	0.16
t2	During inspection	1.77	0.97	–1.45	ns
t3	After inspection	1.85	1.02	–4.57	(***)	0.13
Effect on tiredness (n = 2520)
t0	Regular	2.32	0.92
t1	Before inspection	2.77	1.14	–15.52	(***)	0.44
t2	During inspection	2.82	1.16	–16.83	(***)	0.48
t3	After inspection	2.72	1.16	–13.63	(***)	0.39

Note: Answer categories: 1 = no (stress); 2 = minor (stress); 3 = some (stress); 4 = considerable (stress); 5 = major (stress).

Note: (***) significant at p < 0.001-level.

Intended effects

On average, schools report moderate conceptual effects (M = 3.36 on a scale from 1 to 5) as well as moderate instrumental effects (M = 3.43). Conceptual effects rather concern an increase in ‘reflection about the qualities of the school’ than an increased ‘knowledge about the strengths and weaknesses of the school’. An analysis at the item level reveals that the lowest scoring items (M = 2.96 and 3.28, respectively) are ‘the inspection gave me a better idea about my professional weaknesses’ and ‘the inspection gave me a better idea about the school’s weaknesses’, while the items that obtained the highest scores (both M = 3.51) are ‘the inspection made me reflect about my current practices’ and ‘the inspection fostered the willingness to take a critical stance’.

With regard to the instrumental effects, a larger effect is reported on individual classroom practices compared to the school policy level: the respondents rather agree that the team has made changes in terms of its professional approach (M = 3.39), while there is less agreement that the school has made changes to the school policy (M = 3.06). Respondents report that certain things are handled in a different manner (M = 3.58), and that concrete actions for improvement are taken or planned as a response to the inspection (M = 3.85), but they are unsure whether or not in the end pupils will benefit from these actions (M = 3.22).

The mean scores for symbolic and strategic effects (M = 3.02 and 3.00, respectively) show that these are not common effects, but the relatively high standard deviation for strategic effects (SD = 1.16) indicates large differences between respondents.

There is generally a moderate positive impact on the individual self-efficacy of the respondents, and a slightly stronger positive impact on the collective efficacy (M = 3.36 and 3.49, respectively). Also for these effects, the standard deviations indicate considerable differences between respondents.

Side effects

The mean score of 2.79 in combination with the high standard deviation (SD =1.23) for the scale ‘impact on personal life’ means that for a large group of respondents, the inspection has no or only a very limited personal impact, but that for a considerable minority the inspection strongly affected their personal life: indeed, 34.5% of the respondents have a scale score between 1 and 2, while 22.0% have a scale score of 4 or higher. Furthermore, the low scores obtained for both scales regarding ‘misleading activities’ clearly indicate that, on average, neither schools in general, nor individual teachers do engage in activities intended to present a false picture to the inspectors. Finally, disturbing effects are rather uncommon in the Flemish educational context, although also here the standard deviation indicates rather strong differences between respondents (SD = 1.03).

Table 4 shows that the inspection leads to a very strong increase in stress before and during the inspection (|d| = 1.03 and 1.13, respectively). Nevertheless, two weeks after the inspection, the amount of stress has dropped to a negligible extent (|d| = 0.09) (Cohen, 1988). There is also a strong increase in anxiety before and during the inspection (|d| = 0.84 and 1.01, respectively), while after the inspection some of the increased anxiety still remains (|d| = 0.32). The inspection also leads to a strong decrease in professional enthusiasm two weeks prior to, and during, the inspection. Two weeks after the inspection this decrease is still considerable (|d| = 0.71, 0.87 and 0.46, respectively). The same applies to the increase in tiredness before, during and after the inspection, although each of the impact sizes is smaller than for enthusiasm (|d| = 0.44, 0.48 and 0.39, respectively). On average, no substantial increase or decrease in conflict between staff members was reported.

Impact of the inspection judgement on the effects of inspection

Each of the aforementioned effects is statistically significantly influenced by the inspection judgement (see Table 5). The conceptual effect (the extent to which the inspection influences the thinking of staff members) and the instrumental effect (the extent to which the inspection results in decision-making and actions) are affected by the judgement in an opposite direction: a positive judgement leads to more conceptual effects, while a less favourable judgement spurs instrumental effects. However, the impact of the judgement on these effects can be considered ‘negligible’ (Cohen’s |d| = 0.11 and 0.19, respectively). The judgement has a ‘small’ impact (|d| = 0.29) on symbolic effects: in schools with a ‘restricted positive’ judgement, more symbolic effects are reported, albeit still to a small extent (M = 3.18).

Table 5.

Effects for different inspection judgements.

	‘Positive’ judgement			‘Restricted positive’
	N	Mean	SD	N	Mean	SD	t	p-value	Cohen’s d
Conceptual effect	1395	3.40	0.78	982	3.31	0.79	2.73	(**)	0.11
Instrumental effect	1299	3.38	0.70	923	3.51	0.71	–4.45	(***)	–0.19
Symbolic effect	1053	2.92	0.92	798	3.18	0.86	–6.61	(***)	–0.29
Strategic effect	1106	3.28	1.11	750	2.57	1.10	13.56	(***)	0.64
Impact on self-efficacy	1328	3.61	0.92	896	3.00	1.05	14.50	(***)	0.62
Impact on collective efficacy	1441	3.88	0.71	1008	2.93	0.87	29.61	(***)	1.20
Stress after inspection	1475	2.09	1.05	1062	2.63	1.25	–11.89	(***)	–0.47
Anxiety after inspection	1477	1.33	0.72	1055	1.58	0.97	–7.46	(***)	–0.29
Enthusiasm after inspection	1470	3.98	0.88	1054	3.39	1.19	14.23	(***)	0.57
Conflicts after inspection	1472	1.65	0.88	1055	2.14	1.15	–12.16	(***)	–0.48
Tiredness after inspection	1477	2.64	1.13	1054	2.84	1.20	–4.16	(***)	–0.17

Notes: (**) significant at p < 0.01-level; (***) significant at p < 0.001-level.

In schools with a positive inspection judgement, more strategic effects are reported (impact size ‘medium’, |d| = 0.64). There is also a ‘medium’ impact of the inspection judgement on the feelings of self-efficacy (|d| = 0.62) and even a ‘large’ impact on the collective efficacy (|d| = 1.20). When the inspection judgement is ‘positive’, there is a substantial positive impact on the self-efficacy of individual staff members (M = 3.61) and even a strong positive impact on the collective efficacy (M = 3.88). In schools where the inspection judgement is ‘restricted positive’, these effects are neutral (M = 3.00 and 2.93, respectively).

As the inspection judgement is given at the end of the inspection week, it does not make sense to investigate the impact of the inspection judgement on side effects that occur before or during the inspection (misleading behaviour, impact on personal life, disturbing effect and emotional effects before and during the inspection). Two weeks after the inspection, each of the emotional side effects is stronger in schools with a less favourable inspection judgement, albeit only to a medium (enthusiasm, |d| = 0.57), small (stress, anxiety and conflicts, |d| = 0.47, 0.29 and 0.48, respectively) or negligible extent (tiredness, |d| = 0.17).

Differences between teachers and management teams regarding their perception of (side) effects

Table 6 provides an overview of differences between the responses of teachers and members of the management team (including principals). A number of statistically significant differences arose from the data on the effects of inspection, i.e. differences regarding conceptual effects, instrumental effects and the impact on self-efficacy (estimated higher by members of the management team) and regarding strategic effects (estimated higher by teachers). However, Cohen’s d measure for impact size revealed that these differences were either negligible (|d|<0.20) or small (|d|<0.50) (Cohen, 1988). The same applies to the side effects of the inspection: only the impact of the inspection on the professional enthusiasm prior to and during the inspection reveals appreciable differences between teachers and the school management teams: teachers report slightly lower professional enthusiasm in both periods (|d| = 0.22 and 0.20, respectively). No figures about misleading activities at teacher level are included in Table 6 as this side effect was only surveyed with regard to teachers.

Table 6.

Effects for teachers versus management teams.

	Teachers			Management team
	N	Mean	SD	n	mean	SD	t	p-value	Cohen’s d
Conceptual effect	1996	3.34	0.79	409	3.47	0.77	–3.13	(**)	–0.17
Instrumental effect	1844	3.41	0.71	403	3.56	0.71	–4.01	(***)	–0.22
Symbolic effect	1520	3.02	0.90	348	3.09	0.90	–1.29	ns
Strategic effect	1517	3.03	1.17	361	2.86	1.09	2.50	(*)	0.14
Impact on self-efficacy	1858	3.33	1.03	392	3.51	0.95	–3.24	(***)	–0.19
Impact on collective efficacy	2055	3.47	0.91	423	3.53	0.91	–1.20	ns
Impact on personal life	2015	2.81	1.24	422	2.68	1.22	2.00	(*)	0.11
Misleading activities (school)	1994	2.08	0.75	416	1.99	0.73	–1.50	(*)	0.13
Disturbing effects	1913	2.29	1.01	390	2.33	1.06	0.09	ns
Stress before inspection	2114	3.30	1.16	438	3.22	1.17	1.31	ns
Stress during inspection	2087	3.41	1.16	436	3.28	1.19	2.17	ns
Stress after inspection	2107	2.28	1.15	440	2.47	1.22	–3.11	(*)	0.11
Anxiety before inspection	2112	1.91	1.08	439	1.79	1.04	2.20	(*)	0.11
Anxiety during inspection	2094	2.12	1.21	438	1.94	1.14	2.97	(**)	0.15
Anxiety after inspection	2104	1.43	0.84	438	1.45	0.86	–0.32	ns
Enthusiasm before inspection	2100	3.50	1.02	440	3.71	0.95	–4.05	(***)	–0.22
Enthusiasm during inspection	2079	3.32	1.11	437	3.53	1.07	–3.82	(***)	–0.20
Enthusiasm after inspection	2093	3.73	1.07	441	3.77	1.07	–0.83	ns
Conflicts before inspection	2103	1.86	0.96	434	1.99	0.96	–2.60	(**)	–0.14
Conflicts during inspection	2095	1.76	0.97	429	1.83	0.95	–1.32	ns
Conflicts after inspection	2107	1.83	1.02	426	1.99	1.05	–2.81	(**)	–0.15
Tiredness before inspection	2107	2.78	1.12	436	2.76	1.20	0.21	ns
Tiredness during inspection	2091	2.83	1.15	434	2.77	1.24	0.90	ns
Tiredness after inspection	2105	2.72	1.15	436	2.77	1.25	–0.77	ns

Notes: (*) significant at p < 0.05-level; (**) significant at p < 0.01-level; (***) significant at p < 0.001-level.

Discussion

This study contributes to the existing evidence base on the (side) effects of inspection by identifying insights into the occurrence of these effects in the Flemish educational context. The above evidence expands the current knowledge base and adds several nuances to it and, in some cases, it contradicts earlier findings.

Regarding the first research question, we found that the inspection has, on average, only moderate conceptual and instrumental effects on schools, and small symbolic and strategic effects. There are moderate positive effects on the feelings of self-efficacy and slightly stronger positive effects on collective efficacy within the inspected schools. Moderate conceptual effects (e.g. Chapman, 2002; Ouston et al., 1997; Wilcox and Gray, 1995, 1996) and moderate instrumental effects (e.g. Lowe, 1998; Ouston et al., 1997; Wilcox and Gray, 1996) were also reported by earlier studies, but other studies found a larger instrumental effect of inspections on schools (e.g. Cuckle et al., 1998; McCrone et al., 2007). However, the ambiguity of the results in earlier studies (Chapman and Earley, 2010; Ehren and Visscher, 2006) and the lack of clear conceptualization observed in most studies in the field of the effects of inspections (Ehren and Visscher, 2006) restrict the extent to which our results can be compared to earlier studies.

Even though the Flemish inspection operates at school level and does not make judgements about individual classroom practices, we found that classroom-related instrumental effects at teacher level are stronger than instrumental effects at school policy level. This finding contradicts earlier findings from studies in other contexts (Case et al., 2000; Chapman, 2002). Possibly, it is the strong process-oriented approach adopted by the Flemish inspectors that implies that classroom practices come under the inspection spotlight and are consequently affected by the inspection. It is self-evident that this hypothesis should be the subject of further research.

Related to the conceptual effects, the inspection increases reflection about the school’s qualities, but it does not directly result in new insights into the schools’ strengths and weaknesses (in line with the results of some qualitative studies which also reported increased reflection and consultation amongst staff (Chapman, 2002; Hosker and Robb, 1998; Kelchtermans, 2007). Our data do not support the operating assumption of policy-makers and the Inspectorate in the Flemish educational context, i.e. that the inspection will lead to (new) information about weaknesses in the school, which in turn will be a major lever for schools to engage in change (Vanotterdijk, 2008). The results endorse the idea of Macbeath (2006), Matthews and Sammons (2004) and Woods and Jeffrey (1998) that inspection needs to go hand–in-hand with constructive advice to ensure profound conceptual and instrumental effects. Nevertheless a close eye needs to be kept on the accountability purpose of inspection when inspectors cross the line between the ‘watchdog’ and ‘guidedog’ roles (Macnab, 2004; Martin, 2005; Ouston and Davies, 1998).

We found generally a strong increase in stress and anxiety, and – to a smaller extent – in tiredness, as well as a decreased level of professional enthusiasm before and during the inspection. Therefore, our results are in line with earlier results with regard to the emotional impact of the inspection prior to, during, and after the inspection (Brimblecombe and Ormston, 1995; Brunsden et al., 2006; Ferguson et al., 1999). However, the present study adds several nuances, such as the extent to which levels of anxiety and tiredness, and particularly professional enthusiasm remain affected after the inspection. On average hardly any effect was reported on conflicts in the school teams. Other side effects such as misleading activities were on average rather limited, in contrast with the evidence collected in other educational contexts (Brimblecombe et al., 1996; Case et al., 2000; Perryman, 2009). In addition, earlier evidence regarding disturbing effects (e.g. Kogan and Maden, 1999; McCrone et al., 2007) cannot be confirmed in our study, as we did not find evidence of a substantial impact. Jeffrey and Woods’ (1998) and Perryman’s (2006) finding that inspection impacts on the personal lives of staff members in schools, are confirmed for some staff members, but were not reported by others. The lack of coherence of some of our results with earlier research regarding the side effects of inspection may be explained by the rather low-stakes context of Flemish inspections, compared to the high-stakes UK context in which most studies have been conducted (Van Bruggen, 2010). However, our results are not supportive towards several scholars’ assumption that lower stakes inspection approaches lead to a smaller emotional impact on school staff (Gärtner et al., 2009; Martin, 2005; Yeung, 2012). Two particularities of the Flemish educational context are worth discussing in this regard. First the long period between inspections adds to the lack of experience of staff members with external evaluation. This lack of experience does not only bring uncertainty about what is to be expected – fear of the unknown is a major source of stress and anxiety (Brimblecombe and Ormston, 1995; Wilcox and Gray, 1996) – but also artificially increases (the perception of) the stakes for schools. In a similar vein, the fact that inspection is the sole accountability measure for Flemish schools may increase the emotional side effects because schools lack reliable benchmarking information in order to judge their own quality. This may raise uncertainty about the school’s output performance in relation to the expected standards, which is unarguably associated with increased stress and anxiety (Brimblecombe and Ormston, 1995; Sandbrook, 1996). A final possible explanation derives from the finding that only limited disturbing effects were reported, which means that the preparation for the inspection is added to the regular workload. The increased workload may also partially explain the increase in stress and the decrease in professional enthusiasm, as earlier studies have shown (Case et al., 2000; Chapman, 2002; Perryman, 2009).

Based on earlier studies, we expected that an unfavourable inspection judgement would lead to stronger effects (McCrone et al., 2007; Ouston and Davies, 1998). However, this hypothesis cannot be generalized to every effect. We therefore stress the importance of conceptually distinguishing in future research at least between the conceptual and the instrumental effects of inspections. Although differences were small, we found larger conceptual effects in schools with a positive inspection report, while there is more action (instrumental effect) on the inspection results in schools given a ‘restricted positive’ judgement. The stronger instrumental reaction in those schools is possibly explained by the imposed need to act. The finding that there is more positive impact on self-efficacy and collective efficacy in schools with a positive judgement, supports the findings of Perryman (2009) and McCrone et al. (2007), but contradicts the conclusion of Case et al. (2000). The latter study found a decrease in the feeling of self-efficacy, regardless of the inspection judgement. Furthermore, respondents from schools with an unfavourable inspection judgement reported more severe emotional effects after the inspection, but differences with respondents from other schools were, by and large, rather limited.

We found that conceptual effects are smaller, while both the symbolic and instrumental effects are larger in schools that received a ‘restricted positive’ judgement. These findings lead to the assumption that the weaknesses identified by the inspectors strongly reinforce what either the management team or some of the teachers already knew, but in order to address these deficiencies, they needed the inspection’s judgement to convince others in the school to take action. We presume that improvement in schools with an unfavourable inspection judgement is generally not directly influenced by the inspection, but rather indirectly by the legitimacy it adds to positions that had been taken earlier by the principal and/or other staff members in the school.

We found only small differences in the perception of the effects of inspection between teachers and members of the management team. The finding in other educational contexts that teachers report less effects of inspection (Chapman, 2002; Matthews and Sammons, 2004; Scanlon, 1999) could therefore not be confirmed in our study. We found that some of the side effects are slightly more severe for teachers than for members of the management team, while earlier findings in other educational contexts indicate larger differences (Brimblecombe et al., 1996; Brunsden et al., 2006; Wilcox and Gray, 1996).

We have refrained in this study from drawing conclusions regarding causal effects. It is methodologically challenging to measure the precise effect of inspection, as inspection becomes quickly entangled with other external and internal impulses on the schools (De Wolf and Janssens, 2007; Matthews and Sammons, 2004). Only seldom is an effect of inspection the result of one single school or inspection feature (Ehren and Visscher, 2008). While our quantitative approach served the goal of describing the effects and side effects, a more qualitative approach may be required to identify the processes in the school that lead to certain outcomes.

The results provided by this study contribute to the discussion on how inspection can be organized in a way that spurs school improvement with a minimal amount of side effects for schools. We found that the operating policy rhetoric in Flemish education that ‘providing schools with an analysis of strengths and weaknesses will lead to school improvement’ is without foundation, and that the Flemish inspection – although called low-stakes – still has strong emotional side effects.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Bandura

(1997) Self-efficacy: The Exercise of Control. New York: Freeman.

Brimblecombe

Ormston

(1995) Teachers’ perceptions of school inspection: A stressful experience. Cambridge Journal of Education 25: 53–61.

Brimblecombe

Shaw

Ormston

(1996) Teachers’ intention to change practice as a result of OFSTED school inspections. Educational Management, Administration and Leadership 24: 339–354.

Brunsden

Davies

Shevlin

(2006) Anxiety and stress in educational professionals in relation to Ofsted. Education Today 56: 24–31.

Case

Catling

(2000) Please show you’re working; a critical assessment of the impact of Ofsted inspection on primary teachers. British Journal of Sociology of Education 21: 605–621.

Chapman

(2001) Changing classrooms through inspections. School Leadership and Management 21: 59–73.

Chapman

(2002) Ofsted and school improvement: Teachers’ perceptions of the inspection process in schools facing challenging circumstances. School Leadership & Management 22: 257–272.

Chapman

Earley

(2010) School inspection/external school evaluation. In: Peterson

Baker

McGaw

(eds) International Encyclopedia of Education, 3rd ed. Oxford: Elsevier, pp.719–725.

Cohen

(1988) Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Earlbaum Associates.

10.

Cuckle

Hodgson

Broadhead

(1998) Investigating the relationship between OfSTED inspections and school development planning. School Leadership & Management 18: 271–283.

11.

De Wolf

Janssens

FJG

(2007) Effects and side effects of inspections and accountability in education: An overview of empirical studies. Oxford Review of Education 33: 379–396.

12.

Dedering

Müller

(2011) School improvement through inspections? First empirical insights from Germany. Journal of Educational Change 12: 301–322.

13.

Dimmer

Metiuk

(1998) The use and impact of OFSTED in a primary school. In: Earley

(ed) School Improvement after Inspection? School and LEA Responses. London: Paul Chapman Publishing, pp.51–61.

14.

Ehren

MCM

Altrichter

McNamara

. (2013a) Impact of school inspections on improvement of schools—describing assumptions on causal mechanisms in six European countries. Educational Assessment Evaluation and Accountability 25: 3–43.

15.

Ehren

MCM

Perryman

Shackleton

(2013b) Setting expectations for good education; how Dutch school inspections drive improvement. ORD conference. Brussels, Belgium.

16.

Ehren

MCM

Visscher

(2006) Towards a theory on the impact of school inspections. British Journal of Educational Studies 54: 51–72.

17.

Ehren

MCM

Visscher

(2008) The relationship between school inspections, school characteristics and school improvement. British Journal of Educational Studies 56: 205–227.

18.

Ferguson

Earley

Ouston

. (1999) New heads, OFSTED inspections and the prospects for school improvement. Educational Research 41: 241–249.

19.

Fitz-Gibbon

Stephenson-Forster

(1999) Is Ofsted helpful? An evaluation using social science criteria. In: Cullingford

(ed) An Inspector Calls. Ofsted and its Effect on School Standards. London: Kogan Page, pp.97–118.

20.

Gärtner

Hüsemann

Pant

(2009) Wirkungen von Schulinspektion aus Sicht betroffener Schulleitungen. Die Brandenburger Schulleiterbefragung. Empirische Pädagogik 23: 1–18.

21.

Goddard

(2001). A multilevel analysis of the relationship between teacher and collective efficacy in urban Schools. Teaching and Teacher Education 17(7): 807–818.

22.

Gray

Gardner

(1999) The impact of school inspections. Oxford Review of Education 25: 455–468.

23.

Hosker

Robb

(1998) Raising standards and raising morale: A case study of change. In: Earley

(ed) School Improvement after Inspection? School and LEA Responses. London: Paul Chapman Publishing, pp.86–96.

24.

Jeffrey

Woods

(1998) Testing Teachers: The Effect of School Inspection on Primary Teachers. London: Falmer Press.

25.

Kelchtermans

(2007) Macropolitics caught up in micropolitics: The case of the policy on quality control in Flanders (Belgium). Journal of Educational Policy 22: 471–491.

26.

Klassen

Bong

Usher

. (2009) Exploring the validity of a teachers’ self-efficacy scale in five countries. Contemporary Educational Psychology 34: 67–76.

27.

Kogan

Maden

(1999) An evaluation of evaluators: The Ofsted system of school inspection. In: Cullingford

(ed) An Inspector Calls: Ofsted and its Effect on School Standards. London: Kogan Page, pp.9–32.

28.

Kunter

Frenzel

Nagy

. (2011) Teacher enthusiasm: Dimensionality and context specificity. Contemporary Educational Psychology 36: 289–301.

29.

Learmonth

(2000) Inspection. What’s in it for Schools? London/New York: Routledge Falmer.

30.

Lowe

(1998) Inspection and change in the classroom: Rhetoric and reality? In: Earley

(ed) School Improvement After Inspection? School and LEA Responses. London: Paul Chapman Publishing, pp.97–109.

31.

Macbeath

(2006) School Inspection and Self-Evaluation. Working with the New Relationship. New York/London: Routledge.

32.

Macnab

(2004) Hearts, minds and external supervision of schools: Direction and development. Educational Review 56: 53–64.

33.

Martin

(2005) Evaluation, inspection and the improvement agenda contrasting fortunes in an era of evidence-based policy-making. Evaluation 11: 496–504.

34.

Matthews

Sammons

(2004) Improvement through Inspection. An Evaluation of the Impact of Ofsted’s Work. London: Ofsted.

35.

Matthews

Smith

(1995) OfSTED: Inspecting schools and improvement through inspection. Cambridge Journal of Education 25: 23–34.

36.

McCrone

Rudd

Blenkinsop

. (2007) Evaluation of the Impact of Section 5 Inspections. Slough: National Foundation for Educational Research.

37.

McNamara

O’Hara

(2006) Workable compromise or pointless exercise? School-based evaluation in the Irish context. Educational Management, Administration and Leadership 34: 564–582.

38.

Murphy

Davidshofer

(1988) Psychological Testing: Principles and Applications. Englewood Cliffs, New Jersey: Prentice-Hall.

39.

Nicolaidou

Ainscow

(2005) Understanding failing schools: Perspectives from the inside. School Effectiveness and School Improvement 16: 229–248.

40.

OECD (2013) Synergies for better learning. An international perspective on evaluation and assessment. OECD Reviews of Evaluation and Assessment in Education. Paris: OECD.

41.

Onderwijsinspectie. (2013) Onderwijsspiegel 2013 [Education Mirror]. Brussel: Onderwijsinspectie / Vlaams Ministerie van Onderwijs en Vorming, 114.

42.

Ouston

Davies

(1998) OFSTED and afterwards? Schools’ responses to inspection. In: Earley

(ed) School Improvement after Inspection? School and LEA Responses. London: Paul Chapman Publishing, pp.13–24.

43.

Perryman

(2006) Panoptic performativity and school inspection regimes: Disciplinary mechanisms and life under special measures. Journal of Education Policy 21: 147–161.

44.

Perryman

(2009) Inspection and the fabrication of professional and performative processes. Journal of Education Policy 24: 611–631.

45.

Plowright

(2007) Self-evaluation and Ofsted inspection. Developing an integrative model of school improvement. Educational Management Administration & Leadership 35: 373–393.

46.

Rossi

Lipsey

Freeman

HE.

(1999) Evaluation: A Systematic Approach. Thousand Oaks, CA: Sage Publications, Inc.

47.

Sandbrook

(1996) Making Sense of Primary Inspection. Buckingham, UK/Bristol, USA: Open University Press.

48.

Scanlon

(1999) The Impact of Ofsted Inspections. Slough, UK: National Foundation for Educational Research.

49.

Standaert

(2001) Inspectorates of Education in Europe. A Critical Analysis. Leuven, Belgium: Acco.

50.

Stoll

Fink

(1996) Changing our Schools. Buckingham, UK: Open University Press.

51.

Thoonen

EEJ

Sleegers

PJC

Peetsma

TTD

Oort

(2011) Can teachers motivate students to learn? Educational Studies 37(3), 345–360.

52.

Tschannen-Moran

Hoy

(2001) Teacher efficacy: capturing an elusive construct. Teaching and Teacher Education 17(7), 783–805.

53.

Van Bruggen

(2010) Inspectorates of Education in Europe; some Comparative Remarks about their Tasks and Work. Brussels: SICI.

54.

Vanotterdijk

(2008) (Gedifferentieerd) doorlichten: leren dansen op een slappe koord [Inspection (differentiated approach): learning to show one’s paces]. Tijdschrift voor Onderwijsrecht en Onderwijsbeleid 2007-08: 291–314.

55.

Visscher

(2002) A framework for studying school performance feedback systems. In: Visscher

Coe

(eds) School Improvement through Performance Feedback. Lisse, The Netherlands: Swets & Zeitlinger.

56.

Vlaamse Overheid (2013). Vlaams Onderwijs in Cijfers 2012–2013 [Flemish Education in Figures 2012-2013]. Available at: http://www.ond.vlaanderen.be/onderwijsstatistieken

57.

Wilcox

Gray

(1995) Reactions to inspection: A study of three variants. In: Gray

Wilcox

(eds) Good School, Bad School: Evaluating Performance and Encouraging improvement. Buckingham, UK/Philadelphia, USA: Open University Press, pp.149–166.

58.

Wilcox

Gray

(1996) Inspecting Schools: Holding Schools to Account and Helping Schools to Improve. Buckingham, UK: Open University Press.

59.

Woods

Jeffrey

(1998) Choosing positions: living the contradictions of OFSTED. British Journal of Sociology of Education 19: 547–570.

60.

Yeung

SYS

(2012) A school evaluation policy with a dual character: Evaluating the school evaluation policy in Hong Kong from the perspective of curriculum leaders. Educational Management Administration & Leadership 40: 37–68.