Abstract
The research explores the views of teachers about how their teaching is evaluated by others. The tensions between evaluations motivated by the drive to improve practice (school self-evaluation) and evaluation related to external accountability (external evaluation – inspection) are considered, linked to findings and ideas reported in the literature. The study was undertaken using interviews (which included reflection on critical incidents during inspection) and incorporated the use of drawings as a research tool. Much of the data gathering and analysis were undertaken by five Third-Year undergraduate Education Studies students working under the direction and tutorage of the author (Hopkins). The findings validated those reported in the literature about the negative experiences of external evaluation (inspection) and point towards ways in which these might be reduced. The use of drawings alongside semistructured interviews proved to be a particularly powerful means of eliciting teachers’ thinking and feeling. The involvement of undergraduates as co-researchers provided them with a rich and authentic opportunity to gain insights into the professional world of teachers which they were preparing to join.
What does the literature say about evaluating teachers and teaching?
According to Glatterhorn (2008), teacher evaluation can have two levels, the individual and the organisational, and two purposes, improvement and accountability. Varnava (2006) identifies that teacher evaluation usually takes place within a political context which frequently gives rise to tensions between the various participants as to these levels and purposes.
In relation to accountability, Glatterhorn (2008) points to the link with administration decisions relative to individual teachers such as tenure, promotion and contract renewal. Administrators see the main purpose of teacher evaluations as one of accountability in which the main function is to control the quality of educational resources, to ensure teacher quality by removing weak or poor teachers from the system and rewarding outstanding practitioners.
In relation to improvement, Danielson and McGreal (2000) see the final goal for teacher evaluation as being the development of the educational process through programmes of professional development. In general, teachers and their representative institutions, for example, teacher unions and professional associations, see the main purpose of teacher evaluations as being part of professional development. School and teacher improvement through retraining is seen as the key focus of evaluation activity, and the purpose of the evaluation is to make decisions about the appropriate training required (Galton, 2000).
Varnava (2006, p. 3) draws attention to the links between teacher evaluation and the wider debate about the way to promote educational change. For example, the ‘professional approach’ emphasises ‘collegiality’, self-evaluation and critical reflection, while the approach adopted by administrators is the more technical proscribing curriculum content and teaching methods.
The research reported here explores the views of teachers about how they and their teaching is evaluated (at the individual and organisational levels) and about the impact of the tensions of the accountability and improvement drivers. Insights into ways in which these tensions might be reduced are sought.
Self-evaluation and external evaluation (inspection)
Power (1994) notes that audits, of which external evaluation through inspection and self-evaluation are one part, do not passively monitor performance but shape the standards of this performance in crucial ways and are both therefore potentially powerful tools to drive improvement. MacBeath (2004b) and Stanley and Patrick (1998) cited by Whitby (2010) classify quality assurance systems into ‘self-regulating’, ‘externally regulated’ or ‘a mixture of the two’, according to whether the process is regulated by the school themselves, imposed by external agency or is a combination of the two. In terms of methodologies, Wilcox and Gray (1996) point out that external evaluation through inspection has some of the characteristics of positivist styles of evaluation: use of quantitative methods, the quantification of data, explicit criteria and the like. On the other hand, it also draws on some of the practices and assumptions which reflect the interpretative and naturalistic traditions of self-evaluation while not necessarily acknowledging that this is the case.
Self-evaluation is a priority for most economically advanced countries in the world (MacBeath, 2006a). In England, it is seen by government as being a repeated and continuous process, embedded in school culture, and as a highly effective means for a school to consolidate and secure improvement across a full range of its activities and therefore as central to what outstanding schools do.
External evaluation through inspection is part of the increased accountability culture in English schools (Chitty, 2004; Gleeson & Husband, 2001). There has been a clear shift in accountability in teaching since the 1988 Education Reform Act, from teacher professionalism, with accountability to themselves, their colleagues and their students (self-regulation), to accountability to external agencies including the Office for Standards in Education, Children’s Services and Skills (Ofsted), a non-ministerial department of the UK government.
Provision for the inspections of schools by teams of inspectors and direct reports to schools, parents, and government were made in the Education (Schools) Act 1992. While Ofsted inspects all schools, the frequency/time frame is dependent, in the main, on the externally published test/examination results of the school. An inspection is triggered where the results are poor and/or when the trend, over a number of years, is downwards.
Ofsted inspection is a high-stakes process for a school. The outcome of the inspection can have a significant impact on the school’s reputation in the community, the level of external intervention the school is subject to, and further Ofsted scrutiny, including repeat inspection before the usual time frame.
The ERO Review Committee in 2000 (cited by Whitby, 2010) view self-evaluation and external regulation through inspection as being ‘complementary’ activities in quality assurance systems, self-evaluation being essentially formative in nature, while an external inspection can provide both a formative and a summative focus.
Within the Ofsted external evaluation inspection process, self-evaluation has a role to play. The self-evaluation component of the inspection process has had the potential to powerfully influence the behaviour of teachers. In some cases, it has been a dutiful and strategic response to the demand of Ofsted (Plowright, 2007, p. 374). However, MacBeath (2006b) believes that the Self Evaluation Form (SEF) has been instrumental in helping school leaders to think about quality effectiveness and the nature of evidence. In terms of the power relationship between the two ‘complimentary’ processes, MacBeath (2006b) draws attention to the fact that While it may be assumed . . . that the purpose of the new inspection is to validate the school’s own self-evaluation, Ofsted is quick to disabuse people of the notion. While self-evaluation is described as an integral element of the process, inspection will continue to arrive at their own overall assessment of the effectiveness and efficiency of the school . . . there is no pretence that this is an equal relationship. (p. 213)
The tension between self-evaluation and external evaluation through inspection can result in undesirable side effects. For example, there is a documented risk that self-evaluations are written for the inspectors only and no longer serve the goal of improving education (Plowright, 2007), and that this imbalance causes negative perceptions of self-evaluation systems and strategies, particularly among teachers (De Grauwe & Naidoo, 2004).
What has been the impact of external evaluation through inspection by Ofsted?
Impact on school performance
Research into the effects of school inspections presents a mixed picture. Whitby (2010) reports that there is surprisingly little proof of the relationship between inspection and school improvement. Rosenthal (2004) found that there was no gain after an Ofsted visit and that there was a fall in performance in the year of the visit. J. Gray and Wilcox (1995), Earley (1998), Kogan and Maden (1999) all indicate that inspection generally brings about little improvement in the quality of teaching and learning. Rosenthal (2004) even identifies a slight decline in student achievement levels in the year of the inspection visit. Hopkins, Harris, Watling, and Beresford (1999) noted that the occasional character of teacher inspection does not contribute to the improvement of the quality of the education provided. Ouston and Davies (1998) researched 55 schools which had been inspected between 1993 and 1996 and found that the impact of the inspection suggested that a change was inconclusive. Cullingford and Daniels (1999) modelled changes to 426 schools’ GCSE performances over the 4 years in which they were inspected, they concluded that in the year they were inspected, a school’s GCSE results would improve less than in the years they were not.
More positively, Matthews and Sammons (2004) found inspection evidence and trends in education standards measured by national test and examinations showed improved quality, especially across the weakest institutions. Recently, Allen and Burgess (2012) also provide evidence that failing an Ofsted can have a positive impact on subsequent performance and an immediate and real improvement in teaching. Together with a positive impact on pupil performance, their results suggest a quantitatively and statistically significant effect – a 10% improvement in performance 1 year after the inspection, significantly higher 2 years on and remaining at the enhanced level 4 years after the inspection. McCrone, Coghlan, Wade, and Rudd (2009) found that the inspection process was generally perceived by school leaders as a contributing factor to school improvement and an impetus for progress. Inspection was also generally perceived to have achieved a direct positive impact on school improvement in terms of assessment and, to some extent, quality of teaching, and to have contributed to attainment.
Perryman (2010) found that in relation to a school in Special Measures, the Ofsted inspection process was clearly linked to sustained improvement in Teaching and Learning, but only if Ofsted criteria were used to judge the success. Lessons became ‘good’ by following the Ofsted recipe for what is good – that is, the acceptance of the Ofsted discourse. However, Perryman warns that during an inspection, a school can become rehearsed to perform as a ‘good’ school: she refers to schools as being in the ‘gaze’.
In terms of the purpose of Ofsted external evaluation through inspection – is it about improvement or accountability? David Bell, the once Ofsted Chief Inspector, advised caution when suggesting inspections automatically lead to improvements (MacBeath, 2006a). The Children, Schools and Families Committee Report (2010) makes it clear that while Ofsted has a duty to encourage improvement in school, it does not have a remit to be an active participant in the improvement process aside from the occasional monitoring visits to verify progress. This perspective is confirmed in the most recent Ofsted documentation (Ofsted, 2015a).
Unintended negative impacts of external evaluation (inspection)
Personal impact: the emotional dimension
Emotions are important in teaching as they are in all professions in which performance plays such an important part (Goffman, 1959). Day and Leitch (2001) in their research into the effects of increasing accountability on teachers’ emotions found reforms imposed by a series of government policy decisions are continuing to challenge teachers’ ability to continue to provide the high levels of emotional consistency so necessary to good teaching.
A number of studies show that inspections can lead to teacher stress (e.g. Gray & Gardner, 1999; Leeuw, 2002). Stress becomes problematic when it leads to negative emotions. The European Commission (2000) cited by Perryman (2007) define stress as, the emotional, behavioural and physiological reaction to aversive and noxious aspects of work, work environment and work organisations. It is a state characterised by high levels of arousal and distress and often by feelings of not coping. (p. 2)
Earlier, Cole and Walker (1998) found that an important source of stress for teachers is the feeling that they are not in control of the situation in which they have to operate.
Jeffrey and Woods (1996) contend that Ofsted inspections: . . . penetrate to the heart of teachers’ operations and mount a continual surveillance. The teacher’s self is brought under intensive and critical gaze. (p. 326)
Teachers do feel stressed and worried when the inspector sits in the classroom and evaluates them (Varnava, 2006). Perryman (2006) drawing on the work of Ball (2001) gets to the heart of some of the reasons for the stress that inspection, and any form of evaluation, causes: she comments about how performing within a particular discourse may lead to a sense of de-professionalisation as teachers feel they are performing in order to demonstrate their competence.
On a more positive note, Ofsted’s (2007) research into English schools removed from Special Measures indicates that ‘there are some fairly predictable reactions: relief, elation, recognition of success, euphoria, pride and delight at having all their work rewarded’.
Impacts on practice
Vass and Simmonds (2001) report that Ofsted is seen by some as having an ‘extremely negative impact on teachers and the teaching profession’. While many of these are related to the negative emotional impacts referred to above, others are linked to behavioural changes related to classroom practice. MacBeath (2004a) writes of the fact that for a generation of teachers, the prospect of an Ofsted inspection has signalled time to set aside learning and engage in tactical manoeuvres designed simply to impress or disguise. In 2004, the notification of a pending inspection was considerably longer that the current day-before phone call. Previous inspection regimes allowed for up to 3 months or more for senior leaders to obsess about an impending inspection, resulting in an increase in teacher stress levels as they completed additional paperwork perceived as vital to the inspection process. A number of studies show that inspection can lead to ‘window dressing’ and being afraid to innovate because of the fear that this will conflict with the inspection criteria ( Gray & Gardner, 1999; Leeuw, 2002). Park (2013) argues that the current system of external evaluation through Ofsted inspections has proved profoundly toxic, damaging trust between staff, pupils, parents and policy makers and leading to adverse outcomes for students.
The focus, design and findings of the research
Focus
The research reported here was designed to illicit evidence of teachers’ views and realities about the following focus areas which were derived from the review of the literature:
The tensions between evaluations (self-evaluation and inspection) for improvement purposes and for accountability purposes and how these might be reduced;
The impact that external evaluation (impact) has on a school culture of continuous self-improvement and how any negative impacts might be reduced;
The impact that external evaluation has on teachers and their work and how any negative impact might be alleviated.
Design
The research was conducted by the author and five Third-Year undergraduates as part of their final module of a BA Honours Education Studies degree at Bishop Grosseteste University, Lincoln. The undergraduates acted as co-researchers and played a full part in the gathering and interrogation of the research data. Teachers from 25 primary and secondary schools were interviewed using semistructured interviews using key questions and associated probe questions designed to illicit their thoughts and feelings linked to the focus areas. In addition, the teachers were asked to recall a ‘critical incident’ that took place during an inspection they had experienced. They were asked to draw a representation of the incident before or at the same time as talking to the interviewer about their thoughts and feelings about the incident and why it was seen as significant. The researchers used this conversation as a vehicle for deepening the interviewees’ reflections about the focus areas.
An additional dimension of the research methodology was a requirement for the participating teachers to ‘draw an inspector’ and to talk to the interviewer about their drawing.
The inclusion of the ‘critical incident’ reflection and the drawing tasks (of the incident, and of an inspector) was an attempt to gain access to the thoughts and feelings of the teachers at a deep rather than a surface level.
The collection of ‘critical incident’ data through interviewing has its roots in the seminal work of Flanagan (1954) and is considered to be a helpful way ‘to gain an understanding of an incident from the perspective of the individual, taking into account cognitive, affective and behavioural elements (Chell, 2004, p. 48).
The use of drawing as a research tool has been explained in detail by Theron, Mitchell, and Smith (2011) and used to great effect by researchers such as Guillemin (2004) and Literat (2013). Guillemin (2004) argues that drawings offer a rich and useful research method to explore how people make sense of their world.
The interviews were electronically recorded and later transcribed. Content analysis (Patton, 1990) was used as a method to analyse the transcripts. Content analysis is a generic term for a variety of means of textual analysis that involve comparing and categorising a corpus of data (Schwandt, 2001).
In essence, the process was as described by Collins (2001): . . . an iterative process of looking back and forth, developing ideas, and testing them against the data, revising ideas, building a framework, seeing it break under the weight of evidence, and re-building it again. That process was repeated over and over, until everything hung together in a coherent framework of concepts. (p. 11)
The content analysis was informed by thematic analysis as described by Strauss (1987) and by ‘multiple lens’ analysis (McCormack, 2000).
Findings
The power of the methodology
The use of the research tools resulted in extensive and deep conversations between the researchers and teachers about their experiences of inspection. The use of drawing, of the chosen ‘critical incident’ and of the inspector, in particular, proved to be a rich and insightful research method to explore how the teachers made sense of their world. The analysis of the drawn images, complimented by the discussion of these drawings in the context of their production, resulted in a more nuanced depiction of the concepts and emotions in an ‘expressive, empowering personally relevant manner’ (Literat, 2013 on line) and added significantly to the process of moving from transcript to the interpretative story being told by the teachers collectively. Through the process of drawing and associated conversations with the researchers, the participants provided a greater articulation and understanding of their experiences than the questions forming part of their semistructured interviews provided: for example, while the questioning produced fairly routine responses lacking in any emotional content, the involvement of drawing, though approached hesitantly at first by some, frequently resulted in emotionally charged verbal responses accompanying drawings which were executed with energy and focus.
The analysis of the transcripts was shared across the researchers, the author and the student co-researchers, and the emerging themes and insights were discussed and verified so that a commonality emerged. McCormack (2000) draws attention to the importance of ‘active listening’ when interpreting transcripts so that the researcher can reconnect with the story teller, the story, and his or her reactions to both of these. Discussion of the transcript analysis held between the researcher and the student co-researchers focused on the connectivity between the researchers and the teachers, and the critical importance of researchers thinking about how and where their own assumptions and views might affect the interpretations of the respondents words. This is also emphasised by McCormack (2000).
The transcript-based discussions involved the use of ‘multiple lenses’ perspectives, for example, the language used along with the narrative process (stories, description, argumentation, theorising), which are seen by McCormack (2000) as essentially the dimensions people use to give meaning to their lives.
The experience and views of teachers
The analysis of the data provided the following insights in relation to the focus of the research. In the main, the teachers felt that the focus of the inspectors was very much to get the ‘right grade’, that is, to ensure that what was happening within the school and the results it was producing matched Ofsted’s published descriptors linked to the ‘outstanding’, ‘good’, ‘requires improvement’ and ‘inadequate’ grade categories. Ensuring that the inspection team ‘getting it right’ was seen by the teachers as the focus, rather than ‘what insights can we pass onto the school to help them to get better’. There was a revealed sense of dependency on what the inspectors would reveal as pointers for how the school could/should improve.
The majority of teachers saw self-evaluation as something that had to be done as it was required by the inspection process, the judgement of its quality being a significant indicator of the quality of the school. The current versions of the Ofsted inspection documentation (Ofsted, 2015b) confirm that ‘robust’ self-assessment is a feature of strong leadership and management. Self-evaluations are expected to be ‘part of the school’s business processes’. The documentation expresses no specific expectation regarding the format of reports, as was previously the case, and asserts that ‘self-evaluations should not be generated solely for inspection purposes’. However, teachers revealed their view that self-evaluation with any degree of rigour and with any significant time devoted to it, is an unlikely phenomenon in the absence of an external evaluation process that expects it.
The majority of teachers did not convey any impression that systematic self-evaluation was embedded within the culture of the schools. They did recognise, however, that professionally the schools had a ‘duty’ and responsibility to undertake thorough self-evaluation as part of the continuous quality improvement process but felt that in the current regime, Ofsted does the evaluations and that the Ofsted inspection process, responding to inspection findings or ‘getting ready for an Ofsted inspection’ is the school improvement process in the United Kingdom. This perspective resonates with the concerns expressed by MacBeath (2006) about the unequal power relationships between Ofsted as the regulator and the schools as the regulated. The frequent changes to the inspection evaluation criteria (‘changing the goal posts’) were seen by the teachers as a means by which the power of the regulator is reinforced to the detriment – feelings of disempowerment and low morale – of teachers.
Teachers were very clear that the inspection process triggered necessary improvement activity where aspects of the schools provision/performance were found wanting. There was a general feeling however that recognised weaknesses were not rigorously attended to by schools until they were identified by Ofsted during an inspection or so that a ‘good’ grade would be achieved by a pending Ofsted inspection emerged. The comment ‘They didn’t pick up our issue with girls and maths so that’s one less thing to do’ is concerning yet symptomatic of the school responding to the demands of the Ofsted inspection report rather than to the identified needs of the learners.
Teachers’ stories validated the findings reported in the literature about the negative experiences of external evaluation (inspection). While the teachers acknowledged that some of the processes of self-evaluation could be stressful for some, they felt that generally these were conducted in a supportive way and were linked to professional growth and the improvement of the school. In contrast, the impact of inspection was talked about as ‘damaging emotionally and professionally’. In relation to pedagogy, several teachers commented on the impact of the Ofsted inspection regime on their reluctance to innovate and their tendency to ‘teach’ in the way they assumed the ‘preferred’ pedagogical approach was. ‘We are more or less directed by our headteacher to structure our lessons in the way that Ofsted want’. The emotional dimension was the key focus for the commentary from many of the teachers. ‘I don’t think I ate for three days, I didn’t sleep much either’. While some teachers recognised that ‘it gets less stressful the more experienced you are’, in general, teachers felt that the stress was based on the fact that ‘so much depends on it’ and it was ‘hard to keep a sense of perspective’. The findings support Ozga (2003) who warns that if teachers feel under pressure to demonstrate good performance, it may reduce trust, inhibit discussion of difficulties and diminish honest self-evaluation.
As to ways forward to address their concerns, teachers recognised that external evaluation (inspection) had a place, ‘an objective outside look in’, but felt that evaluation reports published on schools should incorporate a ‘multiple lens’ perspective which includes the school’s own self-evaluation findings which are presented as complimentary to the external evaluation findings. The teachers felt that this would help to equalise the power balance between the schools and the regulator and could ensure that self-evaluation processes were undertaken robustly and that these in turn would become part of the embedded self-improvement culture within schools. These views support Park (2013) in the DEMOS publication Detoxifying School Accountability who proposes an alternative model to the current one of external evaluation through inspection with a component of self-evaluation. Park advocates a model which is built around multi-perspective inspection; such a model would value the opinions of leaders, staff, students, parents and inspectors about a school’s performance, instead of allowing the judgements of one group to prevail against others.
The teachers were clear that the above approach would reduce stress and help them to perform optimally. They felt that reducing the stakes of external evaluation would be professionally empowering and would support honest self-evaluation focused on ‘getting it right for the children’ and not just on getting the right and hopefully ‘good’ outcome for the school.
The student co-researchers reported that they had enjoyed the experience of being involved in a real research project and, in particular, had gained significantly from the process of gathering data through their conversations with the teachers. They reported that from their perspective, as prospective teachers themselves, the insight into the professional world of schools and teachers and the professional issues around accountability and school improvement were invaluable and unsurpassed by any other opportunities the degree course had provided for them. The opportunity to discuss and debate the emerging insights from the data as the interpretative stories emerged from the transcripts was also commented upon as very positive. The co-researchers felt empowered through the authenticity of the research process, which they said, would be a lasting memory of their time studying the degree. They saw themselves as contributing to knowledge which would move beyond the university, suggesting that the power of using undergraduates as co-researchers should not be underestimated.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
