Teacher Voice

Abstract

The purpose of this study was to determine how K-12 public school teachers perceive the use of student performance data in teacher evaluations. The proprietary, utility, feasibility, and accuracy standards created by the Joint Committee on Standards for Education Evaluation served as a framework for the study. An online survey was deployed to a random stratified sample of teachers across the United States. Descriptive statistics and analysis of variance were used to determine the level of teacher agreement on statements.

Keywords

teacher evaluation leadership professional growth

Education systems over the past three decades have seen a tremendous shift in the expectations for student achievement. This shift includes more accountability of America’s teachers. One of the primary reasons for greater accountability of teachers is the growing empirical research that links teacher performance with student achievement (Darling-Hammond, 2000; Stronge, Ward, Tucker, & Hindman, 2008). One of the seminal studies that laid the groundwork for this belief was a 1997 study involving thousands of students that reported that the most important factor affecting student learning is the teacher (Wright, Horn, & Sanders, 1997). The researchers concluded that more can be done to improve education by improving the effectiveness of teachers than by any other single factor (Wright et al., 1997). Researchers found that even when compared against a multitude of other variables, the impact of the teacher trumps all other aspects. Tucker and Stronge (2005) described the relationship between a high-performing teacher and student performance saying, “We now know empirically that these effective teachers have a direct influence in enhancing student achievement” (p. 2). With this information, it was only logical to include student performance data as one component of a teacher’s evaluation.

Revamping Teacher Evaluations

Teacher evaluations experienced an unprecedented level of analysis and scrutiny in recent years. The federal government’s Race to the Top initiative prompted policy makers at the national, state, and local levels to develop stronger evaluation programs that more accurately identify effective teachers and, subsequently, improve student achievement. A wealth of research demonstrates that the single most important factor in a student’s level of academic achievement is predicated by the effectiveness of the student’s teacher (Hattie, 2009; Heck, 2009; Rothstein, 2010; Stronge, Ward, & Grant, 2011). Sanders and Rivers (1996) also noted that teacher effects on student academic gains can be seen as both cumulative and residual. This study’s results were confirmed in a Dallas public school’s study that showed how there are teacher effects on longitudinal student achievement. In one particular study, it was determined that student data can efficiently identify teachers whose affect is detrimental and clearly beneficial to students (Jordan, Mendro, & Weerasinghe, 1997). “The core of education is teaching and learning, and the teaching-learning connection works best when we have effective teachers working with every student every day” (Stronge, 2006, p. 1).

There is also research that cites the long-term benefits for students assigned to more effective teachers. Students with teachers who have high value-added scores are more likely to attend college, attend higher ranked colleges, and earn higher salaries (Chetty, Friedman, & Rockoff, 2011). Since the research clearly demonstrates that the quality of teaching matters, it is reasonable to presuppose that a quality teacher evaluation process also matters in order to know if the school system possesses high-quality teachers (Stronge & Tucker, 2003).

With the passage of the Every Student Succeeds Act (ESSA), there are increasing questions as to whether the new evaluation initiatives launched in recent years with a renewed focus on raising student achievement and strengthening instruction will continue. Will education leaders stay the course on teacher-evaluation reform or will leaders use the flexibility found in ESSA to return to more traditional evaluation systems? Before rushing to consider abandoning evaluation systems that make appropriate use of multiple measures, including student performance data, leaders should consider the perspectives of teachers who have participated in these redesigned evaluation systems. Teachers’ perspectives on evaluation systems that use student performance data can inform school leaders on how to responsibly integrate student performance data as a means to improve the accuracy, utility, propriety, and feasibility of evaluation programs. Teachers’ input should be a significant consideration in designing teaching evaluations to inform professional practice and accelerate student learning.

The Debate Over Student Performance Data in Evaluations

There are many outspoken critics of teacher evaluations that include student performance data. Emery and Ohanian (2004) reported that teachers were fearful of what harm or consequences would come to them as a result of test results interpreted incorrectly by principals or district officials. Teachers also complain that a singular focus on a onetime assessment mitigates the other dimensions of a child’s acquisition and demonstration of learning. Researchers have concluded that using standardized tests and value-added modeling is not appropriate as a primary measure evaluating individual teachers (Braun, 2005; National Research Council Board on Testing and Assessment, 2009) and oftentimes fails to appropriately distinguish effective from ineffective teachers (Springer et al., 2010). Researchers have also cautioned that student assessment data in the form of value-added measures should be used only in low-stakes fashion when they accompany an integrated analysis of teachers’ practices (Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012).

Individuals and groups who are more optimistic toward the inclusion of student performance data in their evaluation point to the opportunities associated with this evaluation format and the limitations of traditional programs. In the 2008 report Rush to Judgment, Toch and Rothman offered a critical review of evaluations as being “superficial, capricious, and often don’t even directly address the quality of instruction, much less measure students’ learning” (p. 1). The researchers in The Widget Effect (Weisberg, Sexton, Mulhern, & Keeling, 2009) similarly criticized current evaluation systems as being “disrespectful to teachers” and indifferent to instructional effectiveness (p. 4). Teachers deserved and warranted a more objective evaluation on the results of their instruction. The availability of student performance data were now perceived as a viable mechanism to provide a more constructive evaluation for teachers. Student achievement data may actually facilitate learning and offer administrators and teachers a neutral and objective source of information that can launch constructive conversations between both parties (Baker et al., 2010). This type of constructive dialogue has the opportunity to significantly assist the teacher’s professional growth.

How Do Teachers Perceive Evaluations With Student Performance Data

In order for states and school districts to realize the expected goals from adding student performance data to teacher evaluations, it is imperative for instructional leaders to understand how teachers perceive this change. Recognizing and responding to teachers’ perceptions of the use of student performance data in evaluations has the potential to be a powerful conduit of change. Teacher buy-in is a significant factor to consider and understand. Unfortunately, research on the teachers’ perception of the use of student performance data in their evaluations has been absent up to this point. For these reasons, the survey findings, although limited in number, still merit consideration.

There are a number of questions about teachers’ perceptions toward these new evaluations. How do teachers perceive evaluations that include student performance data? Do teachers who have participated in evaluation systems differ from their peers? Is there a fear of the unknown associated with reforming teacher evaluations? How can we use the feedback from teachers who have experience with student performance data to improve teacher evaluations and subsequent professional development?

A Study of Teacher Perceptions to Various Evaluations

To better understand how teachers perceive the use of student performance data in evaluations, a stratified random sampling was be used to identify a national sample of 5,000 teachers to participate in the study (Hopkins, 2013). The use of systematic random sampling provides an efficient means of selecting participants from a large accessible population and to ensure the participants equally represent elementary, middle, and high school teachers. The survey focused on the four categories for evaluation standards as defined by the Joint Committee on Standards for Education Evaluation. These categories include utility, feasibility, accuracy, and propriety standards. The study specifically sought to determine how teachers perceived the use of student performance data in teacher evaluations with respect to the four evaluation standards presented by the Joint Committee on Standards for Education Evaluation and if certain demographic characteristics affected those perceptions.

The 166 teachers in this study were closely reflective of the national distribution of teachers with respect to a number of important demographic considerations—level of school, years of experience, teaching a tested or nontested course, and employment under a collective bargaining agreement. Fifty-five percent of the teachers identified themselves as elementary teachers, while 45% were secondary teachers who taught Grades 6 to 12. National data reported that 51% of America’s teachers are at the elementary level, while 49% are at the secondary level (U.S. Department of Education, National Center for Education Statistics, 2013). In terms of the participants’ years of experience, again the study sample was very representative of the national average as reported by the U.S. Department of Education in 2013. Twenty percent of the participants had 0 to 4 years of teaching experience (as compared with a national average of 9% with 0-3 years of experience); 26% had between 5 and 10 years of teaching experience (as compared with a national average of 33% with 3-9 years of experience); and 54% had 11 or more years teaching experience (as compared with a national average of 57% having 10 or more years of teaching experience). The study sample had 41% of the participants teaching a grade or course that includes a state standardized assessment under the Elementary and Secondary Education Act to determine progress toward academic standards. The remaining 59% of the surveyed teachers reported that they do not teach a grade or course that is similarly tested. National research indicates that approximately 69% of teachers do not teach a tested subject or grade (Prince et al., 2009). There were 89 teachers (54% of the total sample) who responded that they taught under a collective bargaining agreement. The U.S. Department of Education documented that 50% of school districts in the United States operate under some form of a collective bargaining agreement (Gray, Bitterman, & Goldring, 2013).

It needs to be stated that the limitation in the number of survey responses makes generalizing the findings difficult. The findings were based on a limited number of survey responses and those survey responses may not be typical of the larger teacher population. Due to the relative recent emergence of teacher evaluations that use student performance data, there is insufficient research to discover if teachers view student performance data as a constructive component in evaluating effective instruction. Since there is research that documents that educational reform programs with teacher support have greater opportunities for lasting success (Greene & Lee, 2006; Turnbull, 2002) and also evidence that attributes teacher buy-in and support as a factor in an educational reform’s success in meeting its intended outcomes (Gigante & Firestone, 2008), teachers’ perspectives on this important topic are essential.

The results from this study present invaluable feedback that may be influential in the discussion and design of future teacher evaluations now that new federal legislation (ESSA) provides states with greater flexibility. Once leaders learn how teachers’ who have actually participated in evaluation programs that use student performance data, they will be able to responsibly design meaningful evaluation programs. Evaluation programs that responsibly use student performance in combination with other measures and performance domains to support teachers.

Instrumentation

A survey was developed for this study based on the research and work conducted by Herman, Winters, & Golan (1989) on teachers’ perceptions of standardized testing and its impact on teachers and learners and Lessing and de Witt (2007) on teachers’ perceptions of the value of professional development. Herman and Golan’s survey instrument was adapted with written permission from the researchers and through the National Center for Research on Evaluation, Standards, and Student Testing by the Regents of the University of California as supported under the Institute of Education Services, U.S. Department of Education. Lessing’s survey was adapted with written permission from the author. The survey instrument contains 38 forced choice items. Each item includes a 4-point Likert-type scale where respondents were asked to identify if they strongly disagreed, disagreed, agreed, or strongly agreed with the statement. The survey concludes with participants responding to two open-ended questions and six demographic questions related to the research questions. Demographic questions pertain to the participant’s years of experience, location of current employment, union membership status, whether they teach a tested or nontested grade or course, and whether they teach at a high school, middle school, or elementary school.

The survey instrument was field tested by a panel of doctoral students and then submitted to a panel of experts in educational research. Twelve doctoral students at the College of William & Mary with varying levels of teaching, administrative, and other education-related experience participated in the first field test. The panel of experts was composed of four highly qualified experts in the educational research field. In both the field test and the expert panel review of the survey, participants reviewed the statements, directions, and format of the survey. Both groups were also testing the survey to ensure that the statements in the survey included content relevant to the study and research questions.

The panel of four research experts was presented a revised survey and a report of feedback from the doctoral students for their consideration and review. For the purposes of this study, an expert is defined as an individual with extraordinary insight into the population and/or subject beyond what a member of the population under study or participant in the phenomenon being investigated might have. The four expert reviewers all have a doctoral degree in Educational Policy and Leadership and possess considerable experience in the design, implementation, and review of scholarly research. This expert review helped determine the credibility, conformability, and dependability of the survey. Recommendations and changes indicated by the reviewers were incorporated into the final survey. Research in the development of valid and reliable surveys documents that expert reviewers have the ability to improve in surveys by providing input on the content of the questionnaire, importance and meaningfulness of question areas to research aims, and wording and terminology of items (Dillman, 2002).

The Findings

One of the most interesting results from this study was learning how teachers who had experience with an evaluation system that student performance data differed from their peers in all four areas. As compared with their peers who had no actual experience with an evaluation system that included student performance data, these teachers reported student performance data actually improved the quality of their evaluations and improved the subsequent feedback that informs their professional growth. Teachers who had at least 1 year working in a setting that uses student performance data cited how they found this type of evaluation program truly identifying outstanding educators and conducive to more meaningful conversations and professional development focused on student achievement (Hopkins, 2013). Armed with this new information, school leaders may use the feedback from teachers to responsibly integrate student performance data as one of multiple measures into a more accurate evaluation program. Principals may also be able to use the findings to proactively address concerns from teachers that proved to be unsubstantiated or exaggerated.

Perceptions of Teachers to the Propriety of Evaluations

The propriety standard demonstrates whether the rights of the individuals affected by an evaluation are protected. It specifically determines whether the evaluation system is conducted ethically, legally, and with regard for the personal welfare of the individuals involved in the evaluation (Joint Committee on Standards for Educational Evaluation, 2009). The feasibility standards are intended to increase evaluation effectiveness and efficiency (Yarbrough, Shulha, Hopson, & Caruthers, 2011). Effective evaluation programs in schools, for example, are not disruptive to the learning environment.

Ms. Ramirez is a relatively new fifth grade math teacher with five years of experience in the classroom. Her students take an end-of-year standardized assessment and their results represent a significant component in her evaluation. She reports that student performance data was a “scary proposition” for her and her colleagues when they first learned that test scores would account for 1/3 of their evaluation. “What we discovered, however, was having this data in our evaluations made the process more equitable. We were evaluated on how we aligned instruction to the standards which is one of the most important parts of our job. Since 2/3 of our evaluation examined other variables, it [student performance data] was an appropriate and fair consideration of our performance in the classroom.”

Survey participants were largely homogenous in their responses to questions associated with the propriety standard. The grand means for the 166 participants for questions related to the propriety standard was 2.95. Since 3.0 indicated disagreement, this value indicated the grand mean was extremely close to disagreement. Participants primarily selected disagree (corresponding to a 3) followed by the selection of agree (corresponding to a value of 2). The low standard deviation indicated that there were very few outliers in the study who responded with strongly agree (a value of 1) or strongly disagree (a value of 4). Cronbach’s alpha analysis for the questions related to the propriety standard was .787 indicating acceptable reliability.

An increase in heterogeneity in the responses occurred when comparing responses of teachers with varying levels of experience with an evaluation program that utilizes student performance data. For example, the responses from teachers with more than 1 year of experience in an evaluation program using student performance data (M = 2.49) indicated they were more inclined to agree that including student performance data improved the proprietary standard as compared with the teachers with no experience (M = 3.14). Table 1 provides information about the grand means and standard deviation for propriety standard questions by participant’s experience with an evaluation program that utilizes student performance data.

Table 1.

Mean and Standard Deviation for Propriety Standard Questions by Participant’s Experience With an Evaluation Program That Utilizes Student Performance Data.

	Total teacher sample completing survey, N = 166	M	SD
No experience with evaluation program	90	3.14	0.184
First year of evaluation program	42	2.91	0.466
More than 1 year of evaluation program	34	2.49	0.233
Total participants	166	2.95	0.569

Data were further analyzed to determine whether a teacher’s experience with an evaluation program that utilizes student performance data accounted for significant differences within groups for the feasibility standard. ANOVA tests were run using SPSS with the significance level set at p < .05. Table 2 documents how teachers’ perceptions did significantly differ based on the teacher’s experience with an evaluation program that utilizes student performance data, F(2, 163) = 19.426, p = .001. Furthermore, Cohen’s effect size value (η² = .192) suggests this effect size to be of high practical significance.

Table 2.

Propriety Standard ANOVA by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Evaluation standard	Sum of squares	df	Mean square	F	Sig.
Propriety standard
Between groups	10.293	2	5.146	19.426	.001
Within groups	43.183	163	0.265
Total	53.476	165

Note. ANOVA = analysis of variance; df = degrees of freedom.

Tukey post hoc analysis revealed that this difference was attributable to teachers with 1 year or more experience with an evaluation program that utilizes student performance data responding significantly more favorable to propriety standard questions than did teachers in their first year of such a program and teachers with no experience. There was also a significant difference in perceptions of teachers in their first year with an evaluation program that utilizes student performance data than teachers with no experience (p = .001). Table 3 shows the post hoc results.

Table 3.

Tukey Post Hoc Analysis for Propriety Standard by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	95% Confidence interval
Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	Lower bound	Upper bound
1 Year	First	−.41807*	.11874	.002	−.6989	−.1372
	None	−.64346*	.10361	.001	−.8885	−.3984
First	1 Year	.41807*	.11874	.002	.1372	.6989
	None	−.22540	.09618	.053	−.4529	.0021
None	1 Year	.64346*	.10361	.001	.3984	.8885
	First	.22540	.09618	.053	−.0021	.4529

Note. SE = standard error.

The mean difference is significant at the .05 level.

Teachers who had reported experience of working in a school district using an evaluation process that used student performance data reported that the inclusion of student performance data improved their overall evaluation. Teachers noted that using student performance data would benefit evaluations because these data “identifies good teachers” and improves the overall evaluation process by making the program “more objective” and “removing evaluator bias.” The teachers who had not yet participated in a program that uses student performance data feared that student performance data would become the “sole source of evaluation data” which would not be especially fair to teachers in schools that have historically poor academic results or teach students who have historically scored low on assessments. These fears did not prove to be realized by teachers who had student performance data as a component in their evaluation.

Perceptions of Teachers to the Feasibility of Evaluations

The feasibility standards are intended to increase evaluation effectiveness and efficiency (Yarbrough et al., 2011). Effective evaluation programs in schools, for example, are not disruptive to the learning environment.

Ms. Schmidt is a first-year seventh grade math teacher. Her students take an end-of-year standardized assessment and their results represent a significant component in her evaluation. She reports that she learned student performance data was going to be included in her evaluation as a means for “reflection” and not “formal evaluation.” Since she was new to the profession she was not able to compare this process to any other evaluation systems. “The use of student data seemed to make sense because it [student data] was part of ‘what we do in schools.’” She went on to comment that the collection of data was “part of our school culture and it made sense to include it informally in our professional reflections.”

Survey participants were largely homogenous in their responses to questions associated with the feasibility standard. The grand mean for the 166 participants for questions related to the feasibility standard was 2.45. This value indicates that participants were leaning slightly toward agreement with most respondents selecting agree (corresponding to a 2) followed closely by the selection of disagree (corresponding to a value of 3). The low standard deviation indicated that there were very few outliers in the study who responded with strongly agree (a value of 1) or strongly disagree (a value of 4). This level of homogeneity extended when comparing teachers with varying levels of experience using an evaluation program that utilizes student performance data. Table 4 provides information about the grand means and standard deviation for feasibility standard questions and the mean and standard deviation for participants broken down by varying levels of experience with an evaluation program that utilizes student performance data.

Table 4.

Mean and Standard Deviation for Feasibility Standard Questions by Participant’s Experience With an Evaluation Program That Utilizes Student Performance Data.

	Total teacher sample completing survey, N = 166	M	SD
No experience with evaluation program	90	2.47	0.341
First year of evaluation program	42	2.43	0.302
More than 1 year of evaluation program	34	2.42	0.342
Total participants	166	2.45	0.785

Data were further analyzed to determine whether a teacher’s experience with an evaluation program that utilizes student performance data accounted for significant differences within groups for the feasibility standard. ANOVA tests were run using SPSS with the significance level set at p < .05. Table 5 documents how teachers’ perceptions did not significantly differ based on the teacher’s experience with an evaluation program that utilizes student performance data. The reason for similar responses may have resulted from the fact that only four questions on the survey were associated with the feasibility standard.

Table 5.

Feasibility Standard ANOVA by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Evaluation standard	Sum of squares	df	Mean square	F	Sig.
Feasibility standard
Between groups	.075	2	0.038	.155	.856
Within groups	39.651	163	0.243
Total	39.726	165

Note. ANOVA = analysis of variance; df = degrees of freedom.

Perceptions of Teachers to the Utility of Evaluations

The utility standards are intended to increase the extent to which program stakeholders find evaluation processes and products valuable in meeting their needs (Yarbrough et al., 2011). These standards focus on the need for evaluator credibility, relevant and meaningful information and processes in the evaluation, timely and appropriate communication and reporting of evaluation findings, and concern for the consequences and influence of the evaluation. The goal for the utility standards is to “increase the likelihood that the evaluation will have positive consequences and substantial influence, as needs and opportunities appear over the course of the evaluation” (Yarbrough et al., 2011, p. 8).

Mr. Jollay is an experienced teacher with over two decades of classroom experience. He teaches a middle school English course that includes a state standardized assessment. Therefore, his students’ performance on this test is included in his evaluation. “For the first time in I don’t know how long . . . if ever . . . I had an evaluation that highlighted how I can improve my teaching. The student data made my post-observation conference mean something. My principal and I used the data to develop some specific goals on how I could improve working with the strongest students in my classroom to make sure I truly differentiated instruction.” Mr. Jollay went on to comment that many of his peers felt the same way. “Our principal used the data to FINALLY help us grow collaboratively enhancing our school’s overall professional development. I believe the data also helps our principal become a stronger supervisor. Evaluations with this [data] now make professional development meaningful.”

The study’s results indicated that teachers were more favorable to how student performance data would improve the evaluation process with respect to the utility standard as compared with the other three evaluation standards. The teachers’ perceptions are aligned to research that suggested most evaluation programs did little to improve practice or instruction and can become “little more than a time-consuming charade” (Stronge & Tucker, 2003, p. 6). Teachers and administrators each perform their assigned role in the evaluation process and not surprisingly very few substantial changes in teaching and learning transpired (Weisberg et al., 2009).

Survey participants were largely homogenous in their responses to questions associated with the utility standard. The grand mean for the 166 participants for questions related to the utility standard was 2.43. This value indicates that participants were leaning slightly toward agreement with most respondents selecting agree (corresponding to a 2) followed closely by the selection of disagree (corresponding to a value of 3). The low standard deviation indicated that there were very few outliers in the study who responded with strongly agree (a value of 1) or strongly disagree (a value of 4). An increase in heterogeneity in the responses occurred when comparing responses of teachers with varying levels of experience with an evaluation program that utilizes student performance data. Teachers with more than 1 year of experience in an evaluation program using student performance data reported much stronger agreement (M = 2.09) that this type of evaluation program improved the utility of the evaluation than teachers with no experience (M = 2.61). Table 6 provides information about the grand means and standard deviation for utility standard questions by participant’s experience with an evaluation program that utilizes student performance data.

Table 6.

Mean and Standard Deviation for Utility Standard Questions by Participant’s Experience With an Evaluation Program That Utilizes Student Performance Data.

	Total teacher sample completing survey, N = 166	M	SD
No experience with evaluation program	90	2.61	0.294
First year of evaluation program	42	2.28	0.209
More than 1 year of evaluation program	34	2.09	0.289
Total participants	166	2.43	0.785

Data were further analyzed to determine whether a teacher’s experience with an evaluation program that utilizes student performance data accounted for significant differences within groups for the utility standard. ANOVA tests were run using SPSS with the significance level set at p < .05. Table 7 documents how teachers’ perceptions did differ based on the teacher’s experience with an evaluation program that utilizes student performance data. Furthermore, Cohen’s (1988) effect size value (η² = .147) suggests this effect size to be of high practical significance.

Table 7.

Utility Standard ANOVA by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Evaluation standard	Sum of squares	df	Mean square	F	Sig.
Feasibility standard
Between groups	7.680	2	3.840	14.156	.001
Within groups	44.213	163	0.271
Total	51.893	165

Note. ANOVA = analysis of variance; df = degrees of freedom.

Tukey post hoc analysis revealed that this difference was attributable to teachers in their first year and teachers with 1 or more years’ experience with an evaluation program that utilizes student performance data responding more favorable to utility standard questions than did teachers with no experience. There was also a significant difference in perceptions of teachers in their first year with an evaluation program that utilizes student performance data than teachers with no experience (p = .001). Table 8 shows the post hoc results.

Table 8.

Tukey Post Hoc Analysis for Utility Standard by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	95% Confidence interval
Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	Lower bound	Upper bound
1 Year	First	−.19034	.12015	.255	−.4745	.0939
	None	−.51732*	.10484	.001	−.7653	−.2693
First	1 Year	.19034	.12015	.255	−.0939	.4745
	None	−.32698*	.09732	.003	−.5572	−.0968
None	1 Year	.51732*	.10484	.001	.2693	.7653
	First	.32698*	.09732	.003	.0968	.5572

Note. SE = standard error.

The mean difference is significant at the .05 level.

Teachers cited more examples of how including student performance data in evaluation programs would benefit the utility nature of the evaluation than any of the other three standards. Teachers commented that an evaluation program that uses student performance data would “guide lesson planning,” “identify student gaps,” “inform professional development,” and “enhance personal growth and reflection.” The responses also identified some perceived liabilities. The most prominent fear—provided by 16 teachers who had not participated in an evaluation system that included student performance data—worried that the inclusion of student performance data would promote “teaching to the test.”

Perceptions of Teachers to the Accuracy of Evaluations

The accuracy standards address the completeness and soundness of the information collected. Accuracy standards are intended to increase the dependability and truthfulness of evaluation findings (Yarbrough et al., 2011). In order to meet these standards, evaluations must include valid and reliable information, sound designs and analyses, and justified conclusions and decisions in order to be meaningful.

Mrs. Bevis, a veteran teacher with over 11 years of experience at the high school level, offered a sentiment that resonated with other educators who had experience with an evaluation program that uses student performance data. “For the first time, I had an evaluation that I believe was a valid and accurate story of what I do as a classroom teacher. The inclusion of student performance data gave me and others in my school a more accurate description of what we do and what how we can do it better. The data celebrates what I do well and I am glad to have it on my evaluation.”

Survey participants were largely homogenous in their responses to questions associated with the accuracy standard. The grand mean for the 166 participants for questions related to the accuracy standard was 2.86. This value indicates that participants were leaning slightly toward disagreement with most respondents selecting disagree (corresponding to a 3) followed by the selection of agree (corresponding to a value of 2). The low standard deviation indicated that there were very few outliers in the study who responded with strongly agree (a value of 1) or strongly disagree (a value of 4). Where there was more heterogeneity in responses was with respect to how much experience the teachers had with an evaluation program that utilizes student performance data. Teachers were more likely to agree that evaluation programs utilizing student performance data produced more accurate evaluations if they had more than 1 year of experience with such a program (M = 2.40) versus having no experience with such an evaluation program (M = 3.06). Table 9 provides information about the grand means and standard deviation for accuracy standard questions by participant’s experience with an evaluation program that utilizes student performance data.

Table 9.

Mean and Standard Deviation for Accuracy Standard Questions by Participant’s Experience With an Evaluation Program That Utilizes Student Performance Data.

	Total teacher sample completing survey, N = 166	M	SD
No experience with evaluation program	90	3.06	0.259
First year of evaluation program	42	2.80	0.233
More than 1 year of evaluation program	34	2.40	0.306
Total participants	166	2.86	0.775

Data were further analyzed to determine whether a teacher’s experience with an evaluation program that utilizes student performance data accounted for significant differences within groups. ANOVA tests were run using SPSS with the significance level set at p < .05. Table 10 documents how there was a significant effect of teachers’ perceptions based on the teacher’s experience with an evaluation program that utilizes student performance data was significant, F(2, 163) = 20.947, p = .001. Furthermore, Cohen’s effect size value (η² = .204) suggests this effect size to be of high practical significance.

Table 10.

Accuracy Standard ANOVA by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Evaluation standard	Sum of squares	df	Mean square	F	Sig.
Accuracy standard
Between groups	10.993	2	5.497	20.947	.001
Within groups	42.771	163	0.263
Total	53.764	165

Note. ANOVA = analysis of variance; df = degrees of freedom.

Tukey post hoc analysis revealed that this difference was attributable to teachers with 1 year or more experience with an evaluation program that utilizes student performance data responding more favorable to accuracy standard questions than did teachers in their first year with such an evaluation program and teachers with no experience. There was also a significant difference in perceptions of teachers in their first year with an evaluation program that utilizes student performance data than teachers with no experience (p = .001). Table 11 shows the post hoc results.

Table 11.

Tukey Post Hoc Analysis for Accuracy Standard by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	95% Confidence interval
Exp (I)	Exp (J)	Mean difference (I−J)	SE	Sig.	Lower bound	Upper bound
1 Year	First	−.40208*	.13515	.009	−.7217	−.0824
	None	−.66160*	.11793	.001	−.9405	−.3827
First	1 Year	.40208*	.13515	.009	.0824	.7217
	None	−.25952*	.10947	.049	−.5185	−.0006
None	1 Year	.66160*	.11793	.001	.3827	.9405
	First	.25952*	.10947	.049	.0006	.5185

Note. SE = standard error.

The mean difference is significant at the .05 level.

Teachers were provided the opportunity to list up to three benefits and limitations associated with including student performance data in their evaluation. The teacher’s feedback related to the accuracy standard on the potential benefits from including student performance data in evaluations indicated that student performance data would “translate to a more meaningful and effective evaluation for once” and “finally allow for effective teachers (who are not necessarily the ‘favorites’) to be recognized.” The teachers’ feedback on the limitations of including student performance data associated with the accuracy standard noted that current standardized tests are “invalid,” “unreliable,” and “bad indicators of student progress.” Teachers also commented on how students “don’t take the tests seriously” and “punish their teachers by bombing the end of year tests.” The open-ended responses were coded and grouped into similar constructs. The potential benefits and limitations cited by teachers associated with the accuracy standard are listed in Table 12.

Table 12.

Teacher-Cited Benefits and Limitations Associated With the Accuracy Standard.

Benefits	Limitations
Makes evaluation more objective (3)	Inaccurate assessments (5)
Identifies good teachers	Student apathy toward test (10)
	Does not account for student ability groupings (5)
	Teaching students with disabilities or English language learners (3)

Impact of Demographics on Teachers’ Perspectives

The study sought to identify if there was any significant difference in the perceptions of teachers toward the incorporation of student performance data in their evaluation among various demographic criteria. Teachers responded with their years of experience, whether they worked in a union or nonunion, state, taught a tested or nontested grade or course, what level they taught (elementary, middle, or high), and if they had any experience with an evaluation system that used student performance data. The only demographic category that produced significant differences in perceptions was how much experience the teacher had with working with an evaluation program that utilized student performance data. Teachers in their first year of such an evaluation program and teachers with 1 year or more experience with such a program were more favorable to how the inclusion of student performance data positively affected the propriety, utility, and accuracy standards. ANOVA tables presented earlier in this article demonstrate that these differences among groups were statistically significant (p < .01) for all three standards. Regardless of their experience with an evaluation program that utilizes student performance data, teachers were similar in their responses to questions associated with the feasibility standard. Table 13 provides a comprehensive view for all four evaluation standards. A value of 1 indicates strong agreement, 2 indicates agreement, 3 indicates disagreement, and a value of 4 indicates strong disagreement. Figure 1 provides a visual representation of how teachers with 1 or more years of experience with an evaluation program that uses student performance data had higher levels of agreement that this type of evaluation system improved the quality in three of the four standards.

Table 13.

Teacher Mean and Standard Deviation for Evaluation Standards Disaggregated by Teacher Experience With an Evaluation Program That Utilizes Student Performance Data.

Level of experience	Teacher sample, N = 166	Propriety standard		Utility standard		Feasibility standard		Accuracy standard
Level of experience	Teacher sample, N = 166	M	SD	M	SD	M	SD	M	SD
No experience	90	3.14	0.184	2.61	0.294	2.47	0.341	3.06	0.259
First year of program	42	2.91	0.446	2.28	0.209	2.43	0.302	2.80	0.233
1 Year or more	34	2.49	0.233	2.09	0.289	2.42	0.342	2.40	0.306

Figure 1.

Teacher mean for evaluation standards disaggregated by teacher experience with an evaluation program that utilizes student performance data.

Implications for Principals

All teachers deserve the opportunity to be evaluated utilizing objective data. Maintaining conventional evaluation programs that do not factor in student achievement data for any teacher jeopardizes opportunities for growth for teachers as well as students. Teachers identified many potential benefits from the use of student performance data in their evaluation. The results from this study may provide all parties with relevant information about the opportunities associated with this change. Administrators armed with this information may be able to expand on the merits of including student performance data in a teacher’s evaluation and garner invaluable buy-in from teachers. Learning more about how teachers perceive the evaluation process is important since evaluations have not historically had the power to enhance teaching and learning. This may also benefit teachers in understanding the rationale for including student performance data in their evaluations. School systems can either use the teacher evaluation process as a “catalyst for improving teaching and learning” or as a “meaningless bureaucratic necessity” (Davis, Ellett, & Annunziata, 2002). This study shows teachers perceive including student performance data do not add more pressure to their lives but actually can be that catalyst to improve teaching and learning.

The revelation that teachers with experience in an evaluation program that uses student performance data largely agreed with the premise that student performance data leads to more accurate and useful evaluations has at least three important implications for principals in improving teacher practice. These implications include (a) transforming evaluation programs into mechanisms for meaningful individual and school-wide professional development, (b) utilizing student performance data to inform responsible decision making in schools to promote academic achievement for all students and close achievement gaps, and (c) using evaluations to recognize exemplary teaching and make more informed personnel decisions and placements.

Teachers evaluated under a system that uses student performance data were more optimistic toward the inclusion of student performance data in their evaluation. They pointed to the opportunities associated with this evaluation format to promote meaningful professional development. In addition to the strong level of agreement noted by teachers to survey questions associated with the utility standards, teachers’ coded open-ended feedback specifically mentioned that student performance data would “inform professional development” and “enhance personal growth and reflection.” Other teachers noted that student performance data offered administrators and teachers a neutral and objective source of information that can launch constructive conversations between both parties. Teacher responses that suggested student performance data in their evaluation would “remove evaluator bias” and “increase collaboration” indicates that student achievement data in the evaluation instrument possesses the potential to drive meaningful dialogue between teacher and administrator.

The findings from this study may provide school leaders and administrators with vital feedback that allows them to preemptively address teachers’ concerns regarding the use of student performance data in their evaluation program. The findings from this study specifically documents where teachers perceive potential liabilities associated with the use student performance data in their evaluations. The heightened level of disagreement from the 90 teachers without any experience with an evaluation program that uses student performance data suggests their concerns may be a result of inadequate and incomplete information. Teachers specifically commented that “data now will replace everything else I do at the school which can’t be quantified in numbers,” “test scores will trump all in the evaluation,” and “my principal can’t understand scores and I am afraid it will hurt me.” What is noteworthy from this study, is that these “fears” came almost exclusively from teachers who had no actual experience with such an evaluation system. It appeared to be, as one teacher cautioned, a “fear of the unknown.” Instructional leaders aware of these concerns are better positioned to proactively educate teachers and other relevant stakeholders as to how these perceived liabilities will be responsibly and appropriately addressed.

Another implication for improving teaching through evaluation programs that use student performance data is through the program’s ability to recognize exemplary teaching and make more informed personnel decisions and placements. Teachers commented in the additional feedback section that the use of student performance data in the evaluation process would “identify good teachers” and “make teachers more accountable.” Teachers with at least 1 year of experience working with an evaluation program that uses student performance data also largely agreed to survey questions correlated with the propriety standard. This level of agreement suggests that these teachers value this type of evaluation’s program to fairly differentiate between levels of performance.

The teachers’ perceptions in this study coincide with findings from another study that criticized evaluation systems without student performance data as largely being “disrespectful to teachers” and indifferent to instructional effectiveness (Weisberg et al., 2009, p. 4). This same study noted that teachers have been routinely rated as satisfactory and above for decades. Truly effective teachers deserve to be distinguished from their colleagues and an evaluation program that uses student performance data can do this according to teachers who have participated in such a program. The instructional strategies employed by these outstanding educators can be more readily replicated in other classrooms.

The homogeneity of teachers’ responses should not prevent leaders from ignoring important differences in how to deliver guidance and information regarding how student performance data will affect teachers. Although teachers from tested and nontested courses responded similarly in this survey, leaders should consider differentiating their message to these two unique audiences since student performance data will presumably be captured differently. It is also important to consider differentiating the message to novice and veteran teachers. Again, this study noted very similar responses between teachers from varying levels of teaching experience. Teachers who have recently graduated from an education program, however, are more likely to have greater exposure to assessment and data literacy courses. This relatively new concentration in undergraduate education preparation programs may help them better understand and appreciate the power of student performance data.

It is also important to note that the weight of student performance data in the teacher’s evaluation will likely affect his or her perspective on this type of evaluation program. The flexibility granted to states under ESSA should help leaders design an effective evaluation program that uses student performance data as one of many measures. The perspectives from teachers in this study suggest that omitting performance data altogether can be counterproductive to moving education forward.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author Biography

Paul Hopkins is an experienced high school principal, director of Exceptional Education, and director of Curriculum. He is a graduate from the Peabody College at Vanderbilt University and received his doctorate from the College of William & Mary. His research focuses on examining how school districts can develop innovative and relevant solutions to strengthen and enhance teacher and principal recruitment, selection, evaluation, and professional development.

References

Baker

E. L.

Barton

P. E.

Darling-Hammond

Haertel

Ladd

H. F.

Linn

R. L.

. . . Shepard

L. A.

(2010). Problems with the use of student test scores to evaluate teachers (Briefing Paper No. 278). Washington, DC: Economic Policy Institute.

Braun

(2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service.

Chetty

Friedman

J. N.

Rockoff

J. E.

(2011, December). The long-term impacts of teachers: Teacher value-added and student outcomes in adulthood (NBER Working Paper No. 17699). Retrieved from http://www.nber.org/papers/w17699.pdf

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Darling-Hammond

(2000). Teacher quality and student achievement: A review of state policy evidence. Educational Policy Analysis Archives, 8(1), 1-44. doi:10.14507/epaa.v8n1.2000

Darling-Hammond

Amrein-Beardsley

Haertel

Rothstein

(2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8-15.

Davis

D. R.

Ellett

C. D.

Annunziata

(2002). Teacher evaluation, leadership and learning organizations. Journal of Personnel Evaluation in Education, 16, 287-302.

Dillman

(2002). Presidential address: Navigating the rapids of change: Some observations on survey methodology in the early twenty-first century. Public Opinion Quarterly, 66, 473-494.

Emery

Ohanian

(2004). Why is corporate America bashing our public schools? Portsmouth, NH: Heinemann.

10.

Gigante

N. A.

Firestone

W. A.

(2008). Administrative support and teacher leadership in schools implementing reform. Journal of Educational Administration, 46, 302-331.

11.

Gray

Bitterman

Goldring

(2013). Characteristics of Public School Districts in the United States: Results from the 2011–12 Schools and Staffing Survey (NCES 2013–311). Retrieved from http://nces.ed.gov/pubs2013/2013311.pdf

12.

Greene

J. C.

Lee

J. H.

(2006). Quieting educational reform . . . with educational reform. American Journal of Evaluation, 27, 337-352.

13.

Hattie

(2009). Visible learning: A synthesis of over 800 meta-analyses related to student achievement. New York, NY: Routledge.

14.

Heck

R. H.

(2009). Teacher effectiveness and student achievement: Investigating a multilevel cross-classified model. Journal of Educational Administration, 47, 227-249.

15.

Herman

Winters

Golan

(1989). CSE Technical Report 298: Reporting foreffective decision making. UCLA Center for Research on Evaluation, Standards,and Student Testing. Retrieved from http://www.dse.usla.edu/products/reportsset.html.

16.

Hopkins

P. T.

(2013). Teacher perceptions of the use of student performance data in teacher evaluations (Doctoral dissertation).

17.

Joint Committee on Standards for Educational Evaluation. (2009). Personnel evaluation standards (2nd ed.). Thousand Oaks, CA: Sage.

18.

Jordan

H. R.

Mendro

R. L.

Weerasinghe

(1997, July). Teacher effects on longitudinal student achievement: A report on research in progress. Paper presented at the CREATE Annual Meeting, Indianapolis, IN. Retrieved from http://dallasisd.schoolwires.net/cms/lib/TX01001475/Centricity/Shared/evalacct/research/articles/Jordan-Teacher-Effects-on-Longitudinal-Student-Achievement-1997.pdf

19.

Lessing

de Witt

(2007). The value of continuous professional development: teachers’ perceptions. South African Journal of Education, 27, 53–67.

20.

National Research Council Board on Testing and Assessment. (2009). Letter report to the U.S. Department of Education. Washington, DC: Author.

21.

Prince

C. D.

Schuermann

P. J.

Guthrie

J. W.

Witham

P. J.

Milanowski

A. T.

Thorn

C. A.

(2009). The other 69 percent: Fairly rewarding the performance of teachers of nontested subjects and grades. Washington, DC: Center for Educator Compensation Reform. Retrieved from http://www.cecr.ed.gov/guides/other69Percent.pdf.

22.

Rothstein

(2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125, 175-214.

23.

Sanders

W. L.

Rivers

J. C.

(1996). Cumulative and residual effects of teachers on future student academic achievement. Knoxville: University of Tennessee Value-Added Research and Assessment Center.

24.

Springer

Ballou

Hamilton

Lockwood

McCaffrey

Pepper

Stecher

(2010). Teacher pay for performance: Experimental evidence from the Project on Incentives in Teaching. Nashville, TN: National Center of Performance Incentives.

25.

Stronge

J. H.

(Ed.). (2006). Evaluating teaching: A guide to current thinking and best practice (2nd ed.). Thousand Oaks, CA: Corwin Press.

26.

Stronge

J. H.

Tucker

P. D.

(2003). Handbook on teacher evaluation: Assessing and improving performance. Larchmont, NY: Eye on Education.

27.

Stronge

J. H.

Ward

T. J.

Grant

L. W.

(2011). What makes good teachers good? A cross-case analysis of the connection between teacher effectiveness and student achievement. Journal of Teacher Education, 62, 339-355.

28.

Stronge

J. H.

Ward

T. J.

Tucker

P. D.

Hindman

J. L.

(2008). What is the relationship between teacher quality and student achievement? An exploratory study. Journal of Personnel Evaluation in Education, 20, 165-184.

29.

Toch

Rothman

(2008). Rush to judgment: Teacher evaluation in public education. Washington, DC: Education Sector.

30.

Tucker

P. D.

Stronge

J. H.

(2005). Linking teacher evaluation and student learning. Alexandria, VA: Association for Supervision and Curriculum Development.

31.

Turnbull

(2002). Teacher participation and buy-in: Implications for school reform initiatives. Learning Environments Research, 5, 235-252.

32.

U.S. Department of Education, National Center for Education Statistics. (2013). Table 209.10: Number and percentage distribution of teachers in public and private elementary and secondary schools, by selected teacher characteristics: Selected years, 1987-88 through 2011-12. Retrieved from http://nces.ed.gov/programs/digest/d13/tables/dt13_209.10.asp#

33.

Weisberg

Sexton

Mulhern

Keeling

(2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Retrieved from http://widgeteffect.org/downloads/TheWidgetEffect.pdf

34.

Wright

S. P.

Horn

S. P.

Sanders

W. L.

(1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 1, 57-67.

35.

Yarbrough

D. B.

Shulha

L. M.

Hopson

R. K.

Caruthers

F. A.

(2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.