Assessment in Music: A Practitioner Introduction to Assessing Students

Abstract

There has been an increased interest in documenting the growth and learning outcomes of students in all subjects in the past 20 years, and music education has not been immune to the accountability movement. Yet, in spite of the increased sociopolitical pressures put on educators, music has remained a difficult discipline to assess, which in turn has created tension between music educators and policymakers. This review of literature examines the basic nature of assessment in music education and discusses possible concepts and methods to improve practitioner understanding of student growth and learning. Topics include (a) What is assessment? (b) Why does assessment matter? (c) How do we assess in music? and (d) What challenges occur in music assessment?

Keywords

assessment evaluation feedback formative assessment rubric summative assessment

There has been an increased interest in documenting the growth and learning outcomes of students in all subjects in the past 20 years, and music education has not been immune to the accountability movement (Ciorba & Smith, 2009; Colwell, 1998; Fisher, 2008; McQuarrie & Sherwin, 2013). As such, assessment in the music classroom has become a primary area of concern to music educators (McQuarrie & Sherwin, 2013). Yet, in spite of the increased sociopolitical pressures put on educators, music has remained a difficult discipline to assess, which in turn has created tension between music educators and policymakers (Colwell, 2008).

Therefore, it has become increasingly important that music educators maintain an active awareness of assessment importance and practice in the field. Maintaining such awareness, however, may prove challenging for young practitioners, due, in part, to a lack of knowledge about assessment. Additionally, Asmus (1999) stressed the importance of teachers knowing the needs and learning styles of the students in their classes. The variety of class settings often navigated by music teachers has been suggested as heightening the necessity for building assessments around student needs (Asmus, 1999; West, 2012). Classes often vary in size, language spoken, cultures represented, and cognitive development. Accordingly, Chiodo (2001) recommended that music educators seek out information about their students from the students themselves, professional colleagues, and from data resources to inform decisions about assessment.

The purpose of this article is to examine the following questions: (a) What is assessment? (b) Why does assessment matter? (c) How do we assess in music? and (d) What challenges occur in music assessment? Topics were selected a priori by the author and the included studies were selected through ERIC, JSTOR, and ProQuest database searches.

What Is Assessment?

Idealistically, assessment has been presented as a method for gathering information relevant to both teachers and students about teaching and learning, centered on student knowledge and skills (Parkes, Rohwer, & Davidson, 2015). Goolsby (1999) suggested that four types of assessment are, in his experience, commonly used in instrumental settings: (a) placement (auditions, chair tests, etc.); (b) summative (concert, festivals, etc.); (c) diagnostic (identifying performance skills and knowledge difficulties); and (d) formative (the regular monitoring of students for learning outcomes). Eisner (1998) also underscored the diagnostic, positioning, content evaluative, and reflective roles of assessment in the arts. Conceptual definitions and goals, such as those mentioned above, provide the foundation for the creation and implementation of assessments. It may, therefore, be necessary to have a basic understanding of what assessments accomplish.

Students have been found to benefit from specific and reliable feedback gained from both formative and summative assessments (Colwell, 1998; Salvador, 2011; Sicherl Kafol, Kordeš, & Holcar Brunauer, 2017; Zerull, 1990). Formative assessments have been defined as assessments conducted during learning, and have been one of the primary aspects of designing and differentiating instruction (Salvador, 2011; Saunders & Holahan, 1997; Sicherl Kafol et al., 2017). Music teachers have been said to engage in formative assessment whenever they listen critically to student performance/practice, make judgments, and provide feedback (Saunders & Holahan, 1997). While practitioners may often use formative assessment to ensure that students do not simply skirt the instructor’s acceptance of mistakes (Goolsby, 1999), formative assessment has often been relegated to error detection, identification, and communication (Colwell, 1998). True formative assessment, according to Colwell (2008), only occurs when “some learning action follows from the testing, otherwise it is merely frequent summative assessment” (p. 13).

Summative assessment, or the end measurement of either learning, growth, or achievement (Sicherl Kafol et al., 2017), can also be thought of as answering the following question: How did the student(s) do? (Mastrorilli, Harnett, & Zhu, 2014). In music, individual performances and competitions often serve as summative assessments and have been found to be the source of many student grades and teacher evaluations (Ciorba & Smith, 2009; Latimer, Bergee, & Cohen, 2010). Russell and Austin (2010) found that the vast majority (95%) of the 352 secondary music teachers surveyed worked in schools that used traditional summative grading systems. Comparable systems have been noted as extremely common; however, Harrison, Lebler, Carey, Hitchcock, and O’Bryan (2013) suggested that simple grades lack meaningful feedback. Due to their perceived and real importance, summative experiences often influence instructional decisionmaking, from sequencing to curriculum.

Why Does Assessment Matter?

Researchers have found that students may view assessment, particularly when measured by grades, as important and a powerful motivating factor (Colwell, 1998; McClung, 1996; Reimer, 2009). Boud, Cohen, and Sampson (1999) claimed that “assessment is the single most powerful influence on learning in formal courses and, if not designed well, can easily undermine the positive features of an important strategy in the repertoire of teaching and learning approaches” (p. 414). Lehman (2008) also emphasized the interconnectivity of instruction and assessment. In this view, assessment has been considered an essential component of quality instruction and necessary for improvement in teaching and learning to take place (Eisner, 1998; Lehman, 2008). Formative assessments, in their role of providing data for reflection and adjustment of instruction, may improve specific lessons, general content delivery, sequencing, and/or curricular focus. Additionally, the cumulative nature of summative assessments may be used to evaluate achievement, influence the development of future goals, serve as a motivating factor, and provide meaningful data about teaching and learning in the music classroom. It may be important, however, to note that these positive factors may be predicated on the proper implementation of assessment.

Increases in accountability, spurred by laws such as No Child Left Behind and grant programs such as Race to the Top, have exacerbated the preexisting natural emphasis on assessment (Ciorba & Smith, 2009; Wesolowski, 2014). This increased public attention, in turn, has increasingly affected policy decisions about music assessment (West, 2012; Zerull, 1990). Music teachers have been asked to teach more than one area of music, in many cases, to free up resources to support subjects struggling with high-stakes testing (Lehman, 2008; West, 2012). The implementation of standardized music tests, while attempted several times in the past 40 years, has not been widespread due to the inherent difficulties in uniform measurement of music skills (Parkes et al., 2015; Zerull, 1990). Researchers have also pointed to the breadth of music as a field, the expressive nature of the discipline, and the intrinsic value of all art forms as factors contributing to music’s unsuitability for assessment (Lehman, 2008; Zerull, 1990).

Assessment has been found to be of such importance that evaluating music and music performance is included in the current national standards (Shuler, Norgaard, & Blakeslee, 2014). Music has been recognized at both national and state levels as being a core curricular subject, which has brought more focus on assessment in the music classroom (Fisher, 2008). As such, Asmus (1999) noted that some objections to assessment in music may stem from a lack of specificity as to students’ expected knowledge and skills. Fisher (2008) provided several reasons for the adoption of more regular assessments in music classrooms, including easily understood and consistent accountability, legitimization in the perceptions of those outside music, and protection of music instruction time. Wright, Humphrey, Larrick, Gifford, and Wardlaw (2005), however, countered that there are also viable arguments that music is not suitable for formal standardized testing, including the inconsistent schedule of music instruction and the miniscule impact on subject credibility testing may provide. Likewise, the researchers listed additional contributing factors, including the breadth of music as a field, the expressive nature of the discipline, and the intrinsic value of all art forms. Asmus (1999) suggested that music, possibly more than other subjects, operates in more unique circumstances and has longer teacher-to-student interactions. Therefore, formative and summative assessments may affect long-term learning outcomes.

The importance of music assessment extends beyond the need to evaluate for pedagogical purposes. A number of studies have shown that assessments can be used to communicate value and advocate for school music to those outside the discipline (Colwell, 1998; Reimer, 2009; Zerull, 1990). Accordingly, Reimer (2009) remarked that the use of nonmusic criteria for formative assessment, or even worse for summative grades, can lend credence to claims that music is a less serious subject than other academic disciplines. McClung (1996) found that only 18% of administrators surveyed perceived choir as having equal educational status as other core subjects. McQuarrie and Sherwin (2013) found that many of the most common assessments used in music classrooms were nonachievement based, which deviated sharply from the assessment literature in the Music Educators Journal. In general, music ensemble instructors have embraced the necessity of assessment in their classes, but often have no training in developing or administering assessment (Colwell, 1998).

How Do We Assess in Music?

Within the differing forms and goals of assessment, scholars have noted a multitude of approaches (McQuarrie & Sherwin, 2013; Rohwer, 1997; Russell & Austin, 2010). Teachers and researchers have used content-specific rating scales, checklists, rubrics, report cards, aptitude testing, observations, and portfolios to assess student learning and growth (Parkes et al., 2015; Rohwer, 1997; Salvador, 2011). Many of the common assessments found in schools, however, have been noted as either nonmusic or containing no actual measurements (Barkley, 2006; McClung, 1996; McQuarrie & Sherwin, 2013; Russell & Austin, 2010; Simanton, 2000). McQuarrie and Sherwin (2013) found that the top five assessments used in elementary general music were based on (a) participation, (b) effort, (c) individual performances using informal observation, (d) group performances, and (e) behavior. These results were echoed by Barkley (2006) who reported that observation was the most commonly used assessment strategy in elementary music classrooms, followed by concert performances and written tests. Russell and Austin (2010) found similar results among secondary music teachers, with the top five assessment criteria in order of use being (a) performance/skill, (b) knowledge, (c) attendance, (d) attitude, and (e) practice. Additionally, Simanton (2000) reported that secondary band directors used (a) participation/attitude, (b) band music performance, (c) attendance, and (d) technique/sight-reading as the primary means of assessing students, and that 56% of assigned grades in band classes were derived from nonperformance criteria. McClung (1996) noted that students perceived individual participation and attitude as the most frequent predictor of grades in choir.

The proliferation of nonmusic assessments, such as attendance or attitude, in classrooms has contrasted with much of the published literature, which has focused on assessments like individual and group performance, standardized testing, assessment software, and others (McQuarrie & Sherwin, 2013). While simple to execute and often supported by administrators (McClung, 1996; Russell & Austin, 2010), nonmusic assessments have not been noted to support music learning and growth to the same extent as content-based assessments built on demonstrations of music knowledge and skills (Reimer, 2009). Russell and Austin (2010) determined that content-based assessment, such as evaluation of music performance or music composition, was the most effective approach to improve teaching and learning.

Assessment Techniques

NAfME (2015) leaders have proposed that assessment in music “should measure student learning across a range of standards representative of quality, balanced music curriculum, including not only responding to music but also creating and performing music.” Wesolowski (2014) remarked that having a deep understanding of music is essential for evaluating both student learning and one’s own instruction. Fortunately, many of our schools of music have been perceived by recent graduates as excelling at instruction in performance, music history, and music theory (Denis, 2017). Subsequently, music educators may find that their music preparations may facilitate the acquisition of performance assessment skills.

Performance has been found to be one of the most common forms of assessment in music, due to its paramount nature with regard to music making and its power to motivate students (Latimer et al., 2010; Reimer, 2009). While definitively authentic, performance assessment has been cited as a subjective endeavor (Bergee & McWhirter, 2005; Latimer et al., 2010; Reimer, 2009; Ryan & Costa-Giomi, 2004). Researchers have noted that school size classification, time of day, type of event, and level of expenditure per average daily attendance were all strong predictors of festival scores (Bergee & McWhirter, 2005). Bergee and Westfall (2005) found many of the same strong relationships between nonmusic predictors like school size or time of day, as well as the added predictor of geographical district (metropolitan or nonmetropolitan), and festival ratings. Furthermore, Ryan and Costa-Giomi (2004) reported that evaluations by judges were influenced by perceived attractiveness and gender.

Several different approaches have been developed and researched to offset inconsistencies in evaluation, including the use of rubrics. Chiodo (2001) suggested that rubrics can be effective tools to help judges organize and evaluate music performance, when aligned with expressed standards or expectations. In general, rubrics use descriptors of performance domain criteria (e.g., tone, balance, rhythm, etc.) to provide an isolated domain-specific rating or overall total for a holistic performance score (Latimer et al., 2010). Multidimensional rubrics with specific descriptions of performance dimensions have been found to provide both higher interjudge reliability (Norris & Borst, 2007) and a more detailed picture of student achievement (Ciorba & Smith, 2009). Latimer et al. (2010) used the institution of a new statewide rubric to assess both reliability and validity of rubric use in Kansas for bands, choirs, and orchestras across a 2-year period. In addition to finding that the rubric led to high interjudge reliability, they also reported that both adjudicators and ensemble directors found that rubric use led to scores that represented performance quality and provided better feedback than previously used forms. For practitioner purposes, rubrics may offer a way to remain consistent between individuals or groups (Chiodo, 2001) and may increase convergent thinking and focus among students, as well as serving as a source of student motivation (Colwell, 2002). Similarly, rubrics may communicate instructor expectations, highlight essential concepts or material, and guide student work.

Rhythm, however, has been noted to be the least reliable dimension used with rubrics (Latimer et al., 2010; Norris & Borst, 2007). Latimer et al. (2010) suggested that the limited reliability of rhythms may be due to a lack of rubric specificity. As such, practitioners may desire to strive for specific and clear definitions of rhythm and rhythmic accuracy in self-designed rubrics to increase reliability.

Several other ways to assess music performance, including checklists, rating scales, and recordings, have been discussed in the literature. Essentially functioning as a tally system to guide instructor focus, practitioners have described checklists as allowing for individual assessment on any number of teacher-determined skills (Chiodo, 2001; Goolsby, 1999). Therefore, checklists may be used with judges to arrive at an overall rating, and have been found to be reliable (Doane, Davidsen, & Hartman, 1990). Wesolowski (2014), however, expressed validity concerns for the implementation of checklists in his own experience, particularly due to the potential disconnect between cognitive processes and observable music behaviors. In music classrooms, checklists may often appear in the form of pass off charts or objective sheets. Akin to rubrics, the checklist has been set forth by music educators as allowing for communication of expectations and content across a variety of topics (Chiodo, 2001; Goolsby, 1999) and may allow for more specific feedback than rubrics (Colwell, 2002). The feedback innate to the format, on the other hand, may be limited, as the metric relies on a pass/fail check of skill acquisition.

Rating scales have been likened to checklists in practitioner literature; however, they have been suggested to measure performance skills with more gradation in results (Chiodo, 2001; Wesolowski, 2014). These scales have been found to be criteria specific and can be continuous, through the measurement of mastery and a hierarchy of interdependent skills, or additive, through the measurement and summation of independent skills (Azzara, 1993; Saunders & Holahan, 1997). Researchers have found rating scales to lead to high interjudge reliability (Bergee, 2003; Saunders & Holahan, 1997).

Rating scales may offer a quick way to focus on specific music skills or knowledge and still attach the feedback necessary for learning in a large classroom setting. While Chiodo (2001) suggested that, in her experience, rating scales offered an efficient way to assess performance, a balance must be found between having a scale large enough to provide meaningful feedback and one small enough to keep the pacing advantages the format offers. In practical use, criteria-specific rating scales may be used for quick placement, diagnostic, or formative assessments. In small group settings, such as a sectional rehearsal, a performance assessment on phrasing might be taken and feedback given through a rating scale without the use of an excessive amount of instruction time.

Apart from purely performance assessment, portfolios have been implemented with increased frequency over the past 25 years in music settings (Lehman, 2008; Parkes et al., 2015; Zerull, 1990). Often touted in response to the value-added modeling assessments stressed in the Race to the Top grant program, portfolios have been defined as collections of artifacts and student work that serve to document growth and learning over a period of time (Asmus, 1999; Hughes & Keith, 2014; Parkes et al., 2015). Portfolios may contain quantifiable aspects, such as the number of recordings, amount of written work, and scores from competitions, qualitative journal entries, teacher impressions, and/or the performance recordings of music (Zerull, 1990). Additionally, Wesolowski (2014) suggested that practitioners may also use portfolios for both formative and summative assessment, in addition to authentic or alternative assessment, depending on timing and material selection approach. Colwell (2002) noted that portfolios may require many individual tasks for each assessed objective, and may lead to increased work for practitioners.

In 2012, the Memphis City Schools (now Shelby County Schools) and the state of Tennessee pioneered and developed a portfolio assessment that used student work in evaluating learning growth. Eventually called the Portfolio Growth Measure System, the approach used both teacher selection/scoring of materials and blind review by trained content-specific reviewers. In 2015, Parkes et al. piloted a modification of the Portfolio Growth Measure System in Virginia used for evaluating music educators and found that stakeholders believed that documenting teacher effectiveness with portfolios was an acceptable approach to documentation, and that the development of measurement instruments and reviews of the portfolios held reasonable validity and reliability. Making a diverse selection of artifacts that reflect content knowledge has been found to be essential for practitioners to receive the benefits of portfolio assessment (Parkes et al., 2015). This approach could be applied to student portfolios, and therefore used to document both specific learning and growth. It may be helpful to decide on the categories or types of artifacts prior to the beginning of the course, so as to avoid confusion.

In addition to performance assessments, traditional written assignments can also be used in calculating student grades (Russell & Austin, 2010). The most common written assessments in secondary music classes have been reported to be quizzes, worksheets, and exams, all of which may be used to appraise basic content knowledge, such as vocabulary, notation, or music theory, and basic music reading and listening skills (Russell & Austin, 2010; Salvador, 2011). Tests have been noted to be frequently used as an assessment tool; yet writing an appropriate, effective test has been found to be a difficult and time-intensive undertaking, and the necessary skills must be learned and developed (Lehman, 2008). Cross-discipline assignments, such as essays about music experiences, may be viewed favorably by many administrators, but may not be valid assessments of music content knowledge. In general, written assignments may allow for assessment of music knowledge, and careful consideration may alleviate validity concerns due to assignment construction or content. Moreover, attentive planning may free written assignments from the grade-focused summative nature practitioners may initially call to mind when considering traditional pen-and-paper assessments.

Assessment is not limited to teacher-centered activities; however, the development of self-assessment skills requires that students experience appropriate external assessment, and be taught to transfer those concepts to autonomous use (Reimer, 2009). Therefore, the complexities of self-evaluation should be experienced by students after basic skill acquisition and learning have occurred (Colwell, 1998; Goolsby, 1999). Any of the previously mentioned techniques can be used to help facilitate self-assessment; however, the use of recordings becomes essential (Silveira & Gavin, 2016). Contest adjudicators and teachers have been trained in assessment using audio recordings (Bergee, 2007) and when teaching students to evaluate themselves, the use of recordings can be advantageous (Silveira & Gavin, 2016). When asking students to record themselves, Goolsby (1999) suggested that practitioners both communicate and practice the necessary procedures to alleviate challenges with using technology.

What Challenges Occur in Music Assessment

A number of inherent challenges exist when trying to develop music assessments. From a broad perspective, the lack of agreement on music curricula or the end goals of instruction have created divisions in assessment approaches, which in turn may have erected barriers to assessment (Lehman, 2008; Reimer, 2009). The subjective nature of any music value judgments may have further hindered the development of any consistency in assessment; however, the rise of the standards movement may assuage some of these concerns (Colwell, 2008; Lehman, 2008). The use of standards may also align with McClung (1996), who commented that grades should be linked to specific learning objectives.

Logistical challenges may also influence assessment decisions and impede improvement in assessment practice (Ferm Almqvist, Vinge, Väkevä, & Zandén, 2017; Harrison et al., 2013; Russell & Austin, 2010; Salvador, 2011). In a qualitative study of three purposely selected elementary music teachers, Salvador (2011) noted that participants viewed the number of students taught, time constraints, and the lack of administrator support as being obstacles to improved assessment. Similarly, Harrison et al. (2013), Lehman (2008), and Ferm Almqvist et al. (2017) all echoed the concern over class size issues and their impact on effective assessment. Russell and Austin (2010) also found that administrative guidance or assistance with assessment was rare, yet when administrators provided help, nonmusic assessments were less likely to be present in the classroom. Additionally, Reimer (2009) suggested that the complexity of grading individuals in ensembles may create confusion or disagreement between music educators and administrators, and that administrators may be concerned that such confusion could lead to grade inflation.

A lack of transparency in grading procedures used in ensembles can lead to perceptions of favoritism (Harrison et al., 2013). Providing written grading policies to students and parents may alleviate transparency concerns. On the contrary, Ferm Almqvist et al. (2017) found that assessment often drove instruction and learning goals directly, stating that “assessment procedures and techniques [became] so persistent that they completely dominate[d] the teaching and learning experiences” (p. 6). Ferm Almqvist et al. (2017) further suggested that transparency contributed to the dominant role assessment played in participant classrooms. Overall, student understanding of teacher expectations has remained crucial for learning, and assessments have been suggested as meaningful indicators of expectations, goals, and objectives (Reimer, 2009). Including rationales and reasoning in assessments has also been offered by practitioners as a way for music educators to document rigor (Wesolowski, 2014).

Even when transparency exists, assessing creativity in music has often proven especially challenging for music educators (Hickey, 2001). Scholars have described creativity as novel, appropriate, and valuable (Barbot & Lubart, 2012; Stefanic & Randles, 2015) and have noted its historical importance in music education (Rohwer, 1997). Despite the educational value of creativity, the inherent subjectivity may prove problematic for educator assessments (Hickey, 2001; Rohwer, 1997).

Researcher-designed standardized assessments and product-based assessments have been used to evaluate and measure creativity in research literature. The most prominent researcher-designed standardized measures of creativity in music were developed by Guilford and Torrance during the middle of the 20th century (Rohwer, 1997; Stefanic & Randles, 2015). The Torrance Tests of Creative Thinking (TTCT) have been renormed as may serve as an example of standardized creative measures (Stefanic & Randles, 2015). Individuals or groups respond to activities (five activities on the TTCT-Verbal and three activities on TTCT-Figural) in writing and in drawing, which are then scored (Kim, Crammond, & Bandalos, 2006). These standardized tests have been used as a foundation for creativity research, although concerns about reliability have been raised (Stefanic & Randles, 2015).

In contrast, product-based assessment has been favored by researchers in recent years, with many studies using Amabile’s (1983) Consensual Assessment Technique (CAT) (Barbot & Lubart, 2012; Eisenberg & Thompson, 2003; Hickey, 2001). Counter to standardized tests, CAT measurements involve domain-appropriate judges using a Likert-type scale to rate artistic products on multiple criteria. This approach has been found by researchers to be reliable when used with both improvisation (Eisenberg & Thompson, 2003) and composition (Hickey, 2001; Stefanic & Randles, 2015). Stefanic and Randles (2015) also found that CAT was less reliable when evaluating group compositions or when there was only one judge. The need for multiple expert judges may pose a logistical problem for the implementation of CAT in some settings (Lu & Luh, 2012).

General tenants from both researcher-designed measurements and general assessment strategies may still be valuable to practitioners in developing appropriate assessments for music creativity. The lack of criterion or standards for evaluating creativity, for instance, may free music educators to focus on growth in novelty or appropriateness of the creative product. To this end, practitioners may use assessment tools such as checklists, rating scales, peer/teacher responses, or portfolios to evaluate student creativity (Rohwer, 1997).

Finally, the common use of nonmusic assessments may point to external factors influencing assessment choices. Russel and Austin (2010) found that practitioner practice and district/campus policies were often at odds, and music teachers were rarely provided assistance in reconciling any discrepancies. Barkley (2006) found that elementary music educators believed nonmusic factors, such as itinerancy, district/campus policies, planning time, training, and resources, all influenced assessment practices. Simanton (2000) suggested that the extensive work load of band directors may contribute to the lack of specific, individual assessments.

Conclusions

In a climate increasingly focused on accountability, how do we implement effective assessment without sacrificing effective music making? This process may begin with the instructor’s deep understanding of music. Colwell (1998) argued that developing a meaningful assessment, particularly in a discipline with a high level of subjectivity, requires a strong foundation of music knowledge. Difficulty in assessment may increase if one attempts to keep the flexibility needed for creativity and artistry in the music classroom alive. The authentic performance assessments that so often constitute the majority of content-specific grades in music ensembles may, if developed and implemented poorly, contribute to the restriction of creative thought by reinforcing the power dynamics of teacher-led decisionmaking. As such, knowing music from a formalist and expressionist perspective might allow practitioners to create learning environments tied to a holistic ideal of music making, instead of simply continuing the deficiency rote performance assessments often seen in school.

Assessing specific knowledge may also be important; however, knowing exactly what and how to assess appropriately may remain problematic. Therefore, a significant understanding of assessment fundamentals and strategies may help practitioners gather truly meaningful data about their students (Colwell, 1998). Yet, in spite of this self-evident fact, many teachers may be unaware of validity concerns in either teacher-created or externally provided assessments. Additionally, deficiencies in practitioner assessment knowledge may lead to misuse of assessment strategies, and in turn promote incorrect conclusions and negatively influence instruction. To avoid misuse, researchers have suggested that assessments be linked with learning objectives (McClung, 1996; Parkes et al., 2015) and be implemented in an instructionally meaningful sequence (Wesolowski, 2014). While nonmusic assessments may be common, they may devalue the overall learning goals and public perceptions of the rigor of school music.

Accordingly, practitioners should understand the knowledge and skills related to assessment in order to evaluate and instruct students effectively. Therefore, intentionally varying assessment approaches and material assessed may provide a more holistic picture of student progress and the efficacy of teacher instruction. Additionally, disparate learning objectives may be best assessed by disparate techniques. Consequently, practitioner planning may be an essential part of meaningful assessment.

Planning may also allow for practitioners to gain a more holistic view of student learning. For instance, a student composition may be assessed through both a teacher checklist to provide feedback about specific areas of growth and a written peer evaluation to address more general aesthetic attributes (Rohwer, 1997). Planning may also mitigate issues common to assessment. Peer feedback, for example, may require careful introduction and practice of procedures to develop a safe environment for the students and facilitate the generation of relevant and meaningful feedback. Sicherl Kafol et al. (2017) found that proper establishment of peer feedback procedures led to improved student perceptions of assessment.

Practically, assessment can direct student and teacher focus. Ferm Almqvist et al. (2017) found that summative assessments drove student learning, as students were coached to perform well on assessments. In light of the power of assessment to shape content, Duke (2005) advocated for planning instruction with the assessment clearly in mind. For practitioners, this may mean deciding on aspects of the material to assess and approaches prior to developing lesson plans. As an illustration, when working with a secondary ensemble, instructors may select the music, decide on the standards and objectives for musicality/expression/accuracy/and so on, develop several approaches for formative assessment (such as an overall rubric for the selected music, checklists for each song, or designated moments of peer feedback), and culminate with the summative assessment of concert performance. Each specific learning objective or standard may need multiple assessments. Such frequency may serve to increase the opportunities to transition content knowledge about music into procedural knowledge of making music while lessening the consequences for failure (Duke, 2005). Consequently, practitioners may want to have frequent, intentional, formative assessments in their classrooms to facilitate teaching and learning.

As discussed previously, several different approaches to assessment may be best suited for differing objectives. Performance remains one of the most common forms of assessment in music; however, practitioners may wish to extend their conceptualization of performance beyond the concert stage. Performances for peers may provide opportunities for reflection, the development of evaluation skills, and practice in the procedural demonstration of music understanding. Furthermore, performances in varying settings may allow for students to exhibit music learning for the teacher, administration, community, and themselves. Practitioners may then use the information gathered to inform future curricula and instruction. For instance, students may be assigned solos and ensembles as part of the class curriculum. During the learning stages, performances for the instructor and other students may offer frequent, low-impact opportunities for feedback and assessment. Building from these performance assessment experiences, students may then participate in community performances of their music. At this stage, students have further opportunities to demonstrate their understanding of the music, and if recordings are made they may again reflect, either individually or with others, on their performance. Finally, performing the solo or ensemble as part of a competition may give a summative assessment of students’ understanding of the piece and individual instrument. By having multiple assessments in various settings, teachers may mitigate some of the noted subjectivity of performance assessments (Bergee & McWhirter, 2005; Latimer et al., 2010; Reimer, 2009; Ryan & Costa-Giomi, 2004).

Similarly, rubrics may help practitioners clearly communicate expectations and increase consistency in certain assessment areas. Practitioners may use rubrics as a way to communicate the desired learning outcomes of a particular unit or topic, and give structure to planned assessment. To illustrate this, practitioners may create a rubric for a piece of music prior to distribution to students that describes the intended learning objectives that will be assessed. This rubric may prime students for engaging with the music once rehearsals begin. Of particular application, the high rate of interjudge reliability found with the use of rubrics (Latimer et al., 2010; Norris & Borst, 2007) may facilitate assessments where more than one practitioner might be involved, such as placement auditions. Correspondingly, rating scales and checklists may also be used to communicate expectations.

Portfolios, while potentially time-consuming for practitioners, may provide a wider picture of student learning by incorporating various diverse artifacts related to music. Student music theory work, performance, composition, and writings may all contribute to a holistic picture of understanding and growth. Practitioners who implement portfolios may desire to predetermine a specific approach to artifact inclusion in the portfolio to avoid student confusion and effective assessment of the learning objectives. Parkes et al. (2015) noted that blind portfolio reviewers occasionally struggled to understand teacher artifact selection. To reduce extraneous time spent evaluating portfolios, practitioners may choose to have a priori categories or descriptors for artifact inclusion and establish overarching goals and procedures for the portfolio. Structured planning may allow practitioners to synthesize artifacts with greater ease. Conversely, too detailed criteria may limit creativity by once again focusing students and teachers toward the assessment instead of the learning objective.

Assessment and accountability are inherent in music education. As a profession, we strive for improved teaching and learning, leading toward instilling in our students a lifelong love of music. Consequently, the effectiveness of our assessments matters, as motivation (Colwell, 1998; McClung, 1996; Reimer, 2009), external value judgments (Colwell, 1998; Reimer, 2009; Zerull, 1990), and most important, student learning (Boud et al., 1999; Eisner, 1998; Lehman, 2008) all can be linked to assessment.

Footnotes

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

References

Amabile

T. M.

(1983). The social psychology of creativity. New York, NY: Springer.

Asmus

E. P.

(1999). Music assessment concepts. Music Educators Journal, 86(2), 19–24. doi:10.2307/3399585

Azzara

C. D.

(1993). Audiation-based improvisation techniques and elementary instrumental students’ music achievement. Journal of Research in Music Education, 41, 328–342. doi:10.2307/3345508

Barbot

Lubart

(2012). Creative thinking in music: Its nature and assessment through musical exploratory behaviors. Psychology of Aesthetics, Creativity, and the Arts, 6, 231–242. doi:10.1037/a0027307

Barkley

(2006). Assessment of the National Standards for Music Education: A study of elementary general music teacher attitudes and practices (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses database. (UMI No. 1439697)

Bergee

M. J.

(2003). Faculty interjudge reliability of music performance evaluation. Journal of Research in Music Education, 51, 137–150. doi:10.2307/3345847

Bergee

M. J.

(2007). Performer, rater, occasion, and sequence as sources of variability in music performance assessment. Journal of Research in Music Education, 55, 344–358. doi:10.1177/0022429408317515

Bergee

M. J.

McWhirter

J. L.

(2005). Selected influences on solo and small-ensemble festival ratings: Replication and extension. Journal of Research in Music Education, 53, 177–190. doi:10.1177/002242940505300207

Bergee

M. J.

Westfall

C. R.

(2005). Stability of a model explaining selected extramusical influences on solo and small-ensemble festival ratings. Journal of Research in Music Education, 53, 358–374. doi:10.1177/002242940505300407

10.

Boud

Cohen

Sampson

(1999). Peer learning assessment. Assessment & Evaluation in Higher Education, 24, 413–426.

11.

Chiodo

(2001). Assessing a cast of thousands. Music Educators Journal, 87(6), 17–23. doi:10.2307/3399687

12.

Ciorba

C. R.

Smith

N. Y.

(2009). Measurement of instrumental and vocal undergraduate performance juries using a multidimensional assessment rubric. Journal of Research in Music Education, 57, 5–15. doi:10.1177/0022429409333405

13.

Colwell

(1998). Preparing student teachers in assessment. Arts Education Policy Review, 99, 29–36. doi:10.1080/10632919809600780

14.

Colwell

(2002). Assessment’s potential in music education. In Colwell

Richardson

(Eds.), The new handbook of research on music teaching and learning (pp. 1128–1158). Reston, VA: MENC.

15.

Colwell

(2008). Music assessment in an increasingly politicized, accountability-driven educational environment. In Brophy

T. S.

(Ed.), Assessment in music education: Integrating curriculum, theory, and practice (pp. 3–16). Chicago, IL: GIA.

16.

Denis

J. M.

(2017). Novice Texas band directors’ perceptions of the skills and knowledge for successful teaching (Doctoral dissertation). Retrieved from https://digital.library.unt.edu/ark:/67531/metadc1011801/

17.

Doane

Davidson

Hartman

(1990). A validation of music teacher behaviors based on music achievement in elementary general music students. Research Perspectives in Music Education, 44(1), 24–41.

18.

Duke

R. A.

(2005). Intelligent music teaching: Essays on the core principles of effective instruction. Austin, TX: Learning and Behavior Resources.

19.

Eisenberg

Thompson

W. F.

(2003). A matter of taste: Evaluating improvised music. Creativity Research Journal, 15, 287–296. doi:10.1080/10400419.2003.9651421

20.

Eisner

E. W.

(1998). The enlightened eye: Qualitative inquiry and the enhancement of educational practice. Upper Saddle River, NJ: Merrill.

21.

Ferm Almqvist

Vinge

Väkevä

Zandén

(2017). Assessment as learning in music education: The risk of “criteria compliance” replacing “learning” in the Scandinavian countries. Research Studies in Music Education, 39, 3–18. doi:10.1177/1321103X16676649

22.

Fisher

(2008). Debating assessment in music education. Research & Issues in Music Education, 6(1), 4. Retrieved from http://ir.stthomas.edu/rime/vol6/iss1/4

23.

Goolsby

T. W.

(1999). Assessment in instrumental music. Music Educators Journal, 86(2), 31–50. doi:10.2307/3399587

24.

Harrison

S. D.

Lebler

Carey

Hitchcock

O’Bryan

(2013). Making music or gaining grades? Assessment practices in tertiary music ensembles. British Journal of Music Education, 30, 27–42. doi:10.1017/S0265051712000253

25.

Hickey

(2001). An application of Amabile’s consensual assessment technique for rating the creativity of children’s musical compositions. Journal of Research in Music Education, 49, 234–244. doi:10.2307/3345709

26.

Hughes

Keith

(2015). Linking assessment practices, unit-level outcomes and discipline-specific capabilities in contemporary music studies. In Lebler

Carey

Harrison

(Eds.), Assessment in music education: From policy to practice (pp. 171–193). New York: NY, Springer. doi:10.1007/978-3-319-10274-0_12

27.

Kim

K. H.

Cramond

Bandalos

D. L.

(2006). The latent structure and measurement invariance of scores on the Torrance Tests of Creative Thinking-Figural. Educational and Psychological Measurement, 66, 459–477. doi:10.1177/0013164405282456

28.

Latimer

M. E.

Jr. Bergee

M. J.

Cohen

M. L.

(2010). Reliability and perceived pedagogical utility of a weighted music performance assessment rubric. Journal of Research in Music Education, 58, 168–183. doi:10.1177/0022429410369836

29.

Lehman

P. R.

(2008). Getting down to basics. In Brophy

T. S.

(Ed.), Assessment in music education: Integrating curriculum, theory, and practice (pp. 17–28). Chicago, IL: GIA.

30.

C. C.

Luh

D. B.

(2012). A comparison of assessment methods and raters in product creativity. Creativity Research Journal, 24, 331–337. doi:10.1080/10400419.2012.730327

31.

Mastrorilli

T. M.

Harnett

Zhu

(2014). Arts achieve, impacting student success in the arts: Preliminary findings after one year of implementation. Journal for Learning Through the Arts, 10, 1–24. Retrieved from http://escholarship.org/uc/item/6c81239d

32.

McClung

A. C.

(1996). A descriptive study of learning assessment and grading practices in the high school choral music performance classroom (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses database. (UMI No. 9700217)

33.

McQuarrie

S. H.

Sherwin

R. G.

(2013). Assessment in music education: Relationships between classroom practice and professional publication topics. Research & Issues in Music Education, 11(1), 6. Retrieved from http://ir.stthomas.edu/rime/vol11/iss1/6

34.

National Association for Music Education. (2015). Assessment in music education. Retrieved from http://nafme.org/about/position-statements/assessment-in-music-education-position-statement/assessment-in-music-education/

35.

Norris

C. E.

Borst

J. D.

(2007). An examination of the reliabilities of two choral festival adjudication forms. Journal of Research in Music Education, 5, 237–251. doi:10.1177/002242940705500305

36.

Parkes

K. A.

Rohwer

Davison

(2015). Measuring student music growth with blind-reviewed portfolios: A pilot study. Bulletin of the Council for Research in Music Education, 203, 23–44. doi:10.5406/bulcouresmusedu.203.0023

37.

Reimer

M. U.

(2009). Assessing individual performance in the college band. Research & Issues in Music Education, 7(1), 7. Retrieved from http://ir.stthomas.edu/rime/vol7/iss1/3

38.

Rohwer

D. A.

(1997). The challenges of teaching and assessing creative activities. Update: Applications of Research in Music Education, 15(2), 8–12.

39.

Russell

J. A.

Austin

J. R.

(2010). Assessment practices of secondary music teachers. Journal of Research in Music Education, 58, 37–54. doi:10.177/0022429409360062

40.

Ryan

Costa-Giomi

(2004). Attractiveness bias in the evaluation of young pianists’ performances. Journal of Research in Music Education, 52, 141–154. doi:10.2307/3345436

41.

Salvador

(2011). Individualizing elementary general music instruction: Case studies of assessment and differentiation (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses database. (UMI No. 3482549)

42.

Saunders

T. C.

Holahan

J. M.

(1997). Criteria-specific rating scales in the evaluation of high school instrumental performance. Journal of Research in Music Education, 45, 259–272. doi:10.2307/3345585

43.

Sicherl Kafol

Kordeš

Holcar Brunauer

(2017). Assessment for learning in music education in the Slovenian context: From punishment or reward to support. Music Education Research, 19, 17–28. doi:10.1080/14613808.2015.1077800

44.

Simanton

E. G.

(2000). Assessment and grading practices among high school band teachers in the United States: A descriptive study (Doctoral dissertation). Retrieved from ProQuest Dissertations and Theses database. (UMI No. 304630933)

45.

Shuler

S. C.

Norgaard

Blakeslee

M. J.

(2014). The new national standards for music educators. Music Educators Journal, 101(1), 41–49.

46.

Silveira

J. M.

Gavin

(2016). The effect of audio recording and playback on self-assessment among middle school instrumental music students. Psychology of Music, 44, 880–892. doi:10.1177/0305735615596375

47.

Stefanic

Randles

(2015). Examining the reliability of scores from the consensual assessment technique in the measurement of individual and small group creativity. Music Education Research, 17, 278–295. doi:10.1080/14613808.2014.909398

48.

Wesolowski

(2014). Documenting student learning in music performance: A framework. Music Educators Journal, 101(1), 77–85. doi:10.1177/0027432114540475

49.

West

(2012). Teaching music in an era of high-stakes testing and budget reductions. Arts Education Policy Review, 113, 75–79. doi:10.1080/10632913.2012.656503

50.

Wright

Humphrey

Larrick

G. H.

Gifford

R. M.

Wardlaw

(2005). Don’t count on testing. Music Educators Journal, 92(2), 6–7. doi:10.2307/3400175

51.

Zerull

D. S.

(1990). Evaluation in arts education: Building and using an effective assessment strategy. Design for Arts in Education, 92(1), 19–24.