Abstract
The purpose of this study was to examine effects of ensemble size and repertoire difficulty on listeners’ perceptions of concert band performances. Undergraduate music majors (N = 210) viewed an audiovisual stimulus consisting of various images of large and small concert bands paired with identical audio performances of either an easy or difficult composition. Participants rated each ensemble’s tone quality/intonation, musicianship/expression, and rhythm/articulation using a 10-point Likert-type scale. Results indicated no main effects for ensemble size or order. There was a significant main effect for repertoire difficulty, with difficult repertoire being evaluated more positively than easier repertoire. We also found a significant Ensemble Size × Repertoire Difficulty × Order interaction, indicating that results were moderated based on order. Within the four orders, the largest mean difference in scores occurred in Order 3 (small/difficult, large/easy, small/easy, large/difficult), with the smallest mean differences occurring in Order 2 (large/difficult, small/easy, large/easy, small/difficult). The small/easy and large/easy videos and the small/difficult and large/difficult videos resulted in positive changes in ratings only when seen first and last, respectively. We recommend blind evaluation and the use of required “test pieces” in concert band festivals as ways to possibly mitigate effects of repertoire difficulty and ensemble size.
Many ensemble directors and students involved with secondary school music programs consider their performances at adjudicated contests and festivals (hereafter, festivals) to be the defining events of their entire school year (Burnsed & Sochinski, 1983; Rohrer, 2002). Because festival ratings also matter to administrators and parents (Batey, 2002; Boyle, 1992), directors often experience anxiety when preparing for these events (Barnes & McCashin, 2005). Given that directors’ professional competence and leadership are often gauged by the ratings garnered by their ensembles (Burnsed, Hinkle, & King, 1985; Conrad, 2003), factors that influence ensemble festival ratings seem worthy of continued study.
Perhaps due to the importance surrounding festival ratings, investigators have explored several factors related to the festival experience and their effects on large ensemble performance ratings. Assessing music performance is a multifaceted process. Several researchers have documented variables that can impact how individuals assess music performances. For example, McPherson and Thompson (1998) delineated several interacting factors that can have an effect on adjudicators’ performance evaluations. These factors included context (i.e., purpose of assessment, type of performance, instrumentation, and performance space), performer/evaluator characteristics, evaluation instruments and/or criteria, and musical and extramusical factors.
Research findings have indicated that the ratings given at festivals can be influenced by the size of the adjudication panel (Bergee, 2003; Brakel, 2006; Fiske, 1983), time of the performance (Bergee & McWhirter, 2005; Bergee & Platt, 2003), and whether the judges had completed previous adjudicator training (Boeckman, 2002; Hunter & Russ, 1996; Winter, 1993). Additional considerations affecting the consistency of the final rating include adjudicator experience (Fiske, 1975; Garman, Boyle, & DeCarbo, 1991) and the type of assessment tool that is used (e.g., Abeles, 1973; Bergee, 1988, 1989a, 1989b; Norris & Borst, 2007; Saunders & Holahan, 1997; Zdzinski & Barnes, 2002). Furthermore, physical characteristics and conductor behavior can influence observers’ perceptions of both the conductor’s and ensemble’s performance and expressivity (Madsen, 2009; Morrison, 1998; Morrison et al., 2009; Price & Chang, 2005; VanWeelden, 2002; VanWeelden & McGee, 2007; Wapnick et al., 1997; Wapnick, Mazza, & Darrow, 1998, 2000; Wöllner & Auhagen, 2008). Although some of these variables may be outside of director control, festival organizers should be aware of these findings when preparing for and organizing these events.
Repertoire difficulty is yet another variable that has been shown to influence large ensemble performance ratings. Hash (2012) concluded that bands performing more difficult repertoire at the South Carolina Band Directors Association state music festivals were rated significantly higher than those performing easier music. Similar results of ratings for Texas choirs were reported by Baker (2004) and Killian (1998, 1999, 2000). Irrespective of the performing medium, it appears possible that adjudicators may be swayed not just by the quality of the performance but also the difficulty of the music selections.
Although auditory information is often thought to be the most important factor in assessing a music performance (Murnighan & Conlon, 1991), visual information can influence evaluators’ judgments. Tsay (2013, 2014) found that professional and novice musicians relied primarily on visual cues when evaluating both individual and ensemble performances. Tsay suggested that regardless of training and knowledge of music, experts may be just as vulnerable as novices in letting visual information affect evaluations of auditory stimuli. Interestingly, Tsay (2013) found that both musical novices and experts successfully identified ensemble competition winners by viewing silent videos but were no better than chance at identifying winners through audio alone or audio with video. And Tsay (2014) concluded, “whether the performance involves one musician, a chamber ensemble, or a symphony orchestra, people appear to overweight visual information in their evaluation of music performances. This vision heuristic rapidly guides people, even domain experts, toward judgment” (p. 30). Knowing that the vision heuristic can influence domain experts might imply that festival adjudicators could unknowingly be biased based on the visual information they are processing rather than making judgements based solely on the aural experience.
Additional elements that have been found to influence the vision heuristic include memorization of music, race, and performer appearance. Research by Williamon (1999) and Kopiez, Wolf, and Platz (2017) indicated that adjudicators provided higher ratings to identical audio performances that appeared as if they were memorized (i.e., no music stand in view) versus those that were not. Researchers have also found that race can bias evaluators’ perceptions of performances. In a study investigating solo flute and trumpet performances, Elliott (1995/1996) observed that white performers received higher evaluations than black performers overall. Additionally, VanWeelden and McGee (2007) found that white choral conductors received higher evaluations when conducting Western art music than when black choral conductors were shown conducting the same excerpt. Conversely, black conductors received higher ratings than white conductors when conducting a spiritual. In both instances, however, the same audio track was used for the black and white conductors. There are also many research findings that document attractiveness bias in performance evaluations (Wapnick et al., 1997; Wapnick et al., 1998, 2000) and the influence of body movement on listeners’ perceptions (Davidson, 1993, 1994; Juchniewicz, 2008; Silveira, 2014). Although the goal of performance evaluations is to provide an accurate and reliable assessment of a musical product, it appears that a number of extramusical factors can influence perceptions of performance quality.
The size of a performing ensemble has also been shown to influence raters’ evaluations of performances. Researchers have found that larger marching bands were rated significantly higher than their smaller counterparts (King & Burnsed, 2009; Rickels, 2009) and that larger choirs received significantly more superior ratings than smaller choirs (Killian, 1998, 1999, 2000). However, we found no empirical evidence of this phenomenon occurring with concert bands. In the aforementioned studies, the aural component (i.e., the performance) was not experimentally manipulated and held constant as has been done in prior research investigating the visual heuristic (e.g., Tsay, 2013, 2014; VanWeelden & McGee, 2007). Given Tsay’s (2013, 2014) findings, it seems plausible that visual information might have an effect on adjudicators’ evaluations of concert bands of various sizes. Exploring the potential relationship between ensemble size and ratings of concert band performances seems important, especially if viewing a larger ensemble might sway adjudicators’ ratings at festivals.
The purpose of this study was to determine the effect of ensemble size on music major listeners’ perceptions of concert band performances. A second purpose of our study involved the addition of another independent variable—repertoire difficulty. As suggested by Hash (2012), perhaps the “difficulty of repertoire and ensemble size also might influence adjudication” (p. 83). As noted, research findings have shown that ensembles that performed more difficult music earned significantly higher ratings than those who performed less demanding repertoire (Baker, 2004; Brakel, 2006; Hash, 2012). We were interested in determining whether ensemble size and repertoire difficulty may interact when considering listeners’ evaluations of large group ensemble performance. Therefore, this study was guided by the following research question: What are the effects of ensemble size (small or large) and repertoire difficulty (easy or difficult) on listeners’ performance ratings?
Method
Definitions
For the purposes of this study, “small” and “large” concert bands were defined as having approximately 30 and 90 members, respectively. These figures were agreed on in consultation with six experts who each had over 10 years of concert band festival adjudication experience. In addition, we defined “easy” (grade 2 of 6) and “difficult” (grade 5 of 6) repertoire based on difficulty classifications found in the Teaching Music Through Performance in Band series (Miles, 2009).
Participants
Participants (N = 210) included a convenience sample of male (n = 112), female (n = 96), and other (n = 2) music majors (Mage = 21.14 years, SD = 9.09) from five different universities located throughout the Midwest, South, or Northeast United States. Participants were recruited by contacting the band directors via email at each of the five institutions. These directors agreed to forward the questionnaire link on to the membership of their respective ensembles. Additional demographic information can be found in Supplemental Table S1 (available in the online version of the article). Given our decision to use a repeated measures ANOVA test, an a priori alpha level of .05, a chosen power level of .80, and a medium effect size of .25 (Cohen, 1988), we conducted a power analysis using the statistical software G*Power (Faul et al., 2007). This program computed a minimum necessary sample size of 192 participants.
Visual and Audio Stimuli
We created an audiovisual stimulus for this study (as opposed to using intact “live” concert band performances) to better isolate the independent variables (ensemble size and repertoire difficulty) and to control for variability that might be introduced by using different audio/video recordings. For participants to evaluate concert bands of two different sizes performing literature of varying difficulty, we paired images of a small or large concert band with either an easy or difficult audio excerpt.
Ninety college undergraduates volunteered as mock high school band personnel to create the visual portion of the stimulus. These instrumentalists were music majors enrolled in the music school at a small liberal arts college in the Northeast of the United States. The instrumentalists were informed of the purpose of the study before they served as ensemble members in one of two mock high school bands of varying sizes (small, 30 and large, 90). To emulate the perspective of concert band adjudicator panels at large group festivals, we photographed each ensemble 70 feet from the stage. During the photo shoot, all 90 volunteers wore concert dress (males in tuxedos and females in all black) and were photographed while seated on the concert hall stage in arced rows and in rest position (e.g., instrument not in playing position; eyes looking toward the camera and smiling). This was used as the large concert band photo. After this photo was taken, 60 of the volunteers exited the stage, and their chairs and stands were removed. The remaining 30 musicians swapped instruments and shuffled their positions on stage to create the illusion of different personnel within each of the small and large bands. The 30 volunteers were photographed in this new setup under the same visual conditions mentioned previously; this configuration was used to capture the small concert band photo. We also photographed individuals and sections within each band. The section photos served to create a panning effect in the stimulus following the full band photo. The use of individual and section photos was suggested by pilot study participants. They found that the use of the full band photo only was distracting. While the use of still images to represent a process that is typically completed in a live, dynamic setting may seem to threaten ecological validity, there is precedent in the research literature for such a methodological paradigm. Boltz, Ebendorf, and Field (2009) used a slideshow montage as one condition in their study examining the effects of visual information in film on the perception and memory of music.
We independently compiled literature lists of grade 2 (easy) and grade 5 (difficult) pieces using the following criteria:
The pieces are representative of high-quality repertoire as evidenced by their inclusion in various state-approved festival/music performance assessment lists and the Teaching Music Through Performance in Band series.
There are enough within-piece style changes to warrant a sufficient evaluation of tone quality/intonation, musicianship/expression, and rhythm/articulation (i.e., the performance categories).
The same composer wrote both pieces (to help eliminate composer as a confounding variable).
The pieces contained one or more excerpts representing a slow, lyrical, and tutti section, with a predictable forte climax and subsequent piano resolution.
After several consultations with each other, we chose Sheltering Sky (grade 2, easy) and The Frozen Cathedral (grade 5, difficult) by John Mackey for the present study. We then located two excerpts (one per piece) that were approximately 90 seconds in duration each. We chose mm. 31–50 in Sheltering Sky and mm. 39–64 in The Frozen Cathedral. The selection of these excerpts was consistent with previous studies featuring similar durations (Silvey & Baumgartner, 2016; Silvey & Koerner, 2016). As an additional measure of validity, two university band directors independently confirmed that both excerpts represented the inclusion criteria listed previously.
Next, we conducted a YouTube search to locate high school band performance recordings of the two Mackey pieces. We used recordings of high school bands (as opposed to professional recordings) to more closely resemble the concert band adjudication task for participants. We decided that both recordings should be representative of live, excellent high school concert band performances and should not have any memorable artifacts (e.g., audience members coughing, excessive ambient noise). The second author played these recordings for two band directors with both verifying that these recordings met our chosen criteria. Through discussion, we chose one recording each to represent the easy and difficult performances. The two recordings were edited such that they encompassed the measures mentioned previously. No other edits were made to the recordings.
Once the visual and audio material were compiled, we used iMovie 10.1 to combine the photos with both music excerpts to create four different excerpts. In an attempt to prevent participants from discovering that the same audio recordings (easy/difficult) and photos (small/large) were being used, we included a distractor recording in the third presentation position in the stimulus (i.e., Excerpt 1, Excerpt 2, Distractor, Excerpt 3, Excerpt 4). The distractor was a YouTube recording of a high school band performing a 90-second excerpt from Steven Bryant’s all stars are love. (This excerpt featured the same inclusion criteria as the Mackey pieces.) The music was paired with photos of a high school concert band in rest position and section photos that we found by searching Google Images, which featured a different concert hall and personnel than the other stimulus videos. We also wanted to arrange the stimulus such that the same size/difficulty was not viewed/heard successively. In an attempt to control for possible order effects, we used a 4 × 4 Latin square design.
Dependent Measures
Participants were provided with a brief vignette describing each performing ensemble on the stimulus recording (including number of musicians). Following precedent in previous research studies (Montemayor & Moss, 2009; Silvey & Montemayor, 2014), we used Bergee’s (1995) three primary factors for concert band adjudication. Immediately following each vignette, an evaluation form appeared with which participants assessed the primary factors of ensemble tone quality/intonation, musicianship/expression, and rhythm/articulation on a 10-point Likert-type scale (ranging from 1 = poor to 10 = excellent).
Procedures
Participants were given an online link to access the questionnaire from their personal computers. The first page of the questionnaire consisted of a description of the study and allowed participants to give their informed consent. We did not, however, mention that the purpose of the study was to test the effects of ensemble size or repertoire difficulty. After consent was obtained, participants viewed the following instructions: Thank you for your participation in this study. For this study, you will hear 90-second excerpts from five different high school bands at the New York State High School Band Performance Assessment. Each high school band is required to perform one of three “test pieces.” You will be asked to evaluate each band’s performance of the test pieces on the following criteria: Tone quality/intonation Musicianship/expression Rhythm/articulation After each recording, you will complete your assessment. Please be sure the volume on your computer is turned up, and then click “Play” to adjust the volume on your computer. Once the volume is at a comfortable level, please click “Next” to begin the evaluation process.
The test audio consisted of a 50-second excerpt of David Gillingham’s Be Thou My Vision, which allowed participants to adjust the volume on their computers before beginning the first evaluation task.
For each 90-second excerpt, the fictitious name of the high school and photo of the group was displayed on screen for 15 seconds with music playback beginning immediately. Three additional still photos of sections within the concert band were used in the videos. The following timing was used for each excerpt: full ensemble in rest position photo for 15 seconds, 4-second cross dissolve, section photo 1 for 15 seconds, 4-second cross dissolve, section photo 2 for 15 seconds, 4-second cross dissolve, section photo 3 for 15 seconds, and a 4-second fade to black. Once participants completed their evaluation, they clicked the “Next” button to proceed on to the next ensemble in the sequence. The online evaluation task took approximately 15 minutes.
Results
Raw data consisted of participants’ evaluations of tone quality/intonation, musicianship/expression, and rhythm/articulation in four concert band performances (small/easy, small/difficult, large/easy, large/difficult) as measured by 10-point Likert-type scales. Table 1 contains means and standard deviations of participants’ ratings by ensemble size and repertoire difficulty.
Means and Standard Deviations of Raw Assessment Scores by Ensemble Size and Repertoire Difficulty.
Note. Each subscale ranged from 1 (poor) to 10 (excellent), resulting in a composite scale range of 3 to 30.
Because we considered our three dependent variables to be measuring the same construct (i.e., ensemble performance quality), we summed evaluators’ ratings of tone quality/intonation, musicianship/expression, and rhythm/articulation to create an overall composite rating for each of the four concert band performances. (We did not compute participants’ ratings of the distractor excerpt because this was not germane to our results.) We summed the ratings for three reasons: (a) We followed analysis procedures from studies in which similar procedures were also undertaken (Montemayor & Silvey, 2019; Silvey, Montemayor, & Baumgartner, 2017); (b) the practice of summing scores obtained on a variety of musical elements (e.g., tone quality/intonation, musicianship/expression, rhythm/articulation) to produce one composite score is common practice when evaluating ensembles at festivals (e.g., Bands of America, 2015); and (c) after conducting four separate principal components analyses with varimax rotation on all concert band evaluations, we found that all evaluation criteria loaded positively onto a single factor, with loadings ranging from .724 to .905, explaining 61.3%, 76.1%, 78.2, and 66.8% of the variance for the small/easy, large/hard, small/hard, and large/easy concert band performances, respectively. Therefore, we proceeded with our subsequent analysis by utilizing the four composite scores.
Prior to conducting the analysis, we screened the data to make sure the assumptions of normality, linearity, and homoscedasticity were met. Significant Kolmogorov-Smirnov and Levene’s test statistics (p < .05) indicated that the dependent variables did not meet the assumptions of normality or homoscedasticity, respectively (see Supplemental Table S2 in the online version of the article). To address normality, we performed a Box-Cox power transformation (optimal transformation resulted in λ = 1.20) on the composite scores to normalize the data (Box & Cox, 1964; Osborne, 2010; Sakia, 1992). Normality was subsequently confirmed via Shapiro-Wilk (p > .05) and quantile-quantile plots. Transformed means and standard deviations are displayed in Table 2. To address the violation of homoscedasticity, we used the Brown-Forsythe F ratio. This F ratio is a robust alternative when homogeneity of variance has been violated (Brown & Forsythe, 1974; Field, 2009). Linearity of the dependent variables was confirmed by inspecting bivariate scatterplots and bivariate correlations (all ps < .01).
Means and Standard Deviations of Box-Cox Transformed Scores for Ensemble Size and Repertoire Difficulty by Order.
Note. SE = small/easy; LD = large/difficult; SD = small/difficult; LE = large/easy.
A three-way repeated measures ANOVA with two within-subjects factors (ensemble size, repertoire difficulty) and one between-subjects factor (order) was used to determine the effect of ensemble size and repertoire difficulty on participants’ ratings of ensemble performance. Results indicated no significant main effect for the within-subjects factor of ensemble size, F(1, 203) = .14, p > .05, η p 2 = .001, or the between-subjects factor, order, F(3, 203) = .90, p > .05, η p 2 = .01. However, there was a significant main effect for repertoire difficulty, F(1, 203) = 46.14, p < .001, and a large effect size (η p 2 = .19), with the difficult repertoire (M = 17.88, SD = 4.83) receiving higher evaluation scores than the easier repertoire (M = 15.94, SD = 4.02). No significant two-way interactions were found.
However, we did find a significant Ensemble Size × Repertoire Difficulty × Order interaction, F(3, 203) = 3.06, p < .05, η p 2 = .04. A visual inspection of a graph of the Box-Cox transformed ratings (see Figure 1) illustrates that participants rated excerpts featuring difficult repertoire, irrespective of ensemble size, higher than easier repertoire (i.e., all means above 24 out of a possible 30 points). However, there was a greater disparity among composite scores in the large/difficult category compared with the other three rating categories; this category evidenced the widest range of composite scores. The largest mean difference occurred in Order 3 between the large/difficult (M = 29.08, SD = 8.52) and large/easy (M = 23.06, SD = 6.97) combinations. The smallest mean difference occurred in Order 2. Within that order, the largest difference was between the small/difficult (M = 24.82, SD = 8.46) and large/easy (M = 23.96, SD = 5.79) combinations. Within each of the other three categories, participants’ ratings for small/easy and large/easy and small/difficult and large/difficult were more closely aligned. Furthermore, the small/easy and large/easy videos resulted in more positive ratings only when seen first and last, respectively (i.e., within Order 1). There was a similar effect when small/difficult and large/difficult were seen first and last, respectively (i.e., Order 3).

Ensemble size by repertoire difficulty by order interaction. Means reflect the Box-Cox transformation.
Discussion
In this study, we were interested in determining whether ensemble size and repertoire difficulty may interact when considering listeners’ evaluations of large group ensemble performance. Researchers have investigated repertoire difficulty as a factor influencing listeners’ perceptions (Baker, 2004; Brakel, 2006; Hash, 2012), and ensemble size (Killian, 1998, 1999, 2000; King & Burnsed, 2009; Rickels, 2009) has been examined in the contexts of marching band and choirs. However, the effects of repertoire difficulty and ensemble size in combination have not previously been investigated in the concert band setting.
Participants’ summed composite rating for excerpts that featured difficult repertoire was significantly higher than that of easier repertoire. This finding is consonant with previous research indicating that ensembles that performed more difficult repertoire were assigned significantly higher ratings at instrumental (Brakel, 2006; Hash, 2012) and choral (Baker, 2004; Killian, 1998, 1999, 2000) festivals than those ensembles that performed easier repertoire. Ratings assigned by participants to small/easy (M = 15.99) and large/easy (M = 15.95) excerpts were nearly identical and were lower in comparison to small/difficult (M = 17.77) and large/difficult (M = 18.03) excerpts, which were rated significantly higher (Table 1). Although these numeric differences may seem somewhat small, it is conceivable that they could represent the difference between summative ratings received at a festival (e.g., excellent rather than superior). “Real world” differences such as these would appear meaningful because festival ratings are seen as a measure of professional competence (Burnsed et al., 1985; Conrad, 2003).
With regard to the significant three-way interaction, it appears that order of ensembles in combination with ensemble size and difficulty influenced our listeners’ perceptions. Specifically, when the small/easy ensemble was viewed first and the large/easy ensemble was viewed last, there were differences in composite ratings between these two conditions in all four orders. Similarly, the only differences between the ratings of the small/difficult and large/difficult ensembles was when the former was viewed first and the latter was viewed last. It is interesting to speculate on what might be the cause of these findings. Perhaps a focusing/anchoring effect (e.g., Tversky & Kahneman, 1974) influenced listeners’ judgements because the small/easy condition was viewed first in Order 1 and also represented the lowest rating overall. Participants could have been overly focused on the size and difficulty of the first presentation (small/easy), which may have influenced their subsequent evaluations (anchoring effect). It might also be that the small/easy condition was rated lowest due to an inherent bias that smaller bands are perceived as not as proficient as larger groups and that there is greater willingness for evaluators to “be more forgiving” when bands perform more difficult music because of the greater technical and musical demands. While researchers have investigated the effects of nonperformance variables on performance ratings in solo and small ensembles (e.g., Bergee, 2007; Bergee & Platt, 2003), further research in actual large ensemble festival settings would be helpful in establishing any definitive connections with the present study.
We conjecture that difficult repertoire was rated higher than easier repertoire, irrespective of ensemble size, because participants may have felt there was a higher demand inherent in performing music that sounded more difficult. It is important to note that we did not give our participants any information regarding the difficulty of the music in our written vignettes that appeared on screen before each performance. Unlike most large group adjudicated festivals, they had no idea about the difficulty of the work from information found printed on the score or provided from a music publisher. As verified by our expert conductors, the performances of the easy and difficult repertoire were both considered high quality. Interestingly, that quality appears to have been adjudicated differently by our participants based on how difficult the two excerpts sounded but not with respect to the size of the ensemble.
The large effect size observed with regard to repertoire difficulty suggests practical applications for concert band conductors who participate in adjudicated festivals. Although a cursory interpretation might imply that more difficult repertoire could garner higher evaluation scores, it is important to note that all of the performances were considered to be high quality by our panel of experts. A more nuanced interpretation could be that performing more difficult repertoire could be seen as an indicator of higher performance quality than if an ensemble performed an easier piece of music. Performing a difficult piece of music poorly would likely not garner similar performance evaluation scores as performing an easy piece of music well. Simply performing more difficult repertoire without regard to its appropriateness for a smaller ensemble seems unwise and pedagogically negligent.
However, if a smaller ensemble is capable of performing more difficult music, an inherent problem is that more advanced repertoire often includes expanded instrumentation that requires additional performers. This expanded instrumentation could potentially make the repertoire unplayable by smaller ensembles. However, shrewd directors often rewrite missing parts for other sections within their ensemble. Those smaller ensembles with the technical and musical acumen might benefit from the more sophisticated elements (e.g., technique, harmony, texture) often included in more difficult repertoire—rewritten for the smaller ensemble. Additionally, in some smaller schools, directors often have to supplement their jazz ensembles with alternative instrumentation (e.g., flutes, clarinets, horns tubas, keyboard bass) to cover missing instruments in the standard jazz instrumentation. Based on our review of existing literature, these questions have not yet been addressed. Given that many smaller schools must make instrumentation adjustments and that these adjustments may influence listeners’ judgements either in isolation or in combination with other factors, it seems prudent to continue to investigate what effects these adjustments might have on how listeners perceive their performances. Above all, literature quality and its developmental appropriateness should be paramount when making programming decisions. It is not our contention that music teachers deliberately program more difficult works simply to garner higher festival evaluations. Rather, we were interested in determining if any unwanted biases may be present in listeners when evaluating ensembles of varied sizes performing literature of varied difficulty.
Because ensemble size interacted with repertoire difficulty and order, perhaps blind adjudication could help mitigate these interaction effects. Blind adjudication might also control for other extramusical influences on listeners’ perceptions such as conductor expressivity and ensemble appearance. With repertoire difficulty having been found to influence listeners’ perceptions, festival organizers might consider a standard required test piece for each ensemble by level (often established by state music organizations based on size of the school population). Test pieces are often used in large ensemble festivals as a common assessment across all participating ensembles (e.g., MusicFest Canada, 2017; Newsome, 2006).
Although the present study adds to the large existing body of research regarding festival evaluations, it is not without its limitations. We attempted to recreate the festival experience as closely as possible, but there were a number of threats to ecological validity. The implementation of the stimulus required participants to listen to the stimulus using computer speakers, which can vary widely in quality of fidelity. Researchers extending or replicating this line of research might consider administering the stimulus in a more controlled lab setting where sound quality and fidelity can be standardized across participants. Although the use of still photographs (as opposed to video) controlled for some extraneous variables known to influence listeners’ perceptions (e.g., body movement, conductor expressivity), the still photographs could have unintentionally influenced listeners’ judgements about performance quality given that this presentation is abnormal when listening to ensembles in a festival setting. The lack of live video in and of itself could have biased listeners’ evaluations of the stimulus. While still photos were used as a means to control extraneous variables (e.g., using different videos of different ensembles), there was a sacrifice in ecological validity in the design of the study. We recommend that future researchers consider using live recordings of ensembles mimicking performing music while a consistent audio track is applied to the video (similar to Juchniewicz, 2008; Silveira, 2014; VanWeelden & McGee, 2007). Furthermore, while we purposely chose to use Bergee’s (1995) three primary factors given their reliability, validity, and precedence in the research literature, most festival judges are given a performance rubric on which to base their evaluations.
Considering the findings and limitations of our study, we suggest a number of future directions for researchers choosing to explore the effects of extramusical factors on the listening experience. To address concerns of ecological validity, having the same concert band record both audio excerpts of “easy” and “difficult” repertoire and pairing this with a coordinated and choreographed video could more authentically represent the festival experience. We also recommend examining a number of state music festival evaluations to determine if there is an existing relationship between evaluation scores and ensemble (or school) size of participating concert bands. While this relationship has been explored in the context of solo and small-ensemble festival ratings (e.g., Bergee, 2006; Bergee & Platt, 2003; Bergee & McWhirter, 2005; Bergee & Westfall, 2005), less research has been conducted involving large instrumental ensembles. In addition, expanding the performance context to include string orchestras, full orchestras, and jazz bands may help to add credence to existing research findings. Might having a visually unbalanced string orchestra (e.g., having only one bass player) or full orchestra (e.g., having incomplete winds/brass sections) have a similar effect on listeners’ judgments?
The difficulty that directors confront when selecting music for large-ensemble festival performance remains an important pedagogical concern for secondary school directors. Although many directors—especially younger ones—are often told that “it is better to show mastery of easier selections than to barely get through and have everyone (including the adjudicators) suffer through more difficult pieces” (Dean, 2017), research findings (and those of the current study) have shown that ensembles who perform more difficult repertoire well receive higher ratings than those who perform easier repertoire well (Baker, 2004; Brakel, 2006; Hash, 2012). It should be noted, however, that we utilized “excellent high school concert band performances” (see inclusion criteria in Method). Future researchers might also consider the effects of ensemble size and repertoire on poorer quality performances. Given the continued proliferation and prominence of large ensemble festivals at the district, state, and national levels, continued exploration of topics surrounding large ensemble evaluation will provide helpful information for secondary school ensemble directors, contest organizers, adjudicators, and administrators.
Supplemental Material
Online_Supplement_FINAL – Supplemental material for Effects of Ensemble Size and Repertoire Difficulty on Ratings of Concert Band Performances
Supplemental material, Online_Supplement_FINAL for Effects of Ensemble Size and Repertoire Difficulty on Ratings of Concert Band Performances by Jason M. Silveira and Brian A. Silvey in Journal of Research in Music Education
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
