Abstract
Despite the efforts made throughout the 20th century to develop standardized batteries to measure musical ability, there seems to be no consensus as to the construct’s gold standard. The Profile of Music Perception Skills (PROMS) was created with the aim of overcoming some of the limitations of previous assessment tests. The Mini-PROMS is the shortened version of the full battery and has a shorter application time. The aim of the present study was to validate the Mini-PROMS battery with a sample of Spanish musicians and non-musicians in order to provide useful scales for the development of research in the field of musical skills in the Spanish population. The results show that Mini-PROMS in Spanish is a reliable and valid tool for measuring the musical ability of musicians and non-musicians. Furthermore, this study provides information on some of the differences between musicians and non-musicians, offering useful scales for comparing the results of individuals with those of their reference population.
The term musical ability is used to describe sensitivity to music, the ability to understand music, and/or the ability to produce music. Researchers generally adopt one of two approaches to describing and measuring musical ability. One approach is to categorize individuals in two groups, as musicians and nonmusicians, and compare the performance of the two groups on an outcome of interest. When musicians outperform nonmusicians, the result is interpreted as evidence for a link between musical ability and the outcome being studied, such as working memory, spatial ability, or language processing skills. A limitation of this approach is that potentially significant gradations in musical ability among musicians and nonmusicians cannot be identified. Nonmusicians may be musical, but their talent may have remained undiscovered, or circumstances may have prevented its development (PROMS; Law & Zentner, 2012). The other approach is to measure individuals’ musical abilities using tests of aural ability (ear tests). An advantage of this approach is that such tests can be used to identify gradations in the musical skills of individuals who have not received training in music. The musical experience of the listener is at the heart of all musical activity (Sloboda, 2000). From this perspective, we can define music ability as the ability to perceive musical stimuli such as small changes in pitch, loudness, rhythm, and other sub-domains of music processing (Faßhauer et al., 2015).
One of the first researchers to try to measure musical ability using ear tests was Carl Stumpf, who asked students to carry out a range of tasks in which they were required to discriminate small changes in pitch, rhythm, and loudness in stimuli that differed in one sub-domain of music such as rhythm, melody, or accent (Stumpf, 1890). Stumpf also compared the test results with students’ own reports of their musical ability to find out if these correlated with his own measures of musical ability. Stumpf’s approach was purely exploratory and did not lead to the creation of a test battery designed to measure musical ability objectively. This was not achieved for the first time until nearly 30 years later, when Seashore (1919) published The Psychology of Musical Talent, followed by the tests produced by Schoen (1923), Lowery (1926), Dykema (1927), and Seashore (1930). All the tasks in these tests were also based on discrimination between two stimuli that could differ in one musical sub-domain such as rhythm, tone, and pitch. Finally, the Drake Test of Musical Talent (Drake, 1933) sub-tests were found to be correlated with those of Seashore (1930) and Kwalwasser and Dykema (1930), so this test was used by early researchers interested in the neurobiological substrates of musical ability (Grede et al., 1978; Martin, 1964). Years later, the first test designed to assess musical ability was standardized: the Seashore Measures of Musical Talents (Seashore et al., 1956). This test considered not only aural recognition but also other parameters of musicality including auditory, motor, associative, affective, and emotional responses. This battery was widely used and even revised at the end of the 20th century (Seashore et al., 1992). In fact, it was for many years the only tool in Spanish available to assess the construct of musical ability.
Until this time musical ability had been considered innate, present only in individuals with special musical skills (Ericsson et al., 1993; Howe et al., 1998). However, Gordon (1965) suggested that while musical skills could be considered as innate abilities, they could also be taught or modified during the course of development through practice. He developed various tests to measure them at each stage of development: the Music Aptitude Profile (MAP; Gordon, 1967), Primary Measures of Music Audiation (PMMA, Gordon, 1986b), Intermediate Measure of Music Audiation (IMMA; Gordon, 1986a), and Advanced Measures of Music Audition (AMMA, Gordon, 1984). Subsequent authors who developed other tests of musical abilities used these tests to ensure convergent validity through correlational analyses (McLeish, 1968; Rizzo & Vispoel, 1992; Wing, 1962; Young, 1972, 1973);
More recently, Peretz and Coltheart (2003) designed a model of music processing based on studies of brain-damaged patients. This model consisted of temporal and melodic processors working together to send information to a musical lexicon that formed the musical repertoire. An alteration in any of the connections between the processors could explain the difficulties in musical integration suffered by patients with brain damage. To evaluate the deficits that occur in amusia, they developed the Montreal Battery of Evaluation of Amusia (MBEA; Peretz et al., 2003), which included six subtests evaluating different sub-domains of music processing: melody, intervals, scale types, rhythm, meter, and musical memory. The MBEA is useful for assessing clinical populations such as people with congenital (Pfeifer & Hamann, 2015) or acquired amusia (Sihvonen et al., 2017), but not for assessing the musical skills of professional musicians.
Thus, the measurement of musical ability using ear tests to evaluate different sub-domains of music processing in this first quarter of the 21st century has been shown to have some limitations: on one hand, (1) the classical tests developed during the 20th century presented some methodological problems in their construction; and (2) did not have the psychometric properties required today; in addition, (3) the sound samples used in these tests were inaccurate as they involved recordings of live performances into which musicians may have introduced small errors; or (4) the recordings were of poor quality due to the recording and preservation methods used; and, on the other hand, (5) the most recent tests are very much aimed at clinical populations, so they are not sensitive to individual differences between healthy adults. To overcome these limitations, Law and Zentner (2012) created the Profile of Music Perception Skills (PROMS). This battery measures perceptual musical skills objectively across nine different music sub-domains (melody, pitch, tone, tuning, rhythm, embedded rhythm, metric accent, tempo, and loudness); was developed using modern digital audio samples; and its reliability and validity were assessed in psychometric studies. In each subtest, according to the instructions, the participant listens twice to a standard stimulus, followed by a comparison stimulus, and is asked to judge whether the comparison stimulus is the same as or different from the standard stimulus (https://musemap.org/resources/proms). The test is equally suitable for listeners with all levels and types of musical training, so it is more comprehensive than previous batteries in terms of the musical sub-domains assessed, and it aims to measure each component of each skill with maximum specificity. Moreover, the test items do not relate to any linguistic or cultural aspect characteristic of any particular language, it is available online with instructions in several languages, and it can be used free of charge via the website of the Laboratory for Personality, Emotion and Music of the Faculty of Psychology at the University of Innsbruck (https://musemap.org/resources/proms).
The test has adequate psychometric properties for the composite score, with indicators of internal consistency and test-retest reliability, r < .85. Convergent validity is established with the melody, rhythm, and tempo dimensions of Gordon’s (1967, 1984) MAP and AMMA, the rhythm dimension of the Musical Ear Test (Wallentin et al., 2010), and the timbre dimension in instrumental sample sounds; and there are significant relationships between test performance and external indicators of musical competence (.38 < r < .62, p < .01). In addition, a divergent validity study was performed, showing the lack of correlations between test scores and a non-musical auditory discrimination task (r = −.05, NS) (Law & Zentner, 2012).
It takes 60 min to complete the PROMS, which can cause some difficulties when it is administered. For this reason, two short versions have been created, the Mini-PROMS (Zentner & Strauss, 2017) and the Micro-PROMS (Strauss et al., 2023). The aim of the present study was to evaluate the validity of the Spanish Mini-PROMS with a sample of Spanish musicians and non-musicians to provide useful scales for the development of research on musical skills in the Spanish population.
Method
Participants
Zhang et al. (2020) reviewed 730 papers published between 2011 and 2017 to establish a consensus for the use of the term musician. They concluded that there is a six-year rule such that, to be defined as a musician, an individual must have spent at least six years developing expertise in music through training and practice.
Our sample consisted of 230 participants aged between 18 and 47 years (M = 21.46, SD = 4.11). A total of 121 (52.6%) met the criterion to be defined as musicians, as they were students at a Superior Conservatory of Music (aged between 18 and 47 years (M = 22.75, SD = 4.47), and 109 (47.4%) were controls, as they were students in disciplines unrelated to music at the Complutense University of Madrid.
The distinction between musicians and controls was based on years and intensity of practice. Students at a Superior Conservatory of Music are likely to have studied for at least 10 years before starting their studies in higher education, and to have practiced daily for between 4 and 8 hr for at least 3 years. Our sample were all classical musicians. Although there were some participants in the control group who played or sang music as a hobby, they did not meet Zhang et al.’s criterion for being defined as musicians.
The socio-demographic characteristics of the two groups are shown in Table 1. The musicians were, on average, 2 years older than the controls, and had had more years of education. They also had a more homogeneous gender distribution.
Socio-demographic characteristics of the sample.
Note: CI: confidence interval; SD: Standard deviation.
Statistics: Mann–Whitney test for numerical variables and Fisher’s test for double-entry tables.
Correlation coefficient (r).
The participants assigned to the group of musicians were students at the Conservatorio Superior de Castilla y León, Conservatorio Superior de Música de Málaga, Conservatorio Superior de Música de A Coruña, Conservatorio Superior de Música de Alicante, and Conservatorio Superior de Música Navarra. They were recruited through information posters and via social networks at each institution. The assessments were conducted individually at each institution between June 1, 2021, and April 27, 2022, in a soundproof booth. The participants assigned to the control group were students taking Bachelor of Psychology and Bachelor of Speech Therapy degrees in the Faculty of Psychology at the Complutense University of Madrid, where information was provided about the research project. The assessment sessions were conducted individually between April 21, 2021, and December 17, 2021, in a soundproof booth at the same faculty. In all cases, appointments were scheduled and managed via a digital platform designed for appointment management.
Instrument
The Spanish Mini-PROMS (Zentner & Strauss, 2017), one of the two short versions of the full PROMS, was administered according to the following instructions:
In each subtest, the participant listens twice to a standard stimulus, followed by a comparison stimulus, and is asked to judge whether the comparison stimulus is the same as or different from the standard stimulus. The Melody subtest assesses the participant’s ability to recognize whether or not two short melodies are the same. The Tuning subtest assesses the ability to recognize if one of the notes of a chord in the comparison stimulus is mistuned. The Accent subtest assesses the ability to recognize whether or not two short series of accentuated rhythmic clicks are the same, and the Tempo subtest assesses the ability to recognize whether a synthetic rhythmic structure or a recorded sample of music is being played for the second time faster or slower. (https://musemap.org/resources/proms)
Test–retest reliability of the Mini-PROMS total score was r = .87. Internal consistency reliability was ω = 0.87. The association between the Mini-PROMS total score and the musical background composite index was r = .52 (p < .001).
Procedure
An ad hoc interview was conducted to collect socio-demographic data and musical education, followed by administration of the Mini-PROMS. The test was conducted in its digitized version, hosted on the Lime Survey platform using an iPad Air and professional Yamaha MT7 headphones. The procedure guaranteed anonymity and the impossibility of linking each participant to their answers.
In all cases, participants gave their explicit consent to participate before they were enrolled in the study by ticking the box “I am 18 years of age or older and I consent to my voluntary participation in this study.” All information gathered in the study was treated and stored in accordance with Spanish legislation. Anonymity was guaranteed throughout the process by codifying responses to prevent subsequent attribution to individuals.
Statistical analyses
The information collected from the questionnaires was entered into a spreadsheet for further analysis using the statistical program R version 4.0 (R Core Team, 2021). Statistical significance was obtained with p values < .05. An initial scan of the database was performed to identify the possible presence of missing values and outliers. A preliminary descriptive analysis was then performed using the non-parametric Fisher’s test for double-entry tables, and Mann-Whitney tests for numerical variables, to determine whether there were significant differences between the two groups (musicians vs. controls) in terms of socio-demographic characteristics and Mini-PROMS scores. Effect size was examined using the correlation coefficient (r) statistic. McDonald’s omega (ω) was calculated to assess the internal consistency of Mini-PROMS and its subscales. This is similar to Cronbach’s alpha but provides a more accurate estimate of reliability (Revelle & Zinbarg, 2009; Zinbarg et al., 2006). Next, a multivariate study of the five Mini-PROMS subscales was performed to determine the best way to obtain normative data for the Mini-PROMS. Their distribution was examined to check whether they met the assumptions of statistical normality and homogeneity of variances. Because none of them met these assumptions, non-parametric generalized linear models (GLM) with Poisson distribution and logit link function were used to explore the possible effect of group on Mini-PROMS scores. Age, years of education, and gender were used as covariates to fit the models. Additionally, a logistic regression analysis to examine the relationship between the four subscales of the Mini-PROMS and musicianship was performed. For this, we used the group as the dependent variable, with two levels (musicians vs. controls), and included all four predictor subscales of the Mini-PROMS in the model.
Normative data for the Mini-PROMS were obtained subsequently. Based on the GLM results, it was considered most appropriate to find the normative data for the two groups separately without correction for socio-demographic variables. The direct scores were first assigned to percentile ranks according to their placement in the distribution. The percentile ranks were then converted to scalar values to achieve a Gaussian distribution of values. Finally, a polytomous Rasch model was used exploratively to analyze the behavior of the different items of the Mini-PROMS and to examine possible differences between the two groups. The full database can be downloaded from https://doi.org/10.6084/m9.figshare.21917301.v1.
Results
As shown in Table 2, there were statistically significant differences between the two groups on all variables representing the different subscales of the Mini-PROMS such that musicians scored higher than controls, with moderate effect sizes for accent and tempo and high effect sizes for melody, tuning, and total score.
Scores on the different subscales of the Mini-PROMS.
Note: CI: confidence interval; SD: Standard deviation.
Statistics: Mann-Whitney test for numerical variables and Fisher’s test for double-entry tables.
Correlation coefficient (r).
Visual examination of the density functions of all Mini-PROMS subscales, shown in Figure 1, revealed significant differences between the two groups, so it was decided to obtain normative data for each group independently of the other.

Density function distribution of musicians and control group’s scores.
In terms of internal consistency, the result of the McDonald’s test for the Mini-PROMS was ω = .84. The results for each subscale were ω = .75 (melody), ω = .73 (tuning), ω = .52 (accent), and ω = .54 (tempo).
Five regression studies, one for each subscale, were performed to quantify the effect of group (musicians vs. controls), controlling for participants’ age, years of education, and gender. As shown in Table 3, the regression models satisfied the assumptions of the Poisson models, showing in all cases a good fit of the data (G2 > .05), no collinearity between predictors (all <5), and reasonable dispersion of the observed results compared to the values predicted from the Poisson distribution (φ < 1).
Generalized linear models for the subscales of the Mini-PROMS.
Note: PROMS: Profile of Music Perception Skills; OR: odds ratio; CI: confidence interval.
The estimated values in the five models showed that musicians scored significantly higher than controls on the melody, tuning, and accent subscales, and total score for the Mini-PROMS (p < .05 in all cases), regardless of their sociodemographic characteristics. Although the difference between the scores of the two groups on the tempo subscale was not significant, there was a trend in favor of the musicians (p = .054).
As can be seen in Table 4, the results of the logistic regression analysis indicate that the model fitted the data well, with a deviance of 178 on 4 degrees of freedom (p < .001) and McFadden R2 = .44. Scores on the melody (OR = 2.38, p < .001) and tuning subscales (OR = 2.15, p < .001) were significant predictors of being classified as a musician. For each additional point on the melody subscale, the odds of being classified as a musician increased by 138% whereas, for each additional point on the tuning subscale, the odds of being classified as a musician increased by 115%. Scores on the accent (OR = 0.91, p = .446) and tempo subscales (OR = 0.98, p = .899) were not significant predictors of being classified as a musician.
Binomial logistic regression to distinguish between musicians and controls.
Note: OR: odds ratio; CI: confidence interval.
None of the socio-demographic variables influenced the results of the Mini-PROMS. Percentile ranks and their corresponding scalar values were calculated for each group without having to correct for sociodemographic characteristics. Tables 5 and 6 show the normative data for each of the two groups.
Scalar scores and percentile ranges from the raw scores of the subscales and the total scores for the Mini-PROMS (musicians).
Note: PROMS: Profile of Music Perception Skills.
Scalar scores and percentile ranges from the raw scores of the subscales and the total scores for the Mini-PROMS (controls).
Note: PROMS: Profile of Music Perception Skills.
The ROC curve for the logistic regression model is shown in Figure 2. The AUC value was .91, indicating that the model had excellent predictive performance. The curve showed that sensitivity decreases, and specificity increases, as the threshold for predicting musicianship becomes more stringent (i.e., moving toward the upper-left corner of the plot). The model had a sensitivity of .89 and a specificity of .85 at the optimal threshold, meaning that the model correctly identified 89% of the musicians and 85% of the controls.

ROC curve for the logistic regression model.
Our logistic regression model thus showed good predictive performance for identifying musicians vs. controls. Of the four Mini-PROMS subscales, only scores for melody and tuning were significant, suggesting that both provide clear distinctions between the two groups. By contrast, the scores on the accent and tempo subscales did not allow us to distinguish between musicians and controls.
Finally, a polytomous Rasch model was used to explore the behavior of the items comprising the Mini-PROMS within the groups. Figure 3 shows the ability and difficulty parameters of the items in the two groups musicians vs. controls) using Wright maps. As expected, the musicians were found to score higher on the items. It was also observed, however, that the items were more difficult for controls (range: –1.86—1.61) than musicians (range: –4.06—0.71), indicating a continuum of musical ability in the control group and a ceiling effect among musicians.

Wright’s maps of the Mini-PROMS for the two groups (musicians vs. controls).
Discussion
Despite the efforts made throughout the 20th century to develop standardized test batteries to measure musical ability, there seems to be no consensus as to a gold standard test for the construct. The main limitations that have been noted with the tests developed over the past century are that they have focused too much on musical perception (Hallam & Shaw, 2002) and that different subtests have often measured a combination of abilities rather than a specific musical ability (Demorest, 1995; Hallam & Shaw, 2002). The PROMS, and hence the Mini-PROMS, aim to find individual differences between healthy adults with and without musical training, making them very useful tools for research in the general population. In addition, the stimuli are carefully selected, balanced, and revised, providing optimal psychometric indicators of reliability and validity. In this sense, both versions overcome the limitations of other previously developed batteries.
The Mini-PROMS has been used widely in recent years, especially when several batteries are administered in one evaluation session Foncubierta, 2020; Samiotis et al., 2021; Sun, 2022; Talamini et al., 2022). The utility of the battery is also reflected in the number of languages into which it has been translated.
Test–retest reliability of the Mini-PROMS total score was r = 0.87. (Zentner & Strauss, 2017). Our contribution is to have evaluated the validity of the Spanish Mini-PROMS; we found a high correlation in form of a group difference in PROMS scores between musicians and non-musicians.
The results of this study show that all the items of the Mini-PROMS are independent of socio-demographic characteristics and the battery can therefore be administered to healthy adults regardless of age or gender. The sociodemographic differences found between the samples correspond to the population from which participants were recruited. According to the latest data (Corrales, 2021), the percentage of women enrolled in Health Sciences is 70.8% and in Arts and Humanities 61.9%, compared to 29.2% and 38.1%, respectively, of men. Thus our sample included more women (n = 173) than men (n = 57). Moreover, musicians begin their studies at higher education institutions later than university students. This is because these studies are preceded by 10 years of study at elementary and professional music schools, which does not always end at the same time as high school, so it is very common to begin higher education studies in music at the age of 19 or 20 (Corrales, 2021).
Effects of group were seen only for musicians vs. controls; this result is consistent with the expectation that musicians would have more advanced musical skills as the result of their practice. The non-significant effect of group on tempo (although a trend was observed) may be due to the greater familiarity of the general population with this parameter since, unlike melody, accent, and tuning (required for advanced-level music making), it concerns the speed at which the excerpt is played, a common parameter in daily life (e.g., for audio playback), making it easier for non-musicians to spot variations.
Figure 1 shows differences between the density functions for the scores of the two groups on each of the Mini-PROMS subscales, showing less variability in the musicians’ scores and thus suggesting that the musicians’ group was more homogenous than the control group. Although one of the most interesting attributes of this test is that it can be taken by any individual regardless of their background, musical training, culture, and language, it would be worth considering musicians and non-musicians as distinct and independent populations when assessing individuals’ musical abilities. This would provide fuller information about an individual’s abilities in comparison with the rest of the population of musicians or non-musicians, as appropriate.
Figure 2 shows that, for controls, both item difficulty and skills were normally distributed; for musicians, only item difficulty was normally distributed. This could be interpreted in two ways: as a ceiling effect on the ability of highly trained musicians to improve their musical skills through practice when they are already performing at a high level, or as a limitation of the battery, which cannot be used to detect subtle differences between such musicians.
One limitation of the study, more widely, is that the control participants were mainly students at a single faculty at a single university. A second limitation is that the original PROMS has not yet been validated with a Spanish sample, so we could not compare the results of the two validations. The original PROMS may not be the first instrument of choice in many research contexts, however, since it takes so long to complete. A third limitation is that we did not include other instruments in our study that would have enabled us to test the Spanish Mini-PROMS for its convergent or discriminant validity, although this would be well worth doing in future.
Nevertheless, we can conclude that the Spanish Mini-PROMS is a reliable and valid tool for measuring the musical abilities of musicians and non-musicians. Furthermore, this study provides information on the differences between the population of musicians and non-musicians, and provides useful scales for comparing an individual’s results with those of their reference population. Taking advantage of the opportunities offered by the Mini-PROMS in terms of its quick and easy administration, we propose to conduct multidimensional studies comparing the results obtained from the Mini-PROMS with those obtained from other cognitive and emotional assessments with the aim of increasing our knowledge of the factors underlying musical abilities.
