Abstract
An established test instrument for the assessment of motor performance in children between 3 and 16 years is the Movement Assessment Battery for Children – Second Edition (M-ABC-2). The Zurich Neuromotor Assessment (ZNA) is also widely used for the evaluation of children’s motor performance but has not been compared with the M-ABC-2 for children below five years for the purpose of convergent validity. Forty-seven children (26 boys, 21 girls) between three and five years of age were assessed using the M-ABC-2 and the ZNA3-5. Rank correlations between scores of different test components were calculated. Only low-to-moderate correlations were observed when separate components of these tests were compared (.31 to .68, p < .05), especially when involving the associated movements from the ZNA3-5 (−.05 to −.13, p > .05). However, the correlation between summary scores of the two tests was .77 (p < .001), and it increased to .84 when associated movements were excluded, which was comparable in magnitude to the test–retest reliability of the M-ABC-2, supporting convergent validity between the two tests. Although the ZNA3-5 and M-ABC-2 measure different aspects of motor behavior, the two instruments may thus measure essentially the same construct.
Keywords
Introduction
So far, few motor tests are available for the assessment of motor performance in children below five years of age (Cools, De Martelaer, Samaey, & Andries, 2009; Folio & Fewell, 2000; Henderson et al., 2007). However, this age band is particularly important for clinicians because children from three to four years of age usually enter kindergarten. In fact, many children are evaluated for school readiness by their pediatricians, and one way to assess readiness is to evaluate motor performance (Piek, Hands, & Licari, 2012). Also, typical motor development is generally seen as a prerequisite for active participation in many physical activities in the playground and classroom, enabling undisturbed cognitive, psychological, and physiological development (Gallahue & Ozmun, 2006).
It is clear that only reporting the child’s newly attained motor milestones (for example, beginning to walk, run, or skip) is not sufficient for the description of motor development. For a more refined description of motor performance, parameters such as accuracy, speed, and distance must be recorded to distinguish under- from over-achievers (Gallahue & Ozmun, 2006). However, the concepts of “as quickly/far/accurately as possible” are not always understood in children below five years of age, and the lack of competitiveness makes this age group a challenge to assess (Largo, Rousson, Caflisch, & Jenni, 2007). Currently, no single test allows testing of motor performance in children from 3 to 18 years of age. The Movement Assessment Battery for Children – Second Edition (M-ABC-2) (Henderson et al., 2007) measures motor performance in children from 3 to 16 years old. The Bruininks-Oseretsky Test of Motor Proficiency – Second Edition (Bruininks & Bruininks, 2005) can be used for the age span 4 to 21 years. The Test of Gross Motor Development (Ulrich, 2000) ranges from 3 to 10 years. It should be stressed that the majority of motor tests for children are designed to detect motor problems (Folio & Fewell, 2000; Henderson et al., 2007; Ulrich, 2000), are time consuming (Bruininks & Bruininks, 2005), and are not suitable for describing the variability of motor performance in typically developing children (Piek et al., 2012).
The Zurich Neuromotor Assessment (ZNA) battery was designed to assess the entire range of a child’s motor performance from 5 to 18 years (Largo, Caflisch, Hug, Muggli, Molnar, & Molinari, 2001a; Largo, Caflisch, Hug, Muggli, Molnar, Molinari, Sheehy, et al., 2001b; Largo et al., 2007). The ZNA was previously not applicable to children below five years, but three- and four-year-old children can now be evaluated with an extension of the ZNA, the ZNA3-5 (Kakebeeke et al., 2013). However, the ZNA3-5 results have not yet been compared with another test.
Therefore, the main aim of the study is to compare the ZNA3-5 with another, frequently used test of the same age band to assess its convergent validity, that is, to investigate whether the summary scores of two tests can measure motor performance in young children consistently despite obvious differences in the tests’ construction. The M-ABC-2 was chosen as the standard test for comparison as it is frequently used in Europe (Cools et al., 2009), is easy to administer, and does not take long to execute the full version. As a reference, we were expecting a correlation of around .66 between the summary scores of the two tests, which is what we observed when comparing the two tests in children aged five to seven years with suspected developmental coordination disorder (Kakebeeke et al., 2014). A secondary aim of this study is to investigate the correlation between subtests or dimensions of motor performance in the ZNA3-5 and the M-ABC-2.
Method
Participants
Forty-seven children (26 boys, 21 girls) between three and five years of age (M = 4.0 years, SD = 0.6, median = 4.0, inter-quartile range = 3.5–4.5) were recruited from six child care centers in the Canton of Zurich by distributing letters to the parents of children of suitable age. Children with behavioral problems, intellectual impairment, or neurological disorders were not considered for the study.
The study was approved by the institutional review board of the Canton of Zurich and conformed to the Declaration of Helsinki. All the families received a study description and provided informed written consent for participation and use of video.
Instruments
Movement ABC-2
Components and items scored in the M-ABC-2 (age band 3 years 0 months – 6 years 11 months).
Zurich Neuromotor Assessment
The ZNA has been designed specifically to describe motor development in typically developing children from 5 years up to 18 years. It focuses on the variability and evolution of motor performance as a function of age and sex (Largo et al., 2001a, 2001b). Recently, the ZNA has been extended to test children between three and five years of age (Kakebeeke et al., 2013).
Components and items scored in the ZNA3-5 (Kakebeeke et al., 2013).
Procedure
All children were tested in the child care centers. The tests took place in the presence of one tester and one child care center employee to allow the children to feel more at ease. One neurophysiologist (TK) was a trained M-ABC-2 tester, and one human movement scientist (EK) was a trained ZNA tester. All items were measured in the order and protocol described in the manuals of the instruments, and the performance was recorded digitally on video. As the inter-rater reliabilities of both tests are very satisfactory (both above 0.8), all ZNA3-5 tests were performed by EK, and all M-ABC-2 tests by TK. The children were split randomly into two groups, with 22 children starting with the M-ABC-2 and 25 children starting with the ZNA3-5.
Statistical analyses
Convergent validity between the ZNA3-5 and the M-ABC-2 was assessed by calculating the rank correlations (Spearman) between motor performance as measured in the two tests. In the M-ABC-2, the motor performance of a given child is expressed as a percentile. In the ZNA3-5, the motor performance of a given child is expressed as a standard deviation score (SDS or z-score), i.e., as numbers of standard deviations above or below the expected performance of children of same age and sex. A child with a positive SDS is performing better than other children of the same age and sex, whereas a child with a negative SDS is performing worse than other children of the same age and sex. To facilitate comparison, percentiles returned by the M-ABC-2 were converted into SDSs by assuming a standard normal distribution, with, for example, a percentile of 95% corresponding to an SDS of 1.64 without affecting the rank correlations between the two tests.
Since the ZNA test battery was recently extended to the age of three to five years, all its items were newly norm-referenced in order to calculate SDS values for the 47 children included in the analysis. For this purpose, essentially the same modeling approach was used as that adopted in Kakebeeke et al. (2013). Motor performance was modeled as a function of age (and sex when this resulted in a better fit, i.e., when the addition of sex to the model lowered the value of Schwartz’s Bayesian Information Criterion) in a generalized additive modeling framework (Stasinopoulos & Rigby, 2007) with age treated as a continuous covariate. The dominant and non-dominant sides were considered separately. Data from children older than six years gathered in earlier studies (Kakebeeke et al., 2013; Largo et al., 2001a, 2001b) were also included in the normative sample whenever possible to improve the estimation of the age trend between three and five years. Because some children could not perform every item due to their young age, a timed performance missing due to a child’s inability was treated as a right-censored observation. These observations were handled in a similar way as in Kakebeeke et al. (2013) by assuming that the true timed performance of a child who was unable to perform the task lies above some unknown age-dependent censoring threshold. That censoring threshold was estimated iteratively by first using a logistic regression model to estimate the proportion of children who were able to perform the task at each age, and then calculating the quantile Q of the predictive distribution of timed performances corresponding to that proportion at each age. The data were then refitted using Q as the new censoring threshold, and the process was repeated until convergence was reached. This approach is similar to that used in Kakebeeke et al. (2013), where missing timed performances were imputed using the Poor Man’s Data Augmentation algorithm (Kakebeeke et al., 2013; Largo et al., 2001a, 2001b). Such a procedure enables the incorporation of data from those children unable to perform, in order to obtain unbiased estimates of the age trend in the presence of data that are not missing at random.
Items in the ZNA3-5 included timed performance (bolts, beads), count data (hopping on one leg), and items rated on a five-point ordinal scale (walking in a straight line, jumping sideways, sequential finger movements), with 0 corresponding to the best possible performance and 4 indicating the inability of the child to perform the task. For these rated items, a cumulative probit regression model (Agresti, 2010) was used to calculate the probabilities associated with the ordered categories. As with the ZNA for children from 5 to 18 years old, the intensity of associated movements—which can take 24 discrete values between 0 and 5—was treated as an interval-censored outcome, with the lower and upper bounds of the censoring interval corresponding to the values of the adjacent categories. The SDSs of individual items were calculated for the 47 children included in the analysis. In presence of right-censoring, an unbiased estimate of the percentile
SDSs for ZNA3-5 components were calculated by summing SDSs from individual items belonging to the same component (Table 2), and then standardizing this sum to arrive at a variance of 1.0 in the normative sample. This standardization was performed using all available data whenever possible, including data from prior studies (Kakebeeke et al., 2013; Largo et al., 2001a, 2001b). However, associated movements are only defined when a child can perform a task. Therefore, only the items which could be performed by virtually all the children were included in the associated movements component of the ZNA3-5 (see Table 2 for the list of items included). A summary score for the ZNA3-5 was calculated using a standardized sum of SDS from the five individual ZNA3-5 components. An additional summary score which excluded associated movements was also calculated for comparative purposes. Rank correlations were calculated for each pair of ZNA3-5 and M-ABC-2 components and for summary scores between the two tests.
Results
Mean and standard deviation of SDS (z-scores) on individual components and summary scores.
The p refers to a one sample t test testing the null hypothesis of a zero mean. M-ABC-2: Movement Assessment Battery for Children – Second Edition; ZNA3-5: Zurich Neuromotor Assessment from three to five years.

Scatter plots of SDSs (z-scores) for summary scores of M-ABC-2 and ZNA3-5 with rank correlation (r). Solid gray lines indicate the 50th percentile.
The effect of the order of the test was checked by comparing the distributions of SDSs (in the summary score of either test) between those children who started with ZNA (25) and those who started with M-ABC-2 (22) using a Mann–Whitney test. This turned out to be non-significant (ZNA3-5: W = 327, p = .28; M-ABC-2: W = 278, p = .98). In this way, specific learning or tiring by one of the tester/tests could reasonably be ruled out.
Rank correlations with 95% confidence interval between SDSs (z-scores) on M-ABC-2 and ZNA3-5 for summary scores and scores on individual components.
M-ABC-2: Movement Assessment Battery for Children – Second Edition; ZNA3-5: Zurich Neuromotor Assessment from three to five years.
Discussion
This study compared two motor tests administered to a sample of 47 children aged three to five years. The rank correlation between the summary scores of the ZNA3-5 and those of M-ABC-2 was .77 despite obvious differences in test construction, theoretical background, and normative samples. This correlation even increased to .84 after excluding the associated movements component from ZNA3-5. In any case, the correlation was higher than the correlation of .66 as in Kakebeeke et al. (2014) when comparing ZNA and M-ABC-2 in children with suspected developmental coordination disorder. As another useful reference, the intra-class correlation measuring the test–retest reliability of the summary score of the M-ABC-2 is .8 (Henderson et al., 2007). Consequently, the observed correlation between the two tests in this study is comparable to the correlation that would be observed between two successive sessions of the M-ABC-2, which is actually quite remarkable and support convergent validity between the two tests. We have thus some evidence that the two instruments measure essentially the same construct in the age range three to five years.
As explained in the previous study by Kakebeeke et al. (2014), a child’s motor competence is determined by motor abilities and motor skills (Logan, Robinson, Rudisill, Wadsworth, & Morera, 2014). Motor abilities are based on trait-like, internal neurological and neuromotor processes (Burton & Miller, 1998). They can be improved by practice and are then defined as motor skills (Schmidt & Lee, 2011). Kakebeeke et al. (2014) showed that the M-ABC-2 primarily assesses movement skills, whereas the ZNA also measures the ability level of the child. One hypothesis is that in the three to five year olds, skills and abilities are not well differentiated yet because a three-year-old child has not had sufficient time to develop skills, whereas abilities are present at birth. This explains why the correlation is higher between the M-ABC-2 and the ZNA3-5 at this age than in studies performed on older children.
While the convergent validity between ZNA3-5 and M-ABC-2, as measured by the correlation between the summary scores of the two tests, appears reasonably high, especially when compared to the test–retest reliability of M-ABC-2, the average motor performance measured in M-ABC-2 was slightly lower (−0.21 SDS) than that in ZNA3-5. This result is inconsistent with that obtained from older children with suspected developmental coordination disorder, where ZNA scores were generally lower than those of M-ABC-2 (Kakebeeke et al., 2014). This discrepancy may be explained by the fact that the 47 children in this study were used in the norming process to extend ZNA to the age range three to five years, whereas M-ABC-2 was normed on an external sample. Many of the 47 children were relatively new to the child care centers and were therefore not as used to external persons giving them instructions. This might have affected children’s motor performance in both tests. In contrast to the M-ABC-2, where norms are given through the test manual, all children participating in this study were concurrently part of the ZNA3-5 norming process. Unfortunately, the slight underperformance of the children tends to pull the average motor performance downwards in the ZNA3-5 over this age range. Consequently, an under-achiever may appear as a typically developing child in the ZNA3-5 while his/her motor performance is correctly labeled as below average in the M-ABC-2. This underlines one limitation of our study: due to difficulties in recruiting young children in the child care centers, the normative sample of the ZNA3-5 is currently of limited size and is as yet likely not representative of the population of typically developing children. However, we intend to recruit more children in the age range three to five years in the future to increase the quality of our normative sample.
A systematic difference between the average motor performance measured in the two tests may also arise from a possible inter-rater effect, as one examiner (EK) conducted all the ZNA3-5 testing while another (TK) conducted all the M-ABC-2 tests. However, this possibility is not seen as a limitation for several reasons. First, this scenario reflects what happens in real-life conditions with tests administered by different examiners, and our study aimed at estimating the correlation that would be obtained between the two tests in real life, where each examiner is trained in a specific motor test. Second, estimating an inter-rater effect is a difficult task when the test also changes. Indeed even if the two tests had been conducted by the same examiner, a possible inter-rater effect still could not be ruled out, since the same examiner may adapt his/her rating behavior depending on the test. Additionally, if all testing were done by one person, the child would know the experimenter for the second test. Third, beyond systematic differences, a possible inter-rater effect would generally induce a larger disagreement between measurements from the two tests. In the presence of an inter-rater effect, the correlation observed between summary scores of the two tests would likely underestimate the correlation that could be achieved if the same rater administered the two tests. Since the observed correlation of .77 is already comparable in magnitude to the test–retest reliability of M-ABC-2, we reason that the inter-rater effect is negligible. Moreover, in an article on timed performance in the ZNA, the inter-rater reliability is usually much higher, and thus less critical, than the test–retest reliability (Rousson, Gasser, Caflisch, & Largo, 2008).
A salient feature of the ZNA3-5 is that not only is the motor performance in children quantified but also the quality of movement is also assessed by observation of associated movements (Largo et al., 2001a, 2001b). Associated movements constitute an integral part of the ZNA for older children (Largo et al., 2001a, 2001b). In the present study, associated movements were measured during the pegboard-and-bolt task and during the repetitive movements of the fingers and hand, as all children were able to perform such tasks (with very few exceptions, which were discarded from the analysis). Associated movements are generally assessed to determine the quality of movement and considered as subtle neurological signs (Largo et al., 2001b). They are expected to decrease with age (Connolly & Stratton, 1968; Piek et al., 2012). We were able to discern a developmental trend in the associated movement component, with the intensity of associated movements decreasing with age, but this trend was statistically significant only in the pegboard task and consistent on both the dominant and non-dominant sides (data not shown). A larger cohort with children below six might give more information about the development of associated movements in very young children, though.
Although providing information on neurological soft signs, the non-linear course, inter-individual variation, and large variability among motor tasks for associated movements need to be taken into consideration (Largo et al., 2001b). In the ZNA3-5, a measure of how the child is performing is thus incorporated in the summary score in addition to the actual time/count/rated performance. Therefore, a child with lots of associated movements during the pure motor skills and fine motor adaptive tasks invariably gets a lower summary score in the ZNA. On the other hand, a similarly high number of associated movements in the hands during the manual dexterity component will not affect the score of that child in the M-ABC-2, as here only bimanual tasks are measured. This largely explains why the associated movements components did not correlate with any of the M-ABC-2 components and why—after omitting associated movements from the ZNA3-5, the rank correlation between summary scores of the two tests increased slightly from .77 to .84.
Limitations
One limitation of this study mainly relates to the limited number of children (N = 47) that could be recruited in the child care centers, such that the confidence intervals for our correlations were not extraordinarily narrow. Another limitation is that the children on whom the study was conducted were included in the normative sample used to extend the ZNA norms to the age range three to five years. Consequently, the average SDS of these children in the ZNA3-5 is close to 0 by design, while the M-ABC-2 used an external normative sample. The fact that the ZNA3-5 and the M–ABC-2 used different normative samples can largely explain the systematic difference that we observed between the two tests, with M-ABC-2 scores being generally lower than those of the ZNA3-5 for the same child. About the test battery ZNA3-5, one restriction is that some children were not able to perform all the tasks (e.g., alternating movements). While the resulting missing timed performance could be appropriately handled by using a model for censored observations, the corresponding associated movements remained undefined. Therefore, the associated movements component in the ZNA3-5 had to be restricted to “easy” tasks which virtually all children could perform, which may limit the interpretation of this component in practice.
Footnotes
Acknowledgments
The authors thank all children, their parents, and all the child care centre directors in the Canton of Zurich for their support.
