Abstract
This paper exemplifies a secondary data analysis of context–specific differences in children's hyperactivity–impulsivity while controlling for informant–specific effects. Participants were boys and girls from the NICHD Study of Early Child Care and Youth Development whose behaviours were measured in 1st, 3rd and 5th grades. Latent factor models were structured using multi–informant reports including mothers, fathers, teachers and observers. Temporal stability within a context was stronger than cross–context consistency, and the magnitude of longitudinal stability was higher in the home context compared to the school context. Controlling for informant–specific effects resulted in a significantly improved model fit and increased within–context stability. Our findings highlight the importance of considering both context and informant effects when studying longitudinal stability and change in personality development. Copyright © 2010 John Wiley & Sons, Ltd.
Introduction
Many developmental theorists agree that temperament represents underlying dispositions, and possibly traits. Individual differences in temperament are evident as behaviours that vary widely across individuals, are readily observed from early in life, are somewhat stable over time and across settings and include a biological foundation (Buss & Plomin, 1984; Goldsmith, Buss, & Lemery, 1997; Rothbart & Bates, 1998; Sanson, Hemphill, & Smart, 2004; Strelau, Zawadzki, & Piotrowska, 2001). Understanding how individual differences in temperament—and related personality factors—unfold over time is greatly enhanced by its accurate assessment aided by using multiple measures that span at least two contexts and that involve observations by multiple informants. In the current quantitative longitudinal study, we used factor analytic models with such data to examine the potential effects of contexts and informants on longitudinal stability and change in two indicators of child behaviour that are related to individual differences in temperament, personality and psychopathology: Hyperactivity and impulsivity.
Temperament is one of the bases of personality, the latter being a much broader array of attributes including perceptions and interests (Keogh, 2004; Rothbart, Ahadi, & Evans, 2000). Typically, developmental theorists state that temperament represents underlying dispositions or traits. However, some theorists have focused on individual differences in temperament reflecting mainly characteristic styles of behavioural response (Buss & Plomin, 1975; Thomas & Chess, 1977) whereas other theorists view temperament as including individuals’ affective qualities (Rothbart, Ahadi, Hershey, & Fisher, 2001; Rothbart & Bates, 1998). In the further development of the operationalization of child temperament, Rothbart and her colleagues suggested the importance of individual differences in attentional self–regulation as a basic dimension of temperament (Posner & Rothbart, 1998; Rothbart et al., 2000) and demonstrated evidence for three distinct dimensions of child temperament—surgency/extraversion, negative affectivity and effortful control (Rothbart et al., 2001). Surgency/extraversion captures ‘approach’ tendencies and includes activity level, impulsivity, high intensity pleasure and low shyness. Negative affectivity includes sadness, anger, discomfort and low soothability. Effortful control includes enjoyment of low–intensity stimulation, perceptual sensitivity and self–control of impulses and attention.
There is evidence of context and method/informant effects in child temperament (Sanson et al., 2004). Agreement between different informants on measures of temperament is far less than unity, with correlations ranging from .15 to .51 between maternal and observer ratings, and between mothers’ and fathers’ ratings (Bornstein, Gaughran, & Segui, 1991; Carnicero, Perez–Lopez, Salinas, & Martinez–Fuentes, 2000; Goldsmith, 1996; Goldsmith & Campos, 1990; Hayden, Klein, & Durbin, 2005; Rothbart et al., 2001; Seifer, Sameroff, Barrett, & Krafchuk, 1994). This modest to moderate agreement between informants could arise from informants using different bases of knowledge when evaluating child behaviour (De Los Reyes & Kazdin, 2005), but it also could indicate that they are reliably reporting children's context–specific behaviours (Mangelsdorf, Schoppe, & Buur, 2000; Veenstra, Lindenberg, Oldehinkel, De Winter, Verhulst, & Ormel, 2008). It might also arise from informant–specific ‘rater bias’. For example, depression is associated with systematic bias in maternal ratings of their children's hyperactivity–impulsivity (Boyle & Pickles, 1997).
In early childhood through middle childhood, exploratory and confirmatory factor analyses (CFA) indicate that impulsivity and activity level together provide the highest loadings on the surgency/extraversion factor, forming the foundation for the surgency/extraversion temperament dimension (e.g. Mullineaux, Deater–Deckard, Petrill, Thompson, & DeThorne, 2009; Rothbart et al., 2001). Hyperactivity–impulsivity seems to be one of the most phenotypically stable temperament dimensions although longitudinal stability of childhood hyperactivity–impulsivity varies widely depending on the assessment method utilized (see Pelham, Fabiano, & Massetti, 2005 for a review). Olson, Schilling and Bates (1999) reported stability in impulsivity between 1st and 3rd grade of r = .44. A study by Rothbart et al. (2001) further demonstrated that the two components of surgency/extraversion—impulsivity and activity level—were two of the most stable facets over time (i.e. two–year stability from 5 to 7 years of age was in .7 range for mothers and fathers), compared to all other facets of temperament measured by the children's behaviour questionnaire (CBQ). More recently, Majdandzic and van den Boom (2007) used latent factor models to examine consistency across situations and time in early childhood and reported higher temporal stability for the parent perceptions of activity level based on mothers’ and fathers’ reports on the CBQ (.83), compared to the observed activity level measured by ratings on the laboratory temperament assessment battery (Lab–TAB; .70). In regard to inter–rater agreement, impulsivity and activity level both have shown moderate to substantial (r = .4 to .7 range) mother–father agreement, as shown in multiple samples (e.g. Majdandzic & van den Boom, 2007; Mullineaux et al., 2009; Rothbart et al., 2001).
There is a substantial literature supporting the heritability of temperament (e.g. Auerbach, Faroy, Ebstein, Kahana, & Levine, 2001; Goldsmith, Lemery, & Essex, 2004). In particular, prior behavioural genetic studies have demonstrated that hyperactivity–impulsivity is highly heritable (see Levy, McStephen, & Hay, 2001). However, the high end of such heritability estimates (i.e. .8 and above) have been found in investigations in which parents were the informants who provided information on the twins’ zygosity and rated their children's behaviours (Nigg, Hinshaw, & Huang–Pollock, 2006). It is plausible that those estimates were inflated by rater contrast effects (Eaves, Rutter, Silber, Shillady, Maes, & Pickles, 2000).
According to a trait/state view of temperament behaviours like hyperactivity–impulsivity, genetic and environmental factors work together to cause variation in neuronal and biochemical factors that in turn cause variation in an abstract trait, which itself is observed directly as transient (state–like) and stable (trait–like) behaviours (Strelau, 2001). These behaviours also are influenced by immediately contingent situational, physiological and psychological setting conditions. In the current study, we considered variance that is stable across time as well as variance that is time/context/informant–specific, in an effort to test directly the veracity of the disposition concept. By developing a multi–informant measurement model and testing its sensitivity to detecting time and context variants, we attempted to derive accurate estimates of the most likely ‘trait’ and ‘state’ components of variance in some behavioural indicators of dispositional hyperactivity–impulsivity—as a conceptual and statistical model that could be applied to all dimensions of temperament and personality.
There remains a longstanding debate about which methods and informants are most valid and reliable to operationalize temperament. One approach that has the potential to resolve this debate is the use of multi–informant and method constructs with good internal and external validity (Karp, Serbin, Stack, & Schwartzman, 2004). The use of multi–informant, multi–context temperament constructs not only can elucidate systematic method/rater biases if they exist, but can also promote the development of new measurement models and assessment tools with improved psychometric properties, including enhanced predictive validity and replicability (Nunnally & Bernstein, 1994; Rushton, Brainerd, & Pressley, 1983).
One common approach that has been used involves computation of multi–informant composites that are calculated as a simple average score across different informants. Such composites constrain the data by obscuring the examination of the variance in behaviour that may or may not overlap across different contexts and informants. Another approach utilizes latent variables to combine information from multiple informants and to isolate informant–specific effects as measurement error that is not examined. In the current study, we offer an alternative that integrates these common approaches without removing or obscuring context and informant effects. We examined longitudinal factor models of temperament that utilizes multi–informant scores, in an effort to describe and predict longitudinal stability and change from 4.5 to 11 years of age.
Many studies of child temperament share the same data structure: Parents (and sometimes observers) report on child behaviour in the home context, and teachers (and sometimes different observers) report on child behaviour in the childcare or school context. The development of models that can address potential context and informant effects is important, especially since context and informant effects are always confounded in the longitudinal data in this common design. Our goal was to identify systematic context effects—for home and school settings—on stability and change in behavioural indicators of hyperactivity–impulsivity in the transition to and through middle childhood, while also considering mother, father, teacher and observer informant variance. We chose to examine hyperactivity–impulsivity because it is a temperament trait that is highly heritable, highly stable and easily observable by others. Furthermore, hyperactivity–impulsivity plays a central role in the conceptualization of child psychopathology (Quay, 1993).
Although secondary analysis offers opportunities for increasing the informational value as well as a relatively low cost way to ask original research questions (Bullock, 2007), we were limited to the available variables in constructing the measurement models of hyperactivity–impulsivity. One consequence was that the items used in the current analysis were narrowly defined behavioural indicators of the temperamental dimension of surgency/extraversion. Although it is beyond the scope of this study, it should be noted that researchers of child temperament and psychopathology generally agree that temperament incorporates the normative range of responding, whereas behaviour problems are symptom clusters that are dysfunctional, excessive, maladaptive or debilitating (Achenbach, 1995). Measurement confounding (i.e. item–content overlap) has been an issue in the studies of child temperament and behaviour problems, because most temperament and symptom measures tap behaviours, and similar behaviours may stem from either temperament or behaviour problems (Lemery, Essex, & Smider, 2002). Empirical evidence, however, has been accumulated suggesting that measurement confounding does not account for the association between temperament and behaviour problems (Lemery et al., 2002; Lengua, West, & Sandler, 1998). The items used in this study were adopted from the child behavior checklist (CBCL) and the teacher report form (TRF) that rated hyperactive behaviours (rather than activity) and impulsive behaviours. Thus, these items can be viewed as reflecting the distribution in the range of behaviour problems rather than normal range behaviours. Nevertheless, we believe that these items could capture individual differences in the temperamentally based behaviours that were part of a surgency/extraversion dimension.
The existing research on temperament described above (though almost all of it based on mono–method/informant assessments of constructs) suggests moderate to substantial short–term (1–2 years) stability of child hyperactivity–impulsivity, with the estimates of stability declining over longer time periods (e.g. Green, Loeber, & Lahey, 1991). In light of this literature, we hypothesized that individual differences in the multi–informant construct of hyperactivity–impulsivity would show moderate to high stability within the same context across time (2 years). Based on previous findings of stronger stability with particular contexts compared to consistency across contexts (e.g. Majdandzic & van den Boom, 2007; Ruff, Capozzoli, & Weissberg, 1998), we hypothesized that within–context longitudinal stability would be comparable or exceed the magnitude of correlations between contexts within a concurrent time point. We also expected that longitudinal stability for the measurement construct would be stronger for the home context than for the school context, in line with previous research that suggests higher consistency for parent reports compared to observational ratings. In addition, it was important for us to consider informant effects in the longitudinal factor model in order to identify and distinguish stable informant effects (i.e. systematic effects of informants due to the unique aspects of the relationship with the child and/or due to informant–specific rater bias) from reliable cross–informant variance when estimating context–specific hyperactivity–impulsivity. Overall, current theory in temperament emphasizes the role of social influences (e.g. demands of the situation such as home or school, or cultural expectations) in the development of individual differences (e.g. Sanson et al., 2004). The examination of reliable estimates of the ‘trait’ and ‘state’ components of temperamentally–based hyperactive–impulsive behaviours among children has significant implications for understanding of the developmental continuity in child temperament.
Method
Participants
In the current study, we examined the public datasets of the NICHD (National Institute of Child Health and Development) Study of Early Child Care and Youth Development (http://www.nichd.nih.gov/research/supported/seccyd/datasets.cfm). Data collection began in 1991 in nine states (Arkansas, California, Kansas, New Hampshire, North Carolina, Pennsylvania, Virginia, Washington and Wisconsin) and included 1364 children (52% male) and their families when the children were 1 month of age. The current sample included 978 children (482 boys and 496 girls) who had mothers’ ratings of hyperactivity–impulsivity measures for at least two time points over the three assessments in 1st, 3rd and 5th grades. Participating children were from five racial categories: White (83%); Black (11%); Asian or pacific islander (1%); American Indian, Inuit or Aleutian (1%) and Other (4%). At the first time point (1st grade) these 978 children did not differ from the 102 children who were not included in the current sample because they had data only in 1st grade but not in 3rd or 5th grade (i.e. ‘non–participants’) with respect to gender (49% boys among participants vs. 57% boys among non–participants), race (83% whites among participants vs. 77% whites among non–participants), maternal education (an average of 14.5 years among participants vs. 14.2 years among non–participants) and the presence of a husband or partner for the mothers (82% among participants vs. 83% among non–participants). Mothers of those 978 children, however, had significantly higher family incomes compared to the 102 non–participants (income–to–need ratio; M = 4.06 vs. 3.01, t (980) = 4.29, p < .05). The analysis sample did not differ from the non–participants with respect to mothers’, fathers’ and teachers’ reports on hyperactivity/impulsivity in 1st grade, but showed slightly lower levels of hyperactivity/impulsivity according to observers’ ratings (M = 4.35 vs. 4.14, t (949) = 2.03, p = .04).
Measures
We used mothers’, fathers’, teachers’ and observers’ (trained paid research staff) ratings on items pertaining to several key indicators of hyperactivity and impulsivity. The items were selected based on (1) face validity from a variety of instruments that are described here (see Table 1 for an overview), and (2) availability across the three measurement points (i.e. measured consistently in 1st, 3rd and 5th grade.
Measures used to construct hyperactivity–impulsivity, by time point, and N with valid data for mothers’ (M), fathers’ (F), teachers’ (T) and observers’ (O) reports
Parent and teacher ratings of children's hyperactive–impulsive behaviour were gathered at each time point. The CBCL and the closely related TRF (Achenbach, 1991) are scored using a three–point Likert–type scale: 0 = not true, 1 = somewhat or sometimes true, 2 = very true or often true. From the CBCL and the TRF, we included the same two items from each (cannot sit still, restless, hyperactive; impulsive or acts without thinking).
The classroom observation system (COS) was developed by the SECC Steering Committee for the National Institute of Child Health and Development Study of Early Child Care and Youth Development (2006). The COS captured discrete child behaviours and interactions with others in the classroom over the course of two 44–minute observation cycles. Each cycles consisted of three 10–minute time sampled periods (30–second observe and 30–second record intervals). In addition to the discrete behavioural recordings completed, a qualitative rating was completed for each of these cycles using a global 7–point scale (1 = uncharacteristic to 7 = extremely characteristic). These three global ratings were then averaged to yield an overall child rating across the observational cycles. For hyperactivity, we used the activity level scale which reflected an overall rating of how physically active, restless and fidgety the child was during the observation period. Inter–rater reliability estimates for the activity level based upon repeated measures ANOVA (i.e. the unbiased estimate of the reliability of the mean of k = 2 measurements after taking into account differences in the raters, k = number of raters, described by Winer, 1971) were .70 for 1st grade ratings, .67 for 3rd grade ratings and .61 for 5th grade ratings.
Statistical analysis
We used CFA models via structural equation modelling (SEM; Bollen, 1989) to estimate measurement models of the behavioural dimensions of hyperactivity–impulsivity. We used two criteria to determine empirical indices of a certain latent psychological construct (Nunnally & Bernstein, 1994). First we considered content validity by examining the adequacy with which a specified domain of content was sampled. Each item/or scale that is said to comprise the latent construct must stand on its own as an adequate representation of that construct. Next we conducted a series of factor analyses to test internal validity—demonstrated empirically as showing that correlations among the manifest variables can be patterned according to the mathematical expectations of a single latent construct (McArdle, 1996). Only the observed variables that have significant factor loadings on the latent factors were maintained using a cutoff of .3 (accounting for 9% of the variance; Pedhazur & Schmelkin, 1991) for what we consider minimally acceptable factor loadings (with one exception of .22 for the COS rating on the school context latent factor in 5th grade).
We focused on testing measurement models for the latent context factors (i.e. home and school) based on the scores reported by multiple informants to study the factorial composition of hyperactivity–impulsivity while differentiating context effects. The latent factors of the home context consisted of the scores reported by mothers and fathers, and the latent factors of the school context consisted of the scores reported by teachers and observers. Items were reverse scored if necessary so that higher scores indicated higher levels of hyperactivity–impulsivity.
In order to examine longitudinal stability and changes in these behavioural indicators of temperament, we estimated longitudinal factor analysis models that specified within–occasion correlations between home and school context factors (i.e. cross–context correlations) as well as between–occasion auto–regressive effects for the same context factor (i.e. stability of contexts). We used the Amos 7.0 program (Arbuckle, 2006) that estimated parameters incorporating full information maximum likelihood (FIML) methods. The FIML methods allowed data from all individuals to be included regardless of their pattern of missing data and are more appropriate than other commonly used methods such as mean substitution. In evaluating the overall goodness of fit of each model, we report the χ2 goodness–of–fit statistic (χ2), degrees of freedom (df) and corresponding p value; the comparative fit index (CFI) and the root mean square error of approximation (RMSEA). CFI values greater than .90 and RMSEA values of .08 or lower are indicative of good–fitting models (Browne & Cudeck, 1993). In order to the significance of cross–context correlations and informant effects, equality constraints were hierarchically imposed to test the adequacy of the constraints using nested χ2 difference tests (Bollen, 1989). We also examined changes in CFI, because the χ2 difference test for nested model comparisons may be too sensitive with relatively large samples (Cheung & Rensvold, 2002). We tested if a value of ΔCFI is greater than the recommended .01 (Cheung & Rensvold, 2002) to reflect a meaningful difference in model fit.
Results
Table 2 presents descriptive statistics (Means and SDs) of hyperactive–impulsivity scores based on mother, father, teacher and observer reports in 1st, 3rd and 5th grades—for the whole sample and separately for boys and girls. Two indicators for mother, father and teacher ratings were averaged within informant and time. We performed a series of t–tests to examine differences between boys and girls, using a Bonferroni–adjusted α level (i.e. dividing the per comparison α level by the number of outcome variables, α = .004) to control for inflated Type I error rates (Jaccard & Guilamo–Ramos, 2002). Boys were rated as showing higher levels of hyperactive–impulsive behaviours regardless of informants and measurement time points (t = 3.51 to 9.51, p < .001).
Means and standard deviations for mothers’, fathers’, teachers’ and observers’ ratings of hyperactivity–impulsivity
We conducted repeated–measures general linear modelling (GLM) analysis to examine whether there were significant developmental changes in the levels of hyperactive–impulsive behaviours across 1st, 3rd and 5th grades, and whether the changes differed by gender (i.e. time by gender interactions). The results indicated that reports by mothers, fathers and teachers showed significant declines in children's hyperactive–impulsive behaviours regardless of gender, with F (1, 844) = 54.06, p < .05 for mother ratings, F (1, 472) = 61.67, p < .05 for father ratings and F (1, 779) = 7.13, p < .05 for teacher ratings. In contrast, there were no significant changes across 1st, 3rd and 5th grades based on observer ratings, F (1, 779) = .38, p = .53.
We estimated zero–order bivariate Pearson correlations between all of the study variables which included four items for the home context and three items for the school context at each measurement time (available upon request). Inter–item correlations ranged from .20 to .41 for the home context, from .22 to .64 for the school context in 1st grade; from .27 to .49 for the home context and from .21 to .64 for the school context in 3rd grade and from .33 to .52 for the home context and from .16 to .58 for the school context in 5th grade. Correlations between parent ratings ranged from .20 to .41 in 1st grade, from .27 to .49 in 3rd grade and from .33 to .52 in 5th grade. Inter–item correlations for teacher ratings were .64 in 1st grade, .64 in 3rd grade and .58 in 5th grade.
Figure 1 presents the longitudinal factor model of context and informant effects. Different informants represent possible systematic errors that are referred to as method factors (Campbell & Fiske, 1959). Informant effects were measured by the degree of covariation between the unique factors for the variables reported by the same informant at a given measurement time (Kenny & Kashy, 1992). Stabilities within the same informants were estimated through auto–correlations over time (between the adjacent times) for unique factors for the same variable provided by the same informant (see McArdle & Nesselroade, 1994). First, we tested the significance of informant effects in studying temporal stability and cross–context consistency of hyperactivity–impulsivity by comparing two models—one with and one without informant effects. If there are no systematic informant variances, then the fit of the simpler model (the one without estimating informant effects) should not be significantly worse than the one estimating informant effects. Second, we compared the models with and without within–time cross–context correlations in order to appreciate the significance of cross–context consistency. If the home and the school contexts were completely independent, then the model fit should not significantly degrade for the simpler model (the one without estimating cross–context correlations) compared to the more complex model which estimates cross–context correlations.

Longitudinal factor model of context and informant effects for hyperactivity–impulsivity from 1st grade to 5th grade. CBCL = child behavior checklist; TRF = teacher report form; COS = classroom observation system; M = mother; F = father; T = teacher; O = observer; G1 = 1st grade; G3 = 3rd grade; G5 = 5th grade.
We fitted a model that specified both informant effects and cross–context correlations. The correlations between unique factors (within a measurement time) represented informant effects, representing variances due to different sources of informants: Mother, father and teacher in 1st, 3rd and 5th grades. In addition, to consider the longitudinal stability of informant effects, correlations between unique factors of the corresponding variables for the same informant between two adjacent occasions were estimated. The longitudinal factor model with informant effects—‘the informant effect model’—fit the data well, χ2 (165) = 505.77, p = .00, CFI = .94, RMSEA = .05. In a subsequent model, the informant effects were fixed to zero. This model without informant effects provided a mediocre fit, χ2 (182) = 946.35, p = .00, CFI = .87, RMSEA = .07. Compared to the model with informant effects, the model fit for the model without informant effects degraded significantly, Δχ2 = 440.58, Δdf = 17, p < .05 and ΔCFI = .07.
Next, we compared the model with both informant effects and cross–context correlations with two alternative models that fixed cross–context correlations to zero in order to test the context specificity—whether the home and the school contexts are independent of each other. In this series of models, if we fixed one of the cross–context correlations to zero, the variance for the home context in 3rd grade became small yet negative (e.g. −.005). In order to avoid such a Heywood case (i.e. the estimated error term for an indicator for a latent variable is negative), the variance of the home context factor in 3rd grade and the covariance between the home and the school context factors in 3rd grade had to be fixed to zero. Consequently, we were not able to test the significance of the cross–context correlations in 3rd grade. We first tested the significance of the zero–order correlation between the home and the school contexts at 1st grade. This model without the cross–context correlation in 1st grade provided a mediocre fit, χ2 (168) = 830.76, p = .00, CFI = .88, RMSEA = .06. Compared to the model that estimates 1st grade correlation between the home and the school contexts—with the variance of the 3rd grade home context and the 3rd grade cross–context covariance fixed to zero, χ2 (167) = 508.06, p = .00, CFI = .94, RMSEA = .05—the model fit without the 1st grade cross–context correlation had a significantly worse fit, Δχ2 = 322.70, Δdf = 1, p < .05 and ΔCFI = .06.
We examined the significance of cross–context consistency between the home and the school contexts in 5th grade by fixing the correlation between the two context factors to zero. In doing so, within–context stability was kept in the model order to control for possible confounding effects of stability when examining cross–context consistency. This model without the cross–context correlation in 5th grade yielded an acceptable fit, χ2 (168) = 516.88, p = .00, CFI = .94, RMSEA = .05. Compared to the model with the cross–context correlation in 5th grade, the model fit did not significantly degrade by assuming a correlation of zero between the home and the school contexts in 5th grade, as indicated by a trivial change in CFI, Δχ2 = 8.82, Δdf = 1, p < .05 and ΔCFI = .002.
As shown in Table 3 and Figure 2, a closer examination of the significant coefficients in the best–fitting model—the informant effect model—indicated that all of the within–context stability coefficients were significant and substantial. There was a strong stability in the home context between 1st grade and 3rd grade (β = .97, p < .05), and between 3rd grade and 5th grade (β = .90, p < .05). Similarly, there was significant stability, with weaker magnitude compared to the home context, in hyperactivity–impulsivity measured in the school context between 1st grade and 3rd grade (β = .95, p < .05), and between 3rd grade and 5th grade (β = .69, p < .05). With respect to cross–context consistency, the concurrent correlation between the home and the school contexts in 1st grade was high and significant (r = .71, p < .05). In contrast, the association between hyperactivity–impulsivity measured in the home context and hyperactivity–impulsivity measured in the school context was significant but low in 5th grade (r = .25, p < .05), after controlling for the context–specific stabilities (i.e. auto–regressive effects of 3rd grade context on 5th grade context). It should be noted that the variances of home and school context factors as well as the covariance between them (r = .55, p = .20) in 3rd grade became non–significant after controlling for informant effects and context–specific stabilities. This may be in part due to extremely high percentages of variances that were explained by the auto–regressive effects of the 1st grade factors (i.e. 94% for the home context and 90% for the school context).
Unstandardized parameter estimates, standard errors and critical ratios for longitudinal factor models of hyperactivity–impulsivity from 1st grade to 5th grade
Note: CBCL = child behavior checklist; TRF = teacher report form; COS = classroom observation system; M = mother; F = father; T = teacher; O = observer; G1 = 1st grade; G3 = 3rd grade; G5 = 5th grade. The ‘ = ’ symbol means a parameter is fixed.
p < .05.

Structural equation model showing standardized estimates of stability coefficients and cross–context correlations for hyperactivity–impulsivity.
We observed a general tendency for the context–specific stability to increase when considering informant effects that may have been confounded with context effects (see Table 3). Specifically, when we compared the longitudinal factor models with and without informant effects, stabilities for the home context increased from β = .95 to β = .97 for 1st grade–3rd grade but slightly decreased from β = .92 to β = .90 for 3rd grade–5th grade. Stabilities for the school context increased from β = .73 to β = .95 for 1st grade–3rd grade and from β = .64 to β = .69 for 3rd grade–5th grade.
An interesting trend was found for changes in the cross–context consistency (i.e. correlations between the home context and the school context latent factors) between models with and without informant effects. It appears that the cross–context correlations decrease over time while at the same time informant effects become less responsible for context–specificity as children age. Specifically, the correlation between the home and the school contexts increased from r = .62 to r = .71 in 1st grade and decreased from r = .37 to r = .25 in 5th grade. By estimating informant effects, the overlapping variances between the home and the school contexts notably increased (from r2 = .38 to r2 = .50) in 1st grade, whereas the common variance between the two contexts decreased in 5th grade (from r2 = .14 to r2 = .06). Finally, as for longitudinal stability of informant effects, auto–correlations over time for the unique factors ranged from .21 to .28 for mother reports and from .13 to .24 for father reports.
Although we did not expect any specific gender differences in the structural relationships among the longitudinal measures of hyperactivity–impulsivity scores measured for the home and the school contexts, we explored possible differences between boys and girls in terms of the magnitude of within–context stabilities and cross–context consistencies using two–group structural equation models. In general, boys and girls did not seem to differ significantly with respect to within–context stabilities and cross–context consistencies. 1
Discussion
We used multi–method multi–informant scores based on a large, representative national sample to model context–specific stability (i.e. home vs. school) and cross–context consistency in hyperactivity–impulsivity while considering informant effects that may confound both context–specific stability and cross–context consistency. Our findings suggested (1) stronger longitudinal stability of hyperactivity–impulsivity within a context compared to cross–context consistency, (2) higher within–context stability in the home context than the school context and (3) systematic variances that were introduced by sharing a common measurement method (e.g. parent vs. teacher rating) and influenced the estimates of within–context stability and cross–context consistency.
Several findings of this study are notable as they add to the current knowledge base regarding context specificity and informant/method effects in child temperament. First, it is noteworthy that the magnitude of the estimates of cross–context consistency of hyperactive–impulsive behaviour was exceeded by the magnitude of the estimates of temporal stability within each context (regardless of whether or not informant effects were considered). Research examining genetic and environmental sources of variance in hyperactivity–impulsivity commonly reports significant genetic influences. High heritability for hyperactivity–impulsivity also has been indicated regardless of measurement instrument, informant and context (Derks, Hudziak, Dolan, van Beijsterveldt, verhulst, & Boomsma, 2008; Kuntsi & Stevenson, 2001; Silberg et al., 1996). Particularly, high heritability has been indicated for hyperactivity–impulsivity across middle childhood (Derks et al., 2008). For example, heritability estimate of 88% was indicated in a recent study of 8–year–old twins for hyperactivity–impulsivity (McLoughlin, Ronald, Kuntsi, Asherson, & Plomin, 2007). These genetic influences are thought to represent, at least in part, a biological basis for temperament traits. However, our findings demonstrated that context–specific variance in behaviour is systematic and not trivial even for the stable, trait–like temperament behaviour of hyperactivity–impulsivity. Just as important, these context effects may only be modestly confounded with systematic effects of informants. The correlations between hyperactive–impulsive behaviour in the home and the school contexts were significant and substantial (average r = .50) yet far from unity. In comparison to prior analyses (see Kim, Mullineaux, Allen, & Deater–Deckard, in press) that examined other temperament behaviours such as attention regulation (part of a broader effortful control dimension) and frustration/anger (part of a negative affectivity dimension), hyperactivity–impulsivity (part of a surgency/extraversion dimension) seems to have lower levels of consistency across diverse contexts than attention (average r = .68) but higher levels of consistency across contexts than anger (average r = .38). Taken together, the findings lend support for prior research suggesting that the particular pattern of context stability may depend on the particular aspect of temperament or personality in question (e.g. Majdandzic & van den Boom, 2007).
Second, greater temporal stability would be expected for the home–context because it is defined using the same informants over time (mothers and fathers) and represents the same stable home environment for the vast majority of children (e.g. Hanson, 1975). Furthermore, parental perceptions are based on a larger sample of daily experiences and their views of the child may be less likely to change over time (e.g. Majdandzic & van den Boom, 2007; Rothbart et al., 2001). In contrast, the school context is defined using different informants in the same ‘role’ (i.e. teacher or observer) across the time points, and the classrooms obviously are different at each time point. From a methodological point view, lower levels of temporal stability for the school context factor compared to the home context factor may be in part due to the fact that the school context factor included observations of temperament that have shown to hold lower stability than questionnaires. Our data were consistent with such an expectation showing that the within–context stability was higher for the home context (average stability coefficient = .94) than for the school context (average stability coefficient = .69) before controlling for informant effects. The same pattern was found after controlling for informant effects (average stability coefficient = .94 for the home context and average stability coefficient = .82 for the school context). More generally, within–context stability estimates were higher than the stability estimates based on simple zero–order correlations, as we used the latent factor SEM approach that increased reliabilities of the variables by controlling for measurement errors as well as method variances due to informants. Therefore, these findings suggest that even after taking into consideration any systematic effects that result from a particular informant, children's expression of hyperactivity/impulsivity was influenced by context–specific effects due to factors related to environmental settings (e.g. Kraemer et al., 2003).
Interestingly, there was evidence for differences in developmental changes in the stability of hyperactivity–impulsivity across the home and the school contexts. Specifically, the variances explained by auto–regressive effects (e.g. regression effects of 1st grade home context factor on 3rd grade home context factor) were high and fairly comparable between the two contexts in 3rd grade (94% for home context and 90% for school context). However, in 5th grade, variances explained by auto–regressive effects were fairly stable and still high for the home context (81%), whereas such variances dropped by almost half in magnitude for the school context (47%). This notable reduction may be due in part to modest decreases in measurement reliability especially in fifth grade, as seen in inter–rater reliabilities for the observation measures as well as inter–item correlations for the teacher ratings that composed the school context factor. Furthermore, when the variance attributable to potential informant effects were taken into consideration, the within–context stability increased notably for the school context behaviours (the average change in the stability coefficient = .13), but not for the home context (the average change in the stability coefficient = 0). Such findings seem to support the commonly assumed but rarely tested assumption that the magnitude of informant effects on estimates of stability of behaviour will be stronger as a function of the number of informants being used. In other words, variances in stability estimates that can be attributed to informant bias are expected to be greater when larger numbers of informants are involved.
Third, there were significant informant effects that influenced stability within a context. The overall fit of the longitudinal factor model for hyperactivity–impulsivity significantly improved when the informant effects were estimated. Longitudinal stability within a context increased in general once the informant effects were taken into account, suggesting that children's hyperactive–impulsive behaviours were substantially stable over time. Our results are in support of the recent findings of Cole and his colleagues who demonstrated that failure to include residual correlations that reflect shared method variance can change the meaning of the extracted latent variables and generate potentially misleading results (Cole, Ciesla, & Steiger, 2007). Our finding further implies that if we ignore informant effects in testing temporal stability of a psychological construct, we may obtain biased estimates of stability as they would be attenuated.
Fourth, the cross–context consistency between the home and the school contexts within a measurement time changed when we controlled for confounding informant effects. Specifically, the variances that overlapped between the two contexts increased from 38 to 50% in 1st grade. The common variances decreased from 14 to 6% in 5th grade after controlling for informant effects as well as context–specific auto–regressive effects. These findings suggest that context–specific effects may become over– or under–estimated thus making children's behaviours look less or more consistent across different contexts (e.g. Zeman & Shipman, 1996) when informant effects are not appropriately addressed. Our findings lend support for the argument by prior researchers who emphasized the importance of the psychological assessment of children being a multi–informant and multi–method process (Frazier & Youngstrom, 2006; Kendall & Morris, 1991). By collecting data from multiple informants and modelling explicitly the informant/method variances, researchers can better control the potential bias in estimating context–specific effects arising from confounds with informant effects.
One of the limitations of our secondary data analysis was that we were constrained by the available data—the items that were consistently available across all four occasions. This limited our ability to construct more comprehensive home and school context factors of hyperactivity–impulsivity. As the estimates of stability and consistency are expected to be dependent upon the method of assessment, it is important to note that the CBCL/TRF items are rather narrow descriptions of the child's behaviour, compared to measures of temperament impulsivity and activity level (e.g. CBQ). The limitation with data availability also prevented us from constructing latent factors that represent informant effects, because we had fewer than three indicators for each informant.
This limitation aside, the present study illustrates an approach for deriving multi–method composite scores while also considering context–specific behaviour and controlling for systematic and transient informant effects. Our longitudinal factor model has broad applicability and is appropriate for use in other existing studies that measure child behaviour across parent, teacher/caregiver, tester, observer and self–ratings. The findings demonstrate how ‘perspective’ (i.e. characteristics of the informant) and ‘context’ (i.e. factors related to circumstances that might influence the subject's expression of the trait) interface with the other to influence testing of the stability of the ‘trait’ (Kraemer et al., 2003). It would be informative to apply the approach we used to answer questions about personality stability later in life when self–reports are available. Researchers are encouraged to consider both context and informant effects in studying the stability of personality traits among individuals using self–reports and others’ ratings (such as family member and peer ratings).
Furthermore, the current findings illuminate potentially different patterns of stability and change for key indicators of Rothbart et al.'s (2001) three child temperament dimensions. Specifically, temperament behaviours of a negative affectivity dimension (represented by frustration/anger) seem to be the most specific to the influences and changes in the environmental settings (e.g. Kim et al., in press), followed by temperament behaviours of a surgency/extraversion dimension (represented by hyperactivity–impulsivity behaviours). Temperament behaviours of an effortful control dimension (represented by attention regulation) seem to be the least specific to the influences and changes in the child's environment (e.g. Kim et al., in press). More broadly, our results affirm the significant context effects on individual differences in the development of temperament and personality, by illustrating significantly differential behaviours across different contexts that were not simply attributable to systematic informant bias with respect to hyperactivity–impulsivity—an aspect of temperament that is highly stable and easily observable by others. Our understanding of longitudinal stability in personality and behavioural development will be greatly enhanced by considering factors that contribute to context–specificity and context–consistency as well as informant–specific effects.
Footnotes
Acknowledgements
This work was supported by NICHD HD54481. The Study of Early Child Care and Youth Development was conducted by the NICHD Early Child Care Research Network and was supported by NICHD through a cooperative agreement that calls for scientific collaboration between the grantees and the NICHD staff. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Child Health and Human Development or the National Institutes of Health.
