Abstract
We examined the psychometric properties of an experience–sampling measure of affect (PANAS) using data from self– and peer reports. A multivariate multilevel model was used to assess the reliability of the latent PANAS scales at the within– and between–person level. Findings suggest satisfying internal consistencies for self– and peer reports of affective experiences at both levels of analysis. Convergent and discriminant validity of the two affect scales were examined by means of a multilevel multitrait–multimethod approach (MLM–MTMM) indicating distinct findings at the within– and between–person level. These findings provide further insights into the structural relations between the two PANAS scales: Whereas positive and negative affect were unrelated at the between–person level; they were negatively correlated at the within–person level. Copyright © 2010 John Wiley & Sons, Ltd.
Introduction
Measuring affective experiences as they unfold in people's daily lives and natural environments has been a core ambition of psychology since the early days of mood research (Flügel, 1925). After several decades of reduced interest, research into affect has prospered during the last decades (Ilies, Dimotakis, & Watson, 2010; Kaplan, Bradley, Luchman, & Haynes, 2009). This trend can probably be attributed to significant advances in two important areas improving and relieving the measurement of affective experiences.
The first concerns the development of theoretically founded and psychometrically well–construed scales to assess affective states and traits (e.g. Larsen, Diener, & Lucas, 2002; Steyer, Schwenkmezger, Notz, & Eid, 1997; Watson, Clark, & Tellegen, 1988). Certainly one of the most popular and widely used affect scales is the Positive and Negative Affect Schedule (PANAS, Watson et al., 1988). This questionnaire measures people's affective experiences along the two broad dimensions of positive affect (PA) and negative affect (NA). According to the PANAS model, these dimensions should represent two unipolar factors that are largely independent of one another (Watson et al., 1988; see also Watson & Tellegen, 1985). Though there is more dissent than consent regarding the dimensional structure of affect in general, and of the PANAS model in particular (Russell & Carroll, 1999; Watson & Tellegen, 1985), the PANAS framework has stimulated a large body of research in both basic and applied psychology (e.g. Kaplan et al., 2009; Schmuckle, Egloff, & Burns, 2002). It thus seems fair to state that the PANAS has significantly contributed to the fact that ‘Psychology has rediscovered affect’ (Watson & Tellegen, 1985, p. 219).
The second area of progress that has notably revitalized scientific interest in affect concerns the intensive repeated assessment of affective experiences as they naturally occur in people's daily lives (e.g. Conner, Feldman Barrett, Tugade, & Tennen, 2007; Nezlek, 2007). Known by several labels, such as diary studies, daily assessment approach, ecological momentary assessment or experience–sampling method (ESM; Hektner, Schmidt, & Csikszentmihalyi, 2007), these data collection techniques refer to a methodology in which participants respond to repeated assessments over the course of time while functioning within their natural settings (Conner et al., 2007; Scollon, Kim–Prieto, & Diener, 2003). Particularly during the last 30 years, the ESM has enjoyed a boom in popularity. This has been closely associated with fast–pacing technological and statistical advances simplifying the relatively complex assessment and analysing procedures that come along with this methodology (e.g. Conner et al., 2007).
Combining this innovative measurement technology with a well–construed affect scale, like the PANAS, researchers today are in the position to capture affective experiences as they unfold in people's daily lives. Though it seems perfectly obvious that this strategy offers many advantages over other assessment strategies, such as retrospective self–report measures or laboratory designs (Conner et al., 2007; Hektner et al., 2007), the ESM should yet not be considered as a panacea for measuring affective experiences. In fact, as with any other assessment strategy, the advantages of this methodology are only available on the condition that the repeated assessments data are valid for their intended purpose. A sine qua non is that the employed affect measure proves to be sufficiently reliable and valid (Reis & Gable, 2000; Scollon et al., 2003).
The purpose of the present study was to evaluate the structure and the psychometric properties of the PANAS in a multi–rater ESM study on affective experiences in peoples’ daily work lives. By this means, the present study aimed at contributing to the ongoing debate concerning the underlying structure of the PANAS, that is, the relation between its two scales, PA and NA (e.g. Diener & Emmons, 1984; Ilies et al., 2010; Schmuckle et al., 2002). Beyond that, we aimed at demonstrating how multivariate multilevel modelling techniques (MLM, Hox, 2002) can be used to examine the reliability and construct validity of ESM measures.
Debating The Structure Of Affect
Watson et al. (1988; see also Watson & Tellegen, 1985) themselves define PA and NA as largely independent dimensions of affect implicating non–significant correlations between affective experiences of different valence. Supportive of this independence, the findings of quite a few studies suggest that the two broad affect factors have different etiologies and operate through different biological and behavioural mechanisms (Baker, Cesa, Gatz, & Mellins, 1992; Rafaeli & Revelle, 2006; Watson, 2000; Watson, Wiese, Vaidya, & Tellegen, 1999). Further support comes from studies showing that PA and NA are differentially correlated with a variety of relevant antecedents and outcomes of affect. Ilies et al. (2010), for example, have shown that blood pressure is significantly related to NA, but unrelated to PA.
These findings notwithstanding, the independence assumption has been challenged, not only by different theoretical arguments, but also by empirical findings suggesting that PA and NA might rather represent opposite poles of a single affect dimension. Green, Goldman and Salovey (1993), for instance, have emphasized the necessity to control for measurement error, because random and systematic error could artificially attenuate correlations between PA and NA. That is, in contrast to correlations that are based on raw data, estimating and removing the effects of measurement error in latent variable analysis should produce considerably stronger correlations between PA and NA (see also, Schmuckle et al., 2002).
The debate has been further complicated by the notion that the structure of affect should depend on the specific level and time frame of consideration (e.g. Diener & Emmons, 1984; Scollon, Diener, Oishi, & Biswas–Diener, 2005; Vansteelandt, van Mechelen, & Nezlek, 2005; Zelenski & Larsen, 2000). Reflecting the operation of different processes at distinct levels, different structural relationships between PA and NA should exist at the state level compared to the trait level. More precisely, the findings of several studies suggest that the proposed independence of PA and NA does only hold between persons (i.e. at the trait level): Individuals who are generally high on trait PA could also be high (or low) on trait NA. Within persons (i.e. at the state level), on the other hand, PA and NA are, however, negatively related: At one given moment, individuals either experience high levels of PA or high levels of NA, while it is not usual to simultaneously experience high levels of both states at the same time (e.g. Schmuckle et al., 2002; Vansteelandt et al., 2005; Zelenski & Larsen, 2000).
For a quarter century, the structure–of–affect–debate has stimulated numerous studies in which a variety of research designs, measurement instruments and analytic strategies have been used to provide evidence for either the one or the other camp (for reviews, see e.g. Russell & Carroll, 1999; Watson & Tellegen, 1999). Thus, a clear and differentiated answer that acknowledges the complexity of the seemingly simple question on the structure of affect still has to be given. However, there is at least one insight that most researchers of both camps would generally agree on: Capturing people's affective experiences as they actually occur in daily life, the ESM approach can be considered the silver bullet to examine the underlying structure of affect (e.g. Ilies et al., 2010; Schmuckle et al., 2002).
Measuring Affective Experiences In Daily Life
The standard ESM study on affective experiences usually has two levels of analysis: The within–person level involving intraindividual processes (e.g. ‘do people vary in their affective experiences depending on the weather?’; Denissen, Penke, Butalid, & van Aken, 2008) and the between–person level encompassing interindividual differences (e.g. ‘do people differ in their affective reactions to weather?’). Several researchers have pointed out that correlations at the within–person level represent state conceptions of affect, whereas between–person correlations computed from aggregated ESM data refer to affective traits (e.g. Zelenski & Larsen, 2000). Allowing for an examination of intraindividual state processes while taking population–relevant between–person differences into account, ESM designs thus unify the specific merits of idiographic and nomothetic research strategies (Reis & Gable, 2000).
As a further benefit, ESM reduces several methodological problems that plague traditional questionnaire or laboratory studies (Conner et al., 2007; Furr, 2009; Scollon et al., 2003). In particular, recording the constructs of interest in vivo and in situ, retrospection artefacts and several cognitive judgment biases can be diminished; as well as there is no problem with artificial or unrealistic laboratory settings. In view of these advantages, it seems fair to conclude that ESM data provide quiet a clear view onto participants’ affective experiences in everyday life.
However, the ESM, like every other assessment strategy, has its own methodological difficulties and flaws (Bolger, Davis, & Rafaeli, 2003; Scollon et al., 2003). In particular, ESM reports are self–reports and thus, still subject to problems, like social desirability or self–deception. Furthermore, the large number of repeated assessments imposes substantial demands on participants. To reduce these burdens, researchers usually prefer short scales and reduced item sets as ESM measures, running the risk to assess the constructs at issue with a rather low reliability and validity (Furr, 2009). As with any other assessment tool, it is thus indispensable to examine the psychometric quality of ESM data appropriately before airily interpreting the results.
Repetitively emphasizing the high ecological validity as the prime justification for ESM designs, studies reporting psychometric data for ESM measures are relatively sparse (Hektner et al., 2007). When psychometric issues are considered at all, the estimated reliability and validity coefficients are usually based on the analyses of interindividual variance, referring to the between–person level. Despite the well–known fact that relationships at the between– and within–person levels are mathematically independent (Nezlek, 2007), the within–person variance, on the other hand, has often been neglected (for exceptions see e.g. Bleidorn, 2009; Nezlek & Gable, 2001; Wilhelm & Schoebi, 2007).
Reliability Of Experience–Sampling Measures
The traditional procedures to assess reliability have often been modified to adapt as adequate as possible to the specific ESM data structure (Hektner et al., 2007). Several studies have examined modified split–half reliabilities of ESM measures (e.g. Csikszentmihalyi & Larson, 1987; Fleeson, 2001; Larson, Moneta, Richards, & Wilson, 2002). In these studies, usually one set of aggregated responses has been compared with a second set of aggregated responses from the same person (e.g. average ratings from the first vs second halves of the weeks). For instance, Larson et al. (2002) have reported split–week correlations for affect ratings between .55 and .67. Indicating the consistency of the average response pattern of a person over time, these coefficients exclusively refer to the between–person level (i.e. the ability of the ESM measure to differentiate between individuals). However, these coefficients do not tell us anything about the reliability at the within–person level (i.e. its ability to differentiate between measurement occasions).
Other ESM studies have estimated the internal consistency by calculating Cronbach's alpha on the basis of aggregated item means (averaged per person across measurement occasions). In an overview of several ESM studies on affective experiences, Hektner et al. (2007) report α coefficients between .70 and .90. Again, these coefficients solely refer to the between–person level but do not reveal the reliability of the ESM measures at the within–person level.
It is essential to note that an adequate estimation of the reliability at both the between– and the within–person level requires a proper consideration of the nested structure of ESM data as offered by MLM techniques (Hox, 2002; Raudenbush & Bryk, 2002). Specifically multivariate MLM models (with items nested within scales, scales nested within measurement occasions and occasions nested within persons) provide a framework to estimate between– and within–person reliabilities of latent ESM scale scores simultaneously (Bleidorn, 2009; Nezlek, 2007; Nezlek & Gable, 2001; Wilhelm & Schoebi, 2007). Since these techniques are comparatively new, ESM studies reporting level–specific reliabilities that have been derived from such models are still sparse. To the best of our knowledge, there is yet no ESM study that has examined the reliability of the PANAS at both the within– and the between–person level of analysis simultaneously.
Validity Of Experience–Sampling Measures
When one is interested in constructs, like PA and NA, which are hypothetical in nature and cannot directly be observed, it is crucial to evaluate the construct validity of the measure—that is, the degree of convergence of the assessed (i.e. manifest) construct with the true (i.e. latent) construct (Shadish, Cook, & Campbell, 2002). Given the lack of knowledge about the true nature of a construct, convergence with related constructs and across different assessment methods for the same construct is usually considered the best available alternative.
Numerous previous ESM studies have used retrospective one–time questionnaires to validate their aggregated ESM scale scores (averaged per person across measurement occasions, for an overview, see Hektner et al., 2007). For instance, Kraan, Meertens, Hilwig, Volovics, Dijkman–Caes and Portegijs (1992) have found that aggregated ESM ratings of affective experiences correlate moderately but in expectable ways with clinical one–time instruments of depression and anxiety. Three aspects are critical to note with these studies: First, the obtained validity coefficients refer to aggregated ESM measures at the between–person level, but do not tell us anything about the validity at the within–person level. Second, low correlations between aggregated ESM measures and one–time questionnaires of affective experiences may not necessarily indicate a lack of validity, but may also be due to different purposes of the assessment strategies (Hektner et al., 2007). Finally, exclusively focusing on convergent validity, these studies only cover one aspect of construct validity. According to Campbell and Fiske (1959), one should also inspect discriminant validity to prove that the measure does not correlate with other constructs assumed to be unrelated.
The multitrait–multimethod matrix (MTMM) has fundamentally shaped researchers’ awareness of the essential role of multimethod research (Campbell & Fiske, 1959; Shrout & Fiske, 1995). The MTMM procedure involves the assessment of more than one theoretical construct using more than one assessment instrument for each construct resulting in a matrix of correlations among these instruments. In ESM research, however, multimethod approaches supplementing ESM self–reports with other non–self–report measures are an exception (Scollon et al., 2003). In fact, hitherto there is no ESM study that has examined convergent and discriminant validity of the constructs at issue at both levels of analysis using self– and peer reports in conjunction with a MTMM approach. On one hand, this lack of research is probably due to technical difficulties that come along with a multi–rater ESM design impeding the provision of a sufficient sample of data points to which both self– and peer raters have simultaneously responded to the ESM items. On the other hand, the paucity of multi–rater studies is typical for research into affective experiences. In fact, due to the subjective quality and internal nature of affective experiences, it is often assumed that it would be difficult to judge other people's affective experiences with a sufficient degree of accuracy. However, using traditional one–time measures, the few available multi–rater studies on affective experiences have revealed significant, although comparatively small, correlations between self– and peer ratings (e.g. Lucas, Diener, & Suh, 1996; Watson & Clark, 1991).
The Present Study
In spite of the large body of research that has been done in the course of the structure–of–affect–debate, there still is no ESM study that has employed MLM procedures to assess the psychometric quality and underlying structure of the PANAS at both the within– and between–person level simultaneously, while controlling for measurement error. In order to bridge this particular gap, the present study aimed at addressing two distinct but related issues; namely the measurement and the underlying structure of affect.
More precisely, interested in the psychometric quality of the ESM for assessing affective experiences, we aimed at examining the reliability and validity of the PANAS as an ESM measure. Employing a multi–rater ESM design in conjunction with multivariate MLM procedures, we were able to do this at both the within– and the between–person level simultaneously. In view of the good reliability and validity of the PANAS as a one–time measure of affective experiences (Watson & Vaidya, 2003), we expected this questionnaire to also show satisfying psychometric properties as an ESM measure. That is, the two PANAS scales should not only reliably differentiate between persons but also between measurement occasions. In order to evaluate the construct validity of the PANAS at both the within– and the between–person level simultaneously, we capitalized on our multi–rater design and set up a multilevel multitrait–multimethod model (MLM–MTMM). We expected the PANAS to demonstrate construct validity at both levels of analysis. However, referring to current research on the structural links between PA and NA (e.g. Vansteelandt et al., 2005), a differential pattern of correlations between PA and NA was expected at the within– as opposed to the between–person level of analysis.
Rather reflecting conceptual than psychometric considerations, the latter assumption already segues into the major issue of the present study. That is, beyond our psychometric examination of the ESM for assessing affective experiences, we also aimed at contributing to the ongoing debate concerning the structural links between PA and NA. Controlling for measurement error, our MLM–MTMM approach allowed for a critical inspection of the relationships between PA and NA at the state as well as at the trait level, simultaneously. Referring to previous studies that have considered the specific level of analyses (e.g. Vansteelandt et al., 2005), we expected PA and NA to be independent at the between–person level, while there should be substantial negative correlations between PA and NA at the within–person level.
Method
Participants
Our multi–rater ESM approach called for participant dyads that spend a considerable amount of time together. On this account, scientific staffs of Bielefeld University sharing one office or closely working together were asked to participate in the present study. Recruitment was conducted via e–mail distribution lists of the different departments at Bielefeld University. Initially, 52 research assistants (i.e. 26 participant dyads, each consisting of one self– and one peer rater) registered to the study. Since four dyads had to be excluded from analyses due to a too large number of missings (i.e. more than 15 corresponding affect ratings), the final sample consisted of 44 research assistants (= 22 dyads; 31 females, 13 males) employed at different departments of Bielefeld University (Chemistry, Economics, Law, Linguistics and Literary Studies, Mathematics, Pedagogy, Physics, Psychology, Sociology, Sports Science and Technology). Age of participants ranged between 23 and 43 years (M= 27.84, SD = 3.57). Average acquaintance between self– and peer raters was M = 1.87 years (SD = 1.62).
Procedure
Self– and peer ratings of momentary affect were recorded by means of an online survey tool. Taking great care to explain why independent ratings of self– and peer raters were crucial to the quality of the study, each participant was personally introduced into the course of the study and strictly instructed to complete ratings without prior consultation.
During the ESM period, participants were repeatedly reminded via e–mail to complete the online survey at three pre–scheduled rating intervals per workday (Monday to Friday; 9–11
Measures
Affective states were measured with the German version of the PANAS (Krohne, Egloff, Kohlmann, & Tausch, 1996) consisting of 20 adjectives that pertain to the two broad dimensions of PA (active, alert, attentive, determined, enthusiastic, excited, inspired, interested, proud, strong) and NA (afraid, ashamed, distressed, guilty, hostile, irritable, jittery, nervous, scared, upset). At each measurement occasion, self–raters were asked to indicate how they felt ‘during the previous hour’ by rating the 20 PANAS adjectives on a 5–point unipolar response scale (1 = very slightly or not at all, 2 = a little, 3 = moderately, 4 = quite a bit, 5 = extremely). Using identical scales and instructions, peer raters indicated how their target persons felt ‘during the previous hour’.
At the end of the experience–sampling period, participants were asked to rate the degree to which study participation had influenced their everyday behaviour by means of a 5–point scale (1 = not at all, 2 = a little, 3 = moderately, 4 = quite a bit, 5 = extremely). According to both self–ratings (M= 1.46, SD = 0.51) and peer ratings (M = 1.67, SD= 0.86), participation in the present study has not considerably influenced the target person's everyday behaviour.
Data analysis
Having regard to the nested structure of ESM data, we draw on multivariate MLM procedures using the program HLM (Version 6; Raudenbush, Bryk, Cheong, & Congdon, 2000). The multivariate MLM approach is associated with many advantages (e.g. Bleidorn, 2009; Hox, 2002; Raudenbush & Bryk, 2002; Raudenbush, Rowan, & Kang, 1991; Snijders & Bosker, 1999) of which the following are of particular importance for the present study: First, MLM accounts for the fact that the measurement occasions are a random selection of the population of possible occasions. Second, unequal spacing between measurement occasions across participants can be handled appropriately. Third, the covariances between the self– and peer–rated PANAS scales can be decomposed over the within– and between–person levels of analysis. And finally, multivariate MLM allows for a level–specific examination of the psychometric properties of the PANAS.
To address the issues of the current study, an unconditional three–level model was estimated with the four PANAS scales (i.e. self–rated PA and NA and peer–rated PA and NA) as multiple dependent variables. Level 1 represented the variation among the item scores within each measurement occasion, level 2 represented the variation among occasions within persons and level 3 referred to the variation among persons. While level 1 exclusively served as a measurement model, levels 2 and 3 may be viewed as a multivariate two–level model for the latent true scores of self– and peer–rated PA and NA states (see Appendix, for the HLM command file containing the model specification input responses).The model can be appropriately represented in three stages, starting at level 1:
where Y ijk is the score on PANAS item i at measurement occasion j for target person k; d pijk is a dummy–coded indicator variable taking on the value of 1 if item Y i belongs to scale p and 0 otherwise for the two self–rating scales of PA and NA and the two peer–rating scales of PA and NA, respectively (indexed by the subscripts 1 to 4); π pjk is the latent true score for target person k at measurement occasion j and e ijk is a measurement error assumed to be normally distributed with a mean of zero and a variance σ2.
The error variance in each scale σ
At level 2, the latent true scores of each of the two self– and peer–rating PANAS scales π p were assumed to vary across occasions within target persons:
where βp0k is the true score mean in scale p of target person k, and r pjk is a random effect on scale p associated with occasion j in target person k. For each measurement occasion, the four random effects were assumed multivariate normal with means of zero and a 4–by–4 variance–covariance matrix τπ.
At level 3, the target person mean scores on the two latent self– and the two latent peer–rating PANAS scales vary around their respective grand means γp00:
For each person, the random effects up0k were assumed multivariate normal with means of zero and a 4–by–4 covariance matrix τβ.
As can be seen from Equations (1)–(3), the fixed part of this unconditional three–level model contains p regression coefficients for the indicator variables, which are the four overall means for the two self–rated and the two peer–rated PANAS scales. The random part contains two variance–covariance matrices, τπ and τβ, and one level–1 variance σ2.This model provided us with level–specific internal consistency coefficients for the latent self– and peer–rated PANAS scales 1 . Furthermore, the true–score correlations among the self– and peer ratings at the within– and between–person level (reflecting a MLM–MTMM matrix) were estimated on the basis of the variance–covariance matrices τπ and τβ.
Results
Table 1 presents the estimates for the fixed and random parts of the unconditional three–level model for the PANAS self– and peer ratings. The upper part shows the fixed effects (γp00) representing the true score means of PA and NA as rated by the target persons themselves and their peers. As could be expected on grounds of previous research, both self– and peer ratings suggest that participants on average experienced higher levels of PA than NA at work (e.g. Miner, Glomb, & Hulin, 2005; Vansteelandt et al., 2005).
Unconditional multivariate multilevel model for ESM self– and peer ratings on the positive and negative affect schedule
Note: Ns = 22 self–rater, Np = 22 peer rater, No = 939 measurement occasions. PANAS = Positive and Negative Affect Schedule; standard errors are in parentheses.
Below, the latent variable variance–covariances matrices of self– and peer rated PA and NA are shown for the within– (τπ) and the between–person level (τβ). The diagonals of these matrices indicate that PA varied more extensively than NA at both levels of analysis and in both rating modes. For both self– and peer ratings, variation in PA was equally distributed over the within– and between–person level. NA, on the other hand, varied more extensively within than between persons.
The latent variance–covariance matrices provided us with the true–score correlations among self– and peer–rated PA and NA at the within– and between–person level representing two level–specific MTMM matrices which are shown in Table 2. The two diagonal lines show the level–specific internal consistencies for each of the four latent self– and peer rated PANAS scales. At the between–person level (see the lower part of Table 2), internal consistencies ranged between .91 for peer–rated NA and .97 for both self– and peer–rated PA suggesting that the PANAS employed as an ESM self– and peer–rating measure provide highly reliable estimates of trait PA and NA (for the average person across measurement occasions). At the within–person level (see the lower part of Table 2), coefficient alphas ranged between .50 for peer–rated NA and .88 for self–rated PA. That is, the PANAS can also be considered as sufficiently reliable to differentiate between the states of PA and NA (within–persons among measurement occasions).
Multilevel MTMM analysis: Within–person and between–person correlations between self– and peer–reported positive and negative affect
Note: Ns = 22 self–rater, Np = 22 peer rater, No = 939 measurement occasions. PANAS = Positive and Negative Affect Schedule; coefficient alphas are in italics; convergent validity coefficients are in bold;
p < .01.
At both levels of analysis, convergent and discriminant validity were assessed according to the classic criteria by Campbell and Fiske (1959). That is, we first inspected convergent validity and tested, whether the correlations between self– and peer ratings of the same constructs (monotrait–heteromethod coefficients, mThM) were statistically significant and large enough to encourage further examination of discriminant validity 2 . At the between–person level, convergent validity coefficients were significantly different from zero with substantial values of .51 for PA and .55 for NA. Reaching values of .45 for PA and .78 for NA, also the convergent validity coefficients at the within–person level were significant and large enough to warrant further inspection of the three criteria recommended to evaluate discriminant validity.
Inspecting the first criterion of discriminant validity, we tested, whether convergent validity coefficients exceeded correlations between different constructs assessed with different methods (heterotrait–heteromethod coefficients, hThM). Examination of Table 2 showed that only one comparison failed to meet this criterion at the two levels of analysis. In particular, at the between–person level, the hThM correlation between peer–rated PA and self–rated NA did not significantly differ from the mThM correlation between self– and peer rated PA (p > .05).
Campbell and Fiske's (1959) second criterion for discriminant validity requires that convergent validity coefficients exceed correlations between different constructs assessed with the same method (heterotrait–monomethod coefficients, hTmM). Again, only one comparison at the between–person level failed to meet this criterion. Specifically, the hTmM correlation between peer–rated PA and NA did not significantly differ from the convergent validity coefficients.
At this point, it should be noted that the failure of the between–person data to completely meet the first two criteria of discriminant validity might rather reflect distinctive methodical features of our ESM design than a lack of discriminant validity. In fact, although the present study exceeds common rules of thumb concerning adequate sample sizes in MLM analyses (e.g. Hox, 2002), our number of self– and peer raters might have been too small to detect differences at the between–person level with sufficient power.
According to Campbell and Fiske's (1959) third criterion of discriminant validity, we finally inspected the patterns of correlations between PA and NA at both the within– and the between–person level. According to an ideal pattern, hTmM coefficients would not exceed hThM coefficients indicating negligible effects of method specific variance. Whereas hTmM coefficients did not significantly differ from hThM coefficients at the between–person level, they were, however, significantly larger at the within–person level.
In sum, the visual inspection of the two separate MLM–MTMM matrices did largely support convergent and discriminant validity of state and trait PA and NA at both levels of analysis. However, there also were some striking differences between the correlational patterns at the within– as opposed to the between–person level pointing to distinct structural relations between PA and NA at the two levels of analysis.
In line with previous studies that have considered the specific level of analysis (e.g. Schmuckle et al., 2002), we found significant within–person correlations between the states of PA and NA, whereas trait PA and NA appeared to be unrelated at the between–person level. When interpreting this finding, it is essential to consider two specifics of our analysis: First, our multivariate MLM analysis allowed us to estimate the true–score correlations between the latent affect variables avoiding an artificial attenuation of correlations due to measurement error. Second, this finding could be replicated with self– and peer ratings and also proved to be robust across these rating modes. In essence, though PA and NA turned out to be independent at the between–person level, they were negatively correlated at the within–person level.
Discussion
The present study aimed at demonstrating how to use a multivariate MLM framework to examine the psychometric quality and the underlying structure of the PANAS employed in a multi–rater ESM design. Pointing to the importance of considering both levels of analysis, the results of our MLM–MTMM analysis show that different constructs with distinct structures are measured at the between– and the within–person level, namely affective traits and states, respectively.
Having regard to this level–specific structure, the results of our psychometric examination suggest that the self– as well as the peer rating version of the PANAS proved to differentiate reliably both between and within persons. However, perfectly exemplifying the importance of a level–specific psychometric examination in ESM studies, the alpha coefficients were fairly smaller at the within–person level than those at the between–person level. In more general terms, these findings nicely point out that reliability is not an intrinsic property of a scale but always a property of a scale in a certain population (of participants, measurement occasions, etc.; e.g. Laenen, Vangeneugden, Geys, & Molenberghs, 2006).
Three further aspects are critical to note with respect to the revealed within–person consistencies. First, the within–person reliabilities refer to single measurement occasions from which the stable between–person variation had already been removed.
Second, in contrast to the between–person reliabilities depending on the number of sampled occasions and the degree of intercorrelation among them, the within–person reliabilities rather depend on the number of items per scale and the degree of intercorrelation among them (Raudenbush et al., 1991). Having this in mind, it becomes clear that the comparatively small variance in NA had probably impaired the within–person reliability to a greater degree than the between–person reliability. The relatively low variation in NA can certainly be attributed to the specific setting in which the affective states were repeatedly assessed. Previous studies have shown that, compared to other settings, mean levels of NA as well as variation in NA are usually relatively low at work (e.g. Ilies et al., 2010; Miner et al., 2005).
Finally, calculating the MLM type of alpha coefficients, we decided to report well–established and widely accepted reliability estimators that can be flexibly applied to any kind of composite affect measure (Osburn, 2000). However, coefficient alpha, like any other statistic, is not without problems. One aspect to keep in mind with our results is certainly the fact that alpha must be considered a lower bound to reliability, as it tends to underestimate the true reliability of a scale (e.g. Osburn, 2000; Sijtsma, 2009). 3
Beyond level–specific reliabilities, we also examined the convergent and discriminant validity of the PANAS as an ESM measure by means of the level–specific MLM–MTMM matrices. In fact, this is the first rigorous analysis to show that the PANAS is a reasonable ESM instrument for assessing the constructs of PA and NA at both the within– and the between–person level with sufficient convergence. Reaching values of about .50, the revealed self–peer correlations clearly exceed the magnitude of convergence coefficients usually reported in multi–rater studies on affective experiences (e.g. Funder, 1989; Lucas et al., 1996; Watson & Clark, 1991). These fairly large convergence coefficients might reflect the aforementioned advantages of ESM designs and result from the reduction of retrospection and other judgment bias that are usually associated with traditional one–time questionnaires.
The seemingly failure of our ESM data to completely meet the criteria of discriminant validity can be understood by considering the distinctive features of our multi–rater ESM design on the one hand, and the nature of the examined constructs on the other hand. An important point to keep in mind is that the subjective and internal components of affective experiences are usually covert and not directly accessible to external observers. That is, the peer raters’ ability to discriminate appropriately between the target person's PA and NA is partly impaired by the fact that their ratings are solely based on overt behavioural expressions of these states, which can be assumed to be rather restrained at the workplace (Lucas et al., 1996). Our multi–rater ESM design can thus be considered a rather conservative approach to evaluate the level–specific convergent and discriminant validities of the PANAS.
Furthermore, the classic MTMM criteria refer to a case in which the constructs being examined are assumed to be absolutely independent, so that any correlation would indicate the existence of method variance. However, Campbell and Fiske (1959) themselves emphasized the necessity to examine the particular nature of the characteristics under investigation (and whether they are assumed to be independent) before evaluating the criteria of discriminant validity (see also, Lucas et al., 1996). In the present research, we expected some relation between PA and NA, particularly at the within–person level of analysis. Hence, even if there had been absolutely no method variance at all, we would not have expected within–person correlations between PA and NA of zero.
Considering both the within– and the between–person level simultaneously and controlling for measurement error, our results thus do not negate discriminant validity, but rather confirm the assumption that the structural links between PA and NA depend on the specific level of consideration (Diener & Emmons, 1984; Schmuckle et al., 2002). Getting at the heart of the ongoing debate on the underlying structure of the PANAS, the results of our MLM–MTMM matrices suggest that PA and NA are independent at the between–person level, but negatively interrelated at the within–person level.
Please note that these findings exclusively refer to the average structure of affective traits and states at the between– and the within–person level, respectively (i.e. averaged across measurement occasions and participants). Thus, our results do not negate the possibility that there might be specific situations in which certain people are more likely to experience mixed feelings (i.e. the co–experience of oppositely poled affective states). In fact, recent research has shown that the within–person links between PA and NA can notably differ across time as well as between individuals (Rafaeli, Rogers, & Revelle, 2007).
However, the co–occurrence of PA and NA seems to be a comparatively rare phenomenon (Scollon et al., 2005). Given the situational embedding of our participants’ affect ratings, it should have been even rarer in our particular ESM study, since most situations in daily work life tend not to evoke strong ambivalent feelings but are either mainly positive or negative. Furthermore, people's limited attentional resources should margin their scope of affective experiences at any given moment. Simultaneously facing his participants with pleasant and unpleasant stimuli, Schimmack (2001) has shown that people are more likely to report mixed feelings at rather low levels of intensity. The co–occurrence of intense mixed feelings seems generally to be rare, because people concentrate their attention to the particular positive or negative experience with increasing intensity of affect.
This interpretation is in line with the assumption that PA and NA operate through two separate neurological systems which work in a mutually inhibitory way, such that the activation of the one system inhibits the other at a given point in time (e.g. Scollon et al., 2005; Watson et al., 1999). Over longer time periods, however, the inhibitory effects should vanish, so that the experience of PA or NA, respectively, is not likely to suppress the other type of affect at a later point in time. Thus, at the between–person level, the traits of PA and NA can be independent or even positively correlated.
Limitations And Future Directions
Regarding the rather strong restrictions of the multivariate multilevel regression approach chosen to estimate the MLM–MTMM model, it should be noted that multilevel structural equation procedures (e.g. Hox, 2002; Muthén, 1994) can offer more flexible and less restrictive modelling strategies. However, independent of the number of sampled within–part elements (i.e. measurement occasions), very large samples of between–part elements (i.e. participants) are necessary to obtain accurate statistics with these computationally demanding methods (Hox, Maas, & Brinkhuis, 2010). Obtaining large sample sizes is already problematic with ESM studies in general, but with multi–rater ESM designs in particular, because these studies are not only quite difficult, time–consuming and costly on part of the researcher but also impose heavy demands on the participants. Yet, not only research into affective experiences, but ESM research in general would certainly benefit from ESM studies with large participant samples allowing the use of even more advanced modelling techniques than our multivariate multilevel regression approach.
A further note should be made with respect to our participants. Given that the present findings were obtained from a sample of research assistants providing affect ratings during their workday, future research would indeed benefit from further multi–rater ESM studies on other samples of participants and measurement occasions.
Conclusion
Employing a well–established affect measure in an ESM design and using multivariate MLM techniques on multi–rater data from an adult non–student sample, the present study extends the contributions of previous affect research at least in two domains. First, our multivariate MLM framework demonstrates a flexible strategy to examine the reliability and construct validity of ESM measures at both the within– and the between–person level simultaneously. Second, pointing to the necessity to consider the specific level of analysis, the present study offers valuable clues to the structural links between PA and NA, considered as states at the within–person level and as traits at the between–person level, respectively.
With regard to the PANAS’ popularity in research as well as in applied psychology, a precise and level–specific understanding of its psychometric quality and underlying structure is an essential prerequisite from a theoretical as well as from a practical standpoint. From a practical perspective, for instance, the usefulness of clinical or managerial interventions that aim to diagnose or influence affect depends (1) on a reliable and valid measurement and (2) on a sophisticated knowledge of the structural conception of the constructs being examined. The findings of our MLM–MTMM approach perfectly illustrates that the psychometric properties as well as the structure of the PANAS depends on the particular level of consideration: Whereas PA and NA seem to be best conceptualized as two largely independent trait dimensions at the between–person level, they rather represent two poles of a single state dimension at the within–person level. We hope that this paper encourages further ESM studies focusing on affective experiences or other fluctuating characteristics to carefully evaluate the psychometric quality and underlying structure of their repeated assessment data at the specific level at issue.
