Abstract
A large number of studies have identified trajectories of adjustment following acute and aversive life events. In these studies, a stable trajectory of positive health or resilience is almost always the modal outcome (Bonanno, 2004; Bonanno et al., 2011). Infurna and Luthar (2016, this issue) reported that they replicated findings from two early studies in which trajectories of subjective well-being were identified before and after divorce, widowhood, and unemployment (Galatzer-Levy, Bonanno, & Mancini, 2010; Mancini, Bonanno, & Clark, 2011) and then reanalyzed these data in such a way to conclude a decrease in the prevalence of resilience. In this commentary, we discuss three serious flaws in Infurna and Luthar’s claims. First, they did not actually replicate our original analyses. They used different data, time points, and parameters. Second, the model specifications in their reanalyses were not optimal because they increased variance, reduced variability in response to the stressor, and had lower entropy, indicating that their models more poorly captured unique patterns of response. Third, their reanalyses were theoretically uninformative as they minimized both group differences and overall responses to the stressor event and thus failed to identify widely acknowledged populations, such as chronic stress reactivity.
Roughly a decade ago, Bonanno (2004) challenged two strongly held but untested assumptions about adaptation to potentially traumatic and major life stressors: (a) Resilience is uncommon, and (b) people respond to life stressors in a relatively uniform manner. Traditional statistical procedures that assume homogeneity had seemed to confirm these assumptions because these procedures lump all participants into a single distribution. As a result, distinct subpopulations that suggested other patterns, such as chronic distress, recovery, and resilience, were unobservable. However, as researchers began to explore alternative methods to examine variation over time, the classic assumption about life stressors became untenable. For example, simply plotting individual trajectories of depression in a sample of bereaved individuals from before to after their loss revealed that their responses were not uniform, but rather followed qualitatively distinct trajectories with the majority of individuals showing low symptom scores before and after their loss or resilience (Bonanno et al., 2002). These were controversial findings at the time and required both replication in other populations and replication using more sophisticated modeling procedures. Since then, however, such findings have been replicated in dozens of studies, in multiple populations, using multiple outcome measures (sometimes in the same study), and by researcher’s working independently employing varied modeling procedures (Bonanno et al., 2011; Bonanno & Diminich, 2013).
Infurna and Luthar (2016, this issue) report that they replicated findings of two early growth mixture modeling (GMM) studies that identified trajectories of subjective well-being before and after divorce, widowhood, and unemployment (Galatzer-Levy, Bonanno, & Mancini, 2010; Mancini, Bonanno, & Clark, 2011). They next reanalyzed these data in such a way that to conclude a decrease in the prevalence of resilience, casting doubt on our initial findings as well as the larger body of literature. In this commentary, we describe three flaws in Infurna and Luthar’s claims: (a) They did not actually replicate the original analyses, (b) the model specifications in their reanalyses were not optimal, and (c) the results produced by their reanalyses were theoretically uninformative.
Our Original Analyses Were Not Replicated
Infurna and Luthar assert that they first replicated our original findings by conducting analyses “exactly the same as those in earlier publications” (p. 178). This assertion is inaccurate. Infurna and Luthar did analyze data from the German Socio-Economic Panel study, as we had. However, they used a different segment of this data obtained at historically different points in time; had different sample sizes and thus different participants; and used more time points, including more time points prior to the stressor. Although they claim to have replicated our findings, their solutions were substantially distinct from ours; they identified fewer classes and differed on key parameters. If we consider the job loss data (Galatzer-Levy et al., 2010), for example, our analysis indicated good fit for up to four classes, whereas their analysis indicated adequate fit only to a three-class solution and evidenced greater within-class variance.
Infurna and Luthar report that their replication used model specifications that were “identical” to our original analysis. However, their model specifications also differed in several important ways. For example, they state that we had fixed the slope parameter in each of the original analysis. However, in our job loss analysis, the slope parameter had in fact been freely estimated. In addition, they fixed the correlation (covariance) between the slope and the intercept in each analysis. However, in none of our final models was the covariance for these parameters fixed. Given that their key argument is that slight alterations to parameters can radically alter the resulting model, the lack of initial replication is important because the replication is the foundation for testing the effects of model alterations.
The Reanalyses Were Not Optimal
The primary modifications in Infurna and Luthar’s reanalysis were that they freed all slope variances with and between classes (see the Replication: Part 2 section, pp. 181–183) and created additional latent factors to represent pre- and postevent change (see the Replication: Part 3 section, pp. 183–185). Their stated goal in making these parameter modifications was to allow greater variation “among individuals within each of the trajectories” (p. 178). This is not an unreasonable aim. Indeed, any number of parameter modifications may be explored, not a priori as Infurna and Luthar incorrectly state, but as a routine phase of model development (Jung & Wickrama, 2008).
The crucial question, however, is whether parameter modifications improve fit and provide more interpretable solutions (see below). Fit statistics in GMM are relative and can only adjudicate within, but not between, models. However, entropy values are an absolute measure reflective of the degree of certainty in classification, and models with higher entropy should be favored when fit indices are similar (Ram & Grimm, 2009; Ramaswamy, Desarbo, Reibstein, & Robinson, 1993). Crucially, entropy values in Infurna and Luthar’s reanalyses dropped precipitously (.53–.62), indicating increased uncertainly and poorer classification accuracy relative to the original models. For example, entropy was .77 in the original model for job loss, but only .62 in the reanalyses. Thus, the relatively lower entropy values indicate that Infurna and Luthar’s reanalyzed models were less optimal or “fuzzier” than the models created by the original analyses.
By introducing greater within-class variation, Infurna and Luthar’s reduced sensitivity to the stressor event. GMMs with random parameters often have convergence problems. We suspect that Infurna and Luthar avoided this issue by including a greater number of pre-event time points. Although this approach may be useful in analyses of broad developmental trends, when applied to acute stressors it appears to dilute their impact. In our original analyses, the trajectories diverged by as much as 5 scale points at the time of the stressor. By contrast, the trajectories in Infurna and Luthar’s reanalysis (see their Figs. 2–4) never deviated in response to the stressor by more than 1 scale point. This is best explained by the dramatic increase in variance around the intercept in their largest class. This class formed a kind of heterogeneous, catchall group with so much intraindividual variation that its trajectory (i.e., mean across time) showed only minimal response to the stressor event.
The Reanalyses Were Theoretically Uninformative
Virtually all theorists in this area agree that both model building (e.g., selection of number of time points) and final model selections depend heavily on theoretical justification and interpretability of the results (e.g., Grimm & Ram, 2009; Jung & Wickrama, 2008; Muthén, 2004). The most informative models are those that provide “a reasonable fit to theory, or to prior research” (Feldman, Masyn, & Conger, 2009, p. 670). If nothing were known about how people responded to acutely aversive life events, alternative model outcomes such as Infurna and Luthar’s would have credence as one possible set of results among plausible alternatives. However, decades of research on acute stressors in both animal and human samples have consistently produced evidence of outcome heterogeneity, with prototypic trajectories of response such as resilience, recovery, and chronic distress commonly observed (Bonanno et al., 2011; Galatzer-Levy, Bonanno, Bush, & LeDoux, 2013). Infurna and Luthar’s reanalysis failed to identify these well-replicated trajectory distinctions, including the distinction between resilience and recovery. Indeed, as they acknowledged, their “resilient and recovery trajectories each fell within the 95% CIs of the other across all life events, suggesting, in fact, that they are not necessarily distinct” (p. 189). But if their two-class solutions were not meaningfully distinct, as they concluded, then these models are no more informative than a simple, single group averaged across time. Not only does this result render questions about the prevalence of resilience moot, it also suggests that the reanalyzed models provide no more information than what could be learned from already available methods, such as repeated-measures analyses of variance.
The picture of reality suggested by Infurna and Luthar’s reanalysis is one in which major life stressors do not exert a strong impact on any group. This outcome is both theoretically uninformative and implausible, especially in the case of bereavement, where it appears to deny that any bereaved people suffered lasting emotional damage. Although there has been controversy about the specific criteria for a grief-related diagnosis, there is little dispute that a subset of bereaved individuals, usually in the range of 10% to 15%, will typically experience a dramatic increase in symptoms (or decrease in life satisfaction) after the loss that remains unremitting for many years (Bonanno & Kaltman, 2001; Zisook & Shear, 2009). That this crucial clinical outcome, commonly known as prolonged grief or complicated grief (Horowitz et al., 1997; Prigerson et al., 2009), is not evident in Infurna and Luthar’s bereavement reanalysis only further underscores that their approach lacks face validity.
We welcome debate about the prevalence of resilience. This is the only way psychological science can move forward. There are situations, for example, when resilience is not common, such as when the stressor is prolonged or where individuals who were doing well were excluded (Galatzer-Levy et al., 2013; Hobfoll, Mancini, Hall, Canetti, & Bonanno, 2011). These exceptions drive new theory (e.g., Bonanno & Diminich, 2013; Maston & Narayan, 2012). However, when a well-replicated phenomenon is reduced or erased by procedures that minimize group variability, as in Infurna and Luthar’s reanalysis, then useful information is lost, not gained, and psychological science is impeded rather than advanced.
Footnotes
Acknowledgements
We thank Lawrence DeCarlo, Columbia University, Teachers College, Ben Porter, Naval Health Research Center, and members of the Loss, Trauma, and Emotion Lab, Columbia University, Teachers College, for their insightful comments during the drafting of this commentary.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
