Abstract
The aim of this study was to investigate whether difficulties in forgetting (like difficulties in remembering) are associated with depressive states. First, dysphoric and nondysphoric students learned 40 word pairs, each consisting of a positive or negative adjective and a neutral noun (target). Next, the students practiced responding with some targets and suppressing others, when given the adjective as cue, for a varied number of repetitions. On the final test, they were told to disregard the prior instruction to suppress and to recall the target associated with every cue. Compared with nondysphoric students, dysphoric students recalled similar percentages of targets from sets assigned for response practice but higher percentages from sets assigned for suppression practice. The degree of forgetting showed some mood-congruent tendencies and was significantly correlated with self-report measures of rumination and unwanted thoughts.
Anyone who has experienced heartbreak, remorse, or failure can imagine the benefits of forgetting. Unfortunately, some of the people with the better reasons for forgetting might have the greater difficulty forgetting, and we predicted that depressed people would be among them. This claim might seem counterintuitive, considering that depression is associated with memory impairment (see the meta-analysis by Burt, Zembar, & Niederehe, 1995), but it is counterintuitive only when forgetting is understood as passive memory failure. In the case of intentional forgetting (valued forgetting), our intuitions anticipated trouble. Success in intentional remembering and forgetting should both rely on procedures of controlled attention, and depression is associated with difficulties in attentional control in various phases of memory experiments (see Hertel, 2000). From a different angle, consider that depressed and dysphoric participants report more intrusions during attempts at thought suppression than do their nondepressed counterparts (see Wenzlaff & Wegner, 2000). They also report spending more time in uncontrolled rumination, which exacerbates their depressive feelings (see Nolen-Hoeksema, 2000). Therefore, reducing the chance that certain memories will come to mind might be an elusive but valuable cognitive skill in depression, especially when the memories are unhappy ones.
Howell and Conway (1992) selected both unhappy and happy autobiographical memories for thought suppression and then noted intrusions while the participants thought aloud. Unlike their neutral-mood counterparts, both dysphoric and sadness-induced participants experienced more intrusions related to the unhappy memory than to the happy one (also see Roemer & Borkovec, 1994). These findings suggest that depressed people might have exaggerated difficulty in intentionally forgetting negatively toned events—a prediction that we call the mood-incongruent-forgetting hypothesis. Forgetting might be more successful for incongruent events.
We sought evidence regarding both depressive deficits in intentional forgetting and mood-incongruent forgetting in an experiment modeled on the procedures used by Anderson and Green (2001). Evidence concerning the effectiveness of instructions to forget has been obtained in a variety of paradigms (see Anderson & Neely, 1996). We chose the one used by Anderson and Green because, much like everyday experience, it does not specify strategies or provide external support for suppression (in the form of other things to think about). Depression-related impairments in remembering are usually obtained under conditions of unspecified strategies and poor external support (e.g., Hertel & Rude, 1991; see Hertel, 2000), so the same should be true for impairments in forgetting.
In the experiments reported by Anderson and Green (2001), participants first learned unrelated word pairs to a specified criterion of accuracy in responding with the second word when cued with the first. Next, they practiced responding when given some cues and suppressing when given others. In the latter case, they were instructed not to think about the previously learned response while viewing the cue, which was presented 0, 1, 8, or 16 times. In the baseline (0) condition, the materials were presented only in the learning phase and on the final test. On that test, participants were asked to recall the correct response word for each cue, disregarding the prior suppression instructions, and greater amounts of practice in suppression produced more forgetting. In the present experiment, depressive deficits in forgetting on the final test would be revealed if the depressed group recalled a greater percentage of suppressed targets than the nondepressed group and showed less effect of suppression relative to baseline performance.
Our adaptation of Anderson and Green's (2001) paradigm included a major modification: the use of adjective-noun pairs in place of unrelated nouns. The pairs were constructed by Hertel and Parks (2002) as part of a method for varying the emotional valence of neutral nouns (e.g., gloomy cottage vs. romantic cottage, funeral dress vs. wedding dress). For the present experiment, we chose this method of varying emotion instead of the method of selecting nouns with emotional meaning. The latter tend to be conceptually related to one another in ways that might confound the effects of practice in suppression or remembering, whereas the nouns we used were conceptually more distinct from one another. Hertel and Parks (2002) found that the positive and negative pairings produced self-referential images that were judged more emotional, in the appropriate directions, than the images for neutral pairings. Therefore, these pairs seemed suitable for examining the mood-incongruent-forgetting hypothesis. Evidence supporting this hypothesis would be provided by greater ease in forgetting nouns that had been imbued with the emotional meaning opposite to participants' emotional states. In particular, depressed participants might successfully suppress and therefore forget positive pairs more easily than negative pairs, because of habits of ruminating about negative events (see Hertel, in press). In the latter regard, we also evaluated the relation of intentional forgetting in this experimental setting to participants' reports of their more general ruminative habits.
METHOD
Materials
Word pairs
Forty nouns were selected from those used by Hertel and Parks (2002). All nouns were four to seven letters long, with concreteness and imageability ratings greater than 5 (on 7-point scales from Paivio, Yuille, & Madigan, 1968), emotionality ratings less than 4, and goodness ratings between 3 and 5 (on 7-point scales from Rubin & Friendly, 1986). Those characteristics and frequency of occurrence (Kuçera & Francis, 1967) were used to distribute the nouns in a balanced fashion into eight sets of five nouns each, four sets each to be assigned to the practice of suppressing or responding (see Table 1). Depending on condition, each noun was accompanied by either of two adjectives—one that lent positive emotion and one that lent negative emotion to the noun (e.g., exciting vs. depressing book, cozy vs. electric chair, esteemed vs. failing paper; materials are available via e-mail to the first author). Emotion ratings for both positive and negative pairings (on a 9-point scale ranging from extremely positive to extremely negative) were also used in balancing sets. Filler items consisted of 10 additional neutral-adjective/noun pairs from the same pool.
Means for characteristics of the target nouns
Note. All analyses of differences between the two groups of sets (1–4 vs. 5–8) and across sets within groups (means not shown) revealed nonsignificant differences, p > .50. The rating scales ranged from 1 to 7, except for the scale for emotional valence of paired items (1–9).
Questionnaires
The Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) was used to assess the participants' level of dysphoria (nondiagnosed negative affect). To measure self-reported rumination, we used the Rumination on Sadness Scale (RSS; M. Conway, Csank, Holm, & Blake, 2000). The RSS consists of 13 statements about ruminative activities (e.g., “I repeatedly analyze and keep thinking about reasons for my sadness”), and respondents are asked to rate how often they do what each statement describes, when they are feeling sad, down, or blue. The White Bear Suppression Inventory (WBSI; Wegner & Zanakos, 1994) is a self-report measure composed of 15 statements intended to assess the experience of unwanted thoughts. The WBSI asks participants to rate their agreement with statements such as, “There are things I prefer not to think about.”
Participants and design
Students were invited to participate on the basis of their BDI scores (administered in introductory psychology classes), but they were unaware of this connection. 1 The final sample consisted of 16 male and 16 female dysphoric students, and the same distribution of nondysphoric students. The mean score on the BDI was 16 in the dysphoric group (indicating a moderate level of dysphoria; SD= 8.48, range: 9–34) and 2 in the nondysphoric group (SD= 2.18, range: 0–6).
In each BDI-gender group, half of the students were randomly assigned to suppress targets associated with positive cues and respond with targets associated with negative cues, and the other half did the opposite. Within each of those subgroups, 1 student was assigned to each of eight counterbalancing conditions, established by the rotation of two groups of word pairs (four sets each) across instruction (respond vs. suppress), and within those groups, four sets of pairs across the number of cue presentations (0, 1, 8, and 16).
Procedure
Learning phase
In the first task, each word pair (words separated by approximately 3.8 cm) was presented in black in the center of a light computer screen for 6 s. (All tasks were controlled by Superlab Pro.) Attempting to infuse the trials with some degree of personal emotional meaning, we instructed participants to create a self-referential mental image for each pair (e.g., “Imagine yourself walking along a sandy beach”). With the offset of the pair, the monitor displayed an instruction to rate the meaningfulness of the image on a scale from 1 (not meaningful at all) to 5 (very personally meaningful). The participants responded aloud at their own pace, and the experimenter keyed in the rating and started the 600-ms intertrial interval (ITI). Pairs were ordered in randomized blocks of nine pairs (one pair from each of the eight sets, plus one filler), and that order remained constant across participants. Two additional filler pairs were placed at the beginning of the list, two at the end, and one between the second and third blocks.
Next, learning was assessed on tests of cued recall. Each cue word was displayed in the center of the screen for 5,200 ms (or less if the participant responded sooner), and the participant was instructed to recall the corresponding target and report it aloud as quickly as possible. After a delay of 200 ms, the correct response was displayed in blue font for 2,000 ms, as feedback, and followed by an ITI of 300 ms. If fewer than 50% of the responses on the first assessment were correct, another test was administered (for a maximum of three cycles; all participants achieved the criterion by this point). For the first assessment, item order was identical to the initial presentation order; items were randomly rearranged within blocks for the second and third cycles. Throughout, no pair was followed or preceded by a particular pair more than once, and no more than two cues of the same valence appeared in succession.
Suppression-training phase
Participants were then told that, in the next part of the experiment, they would be shown cue words; sometimes we would ask them to respond as they had during the learning assessment, and at other times (suppression) we would ask them to maintain attention to the cue but avoid saying or thinking about the target. The training block consisted of 32 trials of the 10 filler pairs from the learning phase (including repetitions). Only one cue was denoted as a cue for suppression, and it appeared eight times. Otherwise, the procedure was like the one used in the main suppression phase.
Suppression phase
Following training, the monitor displayed a list of 15 cue words corresponding to the targets to be suppressed; they were all positive or all negative, depending on the assigned condition of suppression valence. Following a 2-min familiarization period, participants were required to identify all 15 cues from a list that included 14 additional adjectives of the same valence (a criterion achieved by all after no more than four attempts).
In the main suppression phase, cues for responding 1, 8, or 16 times and cues for suppressing 1, 8, or 16 times were presented for a total of 250 trials. In addition, 127 trials displayed cues for responding with the nine filler targets (excluding the target suppressed during practice), in order to create an overall tendency to respond. The 377 trials were randomly ordered and separated by a 400-ms ITI. At the start of each trial, a series of small crosses appeared in the center of the screen for 200 ms. Next, the cue appeared (centered) for 3 s, or less if the participant responded earlier. On response trials, the participant was instructed to recall the target aloud as quickly as possible. When the participant responded incorrectly on a response trial, the correct target was displayed in blue for 500 ms. Any response to a cue for suppression initiated the display of very large red Xs.
Final test phase
Participants were asked to recall the associated target for each cue, regardless of prior instructions. All 40 cues (plus 4 filler cues at the start) were individually presented in the center of the screen for 4 s, or less if the participant responded sooner. Each cue was preceded by a 200-ms display of crosses and followed by a 400-ms ITI. No feedback was given. The 40 cues were ordered in randomized blocks of 8, with 1 cue from each of the eight sets in each block.
After the final test, we asked the participants to fill out the three questionnaires: BDI, RSS, and WBSI (in that order). The purpose was described as unrelated to the experiment.
RESULTS AND DISCUSSION
The percentages of targets recalled on the final test were submitted to a mixed-design analysis of variance, with between-subjects factors for group (nondysphoric vs. dysphoric) and the valence of the cues for suppression (positive vs. negative). Within-subjects factors included type of instruction during the suppression phase (suppress vs. respond) and the number of times that the cues were presented (0, 1, 8, or 16). (To reduce error variance, we included a between-subjects factor for the eight counterbalancing conditions; those effects are not reported.) The significance level was set at .05. Gender was included as a factor in initial analyses but was removed because it entered into no significant interactions (p > .15). Significant main effects are not reported for factors contributing to significant interactions.
Dysphoria-related differences
The most important result from the overall analysis was the significant interaction of group and instruction, F(1, 32) = 7.62, MSE= 198.44, p < .01. Figure 1 shows that group differences were not found in the recall of responded targets (M= 92% for both groups). However, compared with nondysphoric participants, dysphoric participants recalled significantly more targets from the four suppression sets (M= 75% vs. 81%), F(1, 32) = 5.94, MSE= 442.19, p < .03. 2

Mean percentage of targets recalled as a function of number of cue presentations for suppressing or responding. Error bars represent one standard error above and below the mean.
The interaction of group with instruction was not significantly qualified by the number of cue presentations, p > .10. However, the nondysphoric participants showed less difference in recall between suppressed nouns in the baseline condition and suppressed nouns associated with cues presented 16 times (M= 73% vs. 76% baseline) than did the dysphoric group (M= 82% vs. 74% baseline), F(1, 32) = 4.19, MSE= 215.62, p < .05.
The interaction of group with instruction was also not significantly qualified by the valence of the suppression cues, p > .30. However, the suppression effect (baseline vs. 16 cue presentations) significantly interacted with group and suppression valence, F(1, 32) = 5.23, MSE= 215.62, p < .03. In the nondysphoric group, the suppression effect depended on the valence of the cues, F(1, 16) = 7.00, MSE= 175.00, p < .02. Figure 2 shows the trend for positive cues (only) to produce below-baseline suppression, F(1, 8) = 4.76, MSE= 212.50, p < .06. In the dysphoric group, the suppression effect did not significantly depend on the valence of the cues, p > .40. Instead, there was a marginally significant trend for targets from both valences to be recalled more often after suppression practice (16 cue presentations) than in the baseline condition, F(1, 16) = 4.12, MSE= 256.25, p < .06.

Mean percentage of targets recalled as a function of number of cue presentations for suppressing or responding and the valence of the cue. When participants suppressed targets associated with positive cues, they responded with targets associated with negative cues (and vice versa). Error bars represent one standard error above and below the mean.
Taken together, these valence-related outcomes do not support the mood-incongruent-forgetting hypothesis. Nondysphoric participants showed some indication of mood-congruent forgetting (below baseline recall), whereas dysphoric participants tended to show evenhanded forgetting failure (above baseline recall, compared with the nondysphoric group).
Replication of anderson and green (2001)
The effect of instruction (respond vs. suppress) increased with the number of cue presentations, F(3, 96) = 9.42, MSE= 150.52, p < .001. Approximately 94% of that interaction variance was accounted for by the linear trend across number of presentations being greater for responded targets than for suppressed targets, F(1, 32) = 20.61, MSE= 194.06, p < .001. As previously described, however, the evidence for below-baseline suppression was weak (the trend with positive cues in the nondysphoric group), in contrast to the negative slope typically found by Anderson and Green.
The scanty evidence of below-baseline suppression likely reflects the use of related cue-target pairs in this experiment, in contrast to the unrelated pairs used by Anderson and Green. The power to remind repeatedly should arguably be harder to counteract for related than for unrelated cues, especially considering the self-referential nature of initial processing in the learning phase.
Another departure from the pattern obtained by Anderson and Green was the significant simple main effect of instruction within baseline items, F(1, 32) = 7.68, MSE= 146.88, p < .01. The cues for these items were not exposed during practice, yet they shared the valence of cues that led to responding or suppressing. Therefore, valence itself might have served as an implicit cue for suppression on the final test.
Other individual differences
Table 2 reports Pearson correlation coefficients involving scores on the two self-report measures (RSS and WBSI) and two measures of experimental suppression: the number recalled from suppressed sets and the instruction effect (recall difference between responded and suppressed sets). The instruction effect was larger when participants reported lower levels of rumination while sad (RSS score). More suppressed targets were recalled by participants in the dysphoric group, by those reporting more trouble with unwanted thoughts (WBSI), and by those reporting more rumination during sad periods (RSS, although this correlation was only marginally significant, p < .051). The two self-report measures were also significantly correlated with BDI group; dysphoric students reported having experienced more rumination during sad periods and more trouble with unwanted thoughts. Of course, the dysphoric students might simply have been more currently aware of these thought patterns or more inclined to report them. However, the results of an analysis in which we divided the students into high- versus low-RSS scorers are not consistent with this interpretation. The high- and low-RSS groups consisted of the higher scorer and lower scorer, respectively, in each cell of the counterbalancing design. Regardless of BDI group, high RSS scorers forgot fewer targets from suppressed sets than low RSS scorers did, as shown in Figure 3. A t test revealed a nonsignificant difference between high- and low-RSS groups in mean BDI scores, p > .30. Analysis of variance revealed a significant interaction of the instruction effect with RSS group, F(1, 54) = 4.68, MSE= 37.00, p < .04. (Similar differences were not significant for WBSI categories.)

Mean percentage of targets recalled by dysphoric and nondysphoric high and low scorers on the Rumination on Sadness Scale (RSS), according to whether the targets belonged to the suppressed or responded sets.
Pearson correlation coefficients between measures of suppression and self-reported ruminative thoughts
Note. The grouping code (1 = nondysphoric, 2 = dysphoric) was used instead of the actual Beck Depression Inventory (BDI) score, because the BDI distribution was forced to be nonnormal. RSS = Rumination on Sadness Scale; WBSI = White Bear Suppression Inventory; suppressed recall = number of suppressed targets recalled on the final test; instruction effect = number recalled from responded sets minus number recalled from suppressed sets. The numbers in parentheses denote the n for each correlation; a few participants failed to complete either the RSS or the WBSI.
∗ p < .05.
∗∗ p < .01.
∗∗∗ p < .001.
GENERAL DISCUSSION
Our main contribution to the literatures on intentional forgetting and thought suppression is the finding that dysphoric students (compared with nondysphoric students) forgot fewer targets after having been instructed to suppress them. Most likely, their efforts to suppress were less successful than the efforts of nondysphoric students, a difference documented in experiments on thought intrusions (e.g., Howell & Conway, 1992). Investigators in those experiments measured intrusions during attempts to suppress, whereas we demonstrated consequences for later forgetting—the ultimate goal in some respects.
The implied deficit in suppression might be reinterpreted more simply as a lack of motivation to follow instructions. If that were true, however, it would seem that the dysphoric students should also have been poorly motivated in the learning and final recall phases. To the contrary, they performed as well as the nondysphoric students on the learning assessments; both groups recalled an average of 24 targets on the first assessment, and they both required on average 1.3 assessments to reach criterion. Dysphoric students rated their images in the learning phase to be at least as meaningful as did the nondysphoric students (M= 2.9 vs. 2.7, respectively). They reached the criterion for recognizing suppression cues in at least as few trials (M= 1.4 by dysphoric students vs. 1.8 by nondysphoric students). And, of course, they showed better performance on the final test.
Rather than emerging from deficient motivation, suppression difficulties are better understood as emanating from deficient attentional control, as observed with psychophysiological measures of frontal function (see Davidson, 2000), as well as clinical tests (Channon, 1996). Whether the difficulties we have documented are related to impaired inhibition (a mechanism specific to the target representation) in contrast to poor attentional focus is uncertain, given that we did not include an independent-probe test of the sort used by Anderson and Green (2001). Nevertheless, this limitation does not reduce the importance of knowing that deficient forgetting is related to impaired control of some sort, even the sort that goes by the name of distractibility.
A subsidiary outcome of our study was the set of significant correlations between self-reports of unwanted thoughts and rumination on the one hand and experimental evidence of deficient forgetting on the other. The experimental experience itself might have influenced self-reports (and therefore prescreening should be done in future experiments), but if it did, participants would have shown remarkable sensitivity to difficulty, given that everyone recalled a majority of the suppressed targets. Another possible way to understand the correlation between self-reports and experimental measures is to suggest that it was mediated by level of dysphoria, because both the RSS scores and the experimental measures were also correlated with the BDI grouping variable. In this regard, however, it is notable that the high-RSS group, who reported more real-life rumination, had more trouble forgetting items to be suppressed than the low-RSS group did, but did not produce significantly higher BDI scores.
One final matter: The dysphoric students' deficient forgetting was not exaggerated by negative valence, as predicted by the hypothesis of mood-incongruent forgetting. Instead, there was evidence of mood-congruent forgetting by nondysphoric students. Interestingly, the literature on mood-congruent remembering often shows evenhanded recall by dysphoric students (as we also have shown) and negatively biased recall by clinically depressed participants (see Matt, Vazquez, & Campbell, 1992). Some evidence of poor forgetting of negative material has been found in a directed-forgetting paradigm (Power, Dalgleish, Claudio, Tata, & Kentish, 2000, Experiment 3). Conclusions about congruence are therefore premature. However, one thing seems certain: Evidence that unwanted memories can be suppressed in paradigms like Anderson and Green's (2001) cannot yet be generalized to emotional situations and emotionally disordered people in any straightforward way (cf. M.A. Conway, 2001).
Footnotes
1. Students with scores of 7 and 8 were not asked to participate in order to minimize group overlap. The data from 15 participants whose scores on the end-of-session BDI fell outside the initial categories (0–6 vs. 9 or higher) were replaced, as were the data from 3 dysphoric students whose BDI scores were just above the cutoff (by using data from students with higher scores who participated in an aborted second rotation).
Melissa Gerstle is now in the Clinical Graduate Program, Psychology Department, University of New Mexico.
Acknowledgements
We acknowledge the assistance of Shelley Ritter, and thank Michael Conway and Tim Dalgleish for comments on an earlier version of the manuscript. Michael Anderson generously provided detailed information on procedures, helpful comments, and a suggested method for RSS analyses.
