Abstract
Exposure to uncontrollable outcomes has been found to trigger learned helplessness, a state in which the agent, because of lack of exploration, fails to take advantage of regained control. Although the implications of this phenomenon have been widely studied, its underlying cause remains undetermined. One can learn not to explore because the environment is uncontrollable, because the average reinforcement for exploring is low, or because rewards for exploring are rare. In the current research, we tested a simple experimental paradigm that contrasts the predictions of these three contributors and offers a unified psychological mechanism that underlies the observed phenomena. Our results demonstrate that learned helplessness is not correlated with either the perceived controllability of one’s environment or the average reward, which suggests that reward prevalence is a better predictor of exploratory behavior than the other two factors. A simple computational model in which exploration decisions were based on small samples of past experiences captured the empirical phenomena while also providing a cognitive basis for feelings of uncontrollability.
Keywords
Seligman and Maier (1967) established that exposure to uncontrollable events impairs future exploration and learning. This learned-helplessness phenomenon was originally documented in their classic experiment, in which dogs that had no control over the shocks they received stopped exploring their environment, even after control was reinstated (Table 1, left-hand column). These findings were later replicated in studies of human problem solving (Table 1, middle column; for an example of one such study, see Hiroto & Seligman, 1975). Subsequent research shows that the understanding of learned helplessness can shed light on many important behavioral and social problems, including depression (e.g., Alloy et al., 1999; Clark & Beck, 1999), failure in school (Diener & Dweck, 1978), burnout (Maslach & Jackson, 1982), staying with an abusive spouse (Walker, 1977), maladaptive behavior in organizations (Martinko & Gardner, 1982), poverty (Evans, Gonnella, Marcynyszyn, Gentile, & Salpekar, 2005), and many others (for a review, see Evans & Stecker, 2004).
Comparison of Three Learned-Helplessness Paradigms
The common explanation for the empirical learned-helplessness effect focuses on perceived uncontrollability. According to this explanation, when experiencing absence of control, the organism perceives independence between actions and outcomes, becomes helpless, and ceases exploration of the environment. This behavior is later overgeneralized to other situations in which exploration is beneficial.
An alternative account for the empirical learned-helplessness effect involves reward-based considerations: Because absence-of-control manipulations inherently reduce the average reward from exploration, agents (regardless of their perceptions of control) might give up too early simply because they have learned that exploration is, on average, not rewarding enough (Alloy & Abramson, 1979; Cohen, Rothbart, & Phillips, 1976; Jardine & Winefield, 1981; Man, 2010).
However, recent studies suggest that people (and other animals) are more sensitive to frequent reinforcement rather than to the average reinforcement. As we demonstrated previously (Teodorescu & Erev, 2014), people do not explore enough when the prevalence of desirable outcomes from exploration is low, even if the average reinforcement they obtain favors exploration. Thus, the critical factor in learned-helplessness experiments might be the most frequent outcome resulting from exploration, rather than either the average outcome or feelings of uncontrollability.
To clarify the differences among these three explanations, consider a bachelor who has dated unsuccessfully and has decided to quit actively searching for true love. According to the perceived-uncontrollability explanation, the bachelor reduced his exploration efforts because he perceived an independence between his efforts to find his true love and the quality of his dates. In contrast, it is possible that the bachelor does in fact perceive a contingency between his actions and their outcomes (e.g., he noticed that investing more effort resulted in tougher rejections) but judges exploration to be too costly. According to the average-reinforcement explanation, insufficient exploration will then be evident if the environment changes so that exploration becomes beneficial, but the bachelor who ceased engaging in exploration will be oblivious to this change. Finally, according to the reward-frequency explanation, because the most frequent outcome drives the bachelor’s behavior, repeated disappointments can lead him to explore insufficiently even if the environment is stable and exploration is reinforcing on average (e.g., because of extremely rewarding but rare dates), and even if he feels in control. Notably, the three accounts also posit different strategies for effective intervention. The bachelor could be driven to resume exploration by convincing him that he has control (e.g., by making evident the direct effects of his actions), by changing his average experience with exploration (e.g., by paying someone to ensure he will experience a highly rewarding date), or by mildly changing his most frequent experience with exploration (e.g., by instructing him how to use the Internet to reduce his frequent cost of searching).
To determine what led the disappointed bachelor to give up, the influence of each of the above factors must be separated. The problem is that in previous studies of learned helplessness, the three explanations (perceived uncontrollability, average reinforcement, and frequent reinforcement) are confounded: An absence-of-control manipulation inherently reduces the average and frequent rewards from exploration and can simultaneously reduce perceptions of control. Thus, all three explanations predict insufficient exploration following an absence-of-control manipulation. Further, in many learned-helplessness experiments, researchers did not measure perceptions of control and therefore were unable to explicitly examine the common perceived-uncontrollability explanation (Costello, 1978). Moreover, these studies measured only the outcome of exploration efforts (whether or not agents solved the task at hand) without providing a direct measure of exploration.
The purpose of the present research was twofold. First, we developed an experimental exploration paradigm that allowed independent manipulation of the average payoff from exploration and the most frequent payoff from exploration. We then replicated the classic learned-helplessness effect while simultaneously measuring perceived controllability, as indexed by a novel indirect measure based on a prediction task. Our results strongly support the notion that exploratory behavior is sensitive to reward prevalence above and beyond both perceived controllability and the average outcome. Finally, we demonstrated that a simple sampling mechanism can account for exploratory and perceived controllability findings, providing a causal basis for both phenomena.
Method
Participants
One hundred twenty Technion students (52 female, 68 male; average age = 24 years) participated in the experiment in return for a performance-based payment. 1 The experiment lasted about 15 min.
Procedure
Exploration task
All participants first completed the exploration task, in which they were shown a grid of 120 square, light-gray keys arranged in a 12 × 10 pattern and asked to press one (Fig. 1; see also Table 1, right-hand column). After they selected a key, the trial’s payoff was presented. The key’s color then changed from light to dark gray and remained that color until the end of the task. On each subsequent trial (total trials = 100), participants had to select 1 of the 120 keys. This information was provided before the beginning of the task, but no further instructions were given.

Timeline of the exploration task in the current experiment. At the start of each trial, the subject selected a key by clicking on it with the mouse (screens labeled “A”). The trial’s payoff was presented on the selected key for 1 s (screens labeled “B”). The first time a key was selected, its color irreversibly changed from light to dark gray. The final screen presents an example of how the keys might be shaded at the onset of Trial 11. In this example, eight new keys were selected, which represents an 80% exploration rate.
In each trial, exploration of a new key resulted in a reward of 11 points with probability p, and 0 points otherwise. In addition, there was a cost for exploration: Each time a new key was selected, 1 point was reduced from the trial’s payoff. Thus, when participants explored a new key, the trial’s final payoff was 11 − 1 = 10 if a reward was obtained (with probability p) and 0 − 1 = −1 otherwise (with probability 1 − p). In contrast, selection of dark keys (those that had been chosen at least once in previous trials) always resulted in no reward, and no exploration cost was implemented (the trial’s final payoff in these cases was 0). Participants were not told the reward structure (including the cost of exploration); they were told only their final payoff in each trial. However, they became familiar with the reward structure through their experience with the task.
Experimental conditions
Participants were placed into two control groups: the with-control group, who played the game as outlined above, and the yoked group, who experienced an absence-of-control setting in the first 50 trials but regained control in the last 50 trials. Each participant in the yoked group was matched with a participant from the with-control group. Under the absence-of-control setting, yoked participants received a reward only if the with-control participant they were matched with got the reward on the same trial. Thus, whether yoked participants obtained the reward in the first 50 trials was independent of their actions. Yet even in the absence-of-control setting, the cost of exploration (−1) was still subtracted from each exploratory trial’s payoff (the exploration cost was implemented for both groups in all conditions). For example, if the participant he or she was matched with explored a new key and received a reward, the yoked participant received 10 points if he or she also explored a new key but 11 points if he or she selected a previously selected key. If the participant he or she was matched with did not receive a reward (regardless of his or her action), the yoked participant would receive 0 points if he or she selected a familiar key and −1 if he or she explored a new key.
There were three reward frequencies from exploration: extremely low (p = .1), moderate (p = .2), 2 and extremely high (p = 1). The frequency of reward from exploration was manipulated between participants, such that the probability of reward remained constant for each participant across all 100 trials. This resulted in a 3 (reward frequency) × 2 (control group) between-participants design that yielded six groups of 20 participants each. 3 For these frequency values, in the with-control group, the expected return from exploration of new keys was positive (p × 10 − (1 − p) × (−1) > 0 for all ps > .09); thus, exploration of new keys was always the optimal strategy. In contrast, because obtaining a reward under the absence-of-control setting was independent of the choice whereas the cost of exploration was not, the optimal strategy for all reward-frequency conditions was to avoid the exploration cost by selecting familiar keys.
Measures of perceived controllability
The perceived dependency between outcomes and behavior in the exploration task was measured both directly and indirectly. The direct measure was obtained by asking participants, after they completed the experiment (i.e., the exploration and the prediction tasks), to rate the degree of dependence they felt between their actions and the outcomes they received during the first half of the task and during the second half of the task. However, such direct measures have been shown to result in weaker effects than measures based on an indirect prediction task (Presson & Benassi, 1996).
Therefore, to generate the indirect measure, all participants completed 16 prediction trials after completing the exploration task. In the prediction task, they were presented with a screen of another player who was already a randomly determined number of trials (t) into the game. Participants were asked to predict the probabilities of each of four possible outcomes, given a particular choice on trial t + 1. Half of the trials were sampled from Trials 1 through 50, and half from Trials 51 through 100. Perception of control was estimated from the variance between predictions of reward probabilities (the factor over which yoked participants did not have control) over different types of actions. Low variance corresponds to low perceived contingency between actions and rewards. To illustrate the idea behind this measure, imagine a participant who predicted the same probability of rewards for all trials (although different actions were taken)—the variance among different trials would be zero, which indicates the lowest controllability score. In contrast, if the expected frequencies of reward were varied across trials, the variance would be positive and would reflect the perceived dependency between different actions and the probability to obtain a reward. For further details about the prediction task and the indirect measure, see Methodological Details in the Supplemental Material available online.
Results
The exploration task
The top row of Figure 2 presents exploration rates (the percentage of trials in which participants tried new keys) across 4 blocks of 25 trials each. To further examine exploration rates, we conducted a 4 (block: 1, 2, 3, 4) × 3 (reward frequency: extremely low, moderate, extremely high) × 2 (control group: with-control, yoked) repeated measures analysis of variance (ANOVA), which revealed a significant three-way interaction, F(6, 342) = 3.35, p < .01, η p 2 = .05. A post hoc Tukey’s test of exploration rates in the two control groups revealed that when the frequency of rewards from exploration was extremely low (p = .1), exploration rates decreased from about 70% in the first block to approximately 40% in the last block for both the with-control (p < .01) and the yoked (p < .01) groups. However, when reward frequency was moderate (p = .2), with-control participants continued to explore in about 70% of the trials, whereas yoked participants reduced their exploration rates to approximately 40%. This difference between the groups was evident in the second block (p < .01) and remained after yoked participants regained control (p < .01 and p = .09 for the third and fourth blocks, respectively). Finally, when reward frequency was extremely high (p = 1), yoked participants explored less than with-control participants in the second block (p < .01); however, this gap disappeared immediately after yoked participants regained control (ps > .9 for the third and fourth blocks). In summary, the classic learned-helplessness pattern was observed only when the reward frequency was moderate.

Results from the analyses on exploration rates (top row), the indirect measure of perceived controllability (middle row), and the direct measure of perceived controllability (bottom row). For exploration rates, the mean percentage of trials in which participants tried new keys is shown as a function of block and group. The dashed line indicates the point at which the yoked group transitioned from an absence-of-control to a control setting. For the indirect measure of perceived controllability, the mean variance of predicted reward probabilities in the first half (Blocks 1 and 2) and the second half (Blocks 3 and 4) of the exploration task is presented as a function of group. For the direct measure of perceived controllability, participants’ mean rating of the dependency they felt between their actions and the outcomes is shown separately for the first half and the second half of the task. For each analysis, results are shown separately for each of the three reward conditions. Error bars represent ±1 SEM.
Indirect measure of perceived controllability
A 2 (first vs. second half of the task) × 3 (reward frequency) × 2 (control group) repeated measures ANOVA with the indirect measure of perceived controllability as the dependent variable revealed that the influence of the absence-of-control manipulation experienced by the yoked group was not equal across different reward frequencies, F(2, 114) = 10.77, p < .0001, η p 2 = .16. As can be seen in the middle row of Figure 2, there was no significant difference between the two control groups when reward frequencies were extremely low and when reward frequencies were moderate; this result was confirmed in a post hoc Tukey’s test. However, when the reward frequency was extremely high, perceived controllability was significantly greater in the with-control group than in the yoked group (p = .0001). Additionally, perceived controllability increased between the first and the last half of the task in both groups, F(1, 114) = 6.88, p < .01, η p 2 = .06.
Direct measure of perceived controllability
Next, we examined the direct measure of perceived controllability: self-reports (bottom row of Fig. 2). Participants’ ratings of the dependency between actions and outcomes increased from the first half to the second half of the task in all conditions (i.e., there was a main effect of block), F(1, 144) = 15.63, p = .0001, η p 2 = .12; however, all other effects were insignificant, which suggests that self-reports were not sensitive enough to account for observed behavioral effects.
Discussion
Previous research has demonstrated that an absence-of-control manipulation leads to insufficient exploration of one’s options. This learned-helplessness pattern has been traditionally attributed to perceived uncontrollability. However, our results reveal a causal role for the prevalence of rewards from exploration. When exploratory efforts were rarely rewarded, participants did not explore sufficiently regardless of the absence of control and despite advantageous average reinforcement. Conversely, when rewards were highly frequent, participants’ recovery from the absence-of-control manipulation was immediate, and suboptimal behaviors were not observed. Learned-helplessness patterns emerged only when the frequency of rewards from exploration was moderate. This boundary condition can account for previous difficulties in replicating the learned-helplessness pattern: For example, when finding the required response to terminate an adverse stimulus is easy (e.g., when rats run to avoid a shock), the frequency of rewards from exploration is high, and rats in both the with-control and yoked groups explore and learn to escape the shock (e.g., Hunziker & Dos Santos, 2007). Even when the learned-helplessness effect is initially observed, yoked subjects recover from the absence-of-control manipulation (e.g., Maier, 2001). In contrast, if the escape response is very complex (e.g., jumping three times over the barrier), the frequency of rewards from exploration is considerably low, and subjects in both the with-control and yoked groups explore insufficiently and fail to avoid the shock (e.g., Freda & Klein, 1976).
In addition, the observed differences in exploration behavior across conditions were minimally reflected in two perceived-controllability measures. Self-reports showed similar feelings of control across all conditions. A prediction task revealed some sensitivity to the control manipulation, which surprisingly manifested mostly in the high-frequency condition, in which the learned-helplessness effect was not observed, but not in the moderate-frequency condition, in which learned helplessness did emerge. Therefore, it seems that in the current context, learned helplessness is not causally related to perceived controllability or the average reinforcement.
A sampling account of exploration
Previous studies have shown that people tend to rely on small samples of past experiences (e.g., Hertwig, Barron, Weber, & Erev, 2004; Kareev, 2000). In line with the current results, this mechanism implies great sensitivity to the frequency of reinforcement because rare outcomes are underrepresented in a typical small sample. For example, assume that every time the bachelor mentioned in the introduction deliberates about whether or not to go on a first date, he tries to recall outcomes of past first dates but stops after accessing only three random cases. Such a small sample size will most often not include amazing but rare dates. Thus, his ongoing behavior will be guided by the more frequent yet unrewarding outcome, which includes the additional cost (both monetarily and emotionally) of dating someone new.
To examine whether a simple sampling mechanism is sufficient to account for our results, we used the explorative sampler model from our previous study (Teodorescu & Erev, 2014) to derive numerical predictions for the current setting. The model assumes that after some initial, random data collection that decreases over time, decisions are increasingly based on small samples of previous exploratory experiences (see Model Description in the Supplemental Material for a full description of the model).
The top row in Figure 3 presents the model predictions using the parameters estimated in our previous study (Teodorescu & Erev, 2014). The model provides sufficient conditions for the main results: (a) insufficient exploration in the two control groups when the frequency of rewards from exploration is rare, (b) exploration rates above 50% for the with-control group and below 50% for the yoked group when the frequency of rewards from exploration is moderate, and (c) increasing exploration rates within the yoked group after regaining control when exploration is highly rewarding.

Predictions of our explorative sampler model (Teodorescu & Erev, 2014) for exploration rates (top row) and the indirect measure of perceived controllability (bottom row). For exploration rates, the mean percentage of trials in which participants tried new keys is shown as a function of block and group. The dashed line indicates the point at which the yoked group transitioned from an absence-of-control to a control setting. For the indirect measure of perceived controllability, the mean variance of predicted reward probabilities in the first half (Blocks 1 and 2) and the second half (Blocks 3 and 4) of the exploration task is presented as a function of group. For each analysis, results are shown separately for each of the three reward conditions.
The model is also consistent with the finding that individuals with lower levels of anxiety exhibit less learned helplessness (Coyne, Metalsky, & Lavelle, 1980; Lavelle, Metalsky, & Coyne, 1979). It has been suggested that this effect results from the higher attentional resources available to individuals with lower anxiety. From the point of view of the model, more attention to the task should manifest in larger, more representative samples and lead to improved performance in general and less learned helplessness in particular.
A sampling account of perceived controllability
The model also allowed us to derive predictions for the indirect measure of perceived controllability. This was accomplished by drawing one small sample from experiences preceding each prediction trial and reporting the distribution of outcomes according to this sample. The results (Fig. 3, bottom row) capture the pattern observed in the experiment well: There were similar perceptions of control among the two groups when reward frequency was low or moderate, and these perceptions significantly increased when reward frequency was high for both groups but substantially more for the with-control group. Most important, the perceived-controllability results emerged as a property of the reliance on small samples, and no assumptions regarding the dependency between actions and outcome were used (for usage of such priors in learning models, see Huys & Dayan, 2009). The fact that a cognitive mechanism that is highly sensitive to reward frequency can account for perception of control is in line with early work on contingency judgments: Ratings of perceived controllability were highly correlated with success rates and unrelated to the actual degree of control (Alloy & Abramson, 1979; Jenkins & Ward, 1965).
Summary and implications
The observation that exposure to uncontrollable outcomes can impair future learning has led to the assumption that feeling in control is a necessary condition for effective learning. This study questions this popular assumption. It shows that the effect of exposure to uncontrollable outcomes depends on reward prevalence. This finding, like many other learning phenomena, can be the result of a tendency to rely on small samples of past experiences.
From a practical point of view, reliance on small samples implies that to enhance exploration, efforts should be made to change the frequent experience with exploration. For example, imagine a professor who tries to encourage creative thinking in his lab. When his students raise new ideas, he can either get very excited only if it is a great idea (a rare occurrence) or get mildly excited by every innovative idea regardless of its quality. Although the former strategy enhances the contingency between actions and their rewards, and could involve higher average rewards, the present study suggests that the latter strategy will be more effective in generating new ideas.
Footnotes
Acknowledgements
We thank Eitan Man for his useful comments.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This research was supported by grants from the Israel Science Foundation.
Open Practices
All data and materials have been made publicly available via Open Science Framework and can be accessed at https://osf.io/ik356/ and https://osf.io/kj3r7/, respectively. The complete Open Practices Disclosure for this article can be found at http://pss.sagepub.com/content/by/supplemental-data. This article has received badges for Open Data and Open Materials. More information about the Open Practices badges can be found at https://osf.io/tvyxz/wiki/view/ and
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
