Abstract
Subjects performed Sternberg-type memory recognition tasks (Sternberg paradigm) in four experiments. Category-instance names were used as learning and testing materials. Sternberg's original experiments demonstrated a linear relation between reaction time (RT) and memory-set size (MSS). A few later studies found no relation, and other studies found a nonlinear relation (logarithmic) between the two variables. These deviations were used as evidence undermining Sternberg's serial scan theory. This study identified two confounding variables in the fixed-set procedure of the paradigm (where multiple probes are presented at test for a learned memory set) that could generate a MSS RT function that was either flat or logarithmic rather than linearly increasing. These two confounding variables were task-switching cost and repetition priming. The former factor worked against smaller memory sets and in favour of larger sets whereas the latter factor worked in the opposite way. Results demonstrated that a null or a logarithmic RT-to-MSS relation could be the artefact of the combined effects of these two variables. The Sternberg paradigm has been used widely in memory research, and a thorough understanding of the subtle methodological pitfalls is crucial. It is suggested that a varied-set procedure (where only one probe is presented at test for a learned memory set) is a more contamination-free procedure for measuring the MSS effects, and that if a fixed-set procedure is used, it is worthwhile examining the RT function of the very first trials across the MSSs, which are presumably relatively free of contamination by the subsequent trials.
Keywords
The interest in the Sternberg task was first aroused by Sternberg's (1966, 1969a, 1969b) seminal papers on memory scanning. In this task, subjects typically study a list of one to six digits, letters, and words and then take a recognition test for the items they have studied in which they make a decision on whether a probe item is among the items they studied. Since then, memory scan has been an area of extensive research and controversies (Atkinson, Herrmann, & Westcourt, 1974; Atkinson & Juola, 1973; Baddeley & Ecob, 1973; Clifton, 1973; Eriksen, Eriksen, & Hoffman, 1986; Juola, Fischler, Wood, & Atkinson, 1971; McElree & Dosher, 1989; Monsell, 1978; Simpson, 1972; Swanson, 1974; Taylor, 1976; Theios, Smith, Haviland, Traupmann, & Moy, 1973; Theios & Walter, 1974; Treisman & Doctor, 1987; Van Zandt & Townsend, 1993; Wickens, Moody, & Vidulich, 1985). Numerous studies have questioned Sternberg's serial/exhaustive memory scan theory (Atkinson et al., 1974; Atkinson & Juola, 1973; Baddeley & Ecob, 1973; Clifton, 1973; Corballis, Kirby, & Miller, 1972; Eriksen et al., 1986; McElree & Dosher, 1989; Monsell, 1978; Taylor, 1976; Theios et al., 1973; Theios & Walter, 1974; Townsend, 1971, 1990; Van Zandt & Townsend, 1993). Among the most debated issues are whether the scan process is serial or parallel (performed within a system of limited processing resources; Snodgrass & Townsend, 1980; Taylor, 1976; Townsend, 1971, 1990), and exhaustive or self-terminating (Theios et al., 1973; Van Zandt & Townsend, 1993), and whether the memory-set size (MSS) effect derives from such other mechanisms as serial position (or recency) of the probe in the memory array (McElree & Dosher, 1989; Monsell, 1978), a higher probability of stimulus occurrence (e.g., Baddeley & Ecob, 1973, showed that repeated items were recognized faster; Biederman & Stacy, 1974; Monsell, 1978; Theios, 1973; Theios et al., 1973; Theios & Walter, 1974, demonstrated that stimulus frequency and sequence were determinants of reaction time, RT), or a mixture of a familiarity-based recognition and a serial search (Atkinson et al., 1974; Atkinson & Juola, 1973; Banks & Atkinson, 1974; Juola et al., 1971). Thus, after nearly half a century, the exact nature of the memory process underlying the phenomenon Sternberg demonstrated is still far from certain.
Despite the uncertainty about the underlying processes, this paradigm has been frequently used as a diagnostic tool to study other variables in memory and cognition. For example, using the Sternberg paradigm, Ferraro and Balota (1999) found that older adults displayed increases in both slopes and intercepts in memory scan compared with younger adults, and that people suffering from dementia showed these two increases and a greater tendency to engage in self-terminating (rather than exhaustive) search than did an age-matched nondemented control group. Oberauer (2001) found that elderly people were more hampered than young people in a Sternberg task by the information in an irrelevant memory list when one of two memory lists that subjects had learned was designated as relevant and the other as irrelevant. Ilan and Miller (1998) used the Sternberg task combined with a go/no-go design to demonstrate that response preparation could start in the motor cortex contralateral to the responding hand (indicated by the appearance of an electroencephalogram, EEG, called the lateralized readiness potential) even before the supposed memory scan was completed, and they concluded that human information processing operated in a continuous process with different stages overlapping with each other rather than in serial discrete stages (as suggested, for example, in Sternberg 1969b). Also, by measuring the lateral readiness potential, Miller (1998) showed that subjects could start the response preparation for a high-probability probe even before the probe was presented. Even the research and interest in this subject in its own right have persisted to fairly recent times (Burle & Bonnet, 2000; Roeber & Kaernbach, 2004; Williams, Cooper, & Hunter, 1990). Because the Sternberg paradigm as a memory research method has had such an extensive influence over the past four decades, it is important to understand how some seemingly unimportant variations in the testing procedure can give rise to very different results in this memory task.
In a typical memory scan experiment, a random set of no more than 6 single digits or letters is selected as the memory set, or the positive set, for subjects to retain in memory (Sternberg, 1969a, 1969b). The digits or letters not selected constitute the negative set. The memory of the positive set can then be tested by presenting the subjects with a probe. Subjects are asked to respond with either a “yes” or a “no” to indicate whether the probe is in the positive set. One of the important findings has been that the RT increases linearly with the MSS. Based on this fact, Sternberg (1966, 1969a, 1969b, 1975) has argued that a high-speed serial scan is performed in memory in which the probe is compared with each of the memory-set elements. Another phenomenon that Sternberg discovered is the parallelism of the positive and negative RT/MSS functions. On the basis of this finding, he (Sternberg, 1966, 1969a, 1969b, 1975) argued that a high-speed exhaustive memory scan was performed on the list of items in memory, for if the scan were not exhaustive (i.e., terminated when the target was found), the positive RT function would have a slope half the size of that of the negative RT function (since on average, the target would be located half way through the scan of the list, and the search would be terminated).
There are two procedures to probe the memorized positive set. When a learned memory set is tested by only one probe, and a different new set must be learned for another single-probe test, the procedure is called a varied-set procedure (Glass, 1984; Stadler & Logan, 1989; Sternberg, 1969a, 1975). In a typical varied-set procedure, the positive and negative items are sampled from a small and well-defined stimulus ensemble (e.g., the nine digits), and the assignment of items to a positive or negative set from one trial to the next is inconsistent. When a given memory set is probed more than once, the procedure is called a fixed-set procedure. In a typical fixed-set procedure, all memory-set items are probed, and each memory-set member is tested at least once. However, it is not uncommon to have the positive and negative set items tested repeatedly for several rounds. In the fixed-set procedure, the assigning of the items to positive and negative sets can be either consistent or inconsistent. In a consistent assignment or mapping, items that are assigned to a positive set cannot be reassigned to a negative set, and items assigned to a negative set cannot be reassigned to a positive set. In an inconsistent mapping, the same items can be assigned to a positive set in one trial block (for a particular MSS) and to a negative set in another trial block (for a different MSS).
There have been reports of absence of, or very weak, MSS effects (Diener, 1988, 1990; Stadler & Logan, 1989; Swanson, 1974), or linear, or logarithmic (bilinear, or quadratic, used interchangeably in this article) RT functions in Sternberg-type tasks across different studies or within the same studies depending on the materials or methods used. The suggested potential causes and the theoretical implications of these findings were numerous and varied and were lacking a consensus across the studies (Atkinson & Juola, 1973; Atkinson et al., 1974; Burrows & Okada, 1975; Corballis, Katz, & Schwartz, 1980; Juola, Taylor, & Young, 1974; Okada & Burrows, 1978; Simpson, 1972). The present study identified some methodological variables that reliably produced each of the above three types of MSS effects—that is, the linear, the null (or very weak), and the logarithmic (quadratic) RT functions. Also in the present study, some procedural mechanisms were identified to produce those nonlinearly increasing functions. Sternberg's original test employed a varied-set procedure and produced a linear RT function (Sternberg, 1966, Experiment 1). A literature survey seemed to suggest that many cases of deviations from the linear function were associated with the use of a fixed-set test procedure (Burrows & Okada, 1975; Corballis et al., 1980; Okada & Burrows, 1978; Simpson, 1972; Stadler & Logan, 1989; Swanson, 1974). Thus, one possible source of obtaining the logarithmic/bilinear and the flat functions (instead of a linearly increasing function) may be the fixed-set test procedures that these studies employed. This possibility was investigated in this study.
A Sternberg task involves two subtasks: one task in which to learn the positive memory set and another in which to test the learned memory set. Thus, a switch between two tasks is involved in each learning and testing cycle in the Sternberg paradigm. Research on task-switch cost (TSC) has found that when switching between two tasks is required, the process of re-adaptation to the switched-to task takes from one to four trials to complete depending on whether the switching is predictable or unpredictable (Monsell, 2003; Monsell, Sumner, & Waters, 2003). According to the concept of task switch cost, each task requires a particular procedural schema or task set, and when a switch is made from one task to another, the activation of the first task set has to be inhibited, and the activation of the switch-to task set must be restarted, a process that can cause the responses to take longer to initiate on a switch trial than on a task-repetition trial (Mayr & Kliegl, 2000; Monsell, 2003; Monsell et al., 2003; Rogers & Monsell, 1995). As Monsell (2003; Monsell et al., 2003) stated, to change tasks, some processes of task-set reconfiguration (a kind of change in mental preparation) must take place before appropriate task-specific processes can proceed. When a fixed-set procedure is used, the task-switch cost can obscure or confound the MSS effect. Unfortunately, to my knowledge, the issue has never been addressed, and no study regarding this issue has been published in the literature. In the present study, one possible mechanism of obtaining a null MSS effect in a fixed-set procedure of the Sternberg task was investigated, and an explanation based on the overlooked effect of TSC was provided. In a fixed-set test procedure, a memory set is typically tested with all the positive and negative set items for at least one round. As a result, a large memory set is tested with more probes or same-task repetitions than a small memory set, and more same-task trials offer more opportunities for the reconfiguration of task set than do fewer same-task trials. For instance, in a fixed-set procedure, a MSS of 1 involves only 2 test trials (one positive and one negative), and a MSS of 2 involves 4 test trials, which under certain circumstances are the number of trials required for the process of recovery from a task switch to complete. As such, a memory set with a size of 1 or 2 is at a disadvantage due to its mean RTs bearing a disproportional impact of task-switch cost relative to a set of a larger size.
The other confounding variable in a fixed-set procedure is unequal repetition priming effects that different MSSs can obtain. When an item is presented a second time, the response to the repeated presentation is faster than that to the first presentation, a phenomenon known as repetition priming (Ochsner, Chiu, & Schacter, 1994; Reodiger & Challis, 1992; Tulving & Schacter, 1990). Typically when a fixed-set procedure is used, all the items in the positive and negative memory sets are tested in more than one round. The strength of repetition priming is an inverse function of the length of the interrepetitions interval. In the Sternberg paradigm, the interrepetitions interval can be measured by the number of other probes that are presented between the two repeated presentations of a given item. In a smaller set, fewer other probes are presented than in a larger set following the first presentation of a certain probe before that probe is presented again. This creates a higher relative frequency (probability) and generates a stronger repetition priming effect for a probe in the smaller sets than in larger sets. Furthermore, to hold the total number of test trials constant across different MSSs, researchers often repeat the presentation cycles of probes in the smaller sets for more times than in the larger sets (e.g., Corballis et al., 1980). This procedure creates a higher total frequency of presentation for a probe in the smaller set than in the larger set and can possibly produce an additional priming effect over and above that produced by the higher relative frequency. Due to these two types of unequal frequencies in the presentation of probes, the RTs of smaller sets can drop to a much greater extent than RTs of larger sets over the course of repeated testing, which can generate an accentuated MSS effect. When an overall average of RT collapsed over multiple rounds of testing are plotted against MSS, the observed MSS effect is the result of a confluence of the effects of memory load or MSS, TSC, and repetition priming effects. The last two effects originate from the testing procedure rather than from the memory load per se.
Experiment 1
A fixed-set procedure was used in Experiment 1. The main purpose was to test the hypothesis that the null or very weak MSS effect (i.e., the mean RTs of small MSSs being no different from those of large MSSs) obtained in the first round of test in some previous studies (e.g., Stadler & Logan, 1989) is the result of a greater influence of TSC on the mean RTs of smaller MSSs than on those of larger MSSs. After the memory set was learned, the test involved the presentation of all the positive items and an equal number of negative items. The goal was to find out what the slope of the mean RTs of this one round of testing (which were the averaged RTs of all the positive or all the negative test trials in a MSS condition) was. In addition, the RT function of the very first test trials (positive or negative) was examined and was compared with the RT function of the averaged RTs of all positive or negative trials. The first test trials in a fixed-set procedure are similar to the single test trials in a varied-set design in that the response to the first test item in each MSS condition is not yet “contaminated” by the responses to the subsequent test items. Therefore, this experiment also provided some kind of a comparison between a quasi-varied-set design and a fixed-set design concerning the MSS effects that one can observe from these two subsets of data.
Method
Subjects
Thirty-four introductory psychology students at the University of Texas–Pan American (UTPA) participated in the experiment for extra course credit.
Materials, design, and procedure
The positive MSS was varied from 1 to 10. Ten common natural categories were selected from Battig and Montague (1969) category norms: four-footed animal, kitchen utensil, furniture, fruit, tool, sport, clothing, bird, vehicle, and musical instrument. 1
There are many different types of materials possible for learning and memorization for the purpose of this study. The category words were chosen for the following reasons. First, the 10 digits were ruled out because there are not enough items to make equal-sized positive and negative sets for the sets larger than 5. It was found that when the negative set size is smaller than the positive set size (as is necessarily the case when the positive set exceeds 5), subjects would switch to search the negative set instead of the positive set and as a result produce an inverted U-shaped RT/MSS function (as the positive set exceeds 5, the RT decreases rather than continues to increase; Wingfield & Branca, 1970). Since this study aimed to examine the RT for supraspan MSSs, digits failed to meet this goal. Alphabetic letters are considered to be high-frequency limited-vocabulary items (Wickens et al., 1985) and therefore not very representative of memory materials in general. Words are more representative than digits and letters and the most widely used materials in memory research (Balota et al., 2007). Category words rather than unrelated words were used because category words are semantically more homogeneous and are supposed to provide a lower chance for forming subjective organization out of them based on certain distinctive semantic features of the words (Okada & Burrows, 1978; Swanson, 1974; Tulving, 1962). It is presumably more difficult to form semantic subgroups out of semantically more homogeneous words than semantically heterogeneous words. These were the rationales behind using category words rather than other materials.
Subjects were told that they would learn sets of varying numbers of words, and their memory for these words would be tested by subjects being asked to determine whether a test word was or was not in the set of words they learned. Before the learning started, the computer displayed “Ready to learn the words? Press Enter to start”. When Enter was pressed, the first word appeared, remained in view for 2.5 s, and then was replaced by a blank screen of 1 s followed by the next word, and so on. Subjects were not shown the negative items at the learning phase. Subjects were asked to recall the items at the end of each round of presentation. They were told that they could proceed to the testing if they could recall all the items once successfully. They were also told at this point that once they started the test, they had to be at least 90% accurate. Otherwise, they would have to relearn a different set and be retested for that set (this rule was not enforced). Thus, subjects used their own judgement to decide when to end the learning and start the test. If they pressed “N”, the positive-set words were displayed one more round in a new random order. If they pressed Y, the computer displayed a test instruction telling subjects that the “z” and “/” keys were the two designated response keys (which one was the “yes” and which one the “no” response key was counterbalanced across subjects) and that they should poise their left and right index fingers on the two response keys and respond as fast and as accurately as they could. When ready, they pressed Enter to display the instruction on the test procedure, and another Enter in response to the prompt “Ready to display the first test word?” to display the first probe word. The probe word remained in view until one of the response keys was pressed. If they made an error, the computer beeped and displayed “ERROR! Press Enter to continue”. The learning and testing processes were performed for 10 times (once for each MSS). The order in which the 10 MSSs were learned and tested was random across subjects.
Results and discussion
The data from the practice trials were not recorded and analysed. RTs longer than two standard deviations above the mean of the cell (formed by crossing response type, i.e., positive versus negative responses, and MSS) made up 0.5% of the data and were excluded from analysis. The response errors made up 2.2% of the total responses. Error data were also excluded from analysis. The mean cycles of learning was 1.27, and the number of learning cycles taken by subjects was positively correlated with MSS, r = .285. The mean RTs of the first 10 trials (collapsed over the positive and negative responses and the MSSs) are presented in Figure 1a (the lowest of the three RT functions in the figure) as a function of the serial position of the trial in the test run along with those of Experiments 3 and 4. Note that some MSSs did not have 10 test trials, and some MSSs had more than 10 test trials. For instance, MSS 1 had only 2 test trials, MSS 2 had 4 test trials, MSS 3 had 6 test trials, MSS 4 had 8 test trials, MSS 5 had exactly 10 test trials, MSS 6 had 12 test trials, and so on.
(a) Mean reaction time (RT) of Test Round 1 trials as a function of experiment and the serial position of the probe in the test. (b) Mean RT of the first trials of Test Round 1 and all trials of Test Round 1 as a function of response type and memory-set size (MSS) of Experiment 1. To view this figure in colour, please visit the online issue of the Journal.
A visual inspection suggested that there was a decrease in RT from Trial 1 to Trial 4. An analysis of variance (ANOVA) on the first four trials revealed that the trial serial position effect (mean RT of Trial 1 = 965 ms; that of Trial 2 = 754 ms; that of Trial 3 = 744 ms; that of Trial 4 = 736 ms) was significant, F(3, 99) = 60.47, MSE = 6,859, p < .0001. A post hoc Newman–Keuls test showed that the Trial 1 mean was significantly longer than all the other means, with the other means not significantly different from each other. Thus, the TSC was limited to the first trial. This finding was consistent with Monsell et al.'s (2003) finding that when the switching of task was predictable, the switching cost was limited to the first trial of the switched-to task. A linear regression analysis on the mean RTs of Trial 4 through Trial 10 confirmed the visual observation of no change in RT from Trial 4 to Trial 10, t < 1.
The mean RTs as a function of response type (positive versus negative responses) and trial set (the mean RTs of the first test trials versus the mean RTs of all test trials of that response type) are presented in Figure 1b.
An ANOVA comparing the first-trial RT function (collapsed over the positive and negative responses) with the whole-data function (also collapsed over the two response types) indicated that the mean of the first-trial RTs (936 ms) was significantly longer than the mean of all trials RTs (769 ms), F(1, 33) = 48.38, MSE = 85,849, p < .0001. Once more, the TSC was demonstrated. The MSS effect was significant, F(9, 294) = 2.31, MSE = 35,525, p = .016. Importantly, the interaction between the first-trial function and whole-data function was significant, F(9, 223) = 6.11, MSE = 10,914, p < .0001, confirming that the RT reduction from the first trials to the averaged whole data was greater for larger MSSs than for smaller MSSs. To not complicate the above test, data were collapsed across positive and negative trials. So, a separate test specifically aimed at response type effect was conducted and indicated that the mean RT of the positive responses (758 ms) was marginally significantly shorter than that of the negative responses (778 ms), F(1, 33) = 3.75, MSE = 18,944, p = .061.
A linear regression analysis performed on the RT function of the whole data for the positive responses revealed a slope of −1.75 ms, which was not significant, t(332) = −0.52. However, the slope of the negative RT function of the whole data was 7.40 ms, and it was significant, t(334) = 2.07, SE = 3.57, p = .039. Although this slope was significant, it was a far cry from the typically obtained slopes on the order of 30 to 50 ms (Corballis et al., 1980; Okada & Burrows, 1978; Sternberg, 1975). Neither of the two RT functions had a significant quadratic trend, F < 1, and F(1, 326) = 2.49, MSE = 35,251, p = .115 for the positive and negative response function, respectively. This generally replicated the results from the first round of test in the Stadler and Logan's (1989) study. The interesting question is why the RT functions were flat or so close to being flat. The answer is in the unequal weights that TSC carried in the averaged RTs across the different MSSs.
Although the RT functions of the first trials were quite variable because of the small number of data points the plotted functions were based on, a visual inspection suggested that the positive and negative functions of the first trials displayed a considerable slope. A linear regression confirmed this impression. The positive RT function had a slope of 15 ms, which was significant, t = 2.15, SE = 7.04, p = .033, although its quadratic trend was not significant, F < 1. The negative RT function had a slope of 21 ms, which was also significant, t = 2.64, SE = 7.83, p = .009. Again, its quadratic trend was not significant, F < 1. Although the slopes of these first-trial RT functions were lower than were typically found, they were all significant and much larger than the slopes of the whole-data functions, as was visually apparent in the figure.
As Figure 1a revealed, responses for the first couple of test trials were slower than the responses for subsequent test trials, a manifestation of the TSC. Through the averaging process, this initial linear MSS effect was “washed out”. Specifically, although the RTs of the initial trials of smaller MSSs (e.g., MSSs 1 and 2) were shorter, the subsequent trials included in the averaging carried the TSC, causing the averaged RT to be only slightly lower than the initial first-trial RT for the smaller sets. For the larger MSSs, although the initial RTs were higher, the reduction in RT through the averaging process was also greater due to averaging over many more post task-transitional-stage, faster trials, as can be seen in Figure 1a. Put it simply, the whole-data function appeared flat because of a confound (inverse correlation) between the MSS and the weights of the TSC contributing to the mean RTs of MSSs.
Experiment 2
Experiment 2 used a varied-set design in which only one probe was presented for a learned memory set. Although the first test trials in Experiment 1 might be similar to the single test trials in a varied-set design, subjects in a fixed-set and in a varied-set design may have different mental set or expectations. Subjects know that for each MSS they need to respond to only one test probe in a varied-set experiment, but to many test probes in a fixed-set experiment. Can this difference in expectation make any difference in the RT function? A comparison of the first-trial RT function in Experiment 1 with the RT function obtained in this experiment would answer this question. If they are basically the same, the first-trials (and the data from them) in a fixed-set experiment may be used as a quasi-varied-set design within a fixed-set design. Some studies found that the RT function displayed a quadratic or bilinear trend when the MSS exceeded the immediate memory span of about 7, in which the smaller sets section generated a steeper slope than the larger sets section, with the break-point between the two slopes located between 6 and 8 (Burrows & Okada, 1975; Corballis et al., 1980; Okada & Burrows, 1978, among others). This finding has given rise to the proposing of alternative memory mechanisms to the serial scan process. Experiment 1 used several supraspan-sized memory sets and a fixed-set design but did not obtain an RT function with a quadratic trend. Experiment 2 would use the same MSSs but a varied-set design. If the first trials of a fixed-set procedure are similar in nature to the single trials of a varied-set procedure, then a similar RT function with no qualitative difference from the first-trial RT function in Experiment 1 should be obtained. At any rate, Experiment 2 was conducted to test the hypothesis that supraspan MSSs by themselves are not sufficient to generate a nonlinear (quadratic) trend in the RT function in a fixed-set procedure with one test round or in a varied-set design.
Method
Subjects
Sixty undergraduate psychology students at UTPA participated in this experiment for extra course credit.
Design, materials, and procedure
The way the categories of words were created for learning and testing was the same as in Experiment 1 except that 40 categories rather than 10 categories were used. The design and procedure were different from those of Experiment 1. Each of the 10 MSSs was tested four times with four different categories (i.e., each time tested with a new category), twice with positive probes, and twice with negative probes, and each category was tested with a single probe. The single probe was randomly selected from the positive or the negative set of the category for each subject. Thus, each subject studied 40 categories and received 40 test trials. The 40 test trials were presented in four blocks of 10 trials each block. Five of the 10 trials in each block were positive trials, and five were negative trials. Over the four blocks, each MSS was tested twice with a positive and twice with a negative probe.
Results and discussion
Outliers made up 4.0% of the data. The percentage of response errors was 4.33%. Outliers and error responses were excluded from analysis. The mean learning cycles taken by subjects was 1.25, and the number of learning cycles was positively correlated with the MSS, r = .226. The mean RTs as a function of response type and MSS are presented in Figure 2.
Mean reaction time (RT) as a function of response type and memory-set size (MSS) of Experiment 2. To view this figure in colour, please visit the online issue of the Journal.
The mean RT of the positive responses was 1167 ms, and that of the negative responses was 1161 ms. The difference was not significant, F < 1. The MSS by response type interaction was not significant either, F < 1. A regression analysis showed that the slope of the positive RT function was 23 ms, which was significant, t(596) = 3.97, SE = 5.86, p < .0001, and the slope of the negative responses was 32 ms, which was also significant, t(595) = 6.10, SE = 5.38, p < .0001. Neither function's quadratic trend was significant, F < 1 for the positive, and F(1, 587) = 1.29, MSE = 143,914, p = .260, for the negative RT function, respectively. The results of Experiment 2 allowed for three conclusions. First, the first trials in a fixed-set procedure were basically like the single trials in a varied-set procedure in terms of the shape (linearity) of the function they generated. Second, both the positive and the negative RT functions from the varied-set procedure had larger linear slopes than those of the first-trial RT functions in Experiment 1. Of course, besides different expectations on the part of the subjects, far more categories were learned in Experiment 2 than in Experiment 1, possibly creating a greater long-term memory load and a slower memory retrieval rate. Third, supraspan MSSs were not a sufficient condition to generate a quadratic trend in the RT function either in a one-test-round fixed-set procedure or in a varied-set procedure. Fourth, the results of Experiment 2 reinforced the conclusion that TSC indeed played a role in generating a null or a very weak MSS effect in the RT function derived from averaging over all trials in a fixed-set design in Experiment 1. When the RTs of the first probes in a fixed-set design or the RTs of the only probes in a varied-set design were plotted against the MSS, a linearly increasing function was obtained in both cases with no sign of a quadratic trend.
Experiment 3
Experiment 3 repeated the design and procedure of Experiment 1 with two exceptions. First, instead of the positive and negative probes being presented for only one round in the test, the test was repeated for four rounds. Second, the first trials were evenly divided between positive and negative trials rather than selected completely randomly as in Experiment 1 (this produced approximately but not exactly equal numbers of first positive and negative trials for each MSS). This design change gave a more controlled examination of the first trial results than in Experiment 1. The results of the first trials are compared with those of Experiment 1 and the varied-set results of Experiment 2 to further examine the RT functions before they were contaminated by the averaging process in a fixed-set procedure. There were multiple questions to be answered in Experiment 3. The first question is whether the first-trial results (showing a significant linear trend without a quadratic trend) in Experiment 1 can be replicated. The second question is whether the basically flat slopes of the mean RT functions of the first test round in Experiment 1 (averaged over all the positive or negative responses of each MSS in that test round) can be replicated. The third question is whether repeated rounds of testing in a fixed-set design will generate RT functions that reveal a MSS effect as reported by Stadler and Logan (1989) and that is attributed to repetition priming. The last and the most important question is whether the shape of the RT functions resulting from repeated rounds of testing will be the same as, or different from, that of the first-trial and the varied-set RT functions. In other words, if repeated rounds of testing lead to a MSS effect, will there be a quadratic trend since this experiment used supraspan MSSs? Stadler and Logan (1989) used only subspan MSSs (their largest MSS was 4) and did not address this question. Studies that used a fixed-set design typically gave multiple rounds of tests for a learned memory set (e.g., Corballis et al., 1980; Stadler & Logan, 1989, among many others); therefore the results from this experiment will have bearings on the conclusions reached by many previous studies.
A deviation from a linear RT/MSS function has been considered evidence against Sternberg's serial scan theory, and different theoretical accounts have been proposed to explain the nonlinearity of (bilinear, logarithmic, quadratic) RT functions associated with supraspan MSSs. Among these theoretical accounts are different retrieving mechanisms for immediate memory and long-term memory (associated with supraspan sets), with the long-term memory search rate being faster than the immediate memory search rate (resulting in shallower slopes for larger MSSs than smaller MSSs), an item-based versus a class-based decision strategy with the class-based decision more frequently used for supraspan sets yielding shallower slopes for supraspan sets, a sequential binary decision algorithm used for finding a match between the probe and the memory-set item, and a familiarity-based versus a serial search mechanism for classifying the probe with the familiarity-based process more frequently used in the case of supraspan MSSs producing a shallower slope for the supraspan MSSs (Atkinson et al., 1974; Atkinson & Juola, 1973; Corballis et al., 1980; Juola et al., 1971; Okada & Burrows, 1978; Simpson, 1972; Swanson, 1974). Experiment 3 was conducted to determine whether repeated cycles of testing in a fixed-set procedure can produce a negatively accelerated (quadratic) RT function. If so, the quadratic function may have a simpler explanation (unequal repetition priming across MSSs) than the above proposed alternative memory processes.
Method
Subjects
Sixty-six introductory psychology students at UTPA participated in the experiment for extra course credit.
Design, materials, and procedure
The design, materials, and procedure were the same as those in Experiment 1, with the following exceptions. First, the fixed-set test for each MSS was repeated for four rounds. Second, half of the subjects received positive trials as the first trials for odd-numbered MSSs and negative trials as the first trials for even-numbered MSSs, and the other half of the subjects received the reversed arrangement of the first trials. Subjects were not told that half of the first trials were positive and half were negative across different MSSs. A round of test was composed of all the positive and negative probes presented in a random order except for the controlled first trials. The transition from one round of test to the next was not apparent to the subjects. Subjects could take a short break between completing the test for one MSS and learning the memory set for another MSS. There were a total of 440 test trials in the experiment.
Results and discussion
Outliers were defined the same way (a cell in this experiment was defined as response type crossed with test round with MSS). The RT outliers made up 3.88% of the data. The percentage of error responses was 4.23%. Outliers and error responses were excluded from analysis. The mean number of learning cycles taken by subjects was 1.36, and the number of learning cycles was positively correlated with MSS, r = .35. The mean RTs of the first 10 trials of the first test round (collapsed over the positive and negative responses and the MSSs) are presented in Figure 1a. An ANOVA was conducted for the first four of the 10 trials since a visual inspection suggested a downward trend in RT from Trial 1 to Trial 4. The results showed that the effect of trial serial position (mean RT of Trial 1 = 1016 ms; that of Trial 2 = 780 ms; that of Trial 3 = 756 ms; that of Trial 4 = 743 ms) was significant, F(3, 195) = 177.14, MSE = 6,191, p < .0001. A Newman–Keuls test revealed that Trial 1 mean was significantly longer than all the other means, with Trial 2 mean significantly longer than Trial 4 mean, and with Trial 3 mean not significantly different from either Trial 2 or Trial 4 mean. Thus, the TSC in this experiment affected responses over more trials than in Experiment 1 and took three trials rather than one trial to recover from. A regression analysis for the RT means of Trial 4 through Trial 10 did not show a linear trend, t < 1.
The mean RTs of the positive first trials of the first test round and the mean RTs of the positive trials of each of the four test rounds are presented in Figure 3a as a function of trial set and MSS.
(a) Mean reaction time (RT) of positive trials as a function of trial set and memory-set size (MSS) of Experiment 3. (b) Mean RT of negative trials as a function of trial set and MSS of Experiment 3. To view this figure in colour, please visit the online issue of the Journal.
Those of the negative trials are presented in Figure 3b.
An ANOVA comparing the mean RTs of the first trials collapsed over the positive and negative responses with the mean RTs of Test Round 1 trials also collapsed over the positive and negative responses showed that the mean RT of the first trials (1005 ms) was significantly longer than that of the first test round mean (790 ms), F(1, 65) = 201.55, MSE = 69,726, p < .0001, indicating the TSC. The MSS main effect was also significant, F(9, 580) = 1.95, MSE = 46,638, p = .043. Importantly, the MSS by trial set (first test trials versus first-round trials) interaction was significant, F(9, 490) = 4.38, MSE = 18,123, p < .0001, indicating that the reduction in RT from the first trials to the average of all the trials of the first test round was greater for the larger MSSs than for the smaller MSSs (due to an unequal contribution of the TSC across the MSSs), as a visual inspection would suggest. An overall ANOVA on the data of the four test rounds (the bottom four RT functions in Figures 3a and 3b) indicated that the mean RT of the negative responses (768 ms) was significantly longer than that of the positive responses (713 ms), F(1, 65) = 126.48, MSE = 30,672, p < .0001, and that response type by test round and response type by MSS interactions were both significant. Therefore, a separate ANOVA was conducted for the RT functions of the four test rounds of the positive and negative response data, respectively.
For the positive responses, an ANOVA on the four RT functions of the four test rounds (the four bottom RT functions in Figure 3a) indicated that the effect of test round was significant, F(3, 195) = 72.62, MSE = 12,508, p < .0001, with the four mean RTs being 769 ms, 703 ms, 694 ms, and 688 ms for Test Rounds 1, 2, 3, and 4, respectively. A Newman–Keuls test showed that Round 1 mean (769 ms) was significantly longer than the rest of the means, with Round 2 means (703 ms) significantly longer than Round 4 mean (688 ms), and with Round 3 mean (694 ms) not significantly different from either Round 2 mean (703 ms) or Round 4 mean (688 ms). The MSS effect was significant, F(9, 583) = 24.29, MSE = 33,737, p < .0001. Importantly, the test round by MSS interaction was also significant, F(27, 1728) = 12.50, MSE = 9,858, p < .0001, indicating that the reduction in RT from the first round of test to the subsequent rounds of tests was greater for the smaller MSSs than for the larger MSSs. This pattern of interaction was exactly the opposite from that observed in the ANOVA comparing the first trials with the first round of trials.
For the negative response data, the test round effect was significant, F(3, 195) = 42.72, MSE = 18,847, p < .0001, as was the MSS effect, F(9, 584) = 24.47, MSE = 49,445, p < .0001. The mean RTs of the four test rounds were 808 ms, 785 ms, 744 ms, and 733 ms for Test Rounds 1, 2, 3, and 4, respectively. A Newman–Keuls test indicated that Round 1 mean (808 ms) was significantly longer than the rest of the means, with Round 2 mean (785 ms) significantly longer than Round 3 mean (744 ms) and Round 4 mean (733 ms), and with the last two means not significantly different from each other. Again, the test round by MSS interaction was significant, F(27, 1737) = 11.73, MSE = 12,620, p < .0001.
Test results on the linear and quadratic trends of RT/MSS functions of the positive and negative trials of Experiment 3
Note: MSS = memory-set size; RT = reaction time; lin = linear; quad = quadratic; the linear trend and slope were evaluated with a linear regression analysis, and the quadratic trend was evaluated with an orthogonal trend test (hence independent of the linear trend).
As shown in the table, except for the first round of test, the linear slopes of the RT functions were all significant or very close to significance. The linear slopes of the first-trial functions (13.74 ms for the positive responses and 10.89 ms for the negative responses) were by and large comparable to those in Experiment 1 (15 ms for the positive and 21 ms for the negative response of the first trials). The two important replications were that the mean memory-set RTs of the first test round showed no linearly increasing trend, and that the first-trial function showed a moderate but significant linear trend. An unexpected result was that the first-round function for the negative responses had a significant quadratic component. A close examination of the function in Figure 3b indicated that the bowing in the function originated from higher mean RTs at the beginning and ending than at the middle of the function. An unusual aspect of this RT/MSS function was the noticeable spike at MSS 1, which was apparently caused by an insufficient reduction of RT from the first trial to the average of the first-round trials (due to a very strong TSC effect on MSS 1). When the data of MSS 1 were removed, the quadratic trend became nonsignificant, and the linear trend remained nonsignificant. A noteworthy finding was that as the test round was repeated, the linear slope increased. An examination of Figures 3a and 3b suggested that the RTs associated with smaller MSSs appeared to continue to decrease with the repeated rounds of testing although the RTs of the larger MSSs apparently stabilized from the second test round onward (actually, the mean RTs for the very large MSSs stabilized at the first round of test). Another important finding was that repeated rounds of testing from the second round onward generated a significant quadratic trend (except for the second round of the negative trials). As can be seen in Figures 3a and 3b, the emergence of the bend in the RT functions derived from a continued reduction in RT with repeated rounds of testing for the smaller MSSs but not for the larger MSSs.
From the first trial to the completion of the first round of test, no test item was repeated. On the other hand, the continued test from one round to the next involved repeating the presentation of the same set of test items. Therefore, the speeding up of responses within the first round of test and the speeding up across the rounds of tests were derived from different underlying processes. The within-round speeding up was derived from a process of readaptation to the task—that is, from overcoming the TSC. The larger memory sets benefited more from this process than did the smaller sets because there were more trials of the same task in a larger set than in a small set to allow for a recovery from the task switching. The across-the-rounds speed-up involved repeating the same test items and could be attributed to repetition priming. The interval between two repetitions of the same test item, which can be measured by the number of the intervening other test items, was shorter for a small set than for a large set. Therefore, the small sets benefited more from cross-rounds repetition than did the large sets. In fact, as is evident in Figures 3a and 3b, MSSs above 6 showed little or no cross-rounds speeding up. It may be that when the second presentation of the same probe was separated from its first presentation by 6 or 7 other items, the previous memory activation had completely dissipated by the time the second presentation occurred. Thus, the within-round or task-adaptation-based speeding-up benefits from going through more other test items whereas the cross-rounds or repetition-priming-based speeding-up benefits from going through fewer other items.
This experiment along with the first two experiments demonstrated that it is possible to obtain a linearly increasing RT/MSS function without a quadratic trend even with supraspan MSSs, or an increasing RT function with both trends, or even a flat function, depending on the specific testing and measuring procedures. In the literature, each of these three types of results has been obtained and reported (Corballis et al., 1980; Diener, 1988, 1990; Okada & Burrows, 1978; Simpson, 1972; Stadler & Logan, 1989; Swanson, 1974). This experiment identified at least some conditions under which each of these three types of RT functions could be obtained and may provide some clues as to why different types of RT functions were obtained in Sternberg tasks. Based on the findings, several conclusions can be drawn. First, the quadratic RT/MSS functions found in many previous studies with a fixed-set procedure and used as evidence of a problem with Sternberg's scan theory and as a basis for proposing other possible underlying memory mechanisms could have been the result of averaging over the repeated rounds of tests, which gives smaller sets disproportionally greater repetition priming than larger sets (causing a positively accelerated RT drop from larger sets to smaller sets). Thus, the bend in the RT function could possibly have been more simply explained by the result of unequal repetition priming effects accrued for different MSSs rather than by other more complicated memory processes. Second, supraspan MSSs are not a sufficient condition for producing a quadratic function when the roles of the two confounding variables, TSC and repetition priming effect, are removed (as when using a varied-set procedure or the first-trial measure). Third, the data showed that as testing was repeated, the slope became steeper, which appeared paradoxical and difficult to understand on the surface. For example, some research (Juola et al., 1971; Kristofferson, 1972; Schneider & Shiffrin, 1977) has indicated that as the memory retrieval practice is repeated, the retrieval rate becomes faster, and therefore the RT/MSS slope becomes shallower. Therefore, the pattern of results obtained in this experiment appeared to contradict this principle at the first glance. However, when the unequal roles of repetition priming are identified, the results are no longer baffling. The repeated rounds of testing provide the smaller sets stronger repetition priming than the larger sets, causing the RTs of the smaller sets to decrease more than the larger sets, which in turn bends the curve more for the smaller sets, causing the slope to increase with the repeated rounds of testing. In some studies using a fixed-set design, in order to equalize the total number of test trials across different memory sets, the smaller sets were repeatedly tested for more rounds than the larger sets (e.g., Corballis et al., 1980). This can actually bias the distribution of the repetition priming effect even further in favour of the smaller sets beyond the level obtained in this experiment.
Experiment 4
In Experiments 1, 2, and 3, for each subject, each MSS was associated with only one distinct category, and the assignment of words to the positive/negative sets was consistent within a particular category and MSS. Some researchers (Atkinson & Juola, 1974; Banks & Atkinson, 1974; Simpson, 1972; Wickens, Moody, & Dow, 1981; Wickens et al., 1985) have suggested that recognition or retrieval processes can be different for consistently mapping and inconsistently mapping designs. That is, when the items are consistently mapped to positive and negative sets, the RT/MSS function is more likely to be negatively accelerated or logarithmic (Briggs & Johnsen, 1973; Briggs & Swanson, 1970; Corballis et al. 1980; Kristofferson, 1972; Simpson, 1972), but when the items are inconsistently mapped to positive and negative sets, the RT function tends to be linear and its slope steeper due to greater interferences among the items and hence a greater need for tagging individual items with specific contextual information (Corballis, 1979; Corballis et al., 1980; Wickens et al., 1981, 1985). Experiments 1, 2, and 3 used a consistent mapping but showed no quadratic trend for the first trials, which was inconsistent with the above idea. However, Experiment 3 showed that retesting the same items in multiple rounds led to bow-shaped RT functions. These findings seemed to suggest that the source of the bowing was the repeated testing rather than the consistent set mapping. Experiment 4 was conducted to test the hypothesis that the bowing in the RT function has nothing to do with the set mapping, but is the result of repeated rounds of testing.
In Experiment 4, for each subject, only one single category was used repeatedly for all the MSSs. Thus, words in the same category were randomly reassigned to positive and negative sets for the learning and testing of different MSSs. The positive items in one MSS condition could become the negative set items in another MSS condition and vice versa. Because of the repeated use of the words from the same category across all MSS conditions in an inconsistent mapping between stimuli and responses, this arrangement is expected to generate a greater amount of proactive interferences (Wickens et al., 1981, 1985) in the form of either slower responses or higher error rates or both.
Method
Subjects
One hundred and ninety-one psychology undergraduate students at UTPA participated in this experiment for course extra credit.
Materials, design, and procedure
The design and procedure were the same as those of Experiment 3 with two exceptions. In Experiment 3, one category was used only once for one MSS, and therefore the mapping of the items to sets was consistent for each MSS. In this experiment, for a particular subject, a single category was repeatedly used for all MSSs, and, further, the mapping of the items to positive and negative sets was inconsistent. The second difference was that three rather than four rounds of tests were performed for each MSS. There were 10 lists of 20 category words each in this experiment. For a subject, a single list of category words was randomly selected from among the 10 lists for studying and testing throughout the experiment. The order in which the 10 MSSs were learned and tested was randomly determined for each subject. The tasks in a block (for a MSS) were composed of first learning a set of words, and then taking three rounds of recognition test with the transition from one round to the next unnoticeable to the subjects. For a subject, the memory-set words for each of the 10 MSSs were randomly selected from the same 20 category words with the negative-set words randomly selected from the remaining words.
Results and discussion
The RT outliers made up 3.22%, and errors made up 9.45% of the data (which still met the 90% minimum accuracy criterion). Errors consisted of predominantly false-alarm errors (compared with the first three experiments). Thus, as expected, the inconsistent assignment of words across the MSSs lowered the overall accuracy of the performance. The mean number of learning cycles was 1.35, and the number of learning cycles was positively correlated with MSS, r = .213.
The mean RTs of the first 10 trials of the first round of test collapsed over the positive and negative responses and collapsed over the MSSs are presented in Figure 1a (the top RT function in the figure). As was evident in the figure, there was a sharp decrease in RT from Trial 1 to Trial 2, but a much more moderate decrease from Trial 2 to Trial 4. The very long first-trial mean RT relative to those in the other experiments could be the result of the inconsistent mapping, which presumably would require more activation/inhibition reversals for the same words in memory. Moreover, there seemed to be a slight and gradual increase in RT from Trial 4 to Trial 10. An ANOVA on the data of the first four trials showed that the four RT means (mean RT of Trial 1 = 1590 ms; that of Trial 2 = 955 ms; that of Trial 3 = 929 ms; that of Trial 4 = 903 ms) were significantly different, F(3, 570) = 729.65, MSE = 28,720, p < .0001. A Newman–Keuls test showed that Trial 1 mean (1590 ms) was significantly longer than all the rest of the means, and that Trial 2 mean (955 ms) was significantly longer than Trial 4 mean (903 ms), with Trial 3 mean (929 ms) not significantly different from Trial 2 (955 ms) and Trial 4 means (903 ms). In addition, a regression analysis indicated that there was a significant linear increase (slope = 8.90 ms) in RT from Trial 4 to Trial 10, t(1335) = 2.84, SE = 3.13, p = .005. This increase can be interpreted as a task fatigue effect. However, compared with the TSC effect it should have had a relatively very minor impact on the MSS effect for the larger sets.
The mean RTs as a function of trial set and MSS are presented in Figures 4a and 4b for the positive and negative responses, respectively.
(a) Mean reaction time (RT) of positive trials as a function of trial set and memory-set size (MSS) of Experiment 4. (b) Mean RT of negative trials as a function of trial set and MSS of Experiment 4. To view this figure in colour, please visit the online issue of the Journal.
First an ANOVA was conducted to compare the first-trial functions with the first test round functions to confirm there were unequal RT reductions for small and large MSSs (as could be revealed by a trial set by MSS interaction). Again, in this ANOVA, the positive and negative trials were collapsed. The trial set effect (mean RT of first trials of first round = 1566 ms; mean RT of first round = 1011 ms) was significant, F(1, 190) = 918.18, MSE = 285,922, p < .0001, as was the MSS effect, F(9, 1697) = 10.02, MSE = 149,656, p < .0001. The trial set by MSS interaction was significant, F(9, 1343) = 22.66, MSE = 44,657, p < .0001, confirming the unequal RT reduction across MSSs from the first trials to the average of all trials of the first round.
An overall ANOVA on RT showed that the main effect of response type (mean RT of positive responses = 861 ms; that of negative responses = 1002 ms) was significant, F(1, 190) = 396.52, MSE = 142,394, p < .0001. Response type by test round, response type by MSS two-way, and response type by test round by MSS three-way interactions were all significant. Therefore, separate analyses were performed for the positive and negative response data. For the positive responses, the test round main effect (mean RT of Test Round 1 = 956 ms; that of Test Round 2 = 821 ms; that of Test Round 3 = 808 ms) was significant, F(2, 380) = 299.70, MSE = 42,373, p < .0001, with the mean of the first test round significantly longer than those of Test Rounds 2 and 3, and with the last two means not significantly different from each other. The main effect of MSS was significant, F(9, 1701) = 67.61, MSE = 69,840, p < .0001. The test round by MSS interaction was significant, F(18, 3368) = 26.72, MSE = 30,927, p < .0001, again, confirming a larger RT drop for the smaller sets than for the larger sets.
The ANOVA for the negative response data showed that the test round main effect (mean RT of Test Round 1 = 1061 ms; that of Test Round 2 = 985 ms; that of Test Round 3 = 962 ms) was significant, F(2, 380) = 69.83, MSE = 71,876, p < .0001, with the three means all significantly different from one another. The main effect of MSS was significant, F(9, 1702) = 74.04, MSE = 113,589, p < .0001. The test round by MSS interaction was also significant, F(18, 3336) = 37.03, MSE = 53,766, p < .0001.
Test results on the linear and quadratic trends of RT/MSS functions of the positive and negative trials of Experiment 4
Note: MSS = memory-set size; RT = reaction time; lin = linear; quad = quadratic; the linear trend and slope were evaluated with a linear regression analysis, and the quadratic trend was evaluated with an orthogonal trend test (hence independent of the linear trend).
First, the linear slopes of the RT functions were indeed markedly increased from Experiment 3 to Experiment 4. Specifically, the linear slopes of the first-trial functions of 13.74 ms for the positive and 10.89 ms for negative trials in Experiment 3 were increased to 41.40 ms and 26.98 ms in Experiment 4, an increase of about 300%. Also very noteworthy was that the linear slopes of the first-round trials of −1.35 ms for the positive and 2.11 ms for the negative trials in Experiment 3 (both nonsignificant) were increased to 6.55 ms and 9.12 ms, respectively, in Experiment 4 (both highly significant). These significant first test round slopes could be the result of the contributions of the much larger slopes of the first-trial functions in this experiment. Thus, the inconsistent mapping of items to sets indeed raised the RT per item value greatly, suggesting that the specific contextual information might indeed need to take more time to retrieve (Atkinson & Juola, 1973; Corballis, 1979; Wickens et al., 1981, 1985), consistent with the idea that retrieval of specific episodic details is slow and laborious (Yonelinas, 2002). However, the expectation that the quadratic trend would not be obtained was not borne out. Although the quadratic trend did not occur in the first-trial functions, and the Test Round 2 and 3 functions did appear less bent (especially those of the positive responses) than the counterparts in Experiment 3, the quadratic trend was reliably obtained in the second and third rounds of the test. Note that the significant quadratic trend of the first test round of the negative trials (see Table 2) did not show the typical early steeper slope followed by a progressively flattened slope. This quadratic trend might have derived from the downward bowing of the curve due to an unusually high MSS 1 RT mean. Indeed, when MSS 1 data were excluded, the quadratic trend completely vanished, F < 1, and only the significant linear trend remained, F(1, 1694) = 52.90, MSE = 102,380, p < .0001. The overall picture in the results seemed to suggest that the quadratic trend (or the logarithmic function) was primarily the product of the repeated rounds of testing rather than the product of an item/set mapping variable. As shown in Tables 1 and 2, the quadratic trend was consistently absent before the test was repeated, but consistently present once the test round was repeated. Thus, the repeated testing in a fixed-set experiment may be a very robust factor in generating a bowed (a deviation from linearity) RT/MSS function.
General Discussion
TSC has been an intensively researched area in the last couple of decades, and it has been repeatedly shown that the first couple of responses after the switch of the task are executed more slowly than subsequent trials. Because a fixed-set procedure involves different number of multiple test trials for different MSSs, the differential contribution of the TSC becomes a confound for the MSS effect. As demonstrated in three of the four experiments in this study, the confound could cause a null or very weak MSS effect for the first round of test, consistent with Stadler and Logan's (1989) report. When researchers are not aware of this source of confounding, the apparent null MSS effect can be attributed to some memory retrieval processes other than a serial search or scan. For instance, a familiarity-based or direct recognition process that is thought to involve no serial comparison process has been proposed on the basis of the finding of a flat RT function (see Monsell, 1978; Nickerson, 1972, for a review). The present study offers one advice to researchers using a fixed-set procedure of the Sternberg paradigm: Examine the first-trial RT function before concluding that there is no search in memory recognition in the Sternberg paradigm (Stadler & Logan, 1989).
Although Sternberg (1966) used a varied-set procedure in his original memory scan study, the fixed-set procedure has been more widely used than the varied-set procedure in studies adopting the Sternberg paradigm in the last several decades but few studies have indicated a concern about the potential of obtaining different results from the two different test procedures. A quick, limited informal survey of the literature turned up 14 studies that used the fixed-set procedure of the Sternberg paradigm (Burrows & Okada, 1975; Corballis, Murray, & Connolly, 1989; Eriksen et al., 1986; Ferraro & Balota, 1999; Ilan & Miller, 1998; Juola et al., 1974; Miller, 1998; Oberauer, 2001; Okada & Burrows, 1978; Simpson, 1972; Theios et al., 1973; Treisman & Doctor, 1987, among others). The high frequency of using the fixed-set procedure is understandable since the varied-set procedure is far more costly in terms of time and labour required for collecting the same amount of data. With a fixed-set procedure, many data points can be collected from a single learned memory set, but with a varied-set procedure, only one data point can be collected from a learned memory set. However, as made clear in this study, this data collection economy is obtained at a cost—that is, the picture of the MSS function can be changed by the unequal contributions from TSC and repetition priming. The two confounds work in opposite directions: The TSC favours larger sets, whereas the repetition priming favours the smaller sets. However, because the TSC can sometimes impact the first three or even four postswitch trials, the RT reduction for MSS 1 from the first test round onward could have derived from the combined contributions of recovery from TSC and repetition priming.
Studies using a fixed-set design often obtained RT functions that deviated from linearity, which was often used as evidence of a problem with the serial scan theory and as a basis for proposing alternative memory retrieval mechanisms. The most common deviation from linearity was a bilinear or logarithmic function when supraspan memory sets and a fixed-set procedure were jointly used. In these cases, various theories were proposed to explain the bilinear or logarithmic feature of the RT/MSS functions where the smaller MSS section displayed a steeper slope and the larger MSS section a shallower slope. For example, one theory suggests that there are two different retrieval processes, one for retrieval from short-term memory and one from long-term memory, with the steeper portion of function reflecting a retrieval from the short-term and the shallower portion of the function a retrieval from the long-term memory, and that the slope difference between the smaller sets and the larger sets results from a faster retrieval rate from long-term memory than from the short-term memory (Atkinson & Juola, 1973; Juola et al., 1971; Okada & Burrows, 1978). Another theory postulates that the bilinearity of the RT functions derives from a mixture of two processes in variable proportions, a list-length-dependent serial search process and a list-length-independent, familiarity-based recognition process. According to this idea, the steeper part of the function reflects a greater contribution of serial search than the shallower part, which reflects a predominant contribution of familiarity-based recognition (Atkinson et al., 1974; Atkinson & Juola, 1973; Banks & Atkinson, 1974; Burrows & Okada, 1975; Juola et al., 1971; Juola et al., 1974; Swanson, 1974). Still another proposed idea to explain the bilinear or logarithmic functions is that a series of binary matching operations underlies the decision process with each operation eliminating half of the nontarget items (i.e., reduces the total amount of information by one bit of information), and therefore the total number of decisions (and hence the amount of time) required for a response is a logarithmic function of the number of memory items (Burrows & Okada, 1975; Simpson, 1972). A fourth idea suggests that there are individual item-based and a class-dimension-based recognition processes and that the steep part of the function derives from an item-based recognition process and the shallow part from a class-based recognition process (which uses semantic features in long-term memory to organize items into classes and hence generates a relatively list-length-independent decision time; Okada & Burrows, 1978; Swanson, 1974). As can be seen, a common basic concept in the above proposed ideas for explaining the quadratic trend in the RT functions is that the rate of processing memory items for larger sets is faster than that for processing smaller set items. In perception, the exact opposite is the case. It has been known that the rate of perceptually apprehending a small number of items (1 to 4) is so fast that the RT/set-size function is almost flat, a phenomenon called subitizing, which is thought to be a form of parallel information processing. But as the number of stimuli exceeds this range, a slow, serial process called counting takes over, generating a much steeper slope in the RT/set-size functions (Logie & Baddeley, 1987; Mandler & Shebo, 1982; Trick & Pylyshynm, 1994). If there is a functional isomorphism between memory and perceptual processes (Shepard & Chipman, 1970; Shepard & Metzler, 1971), then the idea of a faster processing rate for larger than for smaller memory sets is a little counterintuitive.
When an overall average of RTs collapsed over many repeated rounds of tests is plotted against MSS, the RT curve shows a pattern of a slower processing rate for smaller sets and a faster rate for larger sets (i.e., the averaged negatively accelerated RT curve appears as though the process speeds up as the set size increases). However, when the functions were examined analytically by the first trials and by individual rounds, it becomes apparent that the quadratic trend derives from a speeding up of responses for small sets in the repeated rounds of tests, not a speeding up of processing rate for larger MSSs as the above theories have conceived. As the four experiments consistently showed (at least under the conditions and with the materials employed in this study), there was no deviation from linearity in RT functions in the varied-set experiment and in the first trials of the fixed-set experiments. The deviation from linearity derived from repeated rounds of testing. To my knowledge, no author who used a fixed-set design looked at the RT function from the first trials.
The RTs of the supraspan sets almost reached asymptote at the first test round, and more test rounds had no effect on their RTs. The smaller the sets, the more additional reductions in RT occurred with additional rounds of testing. And this phenomenon was shown to be reliable and robust enough to be replicated with an inconsistent mapping design, which, according to some researchers, requires a predominantly serial memory search for the probe recognition and should generate linearity in the RT functions. Therefore, this finding further reinforces the viewpoint that the quadratic function is the product of a confounding repetition priming effect exerted on the RTs of different MSSs during the test, rather than the product of some pretest memory processes. As demonstrated consistently across the four experiments in this study, when the RT measure was taken from the pristine first trials before the subsequent trials had a chance to influence the measures through averaging, the RT functions were all clearly linear, with no indication of a quadratic component, even when supraspan memory lists were used.
The bottom line from the present study for researchers using the Sternberg paradigm is to avoid using the fixed-set procedure as much as possible. The varied-set procedure is a much cleaner procedure than the fixed-set procedure, and its use is crucial when the central issue of a study concerns the specific nature of retrieving information from memory. If a fixed-set procedure is used for the sake of data collection economy, and a flat RT function or a bilinear RT function is obtained, the results must be examined analytically first (examining the data systematically in subsets) to rule out the confounding from TSC and repetition priming before drawing a conclusion.
Although this article focused on a methodological issue, it has important theoretical bearings. As noted earlier in the article, many proposed alternative memory retrieval processes were based on obtained RT functions that deviated from a pattern of linear increment. As shown in this study, those data could have contained the TSC and repetition priming effects both extraneous to the memory processes of interest in those earlier studies. Therefore, one important theoretical implication of the present study is that the no search or scan and the two different retrieval modes conclusions reached in some previous studies may be worth a reexamination under a condition that is completely free of the two contaminating effects. In addition, for researchers who use the Sternberg paradigm as a tool, being aware of the potential contamination from these two extraneous factors is also helpful.
Finally, I conducted a second, larger scaled survey 2
I thank one of the anonymous reviewers for suggesting this idea.
Footnotes
Acknowledgements
I thank the anonymous reviewers for their helpful comments and Jim Neely for answering a question regarding repetition priming.
Appendix A
A survey was conducted on 32 experiments (in 30 articles) that I read during the course of conducting this research with the goal of revealing any potential relationship between the test procedure used (varied set versus fixed set) and the shape of the RT/MSS function. Several caveats on interpreting the results are in order before presenting the results. First, the survey was not exhaustive. It is certain that many studies in the literature using the Sternberg paradigm were not included in the survey. Second, some studies in this survey did not report explicitly whether the test procedure was varied or fixed set. In those cases, I made the best possible judgements on varied versus fixed-set based on the information given in the procedure sections of the articles. Third, in some cases, the authors did not unequivocally state whether or not the RT/MSS functions were linearly increasing. Again, I made the best judgements on the shape of the functions based on either the reported results or a visual inspection of the curves in the figures. Where the classifications of the test procedures and RT functions were based on my judgements rather than on explicit reports by the authors, I marked the classifications with a footnote in Table A1. In addition, I did not include the specific experiments in a multiexperiments study in which a manipulated variable produced a decisive effect on the shape of the RT/MSS function but had no bearing on the issue in question. Finally, the shape of the RT functions may be also influenced by factors other than the varied versus fixed-set test procedure. For example, according to Briggs (1974), digits possess features conducive to the obtaining of a linearly increasing RT/MSS function. Because in the set of the surveyed studies, other variables were unavoidably confounded with the test procedure, the conclusion of the survey was based on an association between the shape of the RT function and the test procedure, rather than on a cause–effect relationship. A 2 (varied set versus fixed set) × 2 (linear increase of RT versus deviation from a linear increase) contingency table showing the frequencies of the surveyed studies that could be assigned to the four cells is presented in Table A2.
As can be seen, the majority of the studies using the varied-set test procedure yielded a linearly increasing RT/MSS function, whereas the majority of the studies using a fixed test procedure produced a nonlinearly increasing function, with the majority of the nonlinear increases being bilinear or logarithmic and a small number being nearly flat. A chi-square test for association showed a significant association between the shape of the RT function and the test procedure, χ2(1, N = 32) = 18.70, p < .0001. The authors (and experiments), the materials used, the maximum MSS employed, and the shape of the RT functions observed in the surveyed studies are presented in Table A1.
The authors, materials, maximum MSS, and fixed- versus varied-set test procedures of the studies included in the survey Note: MSS = memory-set size; RT = reaction time; max = maximum; exp = experiment. Classifications of test procedures and RT functions based on current author's judgements rather than on explicit reports by the author of cited study. Frequencies of classifications of the surveyed studies by the test procedure and the shape of the RT function Note: MSS = memory-set size; RT = reaction time.
Authors/Experiment
Materials
Max MSS
Test procedure
RT/MSS function linear increase
Baddeley and Ecob (1973)
Digits
6
Fixed set
No
Banks and Atkinson (1974)
Words
6
Varied set
Yes
Biederman and Stacy (1974)
Digits
4
Varied set
Yes
Burle and Bonnet (2000)
Digits
4
Varied set
Yes
Burrows and Okada (1975)
Words
20
Fixed set
No
Clifton (1973)
People's names
4
Fixed set
No
(MSS 1 deviantly fast)
Clifton and Birenbaum (1970)
Digits
7
Varied set
Yes
Corballis et al. (1980)
Words
16
Fixed set
No
Corballis et al. (1972)
Letters
6
Varied set
Yes
Darley, Klatzky, and Atkinson (1972)
Letters
5
Varied set
Yes
Eriksen et al. (1986)
Letters
6
Fixed set
Yes
Ferraro and Balota (1999)
Digits
4
Varied set
Yes
a
Forrin and Morin (1969)
Letters
3
Varied set
a
Yes
Juola et al. (1971)
Words
26
Varied set
a
Yes
Fixed set
No
Juola et al. (1974)
Words & pictures
48
Fixed set
No
Klatzky, Juola, and Atkinson (1971)
Letters & pictures
5
Varied set
Yes
McElree and Dosher (1989)
Words
6
Varied set
Yes
Monsell (1978)
Letters
5
Varied set
Yes
Okada and Burrows (1978)
Words
20
Fixed set
No
Roeber and Kaernbach (2004)
Words
15
Fixed set
No
Simpson (1972)
Letters
4
Fixed set
No
Swanson (1974)
Random forms
4
Fixed set
No
Sternberg (1966)/Exp 1
Digits
6
Varied set
Yes
Experiment 2
Digits
6
Fixed set
Yes
Sternberg (1967)
Digits
4
Varied set
Yes
Sternberg (1969a)/Exp 1
Digits
6
Varied set
Yes
Experiment 2
Digits
6
Fixed set
Yes
Theios et al. (1973)
Digits
5
Varied set
a
Yes
Theios and Walter (1974)
Digits
5
Varied set
a
Yes
Treisman and Doctor (1987)
Letters
6
Fixed set
No
Williams et al. (1990)
Digits & symbols
6
Fixed set
Yes
a
Wingfield and Branca (1970)/Exp 2
Letters
12
Varied set
Yes
RT/MSS function
Test procedure
Linear increase
Deviation from linear increase
Varied set
18
0
Fixed set
4
10
