Abstract
Traditionally, working memory is held to comprise separate subcomponents dedicated to the temporary storage of visuospatial and verbal information. More recently, the addition of an episodic buffer has been proposed where information from multiple memory systems is integrated. We report an experiment designed to investigate the effects of providing additional visuospatial information in a verbal working memory task. When to-be-remembered digits were arranged in a horizontal line, performance was no better than when digits were presented in a single location. However, when digits were presented in a keyboard array, performance was significantly better. It is argued that this pattern is hard to reconcile with the traditional model of working memory, and that the “spatial bootstrapping” effect provides evidence towards models of working memory that incorporate an episodic buffer.
The tripartite model of working memory (WM: Baddeley & Hitch, 1974) has been highly influential towards the understanding of temporary memory in humans. This model is characterized by its emphasis on the interaction of a central executive component with two modality-specific slave systems. Whilst the central executive is held to be able to manipulate information in a range of ways, the slave systems are thought to be more passive. The two separate slave systems support verbal WM and visuospatial WM, respectively. Both have been the focus of substantial research efforts. It is argued that the phonological loop, which can store verbal material, comprises separate storage and rehearsal mechanisms, with the latter being used to maintain the contents of the former (Baddeley, 2000; but see Jones, Macken, & Nicholls, 2004). It has also been suggested that visuospatial WM can be separated into a storage and a processing component (Logie, 1995, 2003).
Separation between verbal and visuospatial WM is one of the key features of the tripartite model and is supported by a range of evidence from neuroimaging (e.g., Smith, Jonides, & Koeppe, 1996), neuropsychology (for a review of relevant cases see Logie, 1995), and cognitive psychology (e.g., Quinn & McConnell, 1996). However, experimental results suggest that such separation is not absolute: Baddeley, Lewis, and Vallar (1984) observed that when concurrent articulation was employed to prevent the use of the phonological loop, participants’ recall of visually presented digits was reduced, but not eliminated. Logie, Della Sala, Wynn, and Baddeley (2000) demonstrated visual similarity effects on a verbal serial recall task. The “episodic buffer” (Baddeley, 2000) is a proposed secondary memory store that was developed in part to help understand these patterns. It is proposed to be a “limited capacity temporary storage system that is capable of integrating information from a variety of sources” (Baddeley, 2000, p. 421) that is linked with the central executive via the mechanism of conscious awareness. It represents a space in which binding can occur between information held in separate systems, both in long-term memory (LTM) and in short-term memory (STM), and therefore has a major role in episodic memory.
Pearson (2006) argues that revising the traditional tripartite model may not be necessary in order to understand overlap and interaction between visual and verbal working memory tasks, and that results like those described above could occur if the visuospatial sketch pad assisted performance on verbal tasks, and vice versa. There is evidence that verbal working memory can support performance on visual working memory tasks (e.g., Pearson, Logie, & Gilhooly, 1999, but see Pearson, 2006, for a fuller review).
The current study is designed to focus on whether visuospatial memory can assist performance of a verbal working memory task. Participants were asked to carry out a visually presented verbal serial recall task. There were three different display conditions in which differing amounts of spatial information were presented when the to-be-remembered digits were shown: a single item condition, and linear and keypad multiple item display conditions (see Figure 1). The tripartite model of working memory would predict that spatial facilitation should occur in both the linear and the keypad conditions, because in both cases additional spatial information that could be held within visuospatial working memory is available to assist memory compared to the single item condition.

Diagrams of screen displays presenting the number 6. Clockwise from top left: static single item display, keypad display, linear display.
The revised model provides an additional route (the episodic buffer) by which visuospatial information from LTM could bind with information from verbal STM and thus boost performance on a verbal task. In daily life keypad arrays are a very common way of arranging digits, especially for motor output, such as entering a PIN or a telephone number. There is evidence (Fendrich, 1998) to suggest that motor encoding strategies based on familiar keypad arrangements are beneficial to long-term memory for digits. Consequently, under the revised model, digit memory is likely to be stronger in the keypad than in the linear condition.
There is substantial evidence for the existence of an implicit mental number line. This is evidenced by the spatial numerical association of response codes (SNARC) effect (Dehaene, Bossini, & Giraux, 1993), whereby participants classify small numbers more quickly using the left hand and large numbers more quickly using the right hand. It is possible that the processes underlying the SNARC effect may form an alternative route by which verbal memory performance is facilitated when extra visuospatial information is available. In that case, however, the linear display would induce higher verbal memory than either the keypad or the single item condition.
Our basic prediction was that additional visuospatial material that was not explicitly task relevant, presented during a verbal memory task, would facilitate verbal memory. In addition, we anticipated that specific patterns of such “bootstrapping” effects might favour an explanation based on the tripartite working memory model, the revised (episodic buffer) working memory model, or processes similar to those underlying the SNARC effect.
Method
Participants
A total of 60 undergraduate students at the Universities of Kent and Aberdeen took part in this experiment. Mean age was 20 years (range 18–33). A total of 9 participants were male, and 51 were female.
Apparatus
A standard windows-compatible personal computer connected to a 17-inch monitor was used to present the stimuli in this experiment.
Design
The experiment was of mixed design. Display type was a between-subjects factor with three levels: static single item, linear, and keypad. List length was manipulated within subjects and had four levels: 5,6,7, and 8. Each participant was randomly allocated to one of the three levels of display type and then carried out 48 experimental trials. There were 12 trials at each list length, and trials were presented in a random order. The dependent variable was the proportion of trials in which all digits were recalled correctly in serial order.
Materials and procedure
The to-be-remembered items were random sequences of the digits 0–9. No digit repeated itself in any single list. Each trial was initiated by the participant pressing a key on the keyboard. A message was then displayed for 1 s informing the participant how many items there would be to remember in the upcoming trial. This was followed by a fixation cross presented for 1 s in the centre of the screen, followed by a display in which the to-be-remembered numbers were shown. Participants were not given any suggestion as to what memory strategy to use. As soon as the presentation display cleared, participants performed verbal serial recall: They were allowed as much time as they wished to complete this. Having completed recall, they began the next trial.
Static single item display
Immediately following the disappearance of the fixation cross, an empty rectangular frame (horizontal side = 185 pixels, vertical side = 236 pixels) was presented centrally on the screen. After 500 ms the first digit of the to-be-remembered sequence was presented within a small square box with a green background (side = 39 pixels) in the middle of the screen. The digit was presented in the Arial typeface at a point size of 22.5. This digit was visible for 500 ms and was then cleared. There was a delay of 500 ms during which only the outer frame was visible, and no digit was presented, and then the next digit was presented in a green box in the same position as before. This pattern continued until the last digit was presented. The frame remained visible for 500 ms after the last digit disappeared and then disappeared itself. The disappearance of this frame served as the cue for participants to begin verbal recall.
Linear display
Following the disappearance of the fixation cross, a rectangular frame (horizontal side = 557 pixels, vertical side = 113 pixels) was presented centrally on the screen. This frame contained 10 square boxes of side 39 pixels, which were evenly distributed from left to right and aligned with the horizontal midline of the display. The digits 0 to 9 were presented (in order from left to right) in each box in the same way as in the single item condition. After 500 ms the background of the box containing the first digit of the to-be-remembered sequence was illuminated by a green colour for 500 ms before reverting to white. Following a delay of 500 ms, the next digit was highlighted in the same way. This pattern continued until the last digit was presented. The frame and linear display of numbers remained visible for 500 ms after the last digit was illuminated and then disappeared itself. The disappearance of the visible display was the cue for participants to begin verbal recall.
Keypad display
The keypad display condition was the same as the linear display condition, except that the arrangement of the numbered boxes was different: The digits 1 to 9 were arranged in three rows of three (i.e., 123, 456, and 789) forming a 3 × 3 square matrix. The digit 0 appeared directly below the digit 8 at the bottom of the display. This pattern is the same arrangement as that of the standard telephone keypad. The only other difference was the dimensions of the bounding frame, which was adjusted to fit around the keypad display (horizontal side = 185 pixels, vertical side = 236 pixels).
Results and discussion
Figure 2 shows the profile of memory performance. Performance data were analysed using a 3 (display type)×4 (list length) analysis of variance (ANOVA). This indicated the presence of a main effect of list length, F(3, 171) = 237.86, p<.0005, MSE = 0.02, partial η2 = 0.81. The main effect of display type was also significant, F(1, 57) = 4.16, p = .02, MSE = 0.14, partial η2 = 0.13. The interaction between list length and display type was not significant, F(6, 171) = 1.43, p = .21, MSE = 0.02, partial η2 = 0.07.

Mean proportion of trials answered correctly broken down by list length and display type. Error bars depict standard error of the mean.
The main effect of list length reflected the unremarkable fact that memory performance declined as the number of to-be-remembered items increased, from .88 (.14) at list length 5 to .24 (.22) at list length 8. Every increase in list length was associated with a corresponding significant decrease in memory performance.
The main effect of display type reflected the fact that there were differences in memory performance across the three conditions. Individual comparisons showed that the proportion of items correctly remembered was significantly higher in the keypad condition than in the single item condition (p = .01) and in the keypad condition than in the linear condition (p = .02). There was no significant difference between the linear and the single item conditions (p = .84). This pattern represents clear evidence that the display pattern affected memory performance—in other words, of a bootstrapping effect. Verbal memory performance was boosted if to-be-remembered items were displayed in such a fashion (i.e., a keypad pattern) that enabled additional resources to be used to assist verbal memory systems.
In the single item condition, only a single number was visible on the screen at any one time, whilst in the linear and keypad conditions, all of the digits from 0 to 9 were visible during presentation. To an extent this may represent a confound, because additional irrelevant digits were visible in the two spatially distributed conditions. However, the layout of digits in the linear and keyboard displays were the same across all trials, and it is hard to see how the irrelevant digits would boost performance. In fact, the additional information might be expected to increase attentional demands and thus decrease performance. Additionally, the observation that spatial information facilitated performance only in the keypad condition cannot be a consequence of the mere presentation of all of the digits from 0–9 in both the keypad and linear conditions.
The data observed do not imply the use of a mental number line. If spatial bootstrapping processes based on the number line had been in operation, then performance should have been best in the linear display condition. Lindemann, Abolafia, Pratt, and Bekkering (2008) recently showed that SNARC-type number line effects were not a consequence of obligatory encoding of numbers in a number line format, arguing instead that SNARC effects were likely to arise from top-down strategic processes. Our data are certainly consistent with the idea that number line effects are not obligatory, and if strategic processes were adopted in this task, using a linear number line was not the strategy that was selected.
The pattern of results is also not easily reconciled with the tripartite model of working memory, which would predict a bootstrapping effect in both the linear and keypad conditions and is incompatible with the observation that no bootstrapping was found in the linear condition. Baddeley's (2000) revised working memory model adds the possibility that relevant long-term knowledge might facilitate encoding and/or retrieval when it is combined with the verbal memory trace in the episodic buffer. The current data are compatible with this revised model. Our interpretation is that the higher performance in the keypad condition was caused by integration of long-term knowledge about the very familiar telephone keypad. However, the precise processes involved are open to further investigation. One possibility is that information from long-term memory might improve encoding, perhaps by enabling more efficient chunking of the spatial keypad arrays.
One way that these results offer support to the revised model of working memory is that there was no evidence of visuospatial bootstrapping in the linear display condition. There are two possible alternative explanations for this pattern. One is that the visual angle between the extremities of the linear display are greater than the equivalent dimension in the keypad display, and hence central fixation would be a more effective strategy in the keypad than in the linear display. However, presentation duration for to-be-remembered digits was 500 ms, with an interdigit interval of 500 ms, so participants had the opportunity for multiple fixations. The other is that eye movements may have interfered with visual memory more in the linear compared to the keypad condition: It is known (Pearson & Sahraie, 2003) that eye movements can impair visuospatial working memory, though it is not immediately clear whether more saccades would be expected in the linear or keypad condition. Both of these factors may have influenced performance, but it is unlikely that either could have completely eradicated the benefit of spatial information in the linear condition but left the keypad condition relatively unaffected.
One intriguing aspect of these data is that the LTM representation supporting bootstrapping may be a motoric one. Reisberg, Rappaport, and O'Shaugnessy (1984) reported that subjects were able to boost working memory performance by using a “finger loop”: They were taught a simple coding scheme to allow them to store information in a motor programme for finger activity. Our results are superficially somewhat similar to these data, but they were elicited without any attempt to train participants or to encourage use of any particular rehearsal strategy. This would suggest that the bootstrapping effect results from implicit processes that are active in everyday cognition, rather than being tied to explicit experimental demands.
The purpose of this paper is to make an initial report of this visuospatial bootstrapping effect by which verbal WM, when under heavy memory load, can gain additional support from other cognitive systems, a pattern of results that is most consistent with models of WM that incorporate some kind of mechanism for online binding of information such as the episodic buffer (Baddeley, 2000), but is rather less compatible with traditional tripartite models.
