Abstract
The analysis of film discourse from a multimodal and cognitive perspective has shown in recent years that such an approach to the study of cinema is a very fruitful one. Among the various cinematic techniques that may be analysed as pieces of multimodal discourse, the flashback seems to be particularly appealing because, while being very rich and versatile, it is also a fixed device and common enough in film as to be studied in a systematic way.
Given those characteristics – formal variety alongside stability – a relevant question would be: how do spectators make sense of film retrospections? To address this question, this paper suggests an examination of the multimodal cues offered by flashbacks in three different films – Ordinary People (1980), Big Fish (2003) and The Help (2011) – and analyses the cognitive processes that those cues activate and which make the comprehension of the flashback possible. What lies at the basis of the flashback scenes proposed is a joint-attention triangle formed by the viewer and the camera, who look together, first at the character in the present and then at the events taking place in the past. Ultimately, such scenes can only be understood in terms of blended joint attention, and they also reveal the importance of other cognitive processes at work, namely time compression, viewpoint integration, and identity and analogy connections.
Keywords
1. Introduction
Living in a mass media society, we are constantly consuming audiovisual narratives, and we have become familiarized with them. Therefore, most people have no problem understanding the most common narrative devices used in film and TV shows, as is the case with flashbacks. This storytelling tool consists of ‘an image or a filmic segment that is understood as representing temporal occurrences anterior to those in the images that preceded it’ (Turim, 1989: 1). In other words, speaking in classical narrative terms, it is a retrospection or analepsis (see Genette, 1972: 82). Considering that this device has been employed in (western) literature since its origins (see De Jong, 2014: 135–137; Genette, 1972: 79–80), and that we also use it regularly in our daily conversations, it seems obvious that we have no difficulties understanding it. That is, unless the filmmaker’s goal is to create confusion in the viewer, and therefore he or she presents an unclear narrative on purpose; or in the case of poorly crafted stories, which are not properly constructed for the viewer to follow them successfully. Another fundamental reason for our ability to seamlessly understand cinematic discourse is that we perceive and experience film by means of the same processes and capacities with which we perceive the real world. It is not that our minds have developed special tools or abilities to understand films, but the other way round: through trial and error, film has evolved to adapt to our minds (this idea is a recurring one in the research tradition of cognitive film theory: see, for example, Anderson, 1996; Bordwell, 2010; Carroll, 1996; Messaris, 1994; Persson, 2003; Shimamura, 2013).
Nevertheless, although these comprehension processes may seem very obvious and simple, they are not so, and a fundamental question should be raised: why is it that a canonical movie flashback (i.e. one clearly signalled by verbal cues, gazes, dissolves and the like, and employed in films with a conventional narrative structure) makes perfect sense to our minds? Which are the cognitive mechanisms behind it that allow viewers to understand it? Part of the process may be related to a learning component (a viewer familiarized with film flashbacks may find them easier to understand), but only to a small degree. Exposure to film narratives does have an influence on viewers’ comprehension of them, but learning does not lie at the basis of the process: the greatest weight falls upon the viewer’s natural cognitive and perceptual capacities (Messaris, 1994). In this sense, the flashback device would be indecipherable without human beings’ general capacity for conceptual blending. This ability relies on different mental capacities which human beings share with other mammals, such as attention, memory and perception, but these are not sufficient on their own to explain how we come up with new and complex ideas (Turner, 2014: 57). Ultimately, the viewer can deal with the complexity that he or she encounters in a given flashback because we are capable of conceptual integration. More specifically, this paper will show how certain movie flashbacks can be cognitively explained in terms of a particular integration process, that of blended joint attention.
In order to illustrate how those mechanisms work, and to come up with a suitable account of how some film retrospections are comprehended, three different flashback examples from three movies will be analysed: The Help (Taylor, 2011), Big Fish (Burton, 2003) and Ordinary People (Redford, 1980). The comparison of all three will also show that, although the flashback is a considerably fixed device, there is enough room left for creativity as long as a basis schema (a fundamental structure composed of some elemental constituents, as will be seen later) is established as its foundation and point of departure.
2. Visual cues and joint attention in flashbacks
Most flashback scenes offer some kind of cue to the viewer, so that he or she immediately understands that a leap to the past is taking place. The two examples analysed below employ specific images and editing which function as visual cues that, by shaping joint-attentional triangles, enable the spectator to comprehend the scene he or she is watching.
2.1. The Help (Taylor, 2011)
The Help (Taylor, 2011) tells the story of young journalist Skeeter Phelan and a group of African American maids in the 1960s, during the civil rights movement. By writing a book with the maids’ testimonies about the hardships in their daily work, Skeeter condemns publicly the way white families treat them, and helps them fight for their rights.
One of the flashbacks in the film is introduced right after the scene of Skeeter’s argument with her mother (00:23:00–00:25:50). She has just found out that, while she was away in college, her mother dismissed their lifelong maid, Constantine. Visibly upset, Skeeter leaves the house and crosses the backyard, and that is when the sight of an empty bench makes her remember a conversation she had with Constantine right there, a few years ago. As Figure 1 shows, 1 we first see Skeeter in a medium shot, from the side: she has stopped and she is looking somewhere off-screen (a). The following shot shows an empty bench in the garden (b), and then a third shot goes back to Skeeter again, although this time it is a front-view close-up (c). That is, first we see the protagonist herself, but afterwards we observe the bench with her, adopting her point of view. Finally we see her again but from a different optical perspective. Thus, the combination of shots a, b and c builds an ‘eyeline match’ structure, a classic film technique which contributes to the ‘continuity system’, 2 and which consists of the combination of at least one shot of a character looking at a certain point off-screen and a shot of an object (or another character) towards which the first person looks. This schema is equivalent to the point-of-view (POV) shot as defined by Branigan (1984): a combination of shot A (point/glance), showing the character’s face and gaze direction, and shot B (point/object), which shows the object the character is looking at. Thus, in shot B the viewer adopts the character’s perspective (although he or she does not necessarily take the character’s precise optical vantage point).

Skeeter’s flashback.
The eyeline match, then, is the basis upon which the flashback is built, as the next shot shows that what Skeeter is looking at is already an image of the past, and that it is Skeeter herself who is sitting on the bench, though the scene took place some years earlier (d).
Furthermore, what lies beneath this conventional editing structure, and therefore beneath the construction of the flashback analysed, is a scene of classic joint attention: a situation in which two individuals sharing spatiotemporal coordinates attend simultaneously to a third object, and also communicate about it. That is, they ‘know together’ that they are focused on the same element (Carpenter and Call, 2013: 50) (see section 2.3). Oakley and Tobin (2012: 73) state that ‘all the standard devices of continuity editing are endowed with the logic of joint attention’, and this also seems to be the case of the eyeline match technique, for the viewer follows the character’s gaze, and both subjects end up taking notice of the same object. However, this interpretation poses a problem if we take a closer look at the flashback scene and at the concept of joint attention. It is true that both Skeeter and the viewer pay attention to the same element, but one of the main features of joint attention is not present: the character is not aware of that shared attention, and she does not intend to share anything with the viewer (Tobin, 2008: 24–28). Therefore, the attentional triangle in this case cannot be formed by the viewer, the character and the object. But, since it is clear that the flashback is based on an eyeline match structure, and that a joint-attention logic underlies this kind of construction, the necessary attentional triangle must consist of other components, namely the camera, the viewer and the object observed (Skeeter first, and then the bench). It is the camera eye and the viewer who pay attention to the same thing; the camera guides (and controls significantly) the viewer’s gaze, and thus, by paying attention to a certain object, they are both engaged in a goal-directed activity (and they are conscious of it) (Oakley and Tobin, 2012).
Even though the character (Skeeter) is not one of the subjects of the joint-attentional triangle, her gaze being directed somewhere off-screen also plays an important part in the construction of the flashback. Human beings naturally tend to follow one another’s gaze in order to see what somebody else is looking at (see Tomasello, 1999). Thus, this seemingly universal feature constitutes the perfect basis for the eyeline match/point-of-view structure. This is precisely what Carroll (1996: 128) is pointing at when he affirms that ‘we might think of point-of-view editing as an automatization, via editing, of our own natural perceptual reaction to track a glance to its target’. The viewer, always guided by the camera, follows the character’s gaze and reaches the object being looked at (which is in fact ‘given’ to him or her by the camera, although the viewer may feel that they are completing by themselves a totally natural process). But the film technique goes even further, for, as Carroll states, in point-of-view editing our gaze-following tendency goes beyond its information-gathering purpose and it is used as a communicative device (1996: 129). Although the character in question (Skeeter, in this case) does not take part in a joint-attentional triangle, her gaze is intentionally communicative because the camera (the filmmaker and their crew) has decided to use it for that aim, and it directs the viewer’s attention to an object of importance in the story (the bench), which is in fact the object that will introduce the flashback.
Another factor that underlies the use of an eyeline match in the construction of certain flashbacks could be the relation of the flashback’s content to the character’s memories. By employing an eyeline match schema right before the flashback, the filmmaker clearly shows that the particular object the character looks at is prompting her to recall the past. It is as if Skeeter’s eyes were directly related to memories: they once witnessed certain events, and now they work as ‘projectors’ of those facts (memories), which she keeps in her mind. This idea is somewhat connected to the spatial-temporal metaphor ‘past is in front of ego’ that Coëgnarts and Kravanja (2015: 237) prove to be underneath the construction of several film flashbacks. In short, their claim is that, in flashbacks like the one in The Help where an eyeline match is established, what the character sees corresponds to what she knows, and that reality is located in front of her. As we can only know the past, and not the future, what the character sees in front of her must correlate with the past (2015: 227–228). Thus, ‘character perception is mapped onto the conceptual domain of time’ (2015: 237).
2.2. Big Fish (Burton, 2003)
Looking now at one of the flashbacks from Big Fish (Burton, 2003), we can see that the mechanism used in it is very similar to that from The Help. The film narrates Ed Bloom’s life, which he tells as being full of adventures and uncommon events. But his son, Will, does not really believe all the stories his father has been telling him about his childhood and youth. When Ed gets seriously ill and is close to death, Will seizes his final opportunity to discover the truth about his father’s life.
While flying with his wife from Paris to his hometown in Alabama, Will is very quiet and pensive (00:08:20–00:08:55). From what we already know about the story, we can infer that he is rather concerned about his relationship with his father. He looks at the hand shadows that a child nearby is making, and this moves him to recall his own father projecting hand shadows for him, many years ago, while telling him bedtime stories. As Figure 2 shows, we first have a long shot with Will in the background and, in the foreground, the child he is looking at (a). There is an eyeline match within this shot already. We then see a medium shot of the kid making hand shadows, but the perspective is different: it is as if we were seeing him from Will’s position (b). The following shot is a close-up of Will, his gaze directed off-screen (c); the camera zooms in slightly. Shot d is a big close-up of the kid’s hands, also with a zoom-in, and shot e is another close-up of Will’s face and look (zooming in), thus closing the eyeline match structure. Then, the next shot following this last close-up of Will takes us directly to the past (f): although it also shows a shadow projected with hands, it is a different one and it belongs to a previous moment in the story.

Will’s flashback.
While this scene and the one from The Help share essentially the same structure, there are some meaningful variations in Big Fish that should be considered. First, the eyeline match structure is repeated: instead of using the basic schema of a ‘character shot’ plus an ‘object shot’ (as is the case with The Help), the scene starts with a long shot (a) which already displays an eyeline match, and which functions as the ‘character shot’ for the object in shot b. Then there is another eyeline match between shots c and d, and finally the last one between shots e and f; with this last shot the leap to the past is introduced.
The question is: why does Big Fish repeat the eyeline match structure, if a single pairing of the character and the object is enough to understand the sequence (as The Help shows)? What is it trying to reinforce? A possible explanation comes by also taking into account that the camera keeps zooming in, both when it shows Will’s face and the hand shadows (shots c, d and e). Will’s close-ups in shots c and e are progressively narrowed down to his face, thus bringing the viewer closer to the character and his experience. On the other hand, the zoom upon the kid’s hands (d) should be interpreted as being shown from Will’s perspective. Obviously, Will is not getting physically closer to the kid, and he cannot zoom with his eyes, but what this effect (alongside the repetition of the eyeline match) is trying to represent is Will’s mental state: concentrated and focused on a very particular aspect of his surroundings, he is progressively ‘taken’ by his memories, which are prompted by that external object calling his attention. By employing both a repetition of the eyeline match and a zoom-in effect, the scene underlines the character’s experience and state of mind. The subjectivity of the scene is enhanced by using those particular poetic resources. 3
Also, as it is the case with the flashback from The Help, a joint-attention structure lies behind this flashback scene, again with the camera, the viewer and the object observed (first Will, then the hand shadows) setting up the joint-attentional triangle. Will’s gaze directed off-screen is important as well, just as Skeeter’s is, for it guides the viewer’s attention towards the kid’s hands and, ultimately, towards an episode in the past.
2.3. Blended joint attention: the key to retrospect
Nonetheless, the construction and understanding of the flashback scenes in both films involves much more than just joint attention. The meaning of each retrospective scene is clear and obvious to the viewer, but the mental operations behind it are many and complex.
On the surface of the scenes from The Help and Big Fish, we see a leap to the past within the story, and a further shot-by-shot analysis of each sequence reveals the joint-attention schema as the pivotal structure for the flashback. Yet this is only the tip of the iceberg. The whole idea of retrospection does not come up solely by using a joint-attention structure. The leaps to Skeeter’s and to Will’s childhoods can only be understood as such when a variety of ideas are combined in our minds. There are several packets of meaning which must be brought together in order to produce the idea of retrospection, and just one of those ideas is that of jointly attending to something. In what follows, in order to explain the basic functioning of blended joint attention, the main concepts of blending theory will be outlined first. Secondly, it will be shown how that theoretical approach works in the three flashbacks analysed.
The mental process of combining different meanings in order to come up with a new one is also known as conceptual integration. It is a mental activity that human beings carry out unconsciously and is a fundamental part of almost everything they think, say and do. It is, blending theory argues, the way to explain how human beings think (Fauconnier and Turner, 2002). Blending theory proposes a hypothetical explanation of how human beings cognitively deal with a variety of phenomena. Many situations in everyday life call for the activation of several mental spaces, ‘small conceptual packets constructed as we think and talk, for purposes of local understanding and action’ (Fauconnier and Turner, 2002: 40). These packets of meaning are then projected into a single new mental space (a blended space) that integrates ideas from all the input mental spaces of meaning. Those projections are selective, meaning that not all the content from every mental space is transferred to the blend. Also, there are meanings that emerge in the resulting blended space that were not present in any of the input mental spaces. A convenient example is found in counterfactual statements of the ‘If I were…’ kind. Let us consider, for instance, the following sentence, said by a professor to one of their younger colleagues: ‘If I were you, I’d be working on finishing my book’. There are at least two main inputs, one for each of the professors, and they are projected into the blended space. In it, some selected features of the two persons are integrated: there is the situation of the younger professor, who needs to publish a book in order to attain a position, and there is also the attitude and experience of the senior professor (Grady et al., 1999: 119). Features of ‘I’ and features of ‘you’ are projected, and it is only inside the new blended space that the sentence can be spoken and understood, for only in it are the two viewpoints blended and do they result in a new perspective that combines both professors. Thus, only in the blend is the meaning of the counterfactual statement made available to our minds: it integrates all the relevant information needed to comprehend the statement, but it also presents it in a way simple enough for our minds to grasp its content.
It must be made clear that these kinds of mental operations are not extraordinary processes which our minds need to use on certain occasions that involve particularly complex meanings. In fact, our thinking is always complex, but since we only see the surface of it (its products and not its working mechanisms) we have the impression of simplicity. The truth is that we are constantly integrating different concepts and different mental spaces of meaning in order to come up with new ideas that shape new spaces. There are many examples of everyday activities which are only possible thanks to this capacity for conceptual integration, and some of those daily activities involve both joint attention and conceptual integration, just like the flashback scenes seen above. Take, for instance, the nowadays common activity of having a conversation through text messages. At least two people are talking to each other, paying attention to the same thing (the written messages shown on the cell phone screen, and their content) and communicating about certain matters. But, as obvious as this classical joint-attention scene may seem, its simplicity is only apparent. These two people are not speaking face-to-face, sharing time and space coordinates. They may even be miles apart, and one may read the other’s message several hours after it was written. Yet they are capable of engaging in an activity that requires jointly attending to an object, even if it is a non-physical one, like their topic of conversation. How does this work, cognitively speaking?
There is, to start with, the idea of a local classic scene of joint attention (taking place here and now), which serves as the ground for this other communicative situation that takes place beyond here and now (Thomas and Turner, 2011: 189–200). Classic joint attention is also distinguished by one of the participants drawing attention towards a near object, or pointing out something about it, and the result is the ‘common and interactive attention’ of all the participants (Thomas and Turner, 2011: 190). There is communication about that specific object of attention (see section 2.1), although the process may be nonverbal (see Tomasello and Farrar, 1986). It is because we have the experience of local and immediate joint attention that we can ‘project’ it to other experiences that are not local, not so familiar. In the same way, our experience of face-to-face conversation serves as the ground for the text-message conversation, which doesn’t occur face-to-face. What mental spaces of meaning are brought together, then, in this common texting activity? There are basic, human-scale packets of meaning: the idea of here and now, the idea of face-to-face conversation, the idea of classic joint attention, and so on. And there are also complex sets of meaning that, although easy for us to understand, surpass the here and now coordinates: two people in two different places and moments in time, two different points of view, the ability to acknowledge and adopt someone else’s viewpoint, and so on. All those mental spaces or packets of meaning are integrated, brought together, and the result is a new packet of meaning, a new mental space, which is anchored in a local scene of joint attention, but which goes beyond that. It is a space of blended joint attention, and only in it and from it does the text-message conversation make sense to us. A blended joint attention space is also the key to our understanding of the flashback scenes in The Help and Big Fish.
Going back to the flashback examples, at first glance it seems that they can be easily explained as simple joint-attention scenes. However, if we look closely at them we find that they don’t meet the conditions of direct joint attention. Classic joint attention takes place ‘here’ and ‘now’, whereas the components of the joint-attention triangles in the flashback scenes (the viewer, the camera and the character or object in the movie) are not present in the same place and time. That is why these film flashbacks should be analysed as cases of blended joint attention.
Oakley and Tobin (2012) identify four defined mental spaces that configure the conceptual integration network that is activated in the mind every time a movie is watched: the presentation space, which involves all those elements related to film production and to the way movies tell stories, including the camera’s gaze guiding the viewer (66); the reference space, which refers to the diegesis, the story world shown on the screen (66); and the virtual space, which issues from the integration of the presentation and the reference spaces (67). Those three spaces make up a conceptual integration network that emerges from and in turn affects the ground, defined as ‘the ontological status of the relevant actors in a film-viewing scenario’ (65). The joint-attention triangles that constitute the basis of the two flashback scenes are composed by elements from these four mental spaces: the viewer belongs to the ground; the camera guiding the viewer by using different techniques (close-ups, zooms, POV shots and so on) is part of the presentation space; and, finally, the characters and the objects towards which the viewer’s attention is directed belong to the reference space. The integration of the story told with the cinematic techniques (e.g. Skeeter stopping and remembering in front of the bench, and this being told using eyeline matches and POV editing) results in the virtual space.
All those spaces contain such an enormous amount of information that it is not possible to process all of it at once. Also, not all that information is necessary for us to understand the flashback scenes in terms of joint attention. Only a selection of relevant inputs from each space needs to be projected into a single, blended space, in which the meaning of the flashback emerges and is made accessible to our minds. Basically, the viewer’s attention and act of watching is projected from the ground; the camera’s gaze, directing the viewer, is added from the presentation space; the fictional world displayed on the screen (with the characters and their actions, emotions, etc.) is projected from the reference space; and finally, and most importantly, a classic joint-attention schema in its simplest, human-scale form is needed to understand this vast mental network. All these inputs are blended into a mental space of blended joint attention. Without this cognitive operation, it would be impossible to merge such diverse mental spaces and to make sense of them, and so the flashback scenes would be unintelligible. Blended joint attention operates in such a way as to create a human-scale joint-attention scene whose components don’t share time and space coordinates, and don’t even have the same ontological nature.
Some of the inputs projected to the blended mental space are products of a blend in themselves. That is the case of the viewer, a figure who brings together all the possible actual viewers a film can have. The common characteristics of all those spectators are projected to the blend (those features relevant in a film-viewing scenario), and as a result a general idea of a viewer emerges, one that can be in turn projected as an input in a new blend. In these terms, it is possible to speak of the viewer’s role in the joint-attention triangle without the need to test how a group of particular viewers watch a flashback.
The camera, as one of the components of the joint-attention triangle, is also the product of a blend. We think of it as a conscious agent that controls what the viewer sees, but actually such an intelligence only exists inside a blend. The idea of the camera as an independent subjectivity is the result of combining a variety of participants and their decisions in the movie-production field regarding camera operations. All those individuals’ decisions are blended into one single abstract agent that we call ‘the camera’ (Oakley and Tobin, 2012: 62). The camera as the product of a blend serves in turn as one of the inputs for the blended joint attention space.
Furthermore, as the viewer, the camera and the story world are projected into the blended space, three different perspectives are projected and merged in it as well. The viewer’s point of view is always forced to align itself with the camera’s perspective (otherwise, if he/she refuses to do so, he/she loses their as spectator), and so these two input elements are integrated into one in the blended space. But this doesn’t mean that in the blend the viewer’s POV disappears, subsumed by the camera’s. Both perspectives are merged in the blend, but at the same time they are always distinguishable, and they need each other. The camera’s reason for being lies in guiding someone’s attention, while the viewer cannot be so without a camera showing him the way along the story.
An additional perspective, that of the character, is introduced in the POV shots when the eyeline match structure is closed with the image of an object. In the example from The Help, as Figure 1 shows, the first shot of every eyeline match shows Skeeter from an ‘external’ position (shots a and c), and then the camera and the viewer join her optical perspective in shots b and d. The same technique is used in Big Fish (Figure 2): first we see Will (shots a, c and e), and immediately after each one of those shots the camera and the viewer adopt Will’s point of view, when he looks at the hand shadows (shots b, d and f). These object shots bring together the three perspectives aforementioned: that of the camera, that of the viewer, and the one related to the character in question.
Moreover, both in The Help and Big Fish the retrospection scene combines two different viewpoints from the main character: her perspective from the present moment, while recalling the past, and her past experience of the event which she remembers. In The Help, the flashback shows Skeeter’s relationship with Constantine when she was a teenager, and, while Skeeter’s experience of those moments belongs to the past, at the same time the flashback never stops being linked to Skeeter’s perspective from the present. In the same way, the flashback in Big Fish shows Will’s memories of his good relationship with his dad when he was a kid, which we experience with him. But, simultaneously, we are somehow linked to Will’s point of view from the present throughout the flashback, while he looks back at the events in the past.
Again, in order to make this combination of points of view available to our understanding, they need to be projected into the blend and compressed to a human-scale size. There is no other way to make sense of four perspectives that belong to different mental spaces, and that are not rooted in the same time and space coordinates. They are projected alongside the idea of classic joint attention, and also with the idea of putting oneself in someone else’s shoes (which is also the result of a blend, because we need to project the idea of ourselves, of our own mindedness, and the idea of someone else at the same time, and integrate both) (Turner, 2014: 93).
Finally, there is one last idea closely connected to that of viewpoint compression, and particularly to that of having two different points of view from the same character at once. It is the question of the past: where does it come from?
In The Help and Big Fish, the main character’s attention is suddenly absorbed by a specific object in their surroundings. This object is, eventually, the one that serves as a transition to the flashback scene, that is, to the narration of some past events in the story. But then, how does the viewer know that what he/she is watching is, in fact, a flashback? The introduction of a retrospection within the narrative requires a new narrative space to be set up, and that space needs to be activated by a ‘space builder’ (see Fauconnier, 1994, 1997). That is the function of the bench and the hand shadows in The Help and Big Fish, respectively: each of these objects appears first in the narrative space of the present, and then they reappear in a different narrative space, the past space, thus activating it and linking it to the previous one by means of an analogy connection (see Dancygier, 2008, 2012).
This new space is further developed and establishes identity connections between the characters in it and the ones in the narrative present: Skeeter, a young adult in the present, is now seen as a teenager; and Will, already an adult in the present, is just a kid in the past. There is also a relation of change that links these two narrative spaces, for the characters in them have changed as they got older (‘identity’ and ‘change’ are two of the ‘vital relations’ that, according to Fauconnier and Turner (2002: 91–96), conceptually link elements from different input spaces). In addition to this, a process of time compression is necessary to understand the flashback as such. The retrospection scenes consist of a narrative leap from the narrative present to a particular previous moment in the story (i.e. ‘the past’). But neither the character nor the viewer has to pass through all the events and the time elapsed between the present and the past moments in the story (Fauconnier and Turner, 2002: 317). The flashback transition (that is, the space builder, which sets up the narrative past space and gives us access to it) takes us directly back to a particular moment in the story. However, that new narrative space doesn’t come out of nowhere: it only exists in a blended space, it only emerges as a result of integrating in the blend this new narrative space with the present narrative space and, most importantly, the relation between them, for the past can only be so in relation to the present. For this purpose, the story as a whole needs to be projected as well.
Finally, returning to the blended joint attention schema discussed above, one could ask what the role of the past space in the network is. Considering that the character doesn’t take part in the activity of jointly attending, it can be said that it is again the viewer and the camera who, together, pay attention to the events in the past. The character, however, is added to the triangle by means of parallel attention (see Carpenter and Call, 2013), and so the past becomes the object of attention, either joint or parallel, of the camera, the viewer and the main character.
The analysis of the two scenes from The Help and Big Fish shows that, even though it is easy for us as spectators to understand a basic movie flashback, the device itself is not a simple one in cognitive terms: there are many different mental spaces which need to be activated and blended in our minds in order to understand a flashback. Blending processes are thus essential for the comprehension of this narrative tool. And more specifically, as we have argued, these particular flashbacks need to be understood as products of blended joint attention. But this cognitive operation doesn’t just function as a ‘final touch’ that comes in at the end of the scene and gives meaning to it. Rather, it is a constituent element that lies at the very heart of each flashback, and without it this device could not exist at all.
3. Ordinary People (Redford, 1980): auditory cues in flashbacks
Because cinema is a type of multimodal discourse, that is, one that involves different modalities in its construction, sound also plays an important role in it, and not just that of accompanying or complementing images. Just as a visual cue was the prompt to introduce the flashbacks in The Help and Big Fish, we will now look at an example from Ordinary People (Redford, 1980), where the flashback is introduced by a sound cue.
Ordinary People narrates the disintegration of an upper-middle-class American family after the death of their eldest son in a sailing accident. The youngest son, Conrad, who was with his brother in the boat but survived the accident, is overcome by grief and guilt, and even attempts suicide. Throughout the film, various memories about his brother (Buck) come to his mind, and most dramatically the memories of the accident.
In one particular scene (00:43:20–00:44:55), Conrad arrives home and offers to help his mother, Beth, who is setting the table. He tries to have a meaningful conversation with her, but she avoids it. Then the phone rings, and the mother answers. She laughs at what her friend says, and this laugh makes Conrad remember some happy moments both with his mother and his older brother. As Figure 3 shows, we see Beth in a close-up shot, in the foreground, while Conrad stays in the background of the frame, looking at her (a): again, as was the case with Big Fish, an eyeline match is already established within this shot. The following shot (b) is a medium shot of Conrad looking off-screen, first in his mother’s direction (b1), then somewhere else to his left (b2). The cut from shot a to shot b comes when Beth starts laughing, thus moving the viewer’s attention from her talking to Conrad listening to her. Then, while Conrad turns his gaze somewhere else, the mother’s laugh grows louder and is progressively perceived with an echo effect, and it continues when the scene from the past (shot c) is introduced. Conrad recalls his mother and his brother laughing openly, while he just looks at them (voices are echoed throughout the scene).

Conrad’s flashback.
As was the case with the first two flashbacks analysed, the retrospection in this scene is based on a joint-attention schema and on a series of eyeline matches. Shot a functions as an establishing shot, as it contains the eyeline match upon which the flashback is built: Conrad, as the character that looks, observes his mother, who is the ‘object’ being looked at (and listened to). When she starts laughing, there is a cut to shot b1, a medium shot of Conrad looking in his mother’s direction. He keeps this position very briefly, and then looks somewhere else to his side (b2). Beth’s laugh is heard throughout these two shots, and so the object of Conrad’s attention (now only represented with sound, but this supported by Conrad’s off-screen look towards his mother) is included within the medium shot of the character. Shot b2 works also as the character shot for the eyeline match established with shot c. Beth’s laugh is still the object of Conrad’s attention, and its progressive echoing serves as a transition to the past scene in shot c; that is, it works as a space builder which activates the past narrative space. Also, the laugh can be seen as a metonymical representation of both Beth’s former self (a happier one, which Conrad now misses and longs for), and also of the happiness the family shared in the past.
It is interesting to note how Conrad’s gaze changes its direction within shot b. Is it just an arbitrary change or a meaningful one? Conrad’s look in b1 is directed towards his mother, and we know that because the eyeline match has already been established within shot a. But, as soon as Beth starts laughing, Conrad turns his eyes away from her and, with a somewhat lifeless gaze, looks to his left (b2). The laugh he is hearing provokes in him a turn in his attention, which goes from his mother laughing in the present time to a recollection of a past event. Conrad turns his gaze away (b2) because the laugh that now catches his attention is no longer the one from the present, but a different one that he recalls from some time ago. This change in the character’s gaze direction is a communicative strategy that tells the viewer that the target object of Conrad’s attention has changed too, and so this visual technique works alongside the echo effect to make the transition to the past clear.
Again, this flashback cannot be understood in terms of simple, classic joint attention. It is true that such structure lies behind the retrospection scene, but it is only one of the many mental spaces which are projected into the blended space. Once more, the joint-attention triangle established in the scene is formed by components which don’t belong to the same ‘here’ and ‘now’: the viewer, the camera and the character (as well as the object he looks at) are not accessible from a single local experience of time and space. On the contrary, each of them belongs to a different mental space, namely the ground, the presentation space and the reference space, respectively. As a consequence, the components of the joint-attention scene can only come together, be merged and thus acquire meaning by being projected into a blended space of joint attention.
Compression of several ideas to human-scale size takes place in that blended space in order to make all the relevant information available to our minds. As mentioned above, the idea of the viewer is in itself the product of a blend, which brings together all the potential viewers of a film. Also, our conception of the camera as a conscious agent is only possible inside a blend, where all the decisions regarding camera operating are projected. This new blended space is in turn projected into the presentation space, and that is why we can say that ‘the camera’ guides the viewer. Furthermore, the presentation and the reference space are fused into the virtual space. Only from this position can we say, for instance, that Conrad is seen in a medium shot, or can we accept the actor as a particular character, with its own identity in the fictional world. And only in the virtual space does the echoed laugh make sense, because Beth’s laugh belongs to the reference space, but the effect that alters it is part of the presentation space.
It is also inside this blended space of joint attention that a variety of perspectives are simultaneously available. The camera’s point of view and the viewer’s perspective are always fused into one, since the latter is always controlled by the camera. This is the case with Conrad’s medium shot (b). But, by the time his mother’s laugh is heard, and, above all, when it is perceived with an echo effect, Conrad’s point of view is added to the scene. It is not a visual point of view (we don’t see through the character’s eyes), but an auditory one: we hear Beth’s laugh with Conrad. This is unmistakably his perspective: the camera redirects our attention towards Conrad and, once we are focused on him, his mother’s laugh is echoed and gains auditory prominence. At this point, the scene merges three different perspectives (viewer, camera and character) coming from three diverse mental spaces, and this is only possible by projecting these inputs into a blended space in which this vast, complex network is compressed to human scale, and thus is accessible to our understanding.
Besides, just as with Skeeter in The Help and with Will in Big Fish, Conrad’s viewpoint is not a simple one, but the result of combining his recalling perspective in the present and his experience of the remembered past. When the echoed laugh introduces the flashback and we see Conrad and his mother and brother laughing, we share the happy experience Conrad had in that given moment of his life, but we never leave the Conrad in the present: it is precisely in the combination of the past experience with Conrad’s look from the present that the whole scene makes sense and we understand Conrad’s emotional struggle.
The introduction of the flashback in this particular scene of Ordinary People works in a very similar way to the first two flashbacks analysed, even though in this one the retrospective device is introduced by a sound cue, as opposed to the visual cues employed in The Help and Big Fish. All three examples are built upon a joint-attention structure, and all three also make the most of human beings’ natural ability for gaze-following. By projecting into the blend the basic idea of classic joint attention, the three flashbacks can be understood as joint-attention scenes, although none of them meet the conditions of direct joint attention. It is only inside the blended space that the viewer can merge with the camera’s gaze and also the character’s point of view, and it is only in the blend that ‘the past’ as a narrative space can emerge and be understood as such.
4. Conclusions
Given the three examples examined above, and the main ideas drawn from their analysis, it can be stated that, in order for these flashbacks to be successfully understood by film viewers, they must be constructed on the basis of viewers’ natural communicative abilities. Movies are made for an audience to see them, and so the mechanisms that film discourse employs rely on human beings’ innate communicative behaviours, such as gaze-following and joint attention.
The eyeline match itself, a fundamental film discourse device that can be found in practically every movie, would be inconceivable without our natural capacity to follow other people’s gazes. It is because we are born with this ability (at least with that potential capacity, which is developed in our first months of life) that the eyeline match makes sense in film language (see, for instance, Tomasello et al., 2007). In fact, gaze-following is essential to it. Thus, this eyeline match serves as the structural foundation for the joint-attention schema that supports the flashback scene. Both gaze-following and the human capability to jointly attend with someone else to a particular object are the grounds upon which the flashback device is built.
Having said that, those natural communicative abilities are not sufficient by themselves in terms of film comprehension. The concept of classic joint attention describes a scene of local experience, one that takes place ‘here’ and ‘now’. But this is not the case with a flashback scene, in which different times, places and perspectives are combined. We need to go beyond ‘here’ and ‘now’ by blending the basic idea of classic joint attention with the complex mental network that is activated while watching a flashback. This web is composed of many diverse mental spaces, some of which are in themselves products of a blend: the presentation, the reference and the virtual spaces; the idea of the camera and that of the viewer; ‘the past’ as a narrative space; to name but a few. The information needed from each mental space is projected into the blend, and thus it is compressed and made available to our understanding. Ultimately, we can talk of a joint-attention triangle formed by the viewer, the camera and the character because we blend the mental spaces that correspond to each one of them with the idea of classic joint attention, and so the result is a scene of blended joint attention from which we can access the meaning of the flashback.
However, blending should not be seen as the last ingredient or the final touch in the recipe, like the icing on the cake. On the contrary, it is one of the constitutive ingredients of the flashback, and without it the retrospection scene cannot even be considered. There would be no flashback at all if blended joint attention were not operating in it from the very beginning. Outside the blend, the flashback would not be such, and so it would not make sense.
That said, the analysis of the three selected examples also shows that, as long as the flashback scene is built upon a basic shared schema, there is enough freedom for artistic creativity without the main meaning of the construction being affected. All three flashbacks use the eyeline match technique as their backbone, which in turn relies on our innate capability for gaze-following and joint attention. Upon this shared basis, many variations can be performed in terms of artistic creativity, but the essential meaning of the flashback (that is, the leap from the present time in the story to a previous time) is not altered, and neither is our capacity to easily understand it. That is why the three examples analysed can be explained in relation to joint attention and gaze-following, even though the scenes are not identical, and one can certainly see many differences between them (The Help and Big Fish use extradiegetic music, while Ordinary People does not; Big Fish employs a zoom-in effect; in Ordinary People the flashback is prompted by a sound cue, instead of by a visual one; and so on). In the end, what makes those variations possible is a shared unified and solid basis: a classic joint-attention scene, and the fundamental ingredients for joint attention to take place, all of them projected into a mental space of blended joint attention.
Footnotes
Acknowledgements
I am very grateful to Inés Olza and Cristóbal Pagán Cánovas for their endless support and insightful ideas, and for reviewing earlier versions of this article. I would also like to thank Paul Bentham for proofreading my manuscript, as well as the two anonymous reviewers for their feedback and constructive comments on this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
