Abstract
In the kinetic depth effect, the direction of the perceived depth and the direction of apparent rotation of a 3-D structure are linked, and typically ambiguous, whereas depth from motion parallax during both observer- and object-movement is stable and unambiguous. Rogers and Rogers demonstrated that the vertical perspective transformations play an important role in disambiguating the direction of the perceived depth in parallax-defined surfaces but more recently Nawrot et al. have proposed that pursuit eye movements provide the crucial disambiguating information. Theoretical considerations suggest that pursuit eye movements could not, in principle, provide the necessary information because 3-D objects as surfaces may rotate during observer- or object-movement. The empirical evidence presented here shows that vertical perspective transformations are sufficient for the unambiguous perception of parallax depth whereas pursuit eye movements are not necessary and may not even be sufficient.
Introduction
Motion parallax has long been regarded as an important source of information about the structure and layout of objects and surfaces in the surrounding world (Helmholtz, 1910; Ono & Wade, 2005). The early experiments by Bourdon (1898), Heine (1905), and Graham, Baker, Hecht, & Lloyd (1948) showed that observers are able to detect very small differences in depth between two points or objects when they make side-to-side head movements (observer-produced parallax). In 1959, Gibson, Gibson, Smith, & Flock showed that observers are reliably able to determine the direction and amount of slant in inclined surfaces that translate across the observer’s line-of-sight (object-produced parallax). Gibson also made the important distinction between the motion perspective transformations created by continuous surfaces in the surrounding world and the motion parallax created by isolated points. Additional evidence for the effectiveness of motion parallax as a source of information about the 3-D structure of surfaces came from our own experiments in 1979. We showed that motion parallax transformations could produce vivid and unambiguous impressions of 3-D shape both when the observer moved with respect to the simulated surface and when the simulated surface translated across the observer’s line-of-sight (Rogers & Graham, 1979).
One of the key findings of that study was the lack of ambiguity in the perception of motion parallax surfaces—that is, what was seen as in front and what was seen as behind—and they contrasted this with the ambiguity of the perceived 3-D structure and the perceived direction of rotation in the kinetic depth effect (KDE). They considered this to be surprising given that the pattern of motion displayed on the screen used to simulate the motion parallax was the same as that used to create a KDE of a similar corrugated surface that rotated to-and-fro around a vertical axis through a small angle. They attributed the lack of ambiguity in the motion parallax situation to the translation of the display pattern with respect to the observer (or the translation of the observer with respect to the display pattern). Some years later, Rogers and Rogers (1992) attempted to distinguish between the different sources of both visual and non-visual information that might be responsible for the lack of ambiguity in the motion parallax situation, and the results of those studies are described in detail in the accompanying theoretical paper. They highlighted the role of what Braunstein and Payne (1968) referred to as the “vertical perspective 1 ” changes that are produced when any 3-D object or surface rotates around a vertical axis in front of the observer and they commented that for motion parallax “Real perspective shape transformations … were most effective in disambiguating the display” (Rogers & Rogers, 1992).
In contrast, Nawrot et al. have argued that “the unambiguous perception of depth from motion parallax
Empirical Evidence to Date
The results of Rogers and Rogers (1992) suggest that both extraretinal (non-visual) and visual sources of information can be sufficient to disambiguate the direction of depth from motion parallax, although they also concluded that no particular source was necessary. With respect to the extraretinal information, they did not distinguish between signals that might arise from vestibular or proprioceptive sources, rather than the eye movement system, and it is quite possible that any one of these could have been responsible for the observed reduction of ambiguity in the depth order in their non-visual information condition (b). With respect to the visual sources of information that resulted in reduced ambiguity, they identified two: (a) the vertical perspective changes that would normally be produced by object rotation and (b) the foreground optic flow created by observer translation.
Nawrot’s Experiments
Nawrot et al., however, have argued that one particular source of information—that is derived from pursuit eye movement signals—plays a “crucial role” (Nawrot & Stoyan, 2009, p. 4709) in disambiguating the direction of depth from motion parallax—a claim repeated in their recent article (George, Johnson, & Nawrot, 2013, p. 639). Their conclusions were based on a series of experiments in which they attempted to isolate the different sources of extraretinal information (Nawrot, 2003; Nawrot & Joyce, 2006). Nawrot and Joyce’s (2006) experiments used a paradigm in which they recorded their observers’ perceptions of the 3-D structure of parallax surfaces in both observer-produced and object-produced parallax situations. In particular, they monitored their observers’ depth impressions while they viewed a shearing stimulus with a sinusoidal motion profile on a stationary monitor during side-to-side head movements (Experiment, 1; Condition 2). In this condition, the authors argued that the eye movement signals, generated as a consequence of the head movements, were a combination of vestibularly driven translational vestibular ocular response (TVOR) and visually driven optokinetic response (OKR) or pursuit movements. They found that the perceived depth order in the test stimulus was unambiguous. In a second condition, observers viewed the same shearing stimulus on a monitor that translated along a linear path from side-to-side while the observer remained stationary (Experiment, 1; Condition 3). In this case, only pursuit eye movement signals were generated and the results showed that the perceived depth order was also unambiguous.
The important novel condition (Condition 6) introduced by Nawrot and Joyce (2006) was one in which both the observer and the monitor displaying the shearing stimulus translated from side-to-side in the same direction and in synchrony. Under these conditions, the authors argued that the resultant pursuit eye movements would have been in the same direction as that of the monitor translation, but in the opposite direction to the vestibularly driven TVOR signals. Because the two signals were in opposite directions, the authors were able to distinguish between the roles of vestibularly driven TVOR and pursuit eye movement signals. The results showed that the perceived depth order in this condition was also unambiguous and, importantly, that it was consistent with the direction of the pursuit eye movement signals rather than the vestibular signals; that is, in the opposite direction to the normal parallax depth. The authors (Nawrot & Joyce, 2006, p. 4719) concluded that: “This result means that the visual system
Let us consider these results and conclusions carefully. First, it cannot be concluded that because one source of disambiguating information wins out over another (e.g., Condition 6 in Nawrot & Joyce, 2006), the other source of information is “not used.” It is quite possible that the human visual system uses either or both vestibular and proprioceptive signals about the direction of head translation to disambiguate the depth order but they may not be as powerful as pursuit eye movement signals and, as a consequence, they may lose out when put in conflict. Second, the authors ignored the possible role of visual information in disambiguating the depth order. In their observer-produced parallax condition (2), not only was there both vestibular and pursuit eye movement information available but there was also visual information from both vertical perspective and foreground optic flow (assuming that the experiments were not carried out in complete darkness). Hence, this condition, by itself, tells us nothing about which sources of information are capable of (i.e., sufficient for) disambiguating observer-produced motion parallax transformations.
Similarly, Nawrot and Joyce’s object-produced parallax condition (3) does not tell which of the extraretinal (pursuit eye movements) or visual sources of information (vertical perspective and foreground optic flow) was responsible for the disambiguation, since all three sources of information were present. Neither does it tell us whether each of the different sources is independently capable of providing a disambiguating signal when presented in isolation—that is, which signals are sufficient. Nawrot and Joyce (2006) regard their Condition 6 as providing the crucial test of their pursuit eye movement theory since it pits the TVOR signal against the pursuit eye movements. However, as already pointed out, it cannot be concluded that a source of information is incapable of disambiguating motion parallax transformations (i.e., not sufficient) just because another source of information wins out when they are put in conflict.
George, Johnson, and Nawrot (2013)
George et al. (2013) reported the results of an experiment in which they attempted to test the idea that vertical perspective is capable of disambiguating the perceived depth from motion parallax in an object-produced parallax situation. To do this, they simulated a single cycle of a sinusoidal-corrugated surface that translated across a stationary display screen in front of the observer. To create pursuit eye movements, the simulated surface (8.9° × 8.9°) moved across the screen from the center to either the left or the right at 12 deg s–1. Observers were asked to report whether the “peak” of the simulated surface was above or below the central fixation spot. As the surface translated across the screen it would, of course, create a vertical perspective transformation—changing in angular shape from a rectangle (at the center of the screen) to a trapezoid at its far left or right location (their Pursuit/Vertical-perspective (PV) condition). In the important test condition Pursuit/No-vertical-perspective (PNV), the simulated translating surface underwent a transformation so that its angular shape remained the same to the observer during the translation, thereby simulating a surface that rotated during translation so that it always faced the observer. The authors reported that the perceived depth was unambiguous when the surface created a vertical perspective transformation consistent with its translation (PV condition)—as would be expected. However, they also reported that the perceived depth was unambiguous when there was a “null” vertical perspective transformation (i.e., there was no angular shape change during translation)—their PNV condition. George et al. also included two KDE-type situations in which the same shearing stimulus remained stationary at the center of the screen. The authors reported that the perceived depth of the simulated corrugated surface was ambiguous both when there were no vertical perspective transformations (No-pursuit/No-vertical-perspective (NPNV) condition)—as would be expected—but also when the display was subject to a vertical perspective transformation that simulated a to-and-fro rotation of the corrugated surface around a vertical axis (No-pursuit/Vertical-perspective (NPV?) condition).
At first sight, these results seem to provide strong evidence in favor of the pursuit eye movement hypothesis and against the vertical perspective hypothesis. There are, however, a number of problems with the experiment. First, the display simulated just a single sideways translation of the surface, rather than the multiple side-to-side movements used in previous studies. As a consequence, the results were only able to capture what observers perceived on their initial viewing of the translating stimulus and they did not capture the pattern of reversals that might have occurred during multiple side-to-side movements. Second, the results of the KDE situation in which there was a vertical perspective transformation to signal the direction of rotation of the surface but the shearing pattern did not move across the screen (NPV condition) clearly show that their particular implementation of a vertical perspective transformation was inadequate. Whether the inadequacy of the vertical perspective transformation was due to the characteristics of the particular display 2 used by George et al. or whether it was due to the small size of the display 3 cannot be determined on the basis of their results. 4 To try to answer this question, we also measured the extent of ambiguity in a shearing surface that had the same angular size as that used by George et al.—8.9° × 8.9°—and with the same vertical perspective transformation simulated in their experiment— ±20° of rotation. Third, it should be noted that they did not directly test whether the presence of a vertical perspective transformation by itself (i.e., be sufficient) could disambiguate the perceived depth order of a parallax surface.
Mitsudo and Ono’s Experiment
Mitsudo and Ono (2007) have argued that the results of their experiment support Nawrot’s claim that pursuit eye movement signals play a role in disambiguating the depth of motion parallax surfaces but only within their own “additive” model in which “… retinal and pursuit signals are summed ….” As the authors point out, this is equivalent to using head-centric rather than retinal velocities as the source of disambiguating information. The crucial condition in their experiment was when the common retinal velocity of the four rows of translating random dots was similar to the amplitude of the pursuit eye movements in the opposite direction. The “additive” model predicts that depth should be ambiguous at this point and that is what they found. However, at the point where the pursuit eye movements in one direction are matched by a common retinal velocity of motion in the opposite direction, the parallax transformation between the rows remained stationary on the screen and also with respect to the observer’s head. At this point, there are also no vertical perspective changes and therefore the same result (i.e., ambiguity) would be predicted if observers were using vertical perspective information to disambiguate the parallax transformation. Mitsudo and Ono argue that this would be unlikely given the low dot density and the absence of a distinct boundary of their dot pattern but the fact remains that the configuration of dots as a whole would have created changes in vertical perspective in conditions where they found a lack of ambiguity and no changes in vertical perspective where they found ambiguity.
The Unanswered Questions
The theoretical considerations described in the accompanying paper show that the only source of information that can reliably disambiguate both the depth order, as well as the amount of depth in motion parallax-defined surfaces, is information about object or surface rotation. This does not mean that the visual system might not also use other sources of information (both extraretinal and visual) to disambiguate parallax transformations even though they may not always provide the correct answer under all conditions.
These considerations suggest that two sorts of experiments need to be done. First, to answer the sufficiency question, we need to investigate whether each potential source of information is capable of disambiguating the depth of a motion parallax surface when presented in isolation from other cues. Not only do such experiments tell us about the sufficiency of a particular source of information but they also tell us about the lack of necessity of other sources of information that are not present. Second, we need to investigate the relative effectiveness of different extraretinal and visual sources of information when they are put into conflict. The results of these experiments give us some idea of the relative strength or effectiveness of the different possible sources of disambiguating information.
Experiment 1: The Effectiveness of Vertical Perspective Transformations in Disambiguating the Depth in the KDE
The purpose of Experiment 1 was to determine the effectiveness of vertical perspective transformations for disambiguating the depth and direction of rotation of 3-D surfaces that rotate to-and-fro around a vertical axis. The stimuli for this and the subsequent experiments were three cycles (0.15 cycles/deg) of a sinusoidal shearing pattern of random texture, similar to those used by Rogers and Graham (1979), and displayed at a distance of 57 cm (Figure 1).
(a) The discrete, outer images of the sinusoidally shearing stimulus shown both with (lower pair) and without (upper pair) a vertical perspective transformation. The continuous transformation of the random texture simulated a sinusoidally corrugated surface that rotated to-and-fro around a vertical axis through a small angle (±6.5°). (b) In Experiment 1, both the observer and the display monitor remained stationary.
The sinusoidal shearing transformation was chosen because it mimics the motion parallax transformation that would be produced by a horizontally oriented, sinusoidally corrugated surface when an observer makes side-to-side head movements. In Experiment 1, the shearing pattern was displayed in front of the observer at an oscillation frequency of 0.33 Hz while he or she remained stationary. Under these conditions, the direction of depth in the 3-D structure is inherently ambiguous and corresponds to a small angle KDE oscillation around a vertical axis under parallel projection.
The display size was 28° × 20° and it is important to point out that most traditional KDE displays of this angular subtense do not correspondence to any real world 3-D surface rotating to-and-fro around a vertical axis because there would always be a vertical perspective transformation to accompany the horizontal shearing motions (Braunstein, 1976, 1977; Braunstein & Payne, 1968). In other words, it is actually a contradiction to talk about the parallel projection of a rotating 3-D surface or object that has an angular extent greater than a few degrees of visual angle (Rogers, 1993). For present purposes, however, such a display serves as a useful control because the depth of the corrugations is inherently ambiguous. The observer’s task was to press one key when he or she saw the corrugation immediately above the center-line of the display as a peak, and a second key when he or she saw the corrugation immediately above the center-line of the display as a trough. Each trial lasted for 45 s and we subsequently calculated the percentage of the time that the corrugations were seen in each of the two depth orders, as well as recording the number of depth reversals.
There is no reason to suppose that the results from the “control” trials would be anything other than chance, that is, the perception of the corrugation above the center line as a peak 50% of the time and as a trough 50% of the time, since the stimuli are inherently ambiguous with respect to the sign of the depth as well as the direction of perceived rotation. To test for the effectiveness of vertical perspective transformations in disambiguating the depth of 3-D surfaces, an additional vertical perspective transformation was added in the experimental conditions. Such a transformation uniquely specifies the direction of rotation of the 3-D surface as it oscillated to-and-fro around a vertical axis. The amount of vertical perspective was varied parametrically and corresponded to to-and-fro rotations of the surface through either: ±1.625°; ±3.25°; or ±6.5° with respect to the frontal plane. The lower pair of images in Figure 1(a) shows the extent of the vertical perspective transformation for a ±6.5° to-and-fro rotation of the simulated surface. If vertical perspective is an effective source of information for disambiguating the depth structure of these oscillating surfaces, we would expect observers to report that one particular depth profile (e.g., a peak above the center line) would be seen for more of the time than the opposite depth profile. In the control trials, the information is inherently ambiguous and there is no “correct” answer on any given trial as to whether it should be seen as a peak or a trough. In the experimental trials, where there was additional vertical perspective information, there were two possible linkages between the shearing motion and the vertical perspective transformation and therefore two different kinds of trials. In one of these, the corrugation above the center line should be seen as a “peak” and in the other as a “trough.”
Methods
In both the control and experimental conditions, the same shearing transformation was presented on an LCD monitor at 57 cm viewing distance (visual angle 28° × 20°). Screen resolution was 1,024 × 768 pixels. The transforming surface was a 50% density black and white random texture with an average element size of 0.25°. Subpixel interpolation was used to create smooth and continuous motions of both the shearing and vertical perspective transformations. Viewing was monocular. Observers were asked to press the appropriate key to indicate whether the corrugation immediately above the center-line of the display was seen as a peak or a trough. For each of the seven observers (including the author), the four different conditions were each repeated four times in a randomized order. In the three experimental conditions, the percentage of time corresponds to the average percentage time that responses were consistent with the vertical perspective information in the two different couplings of perspective transformation and shearing direction.
Results and Discussion
In the absence of additional vertical perspective information (control condition), the percentage of the time that observers reported the corrugation above the center line to be a “peak” was close to 50% as would be expected (Figure 2, left-hand bar). However, introducing a vertical perspective transformation that corresponded to just ±1.625° of rotation biased the percentage of time significantly compared with the control condition and the introduction of a transformation corresponding to the maximum ±6.5° of rotation, almost completely abolished the ambiguity. This can be seen in the differences between the second, third, and final columns of Figure 2(a).
5
(a) The percentage of the time that responses were consistent with the vertical perspective information in each of the four conditions. In the absence of vertical perspective (left-hand bar), there is no correct or “consistent” response, since the depth is inherently ambiguous. In this case, the percentage of responses is simply the percentage of time that the corrugation above the center-line was seen as a “peak.” In the experimental conditions, the percentage of responses corresponds to the percentage of the time in which the depth was seen to be consistent with the vertical perspective information, averaged over the two different couplings of perspective transformation and shearing direction. The right-most bar shows that presence of vertical perspective corresponding to ±6.5° rotation rendered the KDE stimulus unambiguous. Error bars show ±1 standard error of the mean. (b) The percentage of time that responses were consistent with the vertical perspective information plotted using the same conditions (stimulus size and angle of simulated rotation) as George et al. (2013). In the presence of vertical perspective (right-hand bar), 76% of the time observers’ responses were consistent with the information provided by vertical perspective. KDE = kinetic depth effect.
The results of Experiment 1 are not surprising and merely replicate the findings originally reported by Braunstein et al. using continuously rotating objects and those by Rogers and Rogers (1992) in which they simulated the transformations of a sinusoidally corrugated surface that oscillated to-and-fro around a vertical axis. The importance of this particular demonstration of the effectiveness of vertical perspective transformations in disambiguating the depth of KDE stimuli is that (a) the stimuli (random-textured pattern), (b) the depth structure (sinusoidal corrugations), (c) the angular subtense of the display, and (d) the simulated rotation amplitude (up to ±6.5 deg), are all exactly the same as those used in the subsequent motion parallax experiments (2 and 3), which are described later.
It is important to note that the angular size of the rotating or oscillating surface is crucially important. It has to be true that the vertical perspective transformations created by a rotating or oscillating surface that subtends only a small visual angle would be too small for the visual system to detect. Likewise, the extent of the vertical perspective transformation will depend on the extent of the simulated rotation to-and-fro around a vertical axis, and here it is worth noting that we were very conservative in simulating a maximum of just ±6.5° of rotation with respect to the frontal plane. In other words, the vertical perspective transformations were in no sense exaggerated in order to obtain these results.
Exactly the same considerations apply when considering the equivalent “differential perspective” transformations present in binocular stereopsis. In this case, the discrete “angle of rotation” between the two binocular views of a surface corresponds to the vergence angle of the eyes—the “included angle” (Howard & Rogers, 1995). Bradshaw, Glennerster, and Rogers (1996) reported that the differential perspective information was only effective for depth scaling when the display size subtended 10° in diameter or more. Our ongoing experiments suggest that the minimum size of display for disambiguating depth in oscillating KDE situation may be 5° or even less.
To compare our results with those of George et al. (2013), an additional condition was investigated. To replicate the conditions of their experiment, the size of the random texture pattern was reduced to just 8.9° × 8.9°, and the extent of vertical perspective transformation was equivalent to a rotation through ±20°. In the absence of vertical perspective, observers’ responses were close to chance (Figure 2(b), left-hand bar). In contrast, the percentage of time that observers’ responses were consistent with the vertical perspective transformation was 76% (Figure 2(b), right-hand bar). This result shows that the presence of vertical perspective transformation with a magnitude that is the same as that used in George et al. experiments is, in fact, sufficient to disambiguate the perceived direction of rotation of a simulated KDE surface for most of the time. This suggests that either (a) the simulation of the vertical perspective transformation in their experiments was inadequate for some technical reason or (b) the single sideways movement of the shearing surface was not sufficiently long for the visual system to utilize the vertical perspective information. Either way, the results of this additional condition make the point that their particular display conditions were inadequate as a test of the effectiveness of vertical perspective information or of the relative strength of pursuit eye movements in disambiguating motion parallax transformations.
Experiment 2: The Effectiveness of Pursuit Eye Movements in Disambiguating Depth in Object-Produced Motion Parallax
Rogers and Graham (1979) reported that the perceived depth in a display that simulated the motion parallax transformation created by the surface translating across the observer’s line-of-sight (object-produced motion parallax) was unambiguous. In the object-produced parallax situation investigated by Rogers and Graham, the translating surface effectively underwent a rotation with respect to the observer’s line-of-sight because the display screen translated along a straight line path. As a consequence, it created a vertical perspective transformation. As the surface translated to the left, the angular subtense of the left-hand edge of the surface became smaller than that of the right-hand edge (Figure 3(a)). Note that if the surface translates along a path whose length subtends ±6.5° at the observer’s eye, the effective angle of rotation of the surface (with respect to the observer) travelling along a straight line path is ±6.5°. Experiment 1 demonstrated that the human visual system is capable of using a vertical perspective transformation of the same angular extent and the same amount of simulated rotation to disambiguate an oscillating KDE surface and therefore it is quite possible that vertical perspective could play a similar role in disambiguating object-produced motion parallax surfaces.
(a) In a normal, object-produced parallax experiment (Rogers & Graham, 1979), the display monitor translated along a straight line path in front of the observer, and the table-top surface was visible. This created a vertical perspective transformation corresponding to the change in the obliqueness of the surface as it translated laterally. (b) In the three experimental conditions of Experiment 2, the display monitor was mounted on a platform that rotated about a vertical axis under the observer’s eye so that the display monitor always faced the observer as it moved from side-to-side (b). This ensured that there were no changes in the vertical perspective of the displayed pattern. The surface of the table-top was not visible.
In the object-produced parallax situation, Nawrot et al. have argued that the only important and necessary source of information used to disambiguate the depth of parallax surfaces is the pursuit eye movements needed to track the translating surface. As a real 3-D surface moves to the left, leftward (anticlockwise) pursuit eye movements are needed to track it. Thus the parts of a real 3-D surface that are “in front” will move at a faster speed than the rest of the surface and the parts of the surface that are “behind” will move at a slower speed than the rest of the surface. From Nawrot’s point of view, the object-produced parallax situation makes a straightforward prediction about the role of pursuit eye movements since they are not reduced in amplitude by TVOR eye movements, as they are in observer-produced parallax (see Experiment 3 later). However, as was pointed out earlier, this linkage between the direction of pursuit eye movement and the direction of the depth from motion parallax only holds if we assume that the surface or object translates along a straight line path in front of the observer.
The problem with previous experiments that have investigated object-produced parallax is that they have all mimicked the situation of a 3-D surface translating along a straight line path in front of the observer. This includes our original study as well as Nawrot’s 2003 and 2006 experiments. It is a problem because when a surface translates along a straight line, frontal path it generates the vertical perspective changes that are of a magnitude that we have already shown to be capable of disambiguating the direction of depth in the KDE situation. The obvious solution is to simulate a surface that translates along a curved path (with the center of curvature of the path below the observer’s eye) so that the surface always faces the observer (Figure 3(b)). This ensures that there are no confounding vertical perspective changes 6 and therefore allows a pure test of the pursuit eye movement hypothesis (Rogers, 2012). If the potential ambiguity of the shearing transformation on the display screen is resolved by pursuit eye movements, as Nawrot claims, the perceived depth should be unambiguous—that is, those parts of the shearing pattern that move in the same direction as the movement of the surface should be seen as “in front” and vice versa. On the other hand, if the ambiguity of a shearing transformation on the display screen is resolved primarily by vertical perspective information, the perceived depth should be ambiguous. As a consequence, this experiment provides a very powerful direct test of the pursuit eye movement hypothesis.
To test for the relative effectiveness of pursuit eye movements and vertical perspective transformations, a control and three different experimental conditions were studied. In the control condition, the display monitor translated along a straight line path in front of the observer (Figure 3(a)). In the first experimental condition, the display monitor translated along a curved path (with the center of curvature of the path at the observer’s eye) so that the surface always faced the observer (Figure 3(b)). The monitor displayed the same shearing pattern with the same sinusoidal motion profile as in Experiment 1 (see Figure 1(a)). As the observer tracked the monitor, the presence of pursuit eye movements should render the perceived depth unambiguous, according to the pursuit eye movement hypothesis. On the other hand, if the vertical perspective changes are an important determinant of perceived depth order, the perceived depth should be ambiguous because there are no changes in vertical perspective.
To determine the relative strength of the different sources of disambiguating information, two further experimental conditions were investigated. In the first case, a vertical perspective transformation, equivalent to a rotation of the surface with respect to the observer’s line-of-sight through 6.5° in a clockwise direction (from above), was added to the shearing pattern on the display screen as it translated to the left, and in a counterclockwise direction for the translation to the right. In other words, this would simulate a real surface that translated along a straight line path in front of the observer, as in the original Rogers and Graham (1979) experiments. In the second case, a vertical perspective transformation equivalent to a rotation of the surface through 6.5° in an anticlockwise direction (from above) was added to the shearing pattern on the display screen as it translated to the left, and vice versa for translation to the right. This would simulate a real surface that translated along a markedly concave path in front of the observer. According to the pursuit eye movement hypothesis, the depth in both these conditions should be unambiguous and in the same direction as the depth in Condition 1—that is, those parts of the shearing pattern that move in the same direction as the movement of the surface should be seen as “in front” and vice versa.
In contrast, the vertical perspective hypothesis predicts that the perceived depth should be ambiguous in the first experimental condition (where there were no vertical perspective changes) and that when vertical perspective is added in the two additional experimental conditions, the depth should be seen as unambiguous but in opposite directions in the two cases.
Methods
The same LCD monitor and random-textured surface were used to display the object-produced motion parallax transformation as in Experiment 1. In both the control and three experimental conditions, the shearing pattern displayed on the screen was linked to the movement of the display screen at a rate of 0.33 Hz (a side-to-side movement taking 3.0 s). In the control condition, the screen moved along a linear path (Figure 3(a)) while in the three experimental conditions, the display screen was mounted on a platform with its center of rotation under the observer’s eye (Figure 3(b)). In this way, the screen always faced the observer as it moved from side-to-side. In both cases, the observer’s head rested on a fixed chinrest and the observer was asked to track the display as it moved from side-to-side. In the three experimental conditions, the experiment was carried out in complete darkness, apart from the pattern on the display screen, to ensure that there was no additional visual information about the movement of the screen.
In all four conditions, the screen moved through a distance of ±6.5 cm, corresponding to an angular displacement of ±6.5° at a viewing distance of 57 cm. Viewing was monocular and the equivalent disparity of the simulated corrugations was 21 arc min (∼3.3 cm peak-to-trough depth at 57 cm). As in Experiment 1, observers were asked to press the appropriate key to indicate whether the corrugation immediately above the center-line of the display was a peak or a trough. For each of the seven observers (including the author), the control and three experimental conditions were each repeated four times in a randomized order; twice with one coupling of the display movement direction to the shearing pattern and twice with the opposite coupling. In the graph that follows, the “percentage of time” corresponds to the average time that the “peak” or “trough” responses were consistent with the pursuit eye movement information as a function of the two different couplings of display movement and shearing direction.
Results
The results are shown in Figure 4(a). The percentage of the time during each 45 s trial that observers saw the corrugation above the center-line as a peak rather than a trough was calculated, and the percentage of the time that responses that were consistent with the pursuit eye movement information was averaged over the observers and repetitions. In the control condition, the perceived depth of the corrugations should be completely unambiguous—left-hand bar. Observers saw the corrugation above the center-line as a peak 100% of the time when the texture above the center-line moved in the same direction as the display movement and 100% of the time as a trough when the texture above the center-line moved in the opposite direction as the display movement. This is not surprising, since the experimental situation was essentially identical to that used by Rogers and Graham (1979).
(a) The percentage of time that responses that were consistent with the pursuit eye movement information for both the control and three experimental conditions. In normal object-produced parallax (left-most bar), depth was completely unambiguous. In this case, both the vertical perspective and pursuit eye movements signaled the direction of the 3-D corrugations. In the absence of vertical perspective (second bar), for <50% time observer responses were consistent with the pursuit eye movement information. In the third bar (consistent information), for ∼90% of the time observer responses were consistent with both the pursuit eye movements and the vertical perspective information. For the final bar (contradictory information), the percentage of time that responses that were consistent with the pursuit eye movements fell to less than 20%. In other words, the perceived depth was consistent with the vertical perspective rather than the eye movement information. (b) The percentage of time that responses were consistent with the pursuit eye movement information is plotted as a function of the presence or absence of vertical perspective information using the same conditions (stimulus size and angle of simulated rotation) as George et al. (2013). In the absence of vertical perspective (first bar), responses were not significantly different from chance. However, when vertical perspective was added (second and third bars), observers’ responses were consistent with the vertical perspective information even when the pursuit eye movements signaled the opposite depth order in the contradictory condition. Error bars show ±1 standard error of the mean.
The results of first experimental condition (Figure 4(a), second bar) in which there was no vertical perspective information provide the most powerful test of the pursuit eye movement hypothesis. The perceived depth should be unambiguous according to the pursuit eye movement hypothesis because the same eye movements would be initiated as in the “normal parallax” (i.e., control) condition. Trials in which the shearing motion of the texture above the center line was in the same direction as the display movement should be seen as “in front” and those in which the shearing motion of the texture above the center line was in the opposite direction to the display movement should be seen as “behind.”
The second bar in Figure 4(a) reveals that this is not the case. For both pairings of the shearing motion to the display movement, the perceived depth was essentially ambiguous. In fact, there was a small bias in the opposite direction to that predicted by the pursuit eye movement hypothesis. Although incompatible with the predictions of the pursuit eye movement hypothesis, this pattern of results is entirely consistent with the absence of vertical perspective and thus its role in disambiguating motion parallax transformations. Further evidence to support this interpretation of the results comes from the two additional conditions. When vertical perspective information was added to the same stimulus in the consistent condition, observer responses were biased in the direction of the vertical perspective information (third bar). In this case, the vertical perspective information uniquely specified the direction of rotation with respect to the observer as “clockwise” when the display translated to the left (thereby simulating a straight line path as in normal motion parallax).
However, when the sign of the vertical perspective information was reversed so that it signaled a rotation of the surface as “counter-clockwise” with respect to the observer when the display translated to the left (simulating a very concave path), the vertical perspective provided contradictory information for disambiguating the depth order. In this case, observer responses were biased in the opposite direction (final bar); that is, consistent with the vertical perspective information and therefore inconsistent with the information provided by the pursuit eye movement signals.
Although it is not surprising that observers’ responses showed a lack of ambiguity in the consistent condition, it is important to note that in the contradictory condition, observers’ responses were overwhelming consistent with the vertical perspective information and in the opposite direction to that predicted by the pursuit eye movement hypothesis. There does not appear to be a significant preference on the part of the visual system for straight line over concave paths. It should also be noted that not only were the observers’ depth responses consistent with the vertical perspective information but their perceptions of the direction of rotation of the surface were also consistent with that information. Observers reported seeing the corrugated surface rotating with respect to their line-of-sight as it translated to-and-fro in front of them. Moreover, many observers noted that after several side-to-side movements of the screen, they were barely aware of the fact that the display screen was moving even though they were always tracking the display on the screen.
To compare our results with those of George et al. (2013), an additional condition was investigated. To mimic the conditions of their experiment, the random texture pattern was reduced in size to just 8.9° × 8.9°, instead of the 28° × 20° display used in the experiments reported here. As in the previous conditions of Experiment 2, the display screen translated along a circular path in front of the observer so that the screen always faced the observer (Figure 3(b)). Under these conditions, there was no change in the vertical perspective created by the translating surface, thereby providing a powerful test of Nawrot’s pursuit eye movement hypothesis. It could be argued that the failure of George et al. to find an effect of varying the vertical perspective information in their PNV condition was due to the small size of the display that they used (8.9° × 8.9°). By reproducing the characteristics of their display, we can make two critical predictions. First, if pursuit eye movements are a sufficient source of disambiguating information, the perceived depth of the simulated corrugations should be unambiguous because observers were required to track the surface to-and-fro as is translated in front of the observer. Second, if vertical perspective transformations are an insufficient source of disambiguating information in displays of this small angular extent, we would expect that the addition of those transformations to the display should have no effect on the ambiguity (or lack of ambiguity) in the perceived depth. The additional vertical perspective transformation added to the shearing display was equivalent to that used by George et al. (i.e., simulating a ±20° rotation).
The results are clear. The perceived depth was ambiguous in the absence of vertical perspective changes, and observers reported frequent reversals during the 45 s trials. For 47% of the time, observers’ responses were consistent with the pursuit eye movement predictions and for 53% of the time they were in the opposite direction. Pursuit eye movements, in isolation, do not seem to be an effective source of disambiguating information. However, when the vertical perspective transformations were added to simulate the rotation of the surface in either a clockwise or counterclockwise direction (during translation to the left and vice versa), the reversals were far fewer and for most of the time observers reported that the peaks of the corrugation were in accordance with the direction of rotation specified by the vertical perspective transformation (Figure 4(b)).
Seen together, the results of Experiment 2 provide two clear answers to our questions. First, pursuit eye movements do not appear to be either a necessary or a sufficient condition to disambiguate the depth in object-produced motion parallax surfaces. Second, vertical perspective transformations that specify the direction of rotation of a simulated parallax surface with respect to the line-of-sight have the capacity to disambiguate the depth order of object-produced motion parallax in a similar way that they were shown to disambiguate the depth order for the KDE (Experiment 1). Moreover, they were effective in this respect even when the display subtended just 8.9° × 8.9°.
Experiment 3: The Effectiveness of Pursuit Eye Movements in Disambiguating Depth in Observer-Produced Motion Parallax
Rogers and Graham (1979) reported that the perceived depth was unambiguous in a display that simulated the motion parallax created by the side-to-side translation of the observer in front of a stationary 3-D surface (observer-produced parallax). This was in spite of the fact that the shearing pattern they used to simulate the parallax was identical to that created by a KDE display portraying a 3-D surface that oscillated to-and-fro around a vertical axis under parallel projection (Rogers & Graham, 1979). While they were correct in their observation of ambiguity in the KDE situation and a lack of ambiguity in the parallax case, they failed to realize that they were comparing a KDE transformation under parallel projection with a parallax transformation under perspective projection. Had they looked at the KDE under perspective projection, they would have seen that this would have been similarly unambiguous, as demonstrated in Experiment 1. In the motion transformations reaching the observer’s eye in the observer-produced motion parallax case, the translating surface effectively undergoes a rotation with respect to the observer’s line-of-sight (Figure 5(a)) and, as a consequence, creates a vertical perspective transformation—as the observer moves to the left, the angular subtense of the right-hand edge of the surface becomes smaller than the left-hand edge because the former is farther away. Moreover, if the observer moves through a distance of ±6.5 cm at a viewing distance of 57 cm, the effective angle of rotation of the surface with respect to the observer is ±6.5°. Hence, the vertical perspective transformation created by a moving observer should be capable, in principle, of disambiguating the sign of the depth signaled by the shearing transformation. Experiment 1 demonstrated that the human visual system is capable of using vertical perspective information of precisely this magnitude to disambiguate oscillating KDE surfaces and so it is quite possible that vertical perspective could play a similar role in disambiguating observer-produced motion parallax surfaces.
(a) In a normal, observer-produced parallax situation (Rogers & Graham, 1979), the observer moved his or her head along a straight line path in front of the observer, and the table-top surface was visible. (b) In the three experimental conditions, the display monitor was mounted on a platform that rotated about a vertical axis through the center of the screen. The observer’s chinrest was also mounted on the platform so that his or her side-to-side head movement would cause the display to remain facing the observer. The table-top surface was not visible.
In contrast, Nawrot et al. have argued that the “crucial” source of information used to disambiguate the depth of parallax surfaces is the pursuit eye movements that are initiated. Moreover, they argue that pursuit eye movements are not only a sufficient source of disambiguating information but that they are also necessary—a stronger claim. In the case of observer-produced motion parallax, it is easy to see how pursuit eye movements could fulfil this role. As the observer moves to the left so that rightward (clockwise from above) pursuit eye movements are initiated, the parts of the surface that are “in front” will move to the right with respect to the display screen and the parts of the surface that are “behind” will move to the left. However, as was pointed out earlier, this linkage between the direction of pursuit eye movement and the direction of the depth only holds if we assume that the surface remains stationary (i.e., does not rotate) in front of the observer while he or she is moving. There is an additional consideration for the pursuit eye movement hypothesis in this situation that needs to be borne in mind. As Nawrot (2003) pointed out, an observer movement to the left will initiate vestibularly driven TVOR signals to the right (clockwise as seen from above). This means that the amplitude of the additional pursuit eye movement signals needed to maintain fixation on the screen is necessarily smaller and therefore potentially less effective as a disambiguating signal than that produced in the object-produced parallax situation investigated in Experiment 2. Thus one might expect that the disambiguating role of the pursuit eye movement signals would be weakened in the observer-produced parallax situation.
Previous Experiments
The problem with previous experiments that investigated observer-produced parallax is that the predictions of Nawrot’s pursuit eye movement hypothesis and those based on vertical perspective information are confounded—side-to-side movements of observer create both pursuit eye movements as well as the vertical perspective changes that we have already shown are capable of disambiguating the direction of depth in a KDE situation. The obvious way of distinguishing between the predictions of the two hypotheses is for the display screen to rotate (around a vertical axis through its center) as the observer moves from side-to-side, so that it always faces the observer (Rogers, 2012). This ensures that there are no confounding vertical perspective changes 7 and therefore provides an important test of the pursuit eye movement hypothesis. However, it should be noted that while this experimental manipulation eliminates vertical perspective changes, the observer’s movements from side-to-side still produce both proprioceptive and vestibular signals (as well as pursuit eye movements) and those signals may also be used to disambiguate the motion parallax information (Rogers & Rogers, 1992). As a consequence, Experiment 3 is not as strong as a test of Nawrot’s pursuit eye movement hypothesis since any bias in the results away from complete ambiguity could be attributed to the presence of proprioceptive and vestibular signals rather than to the pursuit eye movements needed to maintain fixation. It is, however, a powerful test of the relative effectiveness of vertical perspective versus non-visual information when these are put into conflict.
Methods
The same LCD monitor and random-textured surface were used to display the observer-produced motion parallax transformation as in Experiment 1. The shearing pattern of relative motion depicting the sinusoidal corrugations was also identical in both corrugation frequency and amplitude. The equivalent disparity of the simulated corrugations was 21 arc min (∼3.3 cm peak-to-trough depth at 57 cm). In contrast to Experiments 1 and 2, the shearing pattern displayed on the stationary screen was driven by the observer’s side-to-side head movements through ±6.5 cm. These were typically at a rate of between 0.33 and 0.5 Hz (a complete to-and-fro movement taking 2–3 s). In the control condition, the monitor remained fixed in position as the observer moved his or her head from side-to-side (Figure 5(a)). In the three experimental conditions, the display screen rotated so that it always faced the observer. This was achieved by mounting the monitor on a platform that rotated around a point below the center of the screen (Figure 5(b)). The observer’s chinrest was fixed to the other end of a platform that was mounted on free-running castors so that the observer moved the platform and the display monitor as he or she moved from side-to-side. The experiment was carried out in complete darkness, apart from the display screen, to ensure that there was no additional visual information about the observer’s side-to-side head movements. The display was viewed monocularly.
Three different experimental conditions were tested. In the first, the motion transformation on the screen was a simple horizontal shear. Since there was no additional vertical perspective information (in contrast to the control condition and previous observer-produced parallax experiments), there was no visual information to disambiguate the depth of the depicted corrugations. As a consequence, if vertical perspective plays a role in disambiguating the depth in observer-produced motion parallax, the perceived depth should be ambiguous and observers should be equally likely to see the corrugation above the center line as a peak or as a trough. On the other hand, if pursuit eye movements play a disambiguating role, the corrugations should be seen unambiguously, as in the control condition.
However, since there is also proprioceptive and vestibular information available to signal the observer’s movement with respect to the simulated 3-D surface, it is possible that either or both of these signals might be sufficient to disambiguate the depth of the corrugations rather than the pursuit eye movements.
To study the role of vertical perspective transformations, two additional experimental conditions were investigated in Experiment 3. In both cases, an additional vertical perspective transformation was added to the shearing pattern of motion on the display screen. In the first case, the vertical perspective simulated the rotation of the corrugations in the opposite direction as the observer’s movements with respect to the screen (as in the control condition) and in the second, the vertical perspective simulated the rotation of the corrugations in the same direction to the observer’s movements. In the first case—“consistent condition”—observer movement to the left created proprioceptive, vestibular, and pursuit eye movement signals indicating that shearing motion to the right on the screen should be interpreted as “in front.” In the “consistent condition,” all sources of potential disambiguating information—proprioceptive, vestibular, pursuit eye movements, and vertical perspective—signal the depth as “in front.”
In the second case—“contradictory information”—the vertical perspective information signaled a clockwise rotation of the surface (in the same direction as the observer’s movement) and thus the shearing motion to the right on the screen should be interpreted as “behind.” If the depth in the “contradictory information” condition is seen in the direction specified by the vertical perspective transformations, this would suggest that even if any or all of these non-visual sources is a sufficient source of disambiguation in isolation, they are over-ruled when put in conflict with the visual information from vertical perspective.
Viewing was monocular. As in Experiments 1 and 2, observers were asked to press the appropriate key to indicate whether the corrugation immediately above the center-line of the display was a peak or a trough. For each of the seven observers (including the author), the four different conditions were each repeated four times in a randomized order; twice with one coupling of the head movement direction to the shearing pattern and twice with the opposite coupling. In the graph that follows, the percentage of time corresponds to the average time that observers’ “peak” or “trough” responses were consistent with the pursuit eye movement information provided in the two different couplings of head movement and shearing direction.
Results
In the normal observer-produced parallax condition, the depth of the corrugations was always unambiguous, as reported by Rogers and Graham (1979; Figure 6, left-most bar). No observer ever saw reversals in the depth of the corrugations. In the first experimental condition, where the screen always faced the observer and hence there was no vertical perspective information to signal a rotation of the surface with respect to the observer’s line-of-sight, the perceived depth was significantly above chance (Figure 6, second bar). For around 60% of the time, the responses were in a direction consistent with the observer’s pursuit eye movements (as well as with the use of proprioceptive and vestibular information) but for 40% of the time they were inconsistent. The degree of ambiguity in this condition would not be expected if pursuit eye movements (or vestibular or proprioceptive cues) were a strong source of information for resolving the depth in observer-produced motion parallax.
The percentage of time that responses that were consistent with the pursuit eye movement information plotted as a function of the presence or absence of vertical perspective information in the four conditions. In normal, observer-produced parallax (control condition), depth is completely unambiguous (left-most bar). In the absence of vertical perspective, the depth was largely ambiguous (second bar), despite the presence of pursuit eye movement information. However, the presence of vertical perspective corresponding to ±6.5° rotation rendered the same parallax transformation largely unambiguous in both the consistent and contradictory information conditions (third and final bars). However, the percentage of time that responses that are consistent with the pursuit eye movement information fell to around 15% in the presence of the conflicting vertical perspective information (the contradictory condition—final bar). In other words, the conflicting vertical perspective information wins out. Error bars show ±1 standard error of the mean.
The results in the “consistent condition” (third bar), in which there was additional vertical perspective information indicating a ±6.5° rotation of the surface in the opposite direction to the observer’s movements (i.e., a surface that remains stationary with respect to the world), are not surprising. In this case, close to 90% of the time observers’ responses were consistent with all four sources of disambiguating information—vertical perspective as well as proprioceptive, vestibular, and pursuit eye movements.
The results of the fourth condition—“contradictory information”—are the most important for assessing the relative strength of different sources of disambiguating information. In this case, the non-visual information provided by proprioceptive, vestibular, and pursuit eye movements all signaled that the surface had moved to the right during a head movement to the left (i.e., a stationary surface that effectively rotates in an counterclockwise condition with respect to the observer’s line-of-sight). In contrast, the vertical perspective information specified the rotation of a surface in a clockwise direction with respect to the observer’s line-of-sight during a head movement to the left. That is, the surface should appear to rotate in the same direction as the observer’s movement but at a greater speed and this, indeed, is what observers reported seeing. Note that this is not dissimilar to the apparent rotation of a “Reverspective” structure where the perspective information is inconsistent with the physical structure (Rogers & Gyani, 2010).
In the “no vertical perspective” condition (second bar), there was a small bias toward seeing the depth structure in a direction that is consistent with the non-visual, proprioceptive, vestibular, and pursuit eye movement signals. However, that small bias was reversed in the “contradictory information” condition where, for most of time, responses were consistent with the vertical perspective information in the opposite direction (fourth bar). While this result does not prove that non-visual sources of information, including pursuit eye movements, are insufficient to disambiguate the depth structure on their own, they suggest that vertical perspective information wins out when the visual and non-visual sources are put in conflict.
As in Experiment 2, it should be noted that not only were the observers’ depth responses consistent with the vertical perspective information but their perceptions of the direction of rotation of the surface were also consistent with that information. Observers reported seeing the corrugated surface as rotating with respect to their line-of-sight as they moved from side-to-side.
General Discussion
The purpose of this article was to empirically examine the hypothesis that pursuit eye movements provide both a necessary and sufficient source of information to disambiguate the depth seen from motion parallax. The theoretical considerations outlined in the accompanying paper show that pursuit eye movements could not, in principle, act as a source of disambiguating information. Moreover, previous experimental work has shown that both (a) the vertical perspective transformations and (b) the ground plane optic flow produced when making side-to-side head movements (observer-produced parallax) or when a 3-D surface translates across our line-of-sight (object-produced parallax) are both capable of disambiguating parallax-specified depth (Rogers & Rogers, 1992). Thus the claim that pursuit eye movements are necessary can be rejected.
A Comparison of Observer- and Object-Produced Motion Parallax
According to the pursuit eye movement hypothesis, the disambiguating effect of pursuit eye movements should be greater (i.e., there should be less ambiguity) in the object-produced parallax situation because the eye movements are not reduced in magnitude by the presence of TVOR movements, as they are in the observer-produced parallax situation (Nawrot, 2003). This prediction is not borne out by the experimental findings. In the absence of vertical perspective, the perceived depth was completely ambiguous in the object-produced parallax conditions of Experiment 2 (Figure 4, second bar) when the pursuit eye movements were at full strength. In contrast, the perceived depth was not completely ambiguous in the observer-produced parallax conditions of Experiment 2 (Figure 6, second bar) where the pursuit eye movements were smaller (because of the TVOR signals). In addition, whereas the addition of vertical perspective cues largely dominated and over-ruled other sources of information about depth order in the object-produced parallax situation (with the undiminished pursuit eye movements), the same vertical perspective cues were slightly less effective in determining depth order in the observer-produced parallax situation (with smaller amplitude pursuit eye movements). This suggests that the additional extraretinal information provided by the vestibular and proprioceptive systems in the observer-produced parallax situation may be effective in their own right.
The Coupling of Perceived Depth Order and the Perceived Direction of Rotation
An interesting feature that emerged from these experiments is the close relationship that exists between the direction of the perceived depth from motion parallax transformations and the apparent rotation of the parallax-defined surface with respect to the line-of-sight. This is, of course, an established characteristic in the case of the KDE—when the depth order of the rotating KDE object reverses, the direction of apparent rotation also reverses—the two are coupled together. What the present experiments show is that there is an analogous coupling effect in the motion parallax situation. In “no vertical perspective” condition of Experiment 2, for example, the display screen moved (and was seen to move) along a circular arc that always faced the observer. Under these conditions, the parallax-defined corrugations were seen to rotate simultaneously with respect to the (unchanging) frontal view of the display screen. For example, whenever the observer reported that the corrugation above the center-line was a “peak” and the shearing motion was in the same direction as the display movement, the apparent rotation of the surface was in the opposite direction with respect to the display screen (and the line-of-sight). Whenever the observer reported that the corrugation above the center-line was a “trough” and the shearing motion was in the same direction as the display movement, the apparent rotation of the surface was in the same direction with respect to the display screen. In other words, the perceived depth of the corrugations in the absence of vertical perspective was always linked to the perceived direction of rotation with respect to the line-of-sight. It is an interesting question as to whether (a) the perceived depth order determines the apparent direction of rotation or (b) the direction of apparent rotation determines the perceived depth order or (c) whether the two are determined simultaneously.
However, when an additional vertical perspective transformation was added to the pattern of shearing motion on the screen in Experiment 2, it is clear that the direction of rotation specified by the vertical perspective determines the perceived depth order. If the vertical perspective specified that a surface was rotating in a clockwise direction during display movement to the left, observers saw a surface that rotates in a clockwise direction (with respect to the line-of-sight) and the perceived depth order of the 3-D corrugations was determined by the specified direction of rotation.
A similar coupling between perceived depth order and apparent direction of rotation was noted previously when disparity information was put into conflict with motion parallax (Rogers & Collett, 1989). In those experiments, object-produced motion parallax was created on a binocular, random dot display that translated from side-to-side along a frontal path in front of the observer. When only motion parallax cues were present, the perceived direction of depth in the simulated corrugations was found to be unambiguous such that the parts of display that moved in the same direction as the display movement was seen as in front and those that moved in the opposite direction as the display movement was seen as behind. However, when binocular disparities were introduced so that the previously specified corrugation peaks were depicted with uncrossed disparities and the previously specified troughs were depicted with crossed disparities, the perceived depth of the corrugations was reversed. However, the motion parallax was not ignored or suppressed. Instead, the perceptually “reversed” depth corrugations were seen to rotate as the surface translated to-and-fro across the observer’s line-of-sight. In this case, what we perceive corresponds to the only real-world situation that would generate the particular combination of motion parallax and disparity information.
A third example of the coupling between depth order and direction of rotation can be seen in Patrick Hughes’ “Reverspective” artworks (Papathomas, 2002; Rogers & Gyani, 2010). Under monocular viewing, the perspective information drawn on the faces of the truncated pyramids dominates what we see and the pyramids appear as structures receding away from the observer. When the observer moves from side-to-side, thereby generating motion parallax between different parts of the display, the perceived depth order determines the direction of rotation of the entire 3-D structure so that it appears to rotate and follow the observer’s movements.
The Sufficiency of Pursuit Eye Movements
Having addressed the question of the necessity of pursuit eye movements and their relative strength compared with other sources of disambiguating information, the final question we posed was concerned with the sufficiency of pursuit eye movements as a source of information for disambiguating motion parallax transformations. In Experiment 2, where the simulated 3-D surface moved along a circular path centered at the observer’s eye (in order to eliminate vertical perspective information), the depth of the perceived corrugations was not unambiguous, as predicted by the pursuit eye movement hypothesis. Instead, observers reported that the simulated corrugations were perceptually unstable, with the depth reversing frequently over the 45 s observation period. Given that the observers were tracking the translating display, and thus generating pursuit eye movements, this result suggests that pursuit eye movements are, at best, only a very weak source of disambiguating information.
The Effects of Display Size
The arguments and findings described in this article clearly show that vertical perspective information provides not only a theoretical possible source of information that could be used to disambiguate the depth of both KDE and motion parallax surfaces but also clear empirical evidence that vertical perspective transformations are effective in practice. In Experiment 1, it was pointed out that vertical perspective changes are always present when a 3-D surface rotates with respect to the observer, whether in a KDE or motion parallax situation. There is no such thing as an absence of vertical perspective in either case. However, the magnitude of those transformations depends on the angular size of the object or display as well as the amount of rotation. For displays that only subtend a few degrees of visual angle or only rotate through a small angle, the transformations may be too small to be detected by the human visual system. Hence, any conclusions we might wish to draw need to bear this point in mind. In Nawrot’s original experiments, the display size was only 6.6° × 6.6° and the vertical perspective changes may have been too small for the observer to detect. The results reported here clearly show that for displays 28° × 20° or larger, vertical perspective transformations are effective and this was true for simulated rotations of just ±1.625° (Experiment 1). The results of the additional conditions in Experiments 1 and 2, where the display size was reduced to 8.9° × 8.9° and the simulated rotation angle was ±20° (as in the George et al. experiments), clearly show that vertical perspective transformations of this magnitude and in displays of this size can also be effective in disambiguating the direction of rotation in parallax displays. Our ongoing studies are investigating the precise lower size and rotation limits for the effective use of vertical perspective information.
Having said this, it is important to bear in mind that in the case of observer-produced motion parallax (Experiment 3), the vertical perspective transformations created by observer movement are usually 8 a “whole-field” characteristic—all objects and surfaces in the scene undergo a similar change in perspective viewing since they are produced by the movement of the observer. The same is true of the binocular, vertical disparity (differential perspective) transformations between the eyes. Hence, the visual system could, in principle, integrate those changes over the whole visual field (Howard & Rogers, 1995). In contrast, the vertical perspective changes that occur when an object rotates in front of us (KDE) or translates across our line-of-sight (motion parallax) represent local changes. That this is the case is clearly revealed in the “straight-line” and “concave” conditions of Experiment 2 where the local corrugated surface is seen to rotate with respect to the surrounding display screen. The consequences that arise from this observation require further investigation.
Discrepancies Between the Findings Reported Here and Those of Nawrot’s Experiments
The results of the present experiments on the effectiveness of vertical perspective and pursuit eye movement information for disambiguating motion parallax transformation are clearly different from those reported by Nawrot and his colleagues, and it is therefore important to identify the causes of these differences. Display size is clearly important. In Nawrot et al. experiments, the displays were typically small 6.5° × 6.5° or 8.9° × 8.9° compared with the 28° × 20° displays used in our studies. Like the vertical disparity (differential perspective) transformations between the two stereoscopic images used by Rogers and Bradshaw (1993), there must also be a lower size limit for the dynamic vertical perspective transformations to be effective. The additional condition in Experiment 1 shows that vertical perspective transformations can be effective with displays as small as 8.9° × 8.9° that have a simulated rotation angle of ±20°. In addition, the data from Braunstein’s (1977) experiments show that the vertical perspective transformations that accompany the rotation of simulated spheres were sufficient to disambiguate the direction of rotation in his small (9.6°) displays and that there was no significant difference in the proportion of “correct” responses in the 9.6° and 39.7° displays that he used. This suggests that the simulation of the vertical perspective transformations used in Nawrot’s experiments may have been inadequate.
Second, the experiments of George et al. simulated only a brief, 1.6 s presentation of a unidirectional displacing stimulus which contrasts with the continuous side-to-side head (or display monitor) movements used in our own experiments. As a consequence, their stimulus situation might not have been sufficient for the visual system to exploit the available information. Third, it is possible that because the display screen remained stationary and the random dot field translated across the screen in the George et al. experiment, the frame of the screen could have provided vertical perspective information of a frontal surface during pursuit eye movements that contradicted the superimposed vertical perspective transformation of the translating dot field.
Conclusions
The results of the three experiments described in this article show that vertical perspective transformations can be a sufficient source of information for disambiguating both KDE and motion parallax surfaces. However, this does not imply that they are necessary and, in fact, several experiments point to the fact that vertical perspective information can be over-ruled by other information. Two situations make this clear. As described earlier, Rogers and Collett (1989) reported that when motion parallax and binocular disparities were put into conflict, the disparity information “won out” and the perceived corrugations were consistent with the disparity information. In this case, this display monitor and the random dot pattern translated in front of the observer and therefore generated vertical perspective cues signaling a particular direction of rotation of the surface with respect to the line-of-sight. Observers, however, reported the opposite direction of rotation. A similar result is found in the case of “Reverspectives.” Even though the observer moves from side-to-side and thereby creates a vertical perspective transformation to signal the counterrotation of the Reverspective with respect to the line-of-sight (stationary with respect to the world), observers see depth in accordance with the linear perspective cues on the faces of the pyramids and perceive an apparent rotation of the whole structure in the same direction as their own movements. Vertical perspective transformations can be ignored or over-ruled.
The most convincing piece of evidence to demonstrate the insufficiency of pursuit eye movements as a source of disambiguating information comes from Condition 2 in Experiment 2 where the transforming shearing pattern was seen on a screen that always faced the observer during its side-to-side movements. Pursuit eye movements were generated while the observer tracked the side-to-side movement of the display screen but, in the absence of detectable vertical perspective changes, the perceived depth of the corrugations was ambiguous. It is hard to see how that this ambiguity could be due to the presence of some other source of information that conflicts with that provided by the pursuit eye movement signals.
In conclusion, the results reported in this article provide strong evidence that pursuit eye movements are neither a necessary nor a very effective sufficient source of information for disambiguating motion parallax transformations. In addition, the theoretical considerations described in the accompanying paper reveal that the only source of information that could reliably specify the depth order of motion parallax-defined surfaces is that provided by the vertical perspective transformations of the retinal image. Our empirical results provide further evidence for the effectiveness of these vertical perspective cues in the particular situations we investigated. Finally, our theoretical analysis reveals the similarity between depth from motion parallax and depth from the KDE. Instead of seeing them as two different sources of structure-from-motion information—one due to observer or object translation and the other to object rotation (Rogers, 1993)—it may be more helpful to see both as providing information about the dynamic changes in the relative orientation of surfaces and objects—that is, their rotation—with respect to the observer’s line-of-sight.
Footnotes
Acknowledgments
Preliminary accounts of these findings were presented at the 2010 ECVP meeting in Lausanne, Switzerland (Rogers, 2010) and at the 2012 VSS meeting in Naples, Florida (Rogers, 2012). The author is very grateful to Tim Ledgeway for his help in developing the image presentation software.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
