Abstract
How do observers recognize objects after spatial transformations? Recent neurocomputational models have proposed that object recognition is based on coordinate transformations that align memory and stimulus representations. If the recognition of a misoriented object is achieved by adjusting a coordinate system (or reference frame), then recognition should be facilitated when the object is preceded by a different object in the same orientation. In the two experiments reported here, two objects were presented in brief masked displays that were in close temporal contiguity; the objects were in either congruent or incongruent picture-plane orientations. Results showed that naming accuracy was higher for congruent than for incongruent orientations. The congruency effect was independent of superordinate category membership (Experiment 1) and was found for objects with different main axes of elongation (Experiment 2). The results indicate congruency effects for common familiar objects even when they have dissimilar shapes. These findings are compatible with models in which object recognition is achieved by an adjustment of a perceptual coordinate system.
How do observers recognize objects after they are rotated, shifted, or changed in size? This task is not trivial, because the retinal image changes greatly with spatial transformations. A central and robust finding is that recognition performance is not invariant across spatial transformations, but depends systematically on the amount of transformation (for reviews, see Graf, 2002; Jolicoeur & Humphrey, 1998; Lawson, 1999; Tarr, 2003; Tarr & Bülthoff, 1998). This systematic dependency is difficult to reconcile with models suggesting that objects are represented in an abstract propositional code (e.g., Biederman, 1987; Hummel & Biederman, 1992; Hummel & Stankiewicz, 1998). Several researchers have instead proposed models with image-based representations that predict a systematic dependency on image transformations (for review, see Ullman, 1996). Image-based models rely either on compensation (normalization) processes like mental rotations (e.g., Jolicoeur, 1985, 1990) or on view-based representations that do not require compensation processes (Bülthoff & Edelman, 1992; Edelman, 1998; Perrett, Oram, & Ashbridge, 1998; Riesenhuber & Poggio, 2002). In this article, we present evidence for an alternative account—that object recognition is achieved by performing coordinate transformations that compensate for spatial transformations.
Research in neuroscience provides an important perspective on the issue. Consider the distinction between the two cortical pathways, with the ventral stream involved in object perception and recognition and the dorsal stream involved in visuomotor control (e.g., Milner & Goodale, 1995). Milner and Goodale argued that object recognition and visuomotor control rely on fundamentally different processes. Visuomotor control requires coordinate transformations, because sensory and motor information rely on different coordinate systems. Milner and Goodale assumed that object recognition, in contrast, is achieved by detecting enduring characteristics that are more or less invariant across spatial transformations. However, neurocomputational approaches suggest that visuomotor control and object perception are based on similar processes. Both visuomotor control and object recognition involve coordinate transformations, implemented by neuronal gain modulation (e.g., Deneve & Pouget, 2003; Salinas & Abbott, 1995, 1997; Salinas & Sejnowski, 2001; Zipser & Anderson, 1988).
Considering these neurocomputational approaches, we hypothesized that compensation processes in object recognition may be viewed as coordinate transformations that adjust a perceptual coordinate system (reference frame) defining the correspondence between positions specified in memory and positions in the current visual field (Graf, 2002, 2004; see also Larsen & Bundesen, 1978). Thus, coordinate transformations compensate for spatial transformations by aligning input and memory representations. This model leads to an interesting prediction: If recognition involves the adjustment of a perceptual coordinate system, this coordinate system may be active for some time and facilitate the recognition of subsequently presented objects in the same orientation (an orientation congruency effect)—even when they differ in shape.
We investigated three research questions. First, can an orientation congruency effect be found for common familiar objects? Previous research showed congruency effects for alphanumeric stimuli (Jolicoeur, 1990, 1992) and for novel objects (Gauthier & Tarr, 1997; Tarr & Gauthier, 1998), but not for familiar objects. The demonstration of congruency effects with familiar objects would be a good indicator that a frame adjustment plays an important role in default object recognition.
Second, does the orientation congruency effect rely on a relatively abstract reference frame? Previous studies were contradictory on this point. Two studies showed that the effect was limited to similar objects, suggesting that it is mediated by class-based processes (Gauthier & Tarr, 1997; Tarr & Gauthier, 1998). Other studies demonstrated congruency effects for different alphanumeric stimuli (Jolicoeur, 1990, 1992; Larsen & Bundesen, 1978), which may be regarded as dissimilar objects. Moreover, congruency effects for highly dissimilar stimuli were found in a study that showed facilitated symmetry detection for dot patterns when the orientation of the symmetry axis of the patterns corresponded to the orientation of a previously presented alphanumeric stimulus (Pashler, 1990).
Third, are congruency effects limited to objects with the same main axis of elongation? For example, is there still a congruency effect when an object with a vertical axis of elongation is followed by an object with a horizontal axis (e.g., a saltshaker followed by a car)? Recent studies showed that the main axis of elongation has only little or no influence on naming performance (Large, McMullen, & Hamm, 2003; Lawson, 2004). However, this does not necessarily mean that congruency effects are independent of the axis of elongation. The demonstration of a congruency effect for objects with different main axes of elongation would provide evidence that the congruency effect cannot be reduced to information about the objects' main axes.
EXPERIMENT 1
In Experiment 1, we investigated whether orientation congruency effects can be found for familiar objects, and whether they are limited to similar objects, or to class-based processes. We selected familiar objects from different basic-level categories, in order to get a set of objects with rather low visual similarity. To exclude the possibility that congruency effects observed would be due to class-based processes, we selected the objects from two different superordinate-level categories, which typically have low visual similarity (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). Half of the objects were taken from human-fashioned (artifact) categories, whereas the other half were biological objects (see Table 1). A number of studies have shown substantial differences between biological and artifact categories (for a review, see Humphreys & Forde, 2001). Congruency effects between biological objects and artifacts would strongly indicate that a rather abstract frame of reference is adjusted in object recognition.
Objects Employed in the Test Phase of Experiment 1
The lack of congruency effects for dissimilar objects in studies by Gauthier and Tarr (1997; Tarr & Gauthier, 1998) might have been a consequence of long interstimulus intervals (ISIs), which allowed for interference by strategic processes. In order to make strategic effects less likely and tap the fast default system of object recognition, we briefly presented two objects sequentially, followed by a mask. The use of brief presentation times and the absence of an ISI made it unlikely that any effects observed would be due to differential patterns of eye movements or to strategic effects.
We investigated whether naming performance was higher for objects in congruent orientations than for objects in incongruent orientations. Our interpretation of the results is based on the assumption that under time-limited viewing conditions, perceptual processes that take more time will lead to more perceptual errors. By this assumption, conditions with greater error rates reflect processes that did not run to completion as frequently as in conditions with smaller error rates. Hypothesizing that recognition relies on an adjustment of a relatively abstract coordinate system, we predicted a congruency effect for familiar objects from different basic-level and different superordinate-level categories.
Method
Participants
Ten subjects with normal or corrected-to-normal vision participated in the experiment for payment. Three were male, and 7 were female. The age range was from 22 to 50 years (average = 31 years). The subjects were not familiar with the purpose of the study.
Stimuli
Line drawings of familiar objects (Snodgrass & Vanderwart, 1980) that have a canonical upright orientation were used as stimuli (see Fig. 1). Objects were taken from different basic-level categories, and only one object per basic-level category was selected. We conducted a pilot naming study with 5 subjects in order to identify and select objects that can be recognized approximately equally well. On the basis of this study, 24 objects were chosen for the adjustment phase, and 24 objects were chosen for the test phase (12 artifact objects and 12 biological objects; see Table 1). Most objects that were used in the test phase had a predominantly horizontal main axis. Left-facing objects were flipped so that all objects were oriented to the right. Stimuli were presented in either 50° or 140° orientation (clockwise rotation) in the picture plane. We created four masks by taking fragments from images that were not employed in the experiment, rotating these fragments to random orientations, and pasting them to cover the entire area in which stimuli might appear.

Examples of the stimuli. In both experiments, stimuli were presented in congruent or in incongruent orientations. In Experiment 1, the two objects in a trial were from the same or from different superordinate-level categories. In Experiment 2, the two objects in a trial had the same or different main axes of elongation.
The size of the objects did not exceed 6.5 × 6.5 cm, which corresponded to 3.8°× 3.8° of visual angle (distance to the monitor was about 98 cm). All objects were presented in the center of a 21-in. color monitor (Sony Multiscan G 500) with a resolution of 1024 × 768 pixels and a 144-Hz refresh rate. Objects were presented within a white rectangular window (15.1 × 19.2 cm, 8.8°× 11.2°) on a black screen.
Procedure
In each trial, two targets were presented sequentially at the center of the screen. The subject's task was to name the two objects in the order of presentation. A fixation point was presented until the subject initiated the trial by pressing a key. The fixation point was then replaced by a 347-ms blank screen. Then the first target appeared for 104 ms and was immediately replaced by a second target; the presentation time for the second target was adjusted for each subject during the first phase of the experiment and then held constant for the second (test) phase. The second target was immediately replaced by a pattern mask, which was presented for 999 ms. The fixation point reappeared when the subject's response had been recorded by the experimenter.
In the adjustment phase, the duration of the second target display was adjusted for each subject such that the subject was about 70% accurate in reporting both objects. The objects were presented in the same four orientation conditions (see Design) as in the test phase, but different objects were used for the two phases. The adjustment was done with a staircase algorithm. Presentation times started with 153 ms and were decreased or increased in steps of 7 ms, depending on the subject's response. Subjects needed from 43 to 120 trials in the adjustment phase in order to reach criterion. Once set, the exposure duration of the second stimulus was held constant in the test phase (for that subject) for the remainder of the experiment. Presentation times for the second target in the test phase ranged from 69 to 104 ms, with a mean exposure duration of 82 ms (SD = 15). The entire experiment lasted about 75 min.
Design
There were four main types of trials, defined by the orientations of the first and second objects. Each object was rotated clockwise either +50° or +140°, so there were four possible combinations of orientations, two that were congruent (50°-50°, 140°-140°) and two that were incongruent (50°-140°, 140°-50°). All four combinations were used equally often. Twenty-four objects (half artifacts and half biological objects) were used in the test phase. Six objects could appear as the first target (three biological and three artifact objects), and 18 objects could appear as the second target (nine biological objects and nine artifact objects). Because different objects were used as first and second targets, the same object never appeared twice in one trial. All possible combinations of objects and orientation conditions were presented. The resulting 432 trials were divided into six blocks of 72 trials such that the four orientation conditions and the six first targets were equally distributed over blocks. The order of trials was determined randomly for every subject, with the following constraints: The same object could not appear as the second target in two successive trials, and two successive trials could not show the same combination of orientations. There was a self-timed break after every block.
One of the four masks was selected randomly for every trial. Accuracy of report was the dependent measure; trials were counted as correct when both objects were named correctly in the order of presentation.
Results and Discussion
We analyzed the data with one-way analyses of variance (ANOVAs). 1 In general, objects were named with much higher accuracy in congruent orientations than in incongruent orientations, F(1, 9) = 32.89, p < .001, η p 2 = .785. Thus, congruency effects were found for common objects that were mostly dissimilar and were selected from different basic-level categories. The congruency effect was found both when the orientation of the second target was 50°, F(1, 9) = 37.25, p < .001, η p 2 = .805, and when it was 140°, F(1, 9) = 15.57, p = .003, η p 2 = .634 (see Fig. 2a). The advantage for congruent orientations was rather strong, paralleling a strong congruency effect for letters (Jolicoeur, 1990).

Percentage correct as a function of orientation of the second target in Experiment 1 (a) and Experiment 2 (b). The two targets had the same orientation in congruent trials (i.e., 50° and 50°, 140° and 140°) and different orientations in incongruent trials (i.e., 50° and 140°, 140° and 50°). Note that chance (guessing) performance is below 5.6% accuracy. Error bars represent standard errors of the mean.
Interestingly, the presentation of a rotated object led to an inversion of the usual effect of physical orientation on recognition of the subsequent object. When the first target was presented in 140° orientation, subsequently presented objects were recognized better in 140° than in 50° orientation, F(1, 9) = 5.06, p = .051, η p 2 = .360. Thus, the usual advantage for objects closer to upright (orientation effect) was reversed simply by presenting the first target at 140°. There was still a significant advantage for more upright objects in congruent trials, F(1, 9) = 35.09, p < .001, η p 2 = .796, and thus no full compensation for misorientation.
In order to investigate practice effects, we divided the session into three blocks. Even though the level of performance increased with practice, F(2, 18) = 14.17, p < .001, η p 2 = .612, accuracy was about 24, 18, and 17 percentage points higher for congruent than for incongruent orientations in Blocks 1, 2, and 3, respectively. Consequently, practice effects were comparatively small and did not eliminate congruency effects. This result is in contrast to the findings of naming studies with long presentation times (e.g., Jolicoeur, 1985), but corresponds with findings of other studies using short masked presentations (Lawson & Jolicoeur, 1998).
Significant congruency effects were found when both objects were from the same superordinate-level category, F(1, 9) = 15.66, p = .003, η p 2 = .635, and also when they were from different superordinate-level categories, F(1, 9) = 57.69, p < .001, η p 2 = .865 (see Fig. 3). There was no significant effect of whether the superordinate category was the same or different, F(1, 9) < 1. The results clearly demonstrate orientation congruency effects even for objects from different superordinate-level categories. Overall, the findings are in accordance with the notion that recognition is achieved by adjusting a rather abstract frame of reference. However, it was possible that the congruency effects obtained in Experiment 1 were simply caused by priming the main axis of elongation, and were not due to the adjustment of a relatively abstract coordinate system. Therefore, we conducted a second experiment to investigate whether the congruency effect is restricted to objects with the same axis of elongation.

Percentage correct in Experiment 1 as a function of whether the two targets were from the same or different superordinate-level categories (biological objects or artifacts). The targets had the same orientation in congruent trials (i.e., 50° and 50°, 140° and 140°) and different orientations in incongruent trials (i.e., 50° and 140°, 140° and 50°). Error bars represent standard errors of the mean.
EXPERIMENT 2
Experiment 2 was designed to achieve two goals: to provide a replication of the congruency effect for familiar objects and to test whether congruency effects can be found also for objects with different main axes of elongation (Fig. 1). By manipulating the main axis of elongation, we were able to investigate whether the congruency effect actually relates to the adjustment of a frame of reference, or whether it is simply due to priming of the main axis of elongation. In addition, as objects with different main axes of elongation also tend to be dissimilar, the experiment provides a further test of whether the congruency effect can be found for dissimilar objects. If recognition is achieved by adjusting a relatively abstract coordinate system, then congruency effects should appear even when the objects differ in their main axis of elongation.
Method
Ten subjects with normal or corrected-to-normal vision participated in the experiment for payment. All subjects were female. The age range was from 17 to 41 years (average = 24 years). The subjects were not familiar with the purpose of the study.
The stimuli were line drawings, selected such that 12 had a predominantly vertical main axis of elongation, whereas the other 12 had a predominantly horizontal main axis of elongation (see Table 2). One third of the stimuli in each group were biological objects; the other two thirds were artifacts. The stimuli were similar to those in Experiment 1 in all other respects.
Objects Employed in the Test Phase of Experiment 2
The procedure was the same as in Experiment 1. Presentation times for the second target ranged from 69 to 104 ms, with a mean exposure duration of 78 ms (SD = 11). Subjects needed from 58 to 120 trials in the adjustment phase in order to reach criterion.
The design was the same as in Experiment 1, with one exception: Half of both the first targets and the second targets had a horizontal main axis of elongation, whereas the other half had a vertical main axis of elongation.
Results and Discussion
We analyzed the data with one-way ANOVAs. Again, there was a highly significant orientation congruency effect for objects from different basic-level categories, F(1, 9) = 38.74, p < .001, η p 2 = .811. The congruency effect was present both when the second target was shown at 50° orientation, F(1, 9) = 48.27, p < .001, η p 2 = .843, and when it was shown at 140° orientation, F(1, 9) = 25.62, p = .001, η p 2 = .740 (see Fig. 2b). As before, the presentation of a rotated object led to an inversion of the usual effect of physical orientation on recognition of a subsequent object. When the first target was presented at 140° orientation, subsequently presented objects were recognized better at 140° than at 50° orientation, F(1, 9) = 31.02, p < .001, η p 2 = .775. There was still an effect of orientation in congruent trials, F(1, 9) = 15.78, p = .003, η p 2 = .637.
Performance improved with practice, and there was a significant effect of block, F(2, 18) = 8.71, p = .002, η p 2 = .492. However, the congruency effect was found in all three blocks. Performance was about 24, 27, and 22 percentage points higher for congruent than for incongruent orientations in Blocks 1, 2, and 3, respectively.
Significant congruency effects were present both for objects with the same main axis of elongation, F(1, 9) = 22.58, p = .001, η p 2 = .715, and for objects with different main axes of elongation, F(1, 9) = 46.29, p < .001, η p 2 = .837 (see Fig. 4). There was no significant effect of main axis, F(1, 9) < 1. These results indicate that the congruency effect is not limited to objects with similar main axes of elongation, and thus does not simply result from priming of the main axis of elongation. Instead, it seems that a relatively abstract coordinate system is adjusted during object recognition. The results are in all respects highly similar to those of Experiment 1, and therefore confirm the robustness of the congruency effect.

Percentage correct in Experiment 2 as a function of whether or not the two targets had the same main axis of elongation. The targets had the same orientation in congruent trials (i.e., 50° and 50°, 140° and 140°) and different orientations in incongruent trials (i.e., 50° and 140°, 140° and 50°). Error bars represent standard errors of the mean.
The finding that performance was largely independent of the objects' main axes is consistent with studies suggesting that the axis of elongation is not the main cue for alignment (Large et al., 2003; Lawson, 2004). However, the results do not exclude the possibility that the main axis is one of several cues for object orientation. Further possible cues for frame adjustment are the axis of reflectional symmetry, contour orientation, textural orientation, and motion (for review, see Palmer, 1999). Moreover, alignment cues that are more closely related to object shape may be used (Belongie, Malik, & Puzicha, 2002).
GENERAL DISCUSSION
The present findings indicate that there is a substantial orientation congruency effect for line drawings of familiar objects. Objects are recognized more easily when they are presented in the same picture-plane orientation as a previously presented object than when they are in a different orientation. This is not a simple shape priming effect, because the two objects in a trial were always from different categories. The effect is not limited to similar objects or to class-based processes. It was found for objects from different basic-level categories and for objects from different superordinate-level categories (Experiment 1). Moreover, it was even found for objects with different main axes of elongation (Experiment 2).
Our findings are consistent with the proposal that compensation (alignment) processes in object recognition can be regarded as transformations of a perceptual coordinate system (Graf, 2004). The findings do not allow us to distinguish whether analog transformation processes are involved, or whether a one-step adjustment (Hinton, 1981) is performed. The congruency effect is an important extension of orientation dependency and provides further constraints for any model of object recognition. Congruency effects seem to supply a deeper understanding of the processes underlying orientation dependency in object recognition.
In contrast to two previous studies (Gauthier & Tarr, 1997; Tarr & Gauthier, 1998), the present experiments demonstrated congruency effects for dissimilar objects (taken from different basic-level categories, and even from different superordinate-level categories). We propose two possible reasons for this discrepancy. Gauthier and Tarr used long intervals between stimulus presentations, which allowed for the intrusion of strategic effects. We tried to eliminate potential strategic effects by using a paradigm with shorter presentation times and no interstimulus interval. Moreover, Gauthier and Tarr employed novel objects, which may not have a clearly defined canonical orientation (in the same sense as familiar objects).
Current models of object recognition do not account for these findings. First, structural-description models propose that objects are represented by configurations of elementary parts (Hummel & Biederman, 1992; Hummel & Stankiewicz, 1998). As different objects have different parts with different spatial relations, there is no basis in such models for the priming effects found in the present experiments. Second, our results are not compatible with interpolation models, even when they are extended to class-based processes (e.g., Edelman, 1998; Riesenhuber & Poggio, 2002; see also Gauthier & Tarr, 1997; Tarr & Gauthier, 1998). Our results do not support similarity- or class-based accounts for orientation congruency because substantial congruency effects were found even for dissimilar objects.
Third, according to mental rotation models, alignment is achieved by rotating a specific mental image until it is upright (e.g., Jolicoeur, 1985). These models might account for the findings if it is assumed that the rotation process can be primed by a prior rotation that is in the same direction and through the same or a similar angle. However, several studies indicate that mental rotation is not involved in object recognition (e.g., Jolicoeur, Corballis, & Lawson, 1998; for reviews, see Graf, 2002; Lawson, 1999). Moreover, a study that investigated size changes showed that reaction times were fastest when the stimulus was in the cued size format, and increased monotonically with increasing size divergence (Larsen & Bundesen, 1978). This finding can be accounted for only by a prior frame adjustment. Thus, it seems that compensation processes in recognition differ from image transformations in mental rotation (Graf, 2002, 2004).
The reference-frame approach provides the most parsimonious model. The recognition of misoriented objects involves an adjustment of a perceptual coordinate system to the input representation, so that input and memory representations are in alignment. Once the coordinate system is adjusted, the recognition of subsequently presented objects in the same orientation is facilitated. According to this approach, both object recognition and visuomotor control rely on coordinate transformations, and thus they rely on similar computational principles (e.g., Salinas & Abbott, 2001; Salinas & Sejnowski, 2001).
Footnotes
1We used only one-way ANOVAs, as accuracy data (in contrast to reaction time data) do not fulfill the requirement of additivity, so that interactions are difficult to interpret.
Acknowledgements
This work was supported by the European Commission (IST 2000-29375 COGVIS). We thank Quoc Vuong for helpful comments on an earlier draft of the manuscript.
