Enhancing Visual Search in Large Naturalistic Environments with Augmented Reality Cueing

Abstract

This experiment examined how augmented reality (AR) cues impact visual search in wide field-of-view (FOV) naturalistic scenes. Participants searched for targets located within or beyond the immediate FOV, unaided or aided by cues varying in spatial information, quantity, and reliability. Reliable single cues improved search performance over no cues. Dual cues outperformed single cues, particularly when targets were outside the immediate FOV. In such cases, global cues directed head movements toward the local cue, narrowing the search field. However, performance declined when cues erred, suggesting some level of reliance. These findings highlight the benefits of AR cueing, specifically dual cueing, where a redundancy gain helps combat the negative effects imposed by the device’s constrained FOV when searching a large scene. These findings offer AR cue design insights when searching wide FOV naturalistic environments.

Keywords

attention visual search augmented reality human-automation interaction head-mounted displays

Introduction

Visual search occurs daily. Routine searches, like finding a file on your computer, often occur within a limited field-of-view (FOV) and have minimal consequences if they are long or fail. However, many real-world searches occur in large, complex environments, where targets may exist beyond peripheral vision and require eye scanning or head movements. When these searches are inefficient, they can severely impact safety-critical domains, such as soldiers scanning the battlefield for enemies or search and rescue personnel seeking lost hikers. These scenarios underscore the need for innovative solutions to improve real-world search performance in large, realistic environments where critical information lies beyond the periphery.

Prior work has shown that, without attention guidance, searches can become prolonged and erroneous (Wickens et al., 2022; Wolfe, 2021). While much of the attention guidance research has focused on search performance within small 2D display search scenes (see Wickens et al., 2022; Wolfe, 2021), real-world searches often manifest in environments with much larger FOVs. To address the demands of more complex searches, emerging technologies like augmented-reality head-mounted displays (AR-HMDs) are a viable solution because they overlay virtual information onto the real world (Azuma, 1997), enhancing mental integration between cued information and the real-world context.

Cueing with AR-HMDs has been shown to improve performance for assembly tasks (Kim & Wohn, 2011), navigation (Morrison et al., 2009), and search tasks (Kumaran et al., 2023; Warden et al., 2023; Yeh & Wickens, 2001). However, presenting content with AR presents several challenges, such as display clutter, FOV issues, and increased attentional demands (Binetti et al., 2021). How to design AR cue properties to mitigate these issues—especially in large, naturalistic environments where information is beyond the device’s FOV—remains an unanswered question.

Only a few search studies have examined AR cues for larger search scenes beyond 120 of visual angle (Kumaran et al., 2023; Warden et al., 2023). These studies focus on single cues to guide attention, with fewer studies exploring the impacts of imperfect cues (Yeh & Wickens, 2001; Warden et al., 2023). Little work has explored the effectiveness of different cue properties—such as cue quantity, cue location relative to FOV, and reliability—in wide-FOV environments using AR-HMDs. The present study examines AR cue properties in these environments, where objects can be beyond peripheral vision and constrained by the AR-HMD FOV.

Moreover, little research has examined the use of multiple, simultaneously presented cues, especially when both are imperfect. A recent study found that two simultaneously presented spatial cues enhanced performance more than a single spatial cue in a wide FOV search on a 2D display (Warden, 2024). However, search findings from 2D displays may not fully reveal optimal cue properties for realistic wide FOV environments or whether those cue properties generalize to advanced displays, like an AR-HMD. Only one AR study has highlighted the benefits of switching between AR cues that presented “fine” and “coarse” grained guidance information (Hein et al., 2020). However, their implementation of the “combined cues” involved conditionally displaying a single cue based on head orientation, rather than presenting cues simultaneously. In addition, there was no baseline condition and cues always correctly identified the target. As a result, it remains unclear whether performance benefits were due to cue redundancy, dynamic cueing, or specific cue types.

In contrast, the current work examines how the simultaneous presentation of two cues with global (coarse: general direction) and local (fine: precise location) spatial information impacts performance when objects can be beyond the FOV, and when the cues may err. This work addresses critical gaps in the literature by examining how AR cue properties impact performance when searching a wide-FOV naturalistic scene. Unlike prior work, the current work assesses how the precision of spatial information conveyed by the cue impacts performance, and whether combining cues that convey global and local spatial information provides a redundancy gain such that the combination of the two cues is better than either cue alone. This is particularly important to investigate for AR-HMDs which often have a limited FOV, constraining the view of information that exists beyond peripheral vision or at the edge of the display. Additionally, this experiment examines the consequence of automation errors, where a spatial location with no target is cued, to test how misleading AR cues influence attention and behavior.

The current experiment addresses two primary questions: (1) AR cue effectiveness in naturalistic wide FOV search environments, and (2) whether AR cue effectiveness is moderated by the number of cues, cue precision, or the reliability of cues. Hypothesis 1 (H_1A) predicted AR cues would be more effective than no cueing, but there would be a performance cost for imperfect cues (H_1B). Hypothesis 2A (H_2A) predicted dual cues will provide a redundancy gain, such that dual cues will lead to better performance than either cue alone, and (H_2B) dual cues will improve search performance more than single AR cues for objects located beyond the immediate FOV.

Method

Participants

Thirty-seven participants were recruited from the College of Engineering at the University of Michigan. Participants were monetarily compensated after completing the experiment. All participants had normal or corrected-to-normal vision and were screened for colorblindness. The Institutional Review Board approved the experiment.

Stimuli and Apparatus

Participants completed the experiment using an AR-HMD (Microsoft HoloLens 2). The FOV of the HoloLens 2 (HL2) was 43 laterally and 29 vertically. The experiment was developed in the game engine Unity (version 2020.3.28f1) and used the XR interaction toolkit. The search scene consisted of naturalistic flat terrain images with little foliage. These virtual scenes were embedded in the real world approximately 24 inches away from the participant and were 238 inches wide by 120 inches tall. The virtual search scene had a FOV of 157 . Given the dimensions of the virtual scene, some objects inherently fell outside of the device’s FOV, requiring participants to look around for the objects (see Figure 1).

Figure 1.

An illustration of the virtual search scene and the AR-HMD. All AR cues were presented with the HMD but are shown on the image to show the dual cue with a global and local spatial cue.

The search scene contained routine objects–plastic bottles, cans, rocks, logs–and an occasional high priority target (17%) resembling an improvised explosive device (IED). A total of 194 unique routine objects and four unique high priority objects were available for the search task. Eighteen objects were uniformly distributed across the search scene for each trial. The objects’ brightness and contrast resembled that of the search scene. The apparent size of the object was estimated based on its location in the scene (foreground, middle ground, or background) to simulate pictorial depth.

There were 12 total cue conditions: 11 visual cues and the no cue condition. Global cues provided global spatial information, such as the global arrow always pointing toward the general direction of the target from the screen’s center. Local cues provided local spatial information, such as the highlight cue indicating the target’s exact location by outlining its shape in the search scene.

To search for objects, participants used single cues (global or local cues) or dual cues (a combination of global and local cues). Figure 2 shows how the cues were displayed in the HL2.

Figure 2.

An illustration of each single cue (top, labeled with name and acronym) and dual-cue (bottom). Only routine targets were cued.

The visual cues were either perfect and cued the correct target 100% of the time, or they were imperfect and cued a location where no target was present 17% of the time, representing a miss. In the real world, a soldier might experience this error when the HMD encounters signal interference, thus cueing the soldier to an empty location without a target. On each trial, the correct target object was always present in the scene and was located in any position with relation to the empty location that may have been cued on the imperfect trials. Each cue of the dual cue mad ethe same error: both cues guided attention to the same incorrect location. The automation only cued routine objects, and the high priority target was always uncued.

Design

The experiment was a 12 (cue type) × 2 (cue reliability) within-subjects repeated measures design counterbalanced by reliability. Within each reliability block, there were 12 cue conditions (counterbalanced within the reliability block) with six randomized trials per cue condition. The experiment lasted for one hour which consisted of one practice trial per cue with feedback and the 72 trials per reliability block resulting in 144 total test trials.

Task

The experiment consisted of a 157-degree static visual search task completed using an AR-HMD. Participants were instructed to search for a routine target on every trial that will be either cued or uncued. They were also told to search for a high priority target that appeared occasionally, and that it should take precedence over the routine object. On these trials, participants could select up to two objects: the routine object and the high priority object. Participants were informed that imperfect cues meant that they may make errors.

Procedure

Before the experiment, participants read and signed a consent form, completed an electronic colorblindness test, and read the instructions. Participants sat in the middle of the virtual scene and 24 inches away to achieve a visual angle of 157. At the beginning of each trial, participants resumed a forward position, so their immediate FOV was at the center of the scene. Responses were made using a Logitech Bluetooth keyboard. They pressed “right shift” to select objects. After selecting the target(s), participants pressed the left “left control” button on the keyboard to continue to the next trial.

Results

Before the analysis, two participants were excluded due to incomplete datasets. The remaining participant data (N = 35) were analyzed using R studio. No outliers were identified. Testing for normality (Shapiro-Wilk normality test, ps < .05) revealed a non-normal distribution, thus the Greenhouse Geisser (GG) correction was applied to all ANOVA analyses.

Cue Effectiveness for Perfect Single Cue Conditions

Perfect single cues–minimap, global arrow, local arrow, highlight, gaze guidance line–and the no cue condition were examined for cue effectiveness. A one-way repeated measures ANOVA was conducted to examine the effect of cueing on performance (response time and percent error). See Table 1 for average performance measures.

Table 1.

Mean Response Time (seconds, s) and Percent Error for the No-Cue and the Single, Perfect Cue Conditions.

Cue condition	Response Time (s)	Percent Error (%)
Gaze guidance line	4.08 (0.60)	2.86 (2.26)
Highlight	5.79 (0.65)	4.29 (2.75)
Local arrow	5.95 (0.65)	6.19 (3.27)
Global arrow	6.07 (0.73)	3.33 (2.43)
Minimap	6.70 (0.75)	4.29 (2.75)
No Cue	9.84 (1.08)	22.86 (5.69)

The 95% CI is reported in the parentheses.

Response Time

The ANOVA revealed that, on average, single cues reduced response time by 4.12 s compared to no-cue, F(3.28, 111.37) = 19.84, p < .001, $η_{p}^{2}$ = 0.37.

Percent Error

Like response time, the ANOVA showed that, on average, single cues reduced percent error by 18.67% compared to no-cue, F(2.72, 92.5) = 21.23, p < .001, $η_{p}^{2}$ = 0.38.

Effect of Dual Cueing: Redundancy Gain

Cues were categorized based on spatial precision, where they conveyed either global or local spatial information or the combination of both. Categorizing cues this way allows us to examine how the level of spatial precision conveyed impacts performance. We can also examine whether dual cues provide a redundancy gain, where a redundancy gain occurs when the combination of global and local cues improves performance above and beyond either cue alone. Data were collapsed across both levels of cue reliability, and the no-cue condition was excluded from the analysis.

Response Time

Figure 3 shows the mean response time.

Figure 3.

Mean response time as a function of cue precision and cue type. Error bars signify 1 SEM.

The ANOVA showed a significant effect of cue precision, F(1.87, 63.55) = 12.95, p < .001, $η_{p}^{2}$ = 0.28. Pairwise comparison show that local cues reduced response time more than global cues, t(34) = 2.87, p = .007, d = 0.23. Importantly, the redundant global-local cue reduced response time more than both the global (t(34) = 4.97, p < .001, d = 0.36) and local (t(34) = 2.15, p = 0.04, d = 0.14) cues alone.

Percent Error

Figure 4 shows the mean percent error.

Figure 4.

Mean error as a function of cue precision and cue type. Error bars signify 1 SEM.

Like response time, there was a significant effect of cue precision, F(1.68, 56.97) = 5.63, p = .009, $η_{p}^{2}$ = 0.14. There was no significant difference between single global and local cues, t(34) = 1.43, p = .16, d = 0.20. More critically, the dual global and local cue significantly reduced errors more than either the global (t(34) = 3.01, p = .005, d = 0.41) or local (t(34) = 2.40, p = .02, d = 0.33) cues alone.

Effect of Dual Cueing and Field-of-View

A 2 (cue type) × 2 (FOV) repeated measures ANOVA was conducted to examine how cue type influenced performance depending on target location. The no cue condition was excluded to analyze the interaction between cue type (single, dual) and spatial location of cued targets. Targets positioned more than 25 of visual angle from the center of the display were categorized as outside the immediate FOV. Only perfect cues were included, as imperfect conditions cued no objects.

Response Time

Figure 5 shows the mean response time.

Figure 5.

Mean response time for cue type and target location (filled = in FOV; unfilled = out of FOV). Error bars represent 1 SEM.

The ANOVA showed that dual cues significantly reduced response time by 0.76 s compared to single cues, F(1, 33) = 5.0, p = .03, $η_{p}^{2}$ = 0.13. The effect of FOV was also significant (F(1, 33) = 11.45, p = .002, $η_{p}^{2}$ = 0.25), targets within the FOV were found 0.63 s faster than targets outside the FOV. Although the interaction was not significant, F(1, 33) = 2.98, p = .09, $η_{p}^{2}$ = 0.1), we conducted a planned comparison to test the a priori hypothesis that dual cues would be more beneficial when targets were outside the FOV. A paired t-test showed that dual cues led to faster search times than single cues when targets were outside the FOV, t(34) = 3.76, p < .001, d = 0.22.

Percent Error

Figure 6 shows the mean percent error.

Figure 6.

Mean percent error for cue type and target location (filled = in FOV; unfilled = out of FOV). Error bars represent 1 SEM.

The main effects of cue type and FOV were not significant for error rates (ps > .50). However, the interaction was significant, F(1, 33) = 5.85, p = .02, $η_{p}^{2}$ = 0.15. Dual cues reduced errors by 2.52% compared to single cues when targets were outside the FOV, t(34) = 2.94, p = .006, d = 0.54.

Effect of Dual Cueing and Cue Reliability

To examine the effect of dual cueing and reliability on performance, we conducted a 2 (cue type) × 2 (reliability) repeated-measures ANOVA. To isolate the effect of cue reliability, the no cue condition was excluded. Table 2 presents mean performance as a function of cue reliability and cue type.

Table 2.

Mean Performance as a Function of Cue Reliability (Perfect, Imperfect) and Cue Type (Single, Dual).

Cue reliability	Cue type	Response Time (s)	Percent Error (%)
Perfect	Single	5.72 (0.31)	4.19 (1.21)
Perfect	Dual	4.96 (0.27)	1.67 (0.71)
Imperfect	Single	7.96 (0.43)	10.62 (1.88)
Imperfect	Dual	7.01 (0.38)	8.29 (1.53)

The 95% CI is reported in the parentheses.

Response Time

The ANOVA showed that dual cues (M = 5.98 s) significantly reduced response time by 0.84 s compared to single cues (M = 6.83 s), F(1, 34) = 17.38 p = .0002, $η_{p}^{2}$ = .0.34. Imperfect cues increased search times by 2.14 s compared to perfect cues, F(1, 34) = 36.27 p < .001, $η_{p}^{2}$ = 0.52. The interaction was not statistically significant, F(1, 34) = 0.30 p = .59, $η_{p}^{2}$ < 0.01.

Percent Error

Like response time, dual cues (M = 4.96%) reduced percent error by 2.43% compared to single cues (M = 7.38%), F(1, 34) = 10.64, p = .003, $η_{p}^{2}$ = 0.24. Imperfect cues increased percent error by 6.54% compared to perfect cues, F(1, 34) = 39.54, p < .001, $η_{p}^{2}$ = 0.54. The interaction was not statistically significant, F(1, 34) = 0.031, p = .86, $η_{p}^{2}$ < 0.01.

Discussion

The current work examined how AR attention cues impact visual search performance within a wide FOV naturalistic search task. We specifically assessed how cue effectiveness was impacted by cue number, spatial precision, and reliability. Confirming H_1A, cues improved search performance over an unaided search. Attention cues reduced search time by 4.12 s and improved accuracy by 18.67% compared to an unaided search. This finding replicates prior work on cue effectiveness (Warden et al., 2023; Warden, 2024; Yeh & Wickens, 2001), suggesting AR cues effectively help users allocate attentional resources within a wide search scene.

Confirming H_2A, dual cues improved search performance more than single cues, supporting a redundancy gain: combining global and local spatial information improved performance more than either cue alone. While both cues were visual, their different informational content might engage separate resource pools within the visual modality (Wickens et al., 2022), leading to better integration. Supporting H_2B, dual cues led to better performance than single cues when targets were beyond the initial FOV, where attentional shifts may be more demanding. This effect is likely amplified by the HMD’s limited FOV, which constrains peripheral vision and narrows the attentional spotlight (Eriksen & St. James, 1986). Dual cues provide coarse- and fine-grained spatial information, increasing the user’s ability to spatially distribute attention by first orienting them toward the general direction via the global cue and then precisely locate the target via the local cue. This aligns with theories on attention distribution (LaBerge & Brown, 1989), suggesting cue precision influences whether attention is distributed broadly or narrowly.

While cues improve overall search performance, there were performance costs when cues were erroneous, supporting H_1B and replicating prior work (Warden et al., 2023; Yeh & Wickens, 2001). Participants made more errors when both single and dual cues were erroneous than when the cues were correct. However, the magnitude of this performance cost was similar regardless of cue type, as indicated by the non-significant interaction. The dramatic decline in accuracy with the imperfect cues suggests that people may have exhibited a degree of automation bias, where performance is disrupted by misleading cues. This effect may be driven by an inattentional automation bias (Manzey et al., 2012), in which attention is drawn toward the cued location, narrowing the attentional spotlight at the expense of uncued areas in the scene. Another possibility is that participants overweighted the automated cues in their decision-making.

Another plausible explanation for the cost associated with imperfect cues is that targets may be located outside the immediate FOV of the device, making them harder to locate. However, the interaction between FOV and single cue reliability was not significant (p = .23), reinforcing that the performance costs of imperfect cues persist regardless of whether the target is within or outside the user’s initial view. This finding suggests that the performance cost reflects a general reliability effect rather than being driven by the spatial location of the target alone.

This work extends prior research on cue effectiveness by showing the benefits of AR cueing in wide-FOV, naturalistic search scenes. Much of the existing cueing research has focused on flat 2D displays or employed constrained FOVs, limiting generalizability to AR cueing in large search scenes which more accurately models the complexity of real-world environments. Furthermore, no studies have examined the efficacy of dual cueing systems, where each cue conveys distinct levels of spatial information, particularly in the context of a wide-FOV search scene where peripheral information is constrained by the FOV of an AR-HMD. Here, we show that dual cues—combining global and local spatial information—reduce performance costs when targets fall outside the initial FOV of the device, demonstrating an actionable AR cue strategy to improve performance. Finally, this work expands findings regarding the impact of cue reliability on search performance by assessing the effect of errors in an automated guidance system and showing how such misleading cues may disrupt attentional allocation.

Conclusions and Limitations

These findings show important performance benefits for AR attention cues during a wide FOV search task. Specifically, combining global and local spatial information provides an advantage in situations where the AR-HMD constrains peripheral vision, allowing people to find targets outside of their FOV faster and more accurately. This has important implications for operational settings where personnel are using an AR-HMD in the real-world, such as military or search and rescue teams. While cues, particularly dual cues, improve search performance, caution should be taken when the cues err. The present work shows that erroneous cues misguide attention and, therefore, degrade performance. Future work should investigate the driver behind the performance degradation with imperfect cues, particularly to assess the magnitude of the effect and the extent to which this reflects an inattentional automation bias. Future work should examine larger search scenes and dynamic scenarios where dual cues may benefit performance.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Amelia C. Warden

References

Azuma

R.T.

(1997). A survey of augmented reality. Presence: Teleoperators and Virtual Environments, 6(4), 355–385. https://doi.org/10.1162/pres.1997.6.4.355

Binetti

Chen

Kruijff

Julier

Brumby

D. P.

(2021). Using visual and auditory cues to locate out-of-view objects in head-mounted augmented reality. Displays, 69, 102032. https://doi.org/10.1016/j.displa.2021.102032

Eriksen

C. W.

St. James

J. D.

(1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception & psychophysics, 40(4), 225–240.

Hein

Bernhagen

Bullinger

A. C.

(2020). Two is better than one. improved attention guiding in ar by combining techniques. IEEE computer Graphics and Applications, 40(5), 57–66. https://doi.org/10.1109/MCG.2020.3012274

Kim

K. H.

Wohn

K. Y.

(2011). Effects on productivity and safety of map and augmented reality navigation paradigms. IEICE Transactions on Information and Systems, E94-D(5), 1051–1061. https://doi.org/10.1587/transinf.E94.D.1051

Kumaran

Kim

Y.-J.

Milner

Bullock

Giesbrecht

Höllerer

(2023). The impact of navigation aids on search performance and object recall in wide-area augmented reality. In Proceedings of the 2023 CHI conference on human factors in computing systems (pp. 1–17). Association for Computing Machinery. https://doi.org/10.1145/3544548.3581413

LaBerge

Brown

(1989). Theory of attentional operations in shape identification. Psychological Review, 96(1), 101–124.

Manzey

Reichenbach

Onnasch

(2012). Human performance consequences of automated decision aids: The impact of degree of automation and system experience. Cognitive Engineering and Decision Making, 6(1), 57–87.

Morrison

Oulasvirta

Peltonen

Lemmela

Jacucci

Reitmayr

Näsänen

Juustila

(2009). Like bees around the hive: A comparative study of a mobile augmented reality map. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1889–1898). Association for Computing Machinery. https://doi.org/10.1145/1518701.1518991

10.

Warden

A. C.

Wickens

C. D.

Rehberg

Ortega

F. R.

Clegg

B. A.

(2023). Fast, accurate, but sometimes too-compelling support: The impact of imperfectly automated cues in an AR-HMD on visual search performance, IEEE Transactions on Human-Machine Systems, 53(6), 1061–1072. https://doi.org/10.1109/THMS.2023.3302152

11.

Warden

A. C.

(2024). Strategic attention guidance: The impact of dual and single cues in wide field of view searches. Human Factors and Ergonomics Society Annual Meeting, 68(1), 725–731. https://doi.org/10.1177/10711813241275512

12.

Wickens

C.D.

McCarley

J.S.

Gutzwiller

R.S.

(2022). Applied attention theory (2nd ed.). CRC Press. https://doi.org/10.1201/9781003081579

13.

Wolfe

J. M.

(2021). Guided Search 6.0: An updated model of visual search. Psychonomic Bulletin & Review, 28(4), 1060–1092. https://doi.org/10.3758/s13423-020-01859-9

14.

Yeh

Wickens

C. D.

(2001). Display signaling in augmented reality: Effects of cue reliability and image realism on attention allocation and trust calibration. Human Factors, 43(3), 355–365. https://doi.org/10.1518/001872001775898269