Abstract
Previous researches have revealed that initiators preferentially re-orient their attention towards responders with whom they have established joint attention (JA). However, it remains unclear whether this precedence of social re-orienting is inherent to initiators or applies equally to responders, and whether this social re-orienting is modulated by the social contexts in which JA is achieved. To address these issues, the present study adopted a modified virtual-reality paradigm to manipulate social roles (initiator vs. responder), social behaviours (JA vs. Non-JA), and social contexts (intentional vs. incidental). Results indicated that people, whether as initiators or responders, exhibited a similar prioritisation pattern of social re-orienting, and this was independent of the social contexts in which JA was achieved, revealing that the prioritisation of social re-orienting is an inherent social attentional mechanism in humans. It should be noted, however, that the distinct social cognitive systems engaged when individuals switched roles between initiator and responder were only driven during intentional (Experiment 1) rather than incidental (Experiment 2) JA. These findings provide potential insights for understanding the shared attention system and the integrated framework of attentional and mentalising processes.
Introduction
Social interaction is ubiquitous and vital in human life. Its successfully navigating relies on a range of cognitive capabilities, including the ability to engage in joint attention (JA) with others. JA is defined as the ability of two or more individuals to coordinate their attention with each other so that they are attending to the same object or event of interest (Bruner, 1974; Mundy & Newell, 2007). In the prototypical JA episode, the initiators deliberately direct their gaze towards an object to guide the responder’s attention (initiating a joint attention bid, IJA), while the responders recognise the intentional aspect of this behaviour and shift their attention to the indicated location (responding to a joint attention bid, RJA) (Bruinsma et al., 2004; Emery, 2000). Researches have shown that the initiators spontaneously shift their attention to the responders with whom they have previously established JA (Bayliss et al., 2013; Edwards et al., 2015; Willemse & Wykowska, 2019). However, it remains unclear whether the initiators engaged in JA also elicit a similar attentional capture effect in the responders, and whether this attentional capture is modulated by the social contexts in which JA is achieved. Answering these questions will contribute to a comprehensive understanding of the shared attention system (SAS) and the integrated framework of attentional and mentalising processes that underpin human abilities to undertake and achieve complex social behaviours and cognition (Capozzi & Ristic, 2020; Stephenson et al., 2021). In addition, it may provide a novel approach for diagnosing and intervening in individuals with autism spectrum disorder.
Studies on IJA have revealed that initiators spontaneously make faster saccades back to social partners who are engaged in JA relative to those who are not (i.e., the precedence of social re-orienting) (Bayliss et al., 2013; Willemse et al., 2018; Willemse & Wykowska, 2019). Moreover, Edwards and colleagues investigated the attentional mechanism underlying this propensity, finding the social partners who responded to JA captured the initiator’s covert attention (Edwards et al., 2015, 2022). This attentional capture effect is also modulated by the social partners’ disposition to respond to JA (Willemse et al., 2018; Willemse & Wykowska, 2019). Similarly, neuroscientific researches evaluating self-IJA have reported that the congruent gaze responses elicit larger N170, smaller P350 peaks, and greater alpha-band suppression than incongruent gaze responses, indicating that greater attention is focused on the social partners engaged in JA (Caruana, de Lissa, & McArthur, 2015, 2017; Caruana & McArthur, 2019; Phillips et al., 2023; Rayson et al., 2019; Stephenson et al., 2020).
Responders, similar to initiators, may also show the social re-orienting towards their partners. Indeed, at around 12 months of age, infants begin to exhibit the capability to “check back” towards the partners whose gaze they have followed (Scaife & Bruner, 1975). This capability develops as infants grow, with older infants more frequently shifting their attention to their partners after following gaze, perhaps to verify whether their partners are indeed focusing on the same object to which they have directed their gaze (Perapoch Amado et al., 2023). Furthermore, prior works have demonstrated that re-encountering individuals who have previously engaged in JA enhances subsequent gaze-cueing effect and gaze direction identification (Dalmaso et al., 2016; Edwards & Bayliss, 2019), particularly when the observed individuals take on a leader role in gaze-based social interactions (Capozzi et al., 2016). These studies indicate that the achievement of JA facilitates the social attention of the responders. Although the Parallel Distributed Processing Model (PDPM) of JA suggests that IJA and RJA are distinct cognitive processes both functionally and developmentally (Billeci et al., 2016; Eggebrecht et al., 2017; Mundy, 2018; Mundy & Newell, 2007), available research has indicated the existence of shared mechanisms (Koike et al., 2016; Koike, Tanabe, et al., 2019; Stephenson et al., 2021). For example, Koike and co-workers conducted a hyperscanning fMRI study in which two participants performed JA tasks and found synchronised neural activity in the inferior frontal gyrus (IFG) during both IJA and RJA (Koike et al., 2016). Similarly, the IJA-related activation of the right anterior insular cortex (AIC) in the initiator was positively correlated with RJA-related activation of homologous regions in the responder in a pair-specific manner, confirming that both initiator and responder monitor and evaluate the contingency of self- and other-attention (Koike, Sumiya, et al., 2019; Koike, Tanabe, et al., 2019). Accordingly, it is natural to hypothesise that responders will preferentially re-orient their attention to those with whom they have achieved JA.
It is possible for initiators and responders to demonstrate varying levels of social re-orienting towards their respective partners. While both initiator and responder share overlapping mental representations through JA, the neurocognitive mechanisms engaged by these two agents are different (Caruana, Brock, & Woolgar, 2015; Mundy, 2018; Oberwelland et al., 2016; Redcay et al., 2012; Schilbach et al., 2010; Stephenson et al., 2021). Specifically, IJA involves recruitment of anterior regions responsible for top-down intentional control of attention, and these regions are associated with processes such as attentional control, goal-directed behaviour, and generating attention-directing behaviours, whereas RJA recruits posterior-parietal regions responsible for bottom-up reflexive control of attention, and these areas are involved in processing sensory stimuli and coordinating attention based on external cues (Caruana, Brock, & Woolgar, 2015; Koike, Sumiya, et al., 2019; Mundy, 2018; Mundy & Newell, 2007; Oberwelland et al., 2016; Stephenson et al., 2021). These distinctions in neurocognitive processes may indeed reflect the operation of distinct social orienting mechanisms. Given the inter-individual causal relationship between IJA and RJA, IJA-prominent activity is probably associated with the prediction in which the partner will shift attention to follow our own attention (forward model), whereas RJA-prominent activity reflects the control of our behaviours to follow other people’s attention (inverse model). As a result, it is reasonable to hypothesise that the social re-orienting response may differ between initiators and responders.
Social contexts in which JA is achieved are truly remarkable. In a sense, JA is a goal-oriented behaviour because its primary purpose is to share experiences (Mundy et al., 2009). This means that for JA to be truly joint, agents must not only be aware that the other is attending to the same thing they are attending to, but that there is an intention or motivation to engage in some sort of exchange (e.g., gaze-leading or gaze-following) to share experiences (Carpenter et al., 1998; Moll, 2023; Tomasello & Carpenter, 2007). Moreover, researchers emphasise that the study of JA should take place in a context of shared intentionality, in which the initiator’s attempt to establish JA is intentionally perceived by the responder (Caruana, Brock, & Woolgar, 2015; Caruana, de Lissa, & McArthur, 2015; Tomasello, 1995). However, only a few studies have motivated participants to deliberately establish JA by setting specific tasks or contexts (Caruana, Brock, & Woolgar, 2015; Caruana, de Lissa, & McArthur, 2015; Caruana & McArthur, 2019; Caruana, McArthur, et al., 2017; Redcay et al., 2012). For instance, Caruana and co-workers developed a virtual-reality paradigm in which participants cooperated with an avatar to catch a burglar hiding in one of six houses (Caruana, Brock, & Woolgar, 2015). This paradigm created a social context that (1) elicited intentional, goal-driven JA, and (2) required participants to monitor the attention of their social partner throughout the collaboration to correctly interpret gaze cues and then achieve joint goals (Caruana, Brock, & Woolgar, 2015; Caruana, McArthur, et al., 2017).
Indeed, without intention, JA can also occur by accident or just through coincidence when members of a dyad happen to look at the same thing at the same time (Koike, Tanabe, et al., 2019; Oberwelland et al., 2016; Pfeiffer et al., 2014; Rayson et al., 2019; Schilbach et al., 2010). Most importantly, people can still benefit from monitoring this JA, such as facilitating gaze perception and social cognition (Bayliss et al., 2013; Edwards et al., 2019; Stephenson et al., 2020; Willemse et al., 2018; Willemse & Wykowska, 2019). This dovetails with the SAS, which argues that JA necessarily leads to an exchange of information about the environment and the mental states of the parties involved, no matter whether this is achieved intentionally or incidentally (Stephenson et al., 2021). Furthermore, fMRI studies indicate that both IJA and RJA activate partially overlapping brain regions, such as the ventral striatum, right AIC, IFG, dMPFC, and pSTS, regardless of the social context in which JA is intentionally or incidentally established (Caruana, Brock, & Woolgar, 2015; Koike, Tanabe, et al., 2019; Redcay et al., 2012; Schilbach et al., 2010). However, ERP works have shown that JA elicits a greater N170 than non-JA in incidental rather than intentional social contexts (Caruana, de Lissa, & McArthur, 2017; Caruana & McArthur, 2019; Stephenson et al., 2020). Given these findings, it is logical to cautiously infer that the social context in which JA is achieved (intentional vs. incidental) may play a role in the effect of social role on social re-orienting.
To verify the hypotheses mentioned above, the current study utilised a modified virtual-reality paradigm (Caruana, Brock, & Woolgar, 2015) to manipulate social roles (initiator vs. responder), social behaviour (JA vs. Non-JA), and social contexts (intentional vs. incidental). In Experiment 1, participants cooperated with a virtual partner to complete a task known as “Catch the Gunman and Rescue the Hostage,” creating a social context of shared intentionality. According to our prediction, responders, similar to initiators, would preferentially re-orient their attention to the partners with whom they have established JA. The measurement indices of attentional re-orienting included faster saccade latencies and larger changes in pupil size (showing pupil constriction) in the JA condition than in the Non-JA condition (Bayliss et al., 2013; Blini & Zorzi, 2022; Liao et al., 2021). In Experiment 2, participants performed either a “Catch the Gunman” or a “Rescue the Hostage” task, while the virtual partner’s task was the opposite of the participants’ task, creating a social context in which JA was incidentally established. If the effect of social roles (if observed in Experiment 1) is independent of the social contexts in which JA occurs, it would be expected to remain consistent across incidental social context.
Experiment 1
Method
Participants
A total of 32 undergraduate participants (22 females) with an average age of 19.8 ± 1.5 years took part in the experiment. They were unaware of the research purpose and had normal or corrected-to-normal visual acuity. To achieve a comparable effect size for the two-way interaction (Social Role × Social Behaviour) within a subject design, similar to previous research conducted by Caruana, Brock, and Woolgar (2015) (η2p = 0.52), we employed MorePower 6.0.4 (Campbell & Thompson, 2012) to estimate the sample size. The results suggested that a minimum of fourteen participants would yield 95% power at a significance level of .05. Considering potential data loss of approximately 15%–20%, as observed in prior studies (Willemse et al., 2018; Willemse & Wykowska, 2019), we deliberately recruited 32 participants to maximise lab availability. All participants provided written informed consent.
Stimuli and apparatus
The stimuli consisted of an anthropomorphic virtual character generated using FaceGen Modeller 3.4.1(Copyright © 2009, singular Inversions Inc.) and subtending 4° × 6° of visual angle in the centre of the screen. The character’s gaze included five directions: looking straight, left, right, up and down. Four houses, each with a visual angle of 4° × 4°, were placed to the left, right, above and below the character, with a visual angle of 2° from the character’s eyes (see Figure 1). There were seven conditions for the house: empty, concealed gunman holding hostage, captured gunman, rescued hostage, flashing warning light, round handle door, and diamond handle door (see Figures 1 and 2).

Gaze areas of interest overlaid on participants’ view of stimuli, represented as blue rectangles.

Changes in the house that participants focused on at each phase.
Eye tracking and pupillometry data were recorded using an EyeLink 1000 Plus System (version 5.15; SR Research, Inc.) at a sampling rate of 1,000 Hz. Stimuli were presented on a 24-inch LCD monitor (1920 × 1080 pixels; 144 Hz) using Experiment Builder presentation software (SR Research), with a viewing distance of approximately 62 cm. The stimulus screen with grey background was divided into five gaze-related areas of interest (GAOI), one for each of the four houses, and the virtual character (see Figure 1). These GAOIs were used to monitor participants’ gaze online so that the character’s behaviour could be adapted accordingly by our gaze-contingent algorithm. A chin and forehead rest was used to maintain head stability.
Social task and procedure
A 2 (Social Role: initiator vs. responder) × 2 (Social Behaviour: JA vs. Non-JA) within-subject repeated measures factorial design was conducted, adopting a modified virtual-reality paradigm in which participants’ social role was naturally informed without overt instruction (Caruana, Brock, & Woolgar, 2015). Participants were engaged in a task called “Catch the Gunman and Rescue the Hostage” in which they were told that a gunman had taken a hostage and was hiding in one of the houses, and that they had to cooperate with an on-screen virtual character to locate and catch the gunman so that the hostage could be rescued. A deception induction was used to enhance the ecological validity of the experiment. Participants were told that a real person named Dylan in the adjacent eye-tracking lab was controlling the virtual character’s gaze behaviour via live infrared eye tracking (it was actually controlled by a gaze-contingent algorithm). They were also informed that their own gaze behaviour would be displayed on a virtual character that Dylan could see on his screen. To reinforce the deception, participants had a tour of the two adjacent eye-tracking laboratories before performing the experiment.
As shown in Figure 3, each trial in the search phase began with a central cross fixation subtending 1.5° of visual angle. Participants were asked to fixate the cross until it was replaced by four houses and a character (800 ms). At this point, both participants and the character were instructed to freely search their own allocated houses (distinguished by the door handles: round for participants vs. diamond for character; participants could not see inside the character’s house and vice versa, see Figures 1 and 3). The door opened when participants looked at the house, which either concealed a hostage-taking gunman or nothing at all, and closed when they looked elsewhere. Correspondingly, the character’s face would be updated to depict ongoing search behaviour, with its gaze averted between 0 and 1 additional times.

Schematic representation of trial sequence by condition. Panel (a): Examples of IJA and Non-IJA trials in the initiating condition were depicted. Panel (b): Examples of RJA and Non-RJA trials in the responding condition were depicted. Eye symbol represented the location of the participant’s gaze and was not visible to the participant.
When participants found the hostage-taking gunman (the initiation condition), they were instructed to look at the character and wait for him to look back (500 ms). Once the character looked back, participants were required to execute an initiating saccade from the character to the house where the hostage-taking gunman was located (catching the gunman while signalling the location of the hostage). The door of that house was opened again to reveal the gunman behind prison bars (indicating capture, 500 ms) (see Figure 3, top row). The character responded by either following participants’ gaze towards the same house (rescuing the hostage, IJA) or shifting its gaze towards one of the other three houses (eliminating the danger, Non-IJA). Contrastingly, if participants found all their houses empty (the response condition), they were instructed to look at the character and wait for him to look back. The character would shift its gaze to the participants (500 ms) and then to house where the hostage-taking gunman was located (catching the gunman while signalling the location of the hostage). At this point, participants were required to execute a responding saccade from the character to the same house (its door opened, showing that the hostage had been rescued, RJA) or to one of the other three houses where a blinked warning light appeared (its onset time was less than 150 ms, as the average minimum latency for planning and initiating a saccade is 150–175 ms) (Rayner et al., 1983) to remove the existing danger (Non-RJA, see Figure 3, bottom row). When participants executed a responding or initiating saccade to the appropriate house and fixated it for 1,000 or 1,500 ms, the door of that house closed again, indicating that participants had successfully completed the task and signalling them to return their gaze to the central character as quickly and accurately as possible. After fixating the character for 500 ms, the next trial began.
In both initiating and responding conditions, negative feedback was provided for trials in which participants (1) failed to fixate the house where the warning light or the hostage-taking gunman appeared, (2) fixated away from the house before its door was closed, (3) took longer than 3,000 ms to fixate back on the character after the door was closed. Here, the warning light or the hostage-taking gunman would be shown in yellow for 1,000 ms.
To familiarise the experimental task, participants underwent a training session comprising two blocks, each corresponding to a specific search location: either horizontal left-right or vertical up-down. Each block included 3 trials per condition, resulting in a total of 24 trials across both blocks. Subsequently, participants proceeded to the experimental session, which comprised six blocks. Each block encompassed 48 trials. The search locations alternated between blocks, with the order of the blocks being counterbalanced across participants. Within each block, every condition (RJA, Non-RJA, IJA, Non-IJA) was matched based on the (1) gunman location, (2) location of search houses, (3) number of houses to be searched at the beginning of each trial, and (4) the number of eye movements of the character before looking back at participants.
Each block began with a standard 9-point camera calibration and validation process. If there were any instances of eye drift during the experiment, recalibration was performed to maintain accurate tracking. At the end of each block, participants were asked to estimate the proportion of trials in which they successfully caught the gunman. This estimation served as a measure of their engagement in the task. In addition, there were unrestrained rests between blocks. On average, the entire experimental procedure lasted approximately 60 min.
Data processing
EyeLink Data Viewer (v.4.1.1), R environment (v.4.3.1), and SPSS (v.21) were used for raw data pre-processing, statistics and visualisation. For each trial, the onset latency of return-to-character saccade and pupil size were measured. Saccadic latency was defined as the time taken for participants to make a saccade from the closed-door house (i.e., the door of the fixated house was closed again as a sign of the successful completion of the task) to the character’s face after initiating or responding to JA. Changes in pupil size were analysed using subtractive baseline correction, following the methodology described in previous literature (Mathot et al., 2018; Wang et al., 2015). For each trial, the baseline pupil value was determined by measuring the original pupil values from 0 to 200 ms after returning to fixate the character, and then this baseline pupil value was subtracted from the original pupil values measured from 200 to 500 ms. Evidence from the literature indicates that pupillary changes are a reliable index of mental effort and the intensity of attentional processes (Aminihajibashi et al., 2020; Liao et al., 2021; van der Wel & van Steenbergen, 2018). For example, pupil constriction is associated with attentional capture (Blini & Zorzi, 2022; Liao et al., 2021). Certain criteria were applied to clean and filter the data. Saccade direction errors and saccades with blinks, correct saccades with latencies smaller than 60 ms (i.e., anticipatory saccades) (Huestegge & Kreutzfeldt, 2012; Wenban-Smith & Findlay, 1991) or greater than 1,800 ms (i.e., slower saccades) (Bayliss et al., 2013; Willemse et al., 2018; Willemse & Wykowska, 2019) were excluded from further analysis. Trials with mean saccadic latencies or pupil sizes falling outside the range of three standard deviations were also excluded (i.e., outliers). With these criteria, participants with fewer than 50% of valid trials were discarded from the final analyses.
Results
Based on the above-mentioned criteria, three participants with more than 50% missing data were excluded. For the remaining participants, the data trimming method resulted in 20.9% (saccade errors & blinks: 19%, others: 1.9%) of saccade data and 6.6% (blinks: 2.6%, others: 4%) of pupillometry data being discarded.
Mean saccadic latencies were submitted to a repeated measures analysis of variance (ANOVA) with two within-subject factors: Social Role (initiator vs. responder) and Social Behaviour (JA vs. Non-JA). This showed a significant main effect of Social Behaviour, F(1, 28) = 31.996, p < .001, η2p = .533, with shorter onset latencies of return-to-face saccades on JA (M = 356 ms, SE = 11.76) than on Non-JA trials (M = 392 ms, SE = 12.58), confirming that participants preferentially re-oriented their attention to the interaction partners who engaged in JA compared with those who did not (i.e., the precedence of social re-orienting). There was also a significant main effect of Social Role, F(1, 28) = 34.622, p < .001, η2p = .553, with longer saccade latencies in the responding (M = 397 ms, SE = 14.33) than in the initiating condition (M = 351 ms, SE = 10.10), revealing that when responding to JA, participants were slower to re-orient attention to their partner compared with when initiating JA. However, the interaction between Social Role and Social Behaviour was not significant, F(1, 28) = 0.808, p = .377, η2p = .028, indicating that the precedence of social re-orienting when responding to JA was similar to that when initiating JA (see Figure 4a).

Results of two experiments. Left column: mean onset latencies for the return-to-face saccades for all experimental conditions of Experiments 1 (a) and 2 (c). Right column: changes in pupil size for the return-to-face fixation for all experimental conditions of Experiments 1 (b) and 2 (d). Error bars: ± 1 SEM.
Similarly, a 2 (Social Role: initiator vs. responder) × 2 (Social Behaviour: JA vs. Non-JA) repeated measures ANOVA on changes in pupil size revealed a significant main effect of Social Behaviour, F(1, 28) = 9.131, p = .005, η2p = .246., with larger changes in pupil size (showing pupil constriction) were observed in the JA condition (M = −25.83, SE = 6.63) than in the Non-JA condition (M = −18.70, SE = 6.43), meaning that participants allocated more attention to the partners who engaged in JA with them than to those who did not. The main effect of Social Role was found, F(1, 28) = 18.032, p < .001, η2p = .392, with smaller changes in pupil size were observed in the responding trials (M = −14.80, SE = 5.42) than in the initiating trials (M = −29.73, SE = 7.70), indicating that when responding to JA, participants allocated less attention to their partner compared with when initiating JA. The interaction between Social Role and Social Behaviour was also found, F(1, 28) = 7.691, p = .010, η2p = .215. Post hoc pairwise comparisons showed that larger changes in pupil size following JA than Non-JA were found only when responding to JA, t(28) = −3.219, p = .003, Cohen’s d = 0.470, 95% confidence interval (CI) = [−24.88, −5.53], but not when initiating JA, t(28) = 0.394, p = .697, Cohen’s d = 0.022, 95% CI = [−3.98, 5.87], see Figure 4b, suggesting that participants balanced the proportion of attentional resources allocated to their partner when taking on the role of initiator compared with the role of responder.
Discussion
Consistent with previous reports (Bayliss et al., 2013; Edwards et al., 2015; Willemse et al., 2018; Willemse & Wykowska, 2019), Experiment 1 confirmed that people, whether as initiators or responders, preferentially shifted their attention to the interaction partners with whom they had established JA. This finding also validated the reliability of the JA task used in this experiment. Notably, individuals exhibited faster social re-orienting responses and larger pupil constriction when initiating JA than when responding to JA, suggesting that the cognitive mechanisms involved in initiating JA may be different from those involved in responding to JA (e.g., attentional abilities to interpret and respond to social cues vs. social functions to monitor others’ actions and intentions) (Mundy, 2018; Siposova & Carpenter, 2019). This speculation is supported by the available evidence, which suggests that IJA involves a top-down, volitional cognitive process in which individuals intentionally direct and monitor their partner’s attention towards a shared target, whereas RJA involves a bottom-up, spontaneous cognitive process in which individuals interpret and shift their gaze to their partner’s attentional focus to share attention (Koike, Tanabe, et al., 2019; Mundy et al., 2009; Mundy & Newell, 2007). Most importantly, although agents failed to balance the proportion of attentional deployment when playing the role of responder compared with the role of initiator, these two roles showed a similar prioritisation pattern of social re-orienting. These findings suggest that the enhanced effect of JA on social re-orienting is consistent across initiators and responders, even though the cognitive mechanisms or motivations behind their attentional deployment may differ.
Experiment 2
JA can also occur incidentally when members of a dyad happen to be looking at the same thing (Koike, Tanabe, et al., 2019; Oberwelland et al., 2016; Pfeiffer et al., 2014; Rayson et al., 2019; Schilbach et al., 2010). Crucially, members can still benefit from perceiving, interpreting, and evaluating this JA, for example, by facilitating gaze perception and social cognition (Dalmaso et al., 2016; Edwards et al., 2019; Stephenson et al., 2020; Willemse et al., 2018; Willemse & Wykowska, 2019). Furthermore, recent works have shown that social orienting is facilitated even when the occluding barriers are introduced to prevent the partner from “seeing” what agents see (Kuhn et al., 2018; Morillo-Mendez et al., 2023), revealing that social orienting involves both attentional (i.e., responding to gaze direction) and mentalising (i.e., attributing of intentions/interests) processes (Capozzi & Ristic, 2018, 2020). Given these findings, it is reasonable to ask whether the effects of social roles on social re-orienting observed in Experiment 1 would also be present in incidental JA. To address this question, Experiment 2 created an incidental JA context in which participants performed either a “Catch the Gunman” or a “Rescue the Hostage” task, while the virtual partner’s task was the opposite of the participants’ task. If the effect of social roles is independent of the social context in which JA is achieved, it would persist in the context of incidental JA.
Method
Participants
Thirty-one new naïve undergraduates participated (19 females, mean age = 18.6 ± 1.1 years). All participants were unaware of the purpose of the experiment and reported normal or corrected-to-normal vision. Each participant provided written informed consent prior to the experiment and received monetary compensation. Two participants were excluded from the analysis due to excessive blinking, which resulted in the loss of substantial data.
Stimuli and apparatus
Everything was similar to Experiment 1, except that the manipulation of the house was changed, i.e., the hostage-taking gunman was replaced by the gunman, and the flashing warning light was removed (see Figures 2 and 5).

Schematic representation of trial sequence by condition. Panel (a): Examples of IJA and Non-IJA trials in the initiating condition were depicted. Panel (b): Examples of RJA and Non-RJA trials in the responding condition were depicted. Eye symbol represented the location of the participant’s gaze and was not visible to the participant.
Social task and procedure
Others were identical to Experiment 1 except for the change in task scenario. Participants and an on-screen virtual character were instructed to complete either a “Catch the Gunman” or a “Rescue the Hostage” task individually (see Figure 5). Participants were informed that a hostage-taking gunman was hiding in one of four houses, but the exact location of the hostage was unknown until the gunman was captured. If participants found the gunman, they had to look back at the character and wait for him to look back. After that, participants had to execute the catch by looking back at the location of the gunman (i.e., ‘‘Initiating’’ JA bid). At this point, the character was given the location of the hostage and performed the rescue by looking at one of four houses. Instead, if participants did not find the gunman, they had to look back at the character, and then wait for him to catch the gunman and reveal the location of the hostage. At this point, participants had to perform the rescue by shifting their gaze towards the location where the hostage was presented (i.e., “Responding’’ to JA bid). The hostage could appear in the house where the gunman was (RJA), or in any of the other three houses (Non-RJA).
Results
Using a similar data trimming and filtering procedure as in Experiment 1, 21% (saccade errors & blinks: 19%, others: 2%) of the saccade data and 4.6% (blinks: 2.2%, others: 2.4%) of the pupillometry data were discarded.
A 2 (Social Role: initiator vs. responder) × 2 (Social Behaviour: JA vs. Non-JA) repeated measures ANOVA was conducted on mean saccadic latencies, which revealed a significant main effect of Social Behaviour, F(1, 28) = 32.379, p < .001, η2p = .536, the onset latencies of return-to-face saccades were shorter for JA (M = 325 ms, SE = 12.73) than for Non-JA trials (M = 344 ms, SE = 14.63), indicating that participants rapidly re-oriented their gaze towards the interaction partners who engaged in JA compared with those who did not. The main effect of Social Role, F(1, 28) = 3.725, p = .064, η2p = .117, and the interaction between Social Behaviour and Social Role, F(1, 28) = 3.340, p = .078, η2p = .107, were not significant, demonstrating that the prioritisation pattern of social re-orienting when responding to JA was similar to that when initiating JA (see Figure 4c).
Likewise, a 2 (Social Role: initiator vs. responder) × 2 (Social Behaviour: JA vs. Non-JA) repeated measures ANOVA on changes in pupil size yielded a significant main effect of Social Role, F(1, 28) = 69.791, p < .001, η2p = .714, namely greater pupil constriction was observed in the initiating condition (M = −53.40, SE = 9.00) than in the responding condition (M = −20.75, SE = 7.08), indicating that when initiating JA, participants assigned more attention to their partner compared with when responding to JA. The main effect of Social Behaviour, F(1, 28) = 1.223, p = .278, η2p = .042, and the interaction between Social Behaviour and Social Role, F(1, 28) = 3.573, p = .069, η2p = .113, were not significant, suggesting that, similar to initiating JA, participants balanced the proportion of attentional resources allocated to their partner when responding to JA (see Figure 4d).
Importantly, to provide statistical evidence that the effect of social role on the precedence of social re-orienting is independent of social context in which JA is established (intentional vs. incidental), we conducted a 2 (Experiment: exp.1 vs. exp.2) × 2 (Social Role: initiator vs. responder) × 2 (Social Behaviour: JA vs. Non-JA) mixed ANOVA on mean saccadic latencies and changes in pupil size. For saccadic latencies, the analysis revealed significant main effects of the three factors, participants were faster to shift their gaze towards the partner who engaged in JA compared with those who did not, F(1, 56) = 59.053, p < .001, η2p = .513, and they did so earlier in the initiation condition than in the response condition, F(1, 56) = 23.867, p < .001, η2p = .299, also in the incidental JA context than in the intentional JA context, F(1, 56) = 4.864, p = .032, η2p = .08. The interaction between Social Behaviour and Experiment was observed, F(1, 56) = 5.113, p = .028, η2p = .084. Follow-up pairwise comparisons revealed that gaze shifts were consistently faster following JA than Non-JA in each JA context (ps < 0.001). No other interactions reached statistical significance (Fs ⩽ 3.495, ps ⩾ 0.067). For changes in pupil size, we found the main effect of Social Role, F(1, 56) = 81.916, p < .001, η2p = .594; the interaction between Social Role and Experiment, F(1, 56) = 11.362, p = .001, η2p = .169; the interaction between Social Behaviour and Experiment, F(1, 56) = 8.417, p = .005, η2p = .131. Post hoc contrasts showed that pupil constriction was consistently greater when initiating JA than when responding to JA in each JA context (ps < 0.001). Meanwhile, larger pupil constriction following JA than Non-JA was observed only in the intentional context (p = .004), but not in the incidental context (p = .268). Three-way interaction between Social Behaviour, Social Role, and Experiment was also found, F(1, 56) = 10.846, p = .002, η2p = .162, indicating that the effects of Social Behaviour and Social Role on changes in pupil size were modulated by the social contexts in which JA was either intentional or incidental. No other main or interaction effects reached statistical significance (Fs ⩽ 3.261, ps ⩾ 0.076).
Discussion
Dovetailing with Experiment 1, Experiment 2 showed that the prioritisation of social re-orienting when responding to JA was similar to that when initiating JA. These consistent findings provide evidence that the precedence of social re-orienting is an inherent social attention mechanism in humans, regardless of the social roles (initiators vs. responders) and social contexts (intentional vs. incidental) in which JA is established. However, the effects of social roles on the speed and attentional allocation of social re-orienting observed in Experiment 1 were absent in Experiment 2. These inconsistent findings imply that the distinct socio-cognitive mechanisms driven by initiating and responding to JA may be successfully expressed in the intentional context, but not in the incidental context, leading to a reduction or disappearance of the resulting differences in attentional allocation. Furthermore, the effect of social behaviour was stronger in the intentional context (M = 35.79 SE = 6.33) than in the incidental context (M = 19.52, SE = 3.43), F(1, 56) = 5.113, p = .028, η2p = .084. One possible interpretation for this is that the intentional context stimulates individuals’ willingness to engage in JA with their social partners, leading them to invest more social cognitive resources, which in turn increases perceptual sensitivity to the JA of their social partners.
General discussion
Studies have shown that the precedence of social re-orienting towards the responder engaged in JA is an orienting mechanism in the initiator (Bayliss et al., 2013; Edwards et al., 2015). However, little is known about whether this orienting mechanism is unique to initiators or applies equally to responders, and whether it is modulated by the social contexts in which JA is established. To do so, the current study employed a modified virtual-reality paradigm to manipulate social roles (initiators vs. responders), social behaviours (JA vs. Non-JA), and social contexts (intentional vs. incidental) (Caruana, Brock, & Woolgar, 2015). Results from two experiments revealed that individuals, whether as initiators or responders, showed a similar prioritisation pattern of social re-orienting, and this was independent of the social contexts in which JA was achieved, confirming that the precedence of social re-orienting is an inherent social attentional mechanism in humans. Furthermore, the distinct social cognitive systems engaged when individuals switched roles between initiator and responder were driven only during intentional JA (Experiment 1), but not during incidental JA (Experiment 2). These findings provide potential insights for understanding the SAS and the integrated framework of attentional and mentalising processes (Capozzi & Ristic, 2020; Stephenson et al., 2021).
Consistent with previous works on gaze leading (Bayliss et al., 2013; Edwards et al., 2015, 2022; Willemse et al., 2018; Willemse & Wykowska, 2019), the present results revealed that participants preferentially directed their gaze towards individuals with whom they had established JA. While there could be alternative explanations, we speculate that these consistent findings may be attributed to early attentional orienting towards individuals engaged in JA, as evidenced by greater pupil constriction in the JA condition than in the Non-JA condition. This speculation aligns with recent reports highlighting the significant role of pupil constriction in attentional bias and attractiveness (Blini & Zorzi, 2022; Liao et al., 2021). In other words, the attentional priority observed in this study may be due to attentional capture. Consistent with this interpretation, previous electrophysiological studies have indicated that larger N170, smaller P350 peaks, and greater suppression of the alpha-band are elicited by JA compared with Non-JA, implying that people allocate more attentional resources to the person with whom they are engaged in JA (Caruana, de Lissa, & McArthur, 2015, 2017; Caruana & McArthur, 2019; Phillips et al., 2023; Rayson et al., 2019; Stephenson et al., 2020). Alternatively, the enhanced or prioritised processing of peripherally presented joint gaze is likely to be driven by higher-level (social-) cognitive processes, such as affective evaluation and intentional stance (Bayliss et al., 2013; Caruana & McArthur, 2019; Caruana, Spirou, & Brock, 2017; Willemse et al., 2018; Willemse & Wykowska, 2019). People may evaluate the social alignment experienced during JA as a sign of affiliation and acceptance, which elicits positive affective responses compared with avoiding JA.
An interesting yet surprising finding is that the precedence of social re-orienting is an inherent social attentional mechanism in humans, regardless of social roles (initiators vs. responders) and social contexts (intentional vs. incidental) in which JA is achieved. One simple potential explanation for why responders exhibit a similar prioritisation pattern of social re-orienting as initiators is that these two roles have a shared or collective representation, which drives the partnership to become a joint agent, leading these two roles to engage in mutualistic social re-orienting. This explanation is supported by evidence of overlap in brain mechanisms (Hadders-Algra, 2022; Mundy, 2018; Stephenson et al., 2021). For example, recent hyperscanning fMRI studies have revealed that when monitoring and evaluating self- and other-attention representations, the activation of the right AIC in responders shows a positive correlation with activation in homologous regions in the initiators in a pair-specific manner (Koike, Sumiya, et al., 2019; Koike, Tanabe, et al., 2019). This implies that the analogous prioritisation pattern of social orienting observed in both roles may result from an inter-individual equivalent of the IJA-related attentional representations of initiators with the RJA-related attentional representations of responders in a pair-specific manner.
However, both the speed and attentional allocation of social re-orienting were influenced by the social roles and social contexts, as faster re-orienting responses and larger pupil constriction were observed when initiating JA than when responding to JA in intentional rather than incidental contexts. One possible explanation for these inconsistent findings is that achieving intentional JA activates distinct functions within the attentional monitoring systems of initiators and responders. As we all know, JA depends on the dynamic and synergistic unfolding of multiple processes, including attentional abilities to interpret and respond to social cues, and social functions to monitor others’ actions and intentions (Mundy, 2018; Siposova & Carpenter, 2019). When initiators intentionally direct their partner’s attention to a specific target, they may monitor both their own and their partner’s attentional focus, to interpret their partner’s response to their behavioural guidance and to coordinate each other’s attention towards the shared target. Thus, initiators may devote more attentional resources to their partner than to the shared target. Correspondingly, when responders recognise and respond to their partner’s behaviours, they may monitor and interpret the intentions behind the behaviours more closely to share experiences. Thus, responders may allocate more attentional resources to the behavioural target than to their partner to better understand intentions. As a result, responders allocated fewer attentional resources to their partners than initiators, resulting in an inability to balance the proportion of attentional allocation required to optimally monitor each partner.
Alternatively, these differences in social re-orienting may be explained by the fact that initiators and responders use distinct socio-cognitive mechanisms, such as social motivation and reward value, to evaluate the achievement of intentional JA. This speculation is supported by the available evidence that different social cognitive processes are involved in initiating and responding to JA (Mundy, 2018; Mundy & Newell, 2007; Siposova & Carpenter, 2019; Stephenson et al., 2021). Specifically, IJA involves a top-down volitional cognitive process where individuals intentionally guide their partner’s attention towards a shared target, while RJA involves a bottom-up spontaneous cognitive process where individuals observe and shift their gaze towards their partner’s attentional focus to share attention (Koike, Tanabe, et al., 2019; Mundy et al., 2009; Mundy & Newell, 2007). Similarly, neurological data show the differences in neural-cognitive systems activated when initiating and responding to JA bids (Mundy, 2018; Stephenson et al., 2021). In particular, striatal activation, which is strongly related to the social motivation and reward value of shifting attention, differs between IJA and RJA (Caruana, Brock, & Woolgar, 2015; Eggebrecht et al., 2017; Oberwelland et al., 2016; Redcay et al., 2012; Schilbach et al., 2010). Accordingly, it is possible that when initiators and responders evaluate the establishment of intentional JA, they create a proximal reward mechanism that motivates them to perceive a state of intentional JA as a rewarding. From the initiators’ perspective, evaluating the partner’s response to JA bid may be perceived as a higher reward value, so that more attention is focused on the partner compared with the shared target. From the responders’ perspective, evaluating the intentions behind the partner’s attentional behaviour may be perceived as a higher reward value, so that less attention is focused on the partner compared with the shared target.
It should be noted, however, that the effect of social roles on social re-orienting observed in the intentional context (Experiments 1) was absent in the incidental context (Experiment 2). This pattern of results may be explained by the reduced magnitude of intentions to share attention. When JA is established incidentally, neither initiators nor responders consciously control the allocation of their attention without specific social motivation, resulting in the attenuation or disappearance of differences in the allocation of attentional resources. Another, a higher-level cognitive explanation is that the shared representations formed during incidental JA may reduce the cognitive demands of understanding the intentions behind a partner’s behaviour, resulting in a lack of differences in the social cognitive strategies employed by initiators and responders. In summary, these findings provide preliminary evidence that distinct social cognitive systems are engaged when individuals switch roles between initiator and responder during intentional JA, but not during incidental JA.
Conclusion
The present study examined whether and how social roles and social contexts influence the precedence of social re-orienting towards individuals engaged in JA. Results from two experiments revealed that people, whether as initiators or responders, showed a similar prioritisation pattern of social re-orienting, and this was independent of the social contexts (intentional vs. incidental) in which JA was achieved, demonstrating that the prioritisation of social re-orienting is an inherent social attentional mechanism in humans. It should be noted, however, that the distinct social cognitive systems engaged when individuals switched roles between initiator and responder were only driven during intentional rather than incidental JA. These findings provide potential insights for understanding the SAS and the integrated framework of attentional and mentalising processes.
Footnotes
Author contributions
Conceptualisation: Y.T., J.Z., Y.W.. Methodology: Y.T., J.Z., Y.W. Software: Y.T., M.H. Formal analysis: Y.T., T.Z. Investigation: Y.T., M.H., M.Y. Data curation: Y.T., T.Z. Visualisation: Y.T., M.H., M.Y. Writing-original draft: Y.T. Writing-review & editing & supervision: J.Z., Y.W.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Humanity and Social Science Youth Foundation of the Ministry of Education of China (22YJC190030). The funding organisations played no role in the development of the study design or the collection, analysis, and interpretation of the data.
Ethical approval
This study was performed in line with the Principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the School of Psychology, Shaanxi Normal University (Approval No. HR 2021-05-006).
Informed consent
Informed consent was obtained from all individual participants included in the study.
Data and code availability
The data and the codes are available from the corresponding authors on reasonable request.
