Multisensory Contributions in Joint Actions: A Scoping Review

Abstract

Solving tasks with others is fundamental in our daily life and requires coordinating actions with other agents in time and space. To manage such real-time interactions, humans must deal with uncertainty caused by noise and delays in sensory and motor signals. One mechanism the sensorimotor system may employ to reduce uncertainty is exploiting information from multiple sensory systems. Here, we review empirical studies examining how visual, auditory, and haptic information contribute to joint actions. A systematic search following PRISMA guidelines yielded 24 eligible studies, which we classified according to the taxonomies by Knoblich et al. (2011) – emergent vs planned coordination – and Jarrassé et al. (2012) – co-activity, cooperation, and collaboration. Across emergent and planned coordination, access to multiple sensory channels generally enhanced interpersonal coordination. The review provides indications that the weighting of sensory signals depends on their reliability and task relevance. However, studies directly testing integration principles are rare, and learning in the context of multisensory integration in joint action remains unexplored. We argue that experimentally testing multisensory integration mechanisms in joint actions and investigating training-related changes offers valuable avenues for further research, advancing theoretical understanding and practical applications across domains such as sports, rehabilitation, and human–robot interaction.

Keywords

joint action multisensory integration interpersonal coordination sport uncertainty

Introduction

Our daily lives are filled with interactions with other people, such as navigating a crowded street, dancing Argentine tango with a partner, or passing a baton to a teammate in a relay race. These and many other sensorimotor interactions are joint actions, described as ‘any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment’ (Sebanz et al., 2006, p. 70). As illustrated by these examples, there are many different types of joint tasks with different characteristics.

One distinction that is regularly made in the study of joint actions (e.g., Felsberg & Rhea, 2021) is whether the coordination is instructed (or intentional) (e.g., Noy et al., 2017) or spontaneous (or unintentional) (e.g., Richardson et al., 2005). Coordination is called instructed when the agents involved in the joint task are explicitly advised to coordinate with each other. This is, for example, the case during handover tasks, when one participant is supposed to pass an object to another person (e.g., Brand et al., 2022). In contrast, coordination is called spontaneous when agents are required to perform a task together without being explicitly instructed to coordinate, and coordination patterns emerge unintentionally. This emergent phenomenon has often been studied in walking, in particular with a focus on whether two people walking next to each other fall into a pattern of synchronised steps as a function of various manipulated factors (for a review, see Felsberg & Rhea, 2021).

Another – albeit closely related – classification of joint tasks was proposed by Knoblich et al. (2011), who distinguished between emergent and planned coordination (Figure 1). While emergent coordination occurs when several individuals produce coordinated movements due to perception–action couplings without sharing the goal to do so, planned coordination occurs when multiple individuals act according to a joint goal and internally represent their own role in achieving this goal. This classification is permeable insofar as emergent and planned coordination may occur during the same joint task. For example, when participants are instructed to pass an object to somebody else, this is planned coordination insofar as the agents know they must pass or receive the object, respectively. However, for a successful handover action, the passer and receiver must have their hands in the right place at the right time; hence, they need to estimate the unfolding situation in real time and adapt their behaviour accordingly, which might be solely achieved through emergent coordination. Therefore, these two types of coordination, although distinct, should be considered complementary because they typically occur in all joint tasks (Knoblich et al., 2011). However, since planned and emergent coordination generally contribute to varying degrees to the achievement of a task, Knoblich et al.’s (2011) distinction can be used to categorise joint actions according to which type of coordination dominates.

Figure 1.

Taxonomies for joint actions based on the type of coordination (adapted from Knoblich et al., 2011) and the type of joint behaviour (adapted from Jarrassé et al., 2012), with the categories used in this review highlighted in black

Finally, Jarrassé et al. (2012) proposed a more fine-grained classification of joint tasks from a computational perspective (Figure 1). In their taxonomy, a task can first be characterised as either divisible or interactive. In a divisible task, each agent involved can complete its subtask independently of their partner(s) and thus requires no further information about them. For example, when two people are painting a wall together, they do not need to coordinate their movements or constantly know where the other person is currently working to complete their part of the painting. In this case, the agents’ behaviour is described as co-activity. Conversely, a joint task is called interactive when each agent depends on another agent to accomplish the task. For example, moving a sofa to a desired new location naturally takes at least two people who must continuously coordinate their movements. Interactive tasks can appear either as competition, namely, when agents act antagonistically, or, in the case of agonistic behaviour, as collaboration or cooperation, depending on whether the agents play the same or different roles in achieving the task goal. While the case of several people moving a sofa is a good example of collaboration, cooperation takes the form of assistance when another person helps to move objects out of the way or education if another person advises on how best to lift a heavy object, such as a sofa.

In summary, the taxonomy introduced by Jarrassé et al. (2012) offers an alternative way of classifying joint actions, this time with a focus on the type of joint behaviour, which can be classified as co-activity, cooperation, or collaboration. In conjunction with the distinction between different coordination types proposed by Knoblich et al. (2011), namely, emergent vs planned coordination, the categorisation schemes outlined above help to structure existing empirical research on joint action.

Beyond the categorisation issue, it should be considered that different types of joint action tasks place different sensorimotor demands on the agents to achieve successful interpersonal coordination. In a recent review on embodiment research, Maselli et al. (2025, p. 10) highlight that addressing the sensorimotor mechanisms allowing interpersonal coordination in joint actions is one of four key challenges in the field, stating that ‘in psychology and neuroscience, there is a long tradition of studying joint action and dyadic interaction in situated settings … However, few studies focus on sensorimotor aspects of joint decision dynamics’. Accordingly, the question arises of how, exactly, these sensorimotor aspects can and should be taken into account in the context of research on joint actions.

From a sensorimotor perspective, a fundamental challenge of joint actions is that actors must coordinate their movements with co-actors under inherent uncertainty (Leibfried et al., 2015; Pezzulo et al., 2013; Russo et al., 2025) arising from, amongst other factors, noise and delays in sensory and motor signals (Faisal et al., 2008; van Beers et al., 2002). In this context, Beck et al. (2023) outline five sensorimotor mechanisms to handle uncertainty in complex situations: multisensory integration (Ernst & Banks, 2002), prior knowledge integration (Körding & Wolpert, 2004), risk optimisation (Trommershäuser et al., 2003), redundancy exploitation (Scholz & Schöner, 1999; Todorov & Jordan, 2002), and impedance control (Burdet et al., 2001). Both multisensory integration and prior knowledge integration refer to the optimal merging of different sources of information based on Bayesian principles to obtain a more robust estimate of the current situation (Körding & Wolpert, 2006). Under such circumstances, all available sensory information, as well as prior knowledge, is combined and integrated according to its reliability. The most likely estimate of our environment obtained in this way includes the state of our own body and, particularly relevant in the present context, the people present with whom we need to accomplish the joint task. Following the same principle, multisensory integration entails combining information from multiple sensory modalities (e.g., visual, auditory, haptic), each of which may provide complementary or redundant cues about task-relevant properties of the environment (Ernst & Banks, 2002; Ernst & Bülthoff, 2004; Stein & Meredith, 1993). Integrating information across modalities is thus a key mechanism for reducing uncertainty in sensorimotor control (Franklin & Wolpert, 2011). Therefore, it appears to be of great importance for successful interpersonal coordination.

However, while there is a rich body of research on multisensory integration for perception, on the one hand (e.g., Alais et al., 2010; Calvert et al., 2004; Ernst & Bülthoff, 2004; Stein et al., 2020), and on joint action, on the other hand (e.g., Pezzulo et al., 2026; Schmidt & Richardson, 2008; Sebanz et al., 2006; Sebanz & Knoblich, 2021), there is, to the best of our knowledge, no review focusing on how humans use information from multiple sensory systems in joint action tasks. Thus, this review aims to address this gap and respond to the call by Maselli et al. (2025) to establish a basis for future research on joint actions with a focus on the underlying sensorimotor mechanisms that deal with uncertainty in complex joint action tasks. We thus provide an overview of all peer-reviewed studies that have investigated the role of multiple sensory systems (visual, auditory, haptic) in joint action. To structure the findings, we group the studies by different types of coordination (Knoblich et al., 2011) and types of joint behaviours (Jarrassé et al., 2012), enabling us to explore how the use of multisensory information may vary across different joint action tasks and to derive promising directions for future research. Doing so, we extend and complement previous reviews in the field, which focused on sensorimotor synchronisation (Repp & Su, 2013) or on spontaneous interpersonal synchronisation of gait (Felsberg & Rhea, 2021), by broadening the scope of joint action tasks while focusing on studies that explicitly examine the roles of multiple sensory systems.

Methods

This review was conducted according to the PRISMA Extension for Scoping Reviews guideline (Tricco et al., 2018). The checklist can be found in the supplementary material (Appendix A).

Inclusion and Exclusion Criteria

To be included, the articles had to report empirical data on a task performed by two or more participants together. At least one participant had to be a human, while their partner could be another human, an avatar, or a robot. All the tasks had to include body movements, from finger movements in laboratory settings to full-body movements in more complex conditions, such as engaging in sport or playing music. As one of the main foci of this review is the effect of multisensory integration on joint behaviour, the included articles had to examine at least two sensory systems with a particular interest for sensorimotor control, namely, the visual, auditory, or haptic systems (cf. Leib et al., 2023). This effect had to be investigated as an independent variable, meaning that the studies had to encompass at least two experimental conditions, for example, with vs without vision. As we focused on unrestricted human behaviour, participants had to be humans (i.e., no animal studies) and, more specifically, healthy (i.e., free from health restrictions) adults (i.e., between 18 and 65 years old). The reported results had to be quantitative and behavioural, meaning that qualitative studies or those exclusively focusing on neurophysiological data were not considered. Finally, all the included papers needed to contain original data, be written in English, and be published in peer-reviewed journals by the date of the last conducted search.

Identification and Screening

The last search was conducted on 5 March 2025. We searched six databases: PsychINFO, PubMed, ScienceDirect, Scopus, SPORTDiscus, and Web of Science. The following search strategy was used: ((joint AND action) OR (interpersonal AND coordination) OR entrainment OR synchron*) AND (multisensory OR visual OR auditory OR haptic OR tactile OR touch) AND (movement OR sport OR dance). The terms ‘sport’ and ‘dance’ were included to also capture studies involving gross-motor behaviours, which are often represented in sports and performing arts literature rather than experimental movement research. The search strategy specified above had to be slightly adapted for ScienceDirect because of the limited use of Boolean operators. Details of the specific search strategies in each of the six databases can be found in the supplementary material (Appendix B).

As shown in the PRISMA flow diagram (Figure 2), 10,068 records were initially identified and subsequently imported into Zotero. Subsequently, 3,750 duplicates were removed, leaving 6,318 papers when the title screening began. The titles were independently screened by two researchers, one being the first author. If at least one researcher did not identify any reasons for exclusion on the basis of the prespecified criteria at title level, the papers were included at the abstract screening stage. The two reviewers then independently screened the remaining 659 abstracts and discussed which to keep for full-text screening in case of disagreement. The full-text screening, containing 181 texts, was performed by the first author only and led to the further exclusion of 158 items. Finally, one more article was found as a reference in another thematically related review (Kopnarski et al., 2023) and thus included. At the end of the screening process, a total of 24 papers remained in this review.

Figure 2.

PRISMA flow diagram for the literature search

Results

General Overview

The 24 included studies are listed in Table 1. For all the studies, we extracted the type of coordination (Knoblich et al., 2011) and the type of joint behaviour (Jarrassé et al., 2012) as well as the sensory systems studied and manipulated as independent variables. Table 1 also provides descriptions of the experimental task, the study design, and the main findings.

Table 1.

Studies of Multisensory Integration in Joint Actions, Sorted by Type of Coordination, Type of Joint Behaviour, and Name of (First) Author and Characterised by Investigated Sensory Systems, Experimental Task, Study Design, and Main Findings

	Type of coordination (Knoblich et al., 2011)	Type of joint behaviour (Jarrassé et al., 2012)	Authors	Sensory systems	Task	Study design	Main findings
1	Emergent (spontaneous)	Co-activity	Bigand et al. (2024)	Visual, auditory	Dyads dancing freely to music	N = 80 young adults, in pairs, experimental lab study, within-subject, human–human coordination	Visual contact synchronised full-body lateral movements and hand gestures. Music synchronised anteroposterior movements.
2	Emergent (spontaneous)	Co-activity	Demos et al. (2012)	Visual, auditory	Pairs rocking chairs side by side, without instruction about whether to coordinate or not	N = 48 undergraduates, in pairs, experimental lab study, within-subject, human–human coordination	Seeing and hearing the partner rock elicited coordination. The coupling with the partner was stronger than with the music.
3	Emergent (spontaneous)	Co-activity	Dotov et al. (2021)	Visual, auditory	Group seating and listening to music, without instruction on whether to move or not to move	N = 33 adults, in one group, experimental lab study, within-subject, human–human coordination	Better synchronisation between the participants with vision than without. No effect of stimulus tempo on interpersonal coordination.
4	Emergent (spontaneous)	Co-activity	Harrison and Richardson (2009)	Visual, haptic	Pairs freely walking or jogging, one behind the other, on a 35-m-long path	N = 16 male undergraduates, in pairs, experimental lab study, within-subject, human–human coordination	Stronger synchronisation between the front and back participants when both visually and mechanically coupled than with either visual or haptic information alone.
5	Emergent (spontaneous)	Co-activity	Miyata et al. (2021)	Visual, auditory	Pairs flexing and extending their knees in synchronisation with a metronome while standing	N = 32 adults, in pairs, experimental lab study, within-subject, human–human coordination	Better coordination with vision than without. In the no-vision condition, better with vocal interaction than without.
6	Emergent (spontaneous)	Co-activity	Nessler and Gilliland (2009)	Visual, auditory, haptic	Pairs walking side by side on treadmills without further instructions	N = 40 adults, in pairs, experimental lab study, within-subject, human–human coordination	Altering access to vision and audition had only minor effects on spontaneous synchronisation. Additional haptic information increased coordination.
7	Emergent (spontaneous)	Co-activity	Nowicki et al. (2013)	Visual, auditory	Pairs performing a finger-tapping task without further instructions	N = 26 adults, in pairs, experimental lab study, within-subject, human–human coordination	Auditory feedback of one’s own tapping increased coordination and that of the partner’s tapping decreased it. No effect of the presence or absence of vision.
8	Emergent (spontaneous)	Co-activity	Reynolds and Osler (2014)	Visual, haptic	Pairs freely standing barefoot with feet together	N = 16 adults, in pairs, experimental lab study, within-subject, human–human coordination	Performance was better with vision than without. Shoulder grasp reduced sway more than a light touch, which, in turn, was better than no contact at all.
9	Emergent (spontaneous)	Co-activity	Richardson et al. (2005)	Visual, auditory	Pairs sitting side by side and freely swinging handheld pendulums while completing a puzzle	N = 36 + 18 undergraduates, in pairs, experimental lab study, within-subject, human–human coordination	Better coordination with vision than without. Vision, but not verbal interactions, elicited unintentional coordination of the pendulum movements.
10	Emergent (spontaneous)	Co-activity	Sofianidis et al. (2015)	Auditory, haptic	Pairs swaying rhythmically side by side in the anteroposterior direction, paced by a metronome	N = 12 + 12 adults (dancers/non-dancers), in pairs, experimental lab study, within-subject, human–human coordination	In non-dancers but not in dancers, an auditory stimulus with a higher frequency reduced the interpersonal coordination obtained by touch.
11	Emergent (spontaneous)	Co-activity	Zivotofsky et al. (2012)	Visual, auditory, haptic	Pairs freely walking 70 m side by side	N = 28 adults, in pairs, experimental lab study, within-subject, human–human coordination	Auditory and haptic information elicited spontaneous synchronisation more than visual information.
12	Planned (instructed)	Cooperation	Bishop and Goebl (2015)	Visual, auditory	Pianists playing music with a video of either a pianist or a violinist	N = 31 highly skilled pianists, with video, experimental lab study, within- and between-subject, human–human coordination	Removing auditory information reduced synchronisation, while removing vision did not. When audio was absent, visual cues facilitated synchronisation.
13	Planned (instructed)	Cooperation	Chauvigné et al. (2019)	Visual, auditory, haptic	Group dancing to music while holding hands in a circle and seeing their fellow dancers	N = 14 expert folk dancers, in one group, experimental field study, within-subject, human–human coordination	Eliminating one type of sensory information as well as increasing the distance between partners impaired coordination.
14	Planned (instructed)	Cooperation	Colomer et al. (2022)	Visual, auditory	Pairs moving a slider repetitively between two targets, paced by a metronome	N = 28 + 12 adults, in pairs, experimental lab study, between-subject, human–human coordination	At frequencies higher than 2.15 Hz, the partner responsible for synchronisation dominated the interaction, unlike the partner responsible for hitting the target.
15	Planned (instructed)	Cooperation	Döhring et al. (2020)	Visual, haptic	Handover action from one person to another	N = 22 adults, in pairs, experimental lab study, within-subject, human–human coordination	Grip forces were higher when haptic information was reduced. The onset delay was larger when the passer had no visual or reduced haptic information.
16	Planned (instructed)	Cooperation	Hansen et al. (2017)	Visual, haptic	Handover action from one person to another	N = 10 adults, in pairs, experimental lab study, within-subject, human–human coordination	Distance between the partners and the mass of the object affected handover duration, but only the distance affected the handover height.
17	Planned (instructed)	Cooperation	Heinen et al. (2014)	Visual, auditory	Synchronising one’s own leaps with the leaps of a model gymnast in trampolining	N = 20 female gymnasts, in pairs, experimental field study, within-subject, human–human coordination	Quicker synchronisation with only visual than with only auditory information, but even faster when audiovisual information was available.
18	Planned (instructed)	Cooperation	Khan et al. (2020)	Visual, auditory	Stepping in time with an avatar displayed in a VR headset	N = 8 + 12 male adults, in pairs with an avatar, experimental lab study, between- and within-subject, human–avatar coordination	Better synchronisation with vision and audition than with vision only. Better coordination with the slower avatar than with the faster one.
19	Planned (instructed)	Cooperation	Noy et al. (2017)	Visual, auditory	Walking side by side and synchronising with a point-light walker or a virtual avatar	N = 8 + 4 + 3 adults, in pairs, experimental lab study, within-subject, human–avatar coordination	Faster and more precise synchronisation with auditory and audiovisual cues than with visual cues alone. When auditory and visual cues were incongruent, participants relied more on audition.
20	Planned (instructed)	Cooperation	Werner and Gorman (2023)	Visual, auditory	Driving a remote-controlled car with one person steering and the other providing visual or auditory cues	N = 48 adults, in pairs, experimental lab study, between- and within-subject, human–human coordination	Visual and audiovisual couplings resulted in faster trial times than auditory coupling alone.
21	Planned (instructed)	Collaboration	Hessels et al. (2023)	Visual, auditory	Pairs collaboratively copying a duplo model as quickly and as accurately as possible	N = 46 adults, in pairs, experimental lab study, within-subject, human–human coordination	Faster completion times for models with all the pieces visible than those with hidden pieces. Talking or not talking did not affect completion times.
22	Planned (instructed)	Collaboration	Liebermann-Jordanidis et al. (2021)	Visual, auditory	Pairs producing a musical melody together with a pair of music boxes	N = 32 adults, in pairs, experimental lab study, within-subject, human–human coordination	Interpersonal coordination was best when the sounds were different in pitch, but movement frequency between partners was congruent.
23	Planned (instructed)	Collaboration	Masumoto and Inui (2014)	Visual, auditory	Pairs performing periodic isometric pressing movements with their index fingers	N = 20 male adults, in pairs, experimental lab study, within-subject, human–human coordination	Verbal interaction impaired the coordination of force production when the partners received visual force feedback, but not in the no-feedback condition.
24	Planned (instructed)	Collaboration	Mojtahedi et al. (2017)	Visual, haptic	Pairs lifting an object and putting it back down together with one hand each	N = 72 adults, in pairs, experimental lab study, within-subject, human–human coordination	Combinations of dominant and non-dominant hands did not affect task performance, but better performance side by side than face to face.

Regarding the type of coordination, Table 1 first shows that the categories of emergent vs planned coordination proposed by Knoblich et al. (2011) correspond perfectly with the more task-related distinction of spontaneous vs instructed coordination, which was also introduced above. Furthermore, the classifications developed by Knoblich et al. (2011) and Jarrassé et al. (2012) are highly congruent. This means that all articles classified as emergent (or spontaneous) coordination were also classified as co-activity (n = 11, 46%). In contrast, all articles falling into the category of planned (or instructed) behaviour qualified as encompassing either cooperation (n = 9, 37.5%) or collaboration (n = 4, 16.5%), representing 13 studies in total (54%).

In terms of the sensory systems studied, Table 1 shows that the overall largest research focus has been on investigating how participants use visual and auditory information (n = 15, 62.5%). Fewer studies have researched the contributions of the visual and haptic systems (n = 5, 21%), the auditory and haptic systems (n = 1, 4%), or the three sensory systems together (n = 3, 12.5%). The number of studies addressing different types of coordination, types of joint behaviour, and sensory systems is illustrated in Figure 3.

Figure 3.

Number of studies by type of coordination (instructed/planned vs spontaneous/emergent), type of joint behaviour (co-activity vs cooperation vs collaboration), and the sensory systems investigated (vision vs audition vs haptics)

In the following sections, we present the studies grouped by the type of joint behaviour addressed, namely, co-activity, cooperation, or collaboration, the first behaviour reflecting emergent, spontaneous coordination and the latter two reflecting planned, instructed coordination. For each of these categories, we first summarise the primary characteristics of the conducted studies before reporting key findings, particularly in regard to the use of different sensory modalities for successfully performing joint actions.

Emergent, Spontaneous Coordination and Co-Activity

Studies classified as emergent, spontaneous coordination and co-activity (n = 11, 46%) cover various tasks from gross-motor skill performance, such as dancing (Bigand et al., 2024; Dotov et al., 2021), walking (Harrison & Richardson, 2009; Nessler & Gilliland, 2009; Zivotofsky et al., 2012), rocking chairs (Demos et al., 2012), and balancing (Miyata et al., 2021; Reynolds & Osler, 2014; Sofianidis et al., 2015), to fine-motor skill performance, such as hand (Richardson et al., 2005) or finger movements (Nowicki et al., 2013). All but one of these tasks were completed in pairs with another human, with the remaining study investigating dancing in a group (Dotov et al., 2021).

The majority of articles (n = 6) focused on the effect of visual and auditory information, and the sensory systems were mostly manipulated in an ‘on/off’ manner. For example, in the experiment by Harrison and Richardson (2009), two participants had to walk one behind the other while the use of visual and haptic information was manipulated. In one condition, the participants could see each other and were linked by a mechanical coupling, whilst in further conditions, they could either use visual information without additional haptic contact or vision was restricted, but the mechanical coupling was maintained.

In terms of the empirical findings obtained, visual information seems to be particularly important to elicit spontaneous coordination, as more than half of the included studies of emergent coordination agree on that point (Demos et al., 2012; Dotov et al., 2021; Harrison & Richardson, 2009; Miyata et al., 2021; Reynolds & Osler, 2014; Richardson et al., 2005). Consequently, if vision is removed, people lose the coordination they have acquired under full-vision conditions (Miyata et al., 2021). However, two studies found no effect of vision on synchronisation (Nessler & Gilliland, 2009; Nowicki et al., 2013), and Zivotofsky et al. (2012) reported that, for a side-by-side walking task, auditory and haptic information were more important for spontaneous gait synchronisation than vision.

Besides vision, auditory information proved to be important for emergent coordination (Demos et al., 2012; Miyata et al., 2021), but primarily in a specific form. For example, Nowicki et al. (2013) found that having access to one’s own auditory feedback improves synchronisation during a finger-tapping task, while hearing the auditory feedback coming from one’s partner decreased performance. Conversely, in a rocking chair task, receiving auditory information produced by the partner’s movements elicited stronger synchronisation in co-active behaviour, while music competed with the partner’s influence and reduced coordination (Demos et al., 2012). The effect of verbal interaction – which can be understood as a special kind of auditory information – is unclear, as one study found no influence (Richardson et al., 2005), while another revealed a facilitated coordination, but only when the partners could not see each other (Miyata et al., 2021).

Further, three studies showed that haptic information also contributes to emergent coordination (Harrison & Richardson, 2009; Nessler & Gilliland, 2009; Reynolds & Osler, 2014); for example, Nessler and Gilliland (2009) reported that pairs of participants in a side-by-side walking task were more coordinated when they were holding hands than when they were not.

Generally, combining multiple sources of sensory information enhanced coordination compared to enabling the use of only one sensory channel by restricting the others (Harrison & Richardson, 2009; Nessler & Gilliland, 2009). Notably, different sensory modalities seem to induce different, complementary aspects of coordination. For example, Bigand et al. (2024) investigated humans freely dancing in a ‘silent disco’ setting, manipulating both auditory (here: musical) and visual input. Both the auditory and visual information promoted synchrony. However, the music primarily elicited synchronisation in the anteroposterior direction, whereas vision of the partner mainly induced synchronisation in the lateral direction. In contrast, if visual and auditory information are redundant (e.g., both providing information about a step), the story is different: Nessler and Gilliland (2009) showed that when one sensory cue is absent (i.e., vision), the participants were able to use redundant sources of information (i.e., sound) to generate synchrony that did not differ from the condition in which vision and sound were normal. Furthermore, studies show that multiple sources of sensory information can modulate the effect of each source. For example, haptic information elicited spontaneous coordination in a balance task, but this spontaneous coordination was impaired when there were frequency-related changes in the auditory information (Sofianidis et al., 2015). In the same vein, the effect of visual information was stronger or weaker depending on the ‘groove’ and tempo of musical-auditory information in a dance task (Dotov et al., 2021), and haptic information increased coordination when vision was restricted for one or both partners (Reynolds & Osler, 2014).

Planned, Instructed Coordination and Cooperation

Nine of the 24 articles (37.5%) included in the present review were classified as planned, instructed coordination and cooperation. They cover various tasks, such as playing music (Bishop & Goebl, 2015), dancing (Chauvigné et al., 2019), trampolining (Heinen et al., 2014), and walking (Khan et al., 2020; Noy et al., 2017), but primarily tasks that require hand movements (Colomer et al., 2022; Döhring et al., 2020; Hansen et al., 2017; Werner & Gorman, 2023). In one study, the participants were in a group to perform a dance task (Chauvigné et al., 2019); otherwise, the participants were investigated in pairs. In two walking studies, the participants had to synchronise their steps with an avatar (Khan et al., 2020; Noy et al., 2017), while in all the other publications, the participants had to synchronise with another human.

As in the co-activity studies, the sensory systems were primarily manipulated in an ‘on/off’ manner. However, the manipulation was more subtle in four studies, namely, in handover tasks, by adding thick gloves to reduce haptic information (Döhring et al., 2020) or varying the distance between passer and receiver as well as the weight of the object to be passed (Hansen et al., 2017), and in joint walking tasks, by manipulating the velocity of the leading avatar (Khan et al., 2020) or the congruency between the available visual and auditory information (Noy et al., 2017).

In regard to the use of vision, coordination was found to be better when participants could see their partners than when they could not (Chauvigné et al., 2019; Döhring et al., 2020). However, coordination decreased with increasing uncertainty in the available visual information, for instance, induced by an increased velocity of the leading avatar in joint walking (Khan et al., 2020) or by an increased distance between the participants in a dancing task (Chauvigné et al., 2019).

Several studies showed that planned coordination is better when auditory information is available than when it is not, be it music in a dance task (Chauvigné et al., 2019) or the sound of the partner’s footsteps in joint walking (Khan et al., 2020; Noy et al., 2017). For example, Khan et al. (2020) and Noy et al. (2017), who researched how participants synchronise their steps with those of a virtual avatar, showed that auditory information leads to better performance than visual-only cues. However, not all tasks reveal an auditory advantage. In a remote driving task, where one participant (the driver) operated a car without seeing it, while another (the spotter) provided visual (hand gestures) or auditory (verbal instruction) cues, Werner and Gorman (2023) showed that the participants perform better with visual cues alone than with auditory cues alone. Studying piano duets, Bishop and Goebl (2015) showed that the effect of visual information (seeing the partner) depends on the access to auditory information (hearing the partner). When audio was available, removing vision had no effect on synchronisation. In contrast, when audio was absent, visual cues facilitated synchronisation, showing that visual information became important only when the reliable auditory cue was missing. In the same vein, Colomer et al. (2022) used a task in which pairs had to move a mobile slider with one hand each, based on either visual or auditory information, and found that the hearing partner takes the lead when the auditory information has a frequency of 2.15 Hz or more, but not when the frequency is lower. This outcome can be understood as an effect of varying degrees of uncertainty regarding available information.

Three studies investigated the role of haptic information in cooperative tasks (Chauvigné et al., 2019; Döhring et al., 2020; Hansen et al., 2017). Using a handover task, Döhring et al. (2020) showed that the quality of haptic information affects coordination because the passer’s grip forces increased, while those of the receiver decreased, when the passer wore thick gloves. Furthermore, Hansen et al. (2017) reported that the duration of the handover increased with the weight of the object to be passed. Finally, coordination in a dance task was better when partners were allowed to hold hands than when they could not (Chauvigné et al., 2019).

Surprisingly, not many studies directly addressed the integration mechanism used to combine information from multiple senses. A notable exception is the study by Noy et al. (2017), who tested the maximum likelihood estimation (e.g., Ernst & Banks, 2002). In their joint walking task, the researchers experimentally created a conflict between visual and auditory information and showed that the participants relied more on auditory signals. As a result of the incongruency between the sources of information, coordination between the participant and the avatar decreased. However, in studies where sensory information from different channels was congruent, coordination quality was generally better when both visual and auditory information about the partner was available (Heinen et al., 2014; Khan et al., 2020).

Planned, Instructed Coordination and Collaboration

Planned, instructed collaboration – which, as a reminder, differs from cooperation in that two or more agents play the same role rather than different roles in achieving a task goal (Figure 1) – is the least represented category in this review, as only four articles address it. Two of the researched tasks involve hand movements (Hessels et al., 2023; Mojtahedi et al., 2017), one finger movements (Masumoto & Inui, 2014), and one playing music (Liebermann-Jordanidis et al., 2021). All these tasks were carried out in pairs with another human.

Notably, the manipulation of the available sensory information was more diverse than in the other categories, since, beyond a mere ‘on/off’ manipulation, studies experimentally varied the congruency of the information (Liebermann-Jordanidis et al., 2021) or the type of information available in the same sensory systems by comparing conditions in which the participants were seated either face to face or side by side (Mojtahedi et al., 2017).

Regarding visual information in planned, instructed collaboration, Hessels et al. (2023) reported that pairs were faster to complete a puzzle when all pieces were visible than when some were hidden. Moreover, in a concurrently assigned joint force-production task, the participants were closer to the target force with than without access to visual feedback (Masumoto & Inui, 2014). Finally, two people were better coordinated in jointly lifting an object when they were positioned side by side than face to face (Mojtahedi et al., 2017).

With respect to auditory information, Hessels et al. (2023) found no effect of verbal interactions on coordination in the puzzle task, while Liebermann-Jordanidis et al. (2021) reported that the coordination between partners in joint music playing was better when they were required to produce different sounds rather than the same sound.

Only one study investigated the role of haptic information on coordination in collaborative tasks, namely, Mojtahedi et al. (2017), who found that various combinations of dominant and non-dominant hands of both partners do not affect the quality of their coordination in joint object lifting.

As only four articles about collaboration were included in this review, insights on multisensory integration remain limited within this category. Nevertheless, Masumoto and Inui (2014) showed that coordination in an isometric force-production task decreases under verbal dual-task conditions if visual feedback is available, but not if visual feedback is withdrawn. However, Hessels et al. (2023) found that talking with each other does not influence joint performance on the puzzle task, whether visual information is fully accessible or not.

Discussion

This review examined how humans utilise multiple sensory modalities in joint action tasks. In total, we identified 24 empirical studies addressing this question. To structure the findings, we used the taxonomies proposed by Knoblich et al. (2011) and Jarrassé et al. (2012), and the combination of these two classifications proved to be a useful framework to relate empirical findings to each other.

First, the review shows that the classifications by Knoblich et al. (2011) and Jarrassé et al. (2012) are highly congruent. All the articles that addressed emergent (or spontaneous) coordination were also classified as encompassing co-activity as a joint behaviour type (n = 11, 46%). All the studies addressing planned (or instructed) coordination fell into the category of either cooperation (n = 9, 37.5%) or collaboration (n = 4, 16.5%). In terms of sensory systems, research has principally investigated how participants use visual and auditory information (n = 15, 62.5%), while fewer studies have researched the contributions of the visual and haptic systems (n = 5, 21%), the auditory and haptic systems (n = 1, 4%), or the three sensory systems together (n = 3, 12.5%).

Generally, we found that the availability of multiple sensory cues enhances both emergent and planned coordination across joint action behaviour types (co-activity, collaboration, cooperation). For spontaneous coordination phenomena investigated in co-activity tasks, it was found that different sensory cues can elicit different aspects of coordination (Bigand et al., 2024; Miyata et al., 2021). For example, Bigand et al. (2024) reported that music (i.e., auditory information) primarily elicited synchronisation in the anteroposterior direction, while vision of the partner mainly induced synchronisation in the lateral direction, suggesting a complementary function of different sensory cues to synchronise to different aspects of the task.

Aligning with this idea, an overview of the studies suggests that, for both emergent and planned coordination, the relative importance of the sensory modality (visual, auditory, haptic) seems to depend on the demands of the task. In tasks with high timing demands (Bishop & Goebl, 2015; Khan et al., 2020; Nowicki et al., 2013; Noy et al., 2017; Zivotofsky et al., 2012), such as synchronising steps to those of a partner, coordination depends more on audition than on vision; a so-called auditory dominance effect, which is well-documented in the literature (Burr et al., 2009; Repp & Penel, 2002) and can be explained by the higher temporal resolution of the auditory system (Kandel et al., 2013). In contrast, in tasks with high demands for spatial localisation, such as navigating a car through obstacles (Werner & Gorman, 2023), visual information is more important than auditory information to coordinate joint actions. However, while this task-dependent weighting of sensory information provides a principled hypothesis, joint-action experiments directly testing this idea remain lacking.

For planned coordination tasks, a range of studies shows that the effect of sensory cues is modulated by the availability or reliability of another sensory source (Bishop & Goebl, 2015; Colomer et al., 2022; Döhring et al., 2020; Masumoto & Inui, 2014; Noy et al., 2017; Werner & Gorman, 2023). For example, in a piano duet, Bishop and Goebl (2015) showed that the effect of visual cues (seeing the partner) depends on the availability of the auditory cue (hearing the primo performer). When the primo audio was available, removing visual contact did not impair synchronisation. In contrast, when the primo audio was absent, visual cues facilitated synchronisation at critical moments, such as following long pauses, demonstrating that visual information became important only when the reliable auditory cue was missing. These findings are in line with the Bayesian idea that the sensorimotor system combines information from multiple sensory modalities and weights them according to their relative reliability (Ernst & Banks, 2002). However, only one of the reviewed studies, Noy et al. (2017), directly investigated the integration mechanism, namely, maximum likelihood estimation.

By examining a broader range of coordination types (emergent, planned) and joint action behaviours (co-activity, cooperation, collaboration), the current review complements a previous review on sensorimotor synchronisation by Repp and Su (2013). While their review also included synchronisation tasks beyond interpersonal coordination (e.g., coordination with a metronome), the authors only covered synchronisation tasks. Another related review by Felsberg and Rhea (2021) more narrowly focused on spontaneous interpersonal coordination during gait and pointed out that the role of each sensory system in spontaneous coordination remains unclear. By including a broader range of joint action tasks while more narrowly focusing on multisensory integration, our review identifies patterns in the findings and suggests hypotheses about general principles of multisensory integration in joint actions that can be tested in future studies. In this regard, based on the findings of our review as well as the current literature, we see three promising avenues for future research: (i) investigating multisensory integration in joint action tasks in other classification combinations, (ii) experimentally testing mechanisms of multisensory integration, and (iii) studying the learning processes by comparing interventions to improve multisensory integration in joint action tasks.

(i) Our results surprisingly show that emergent, spontaneous coordination has been exclusively studied in co-activity tasks, and planned, instructed coordination has been exclusively studied in either cooperation or collaboration. However, since only studies of joint tasks with a motor component and with an experimental manipulation of at least two sensory systems were considered in this review, this finding does not rule out the possibility of investigating multisensory integration in joint actions with further classification combinations. For example, Richardson et al. (2007) asked their participants to pick up wooden planks of different sizes either alone or with someone else, which addresses emergent, spontaneous coordination, but as collaboration instead of co-activity. The question of how participants use multiple sensory inputs to solve this joint task could easily be incorporated into this study design. Therefore, further research on multisensory integration in joint action with alternative classification combinations is desirable.

(ii) By selectively suppressing one sensory channel, current studies provide indications of the importance of particular sensory modalities to perform joint action tasks. Yet only one study included in our review (Noy et al., 2017) aimed to test the principles under which information from multiple sensory modalities is integrated. To test those integration mechanisms experimentally, an elegant technique – primarily used in perceptual studies to date – is to create incongruencies between sensory inputs and observe participants’ bias towards one or the other cue as a function of different conditions (Trommershäuser et al., 2011). This logic has been widely applied in size estimation (e.g., Ernst & Banks, 2002), localisation tasks (e.g., Alais & Burr, 2004), and more recently, sensorimotor synchronisation (e.g., Elliott et al., 2010), and is thus readily applicable to scenarios of interpersonal synchronisation as well. Using this approach, testing two hypotheses would particularly advance our functional understanding of multisensory integration in joint actions. First, to test the Bayesian principle that multiple signals are weighted according to their relative reliability (Ernst & Banks, 2002), researchers could experimentally create conflicts between two or more sensory cues and manipulate their respective reliability. If participants integrate the signals in a reliability-based manner, their estimate should shift towards the more reliable source. Second, to test the hypothesis that the sensorimotor system weights signals according to their task relevance, a promising avenue would be to design joint action tasks in which temporal and spatial demands can be manipulated independently of each other. In this case, a task-relevance integration mechanism would predict that auditory information should be weighted higher if the temporal timing demands of the task increase, while visual information should be weighted higher if the spatial demands increase.

(iii) Strikingly, none of the articles included in this review focused on performance changes and thus learning in joint actions. While learning processes have been addressed in the fields of joint action (e.g., Knoblich et al., 2011) and multisensory integration (e.g., O’Brien et al., 2023) in isolation, the question of how multisensory integration in joint action tasks changes through practice remains unexplored. Thus, addressing learning appears a highly promising avenue for future research both to improve our theoretical understanding of joint actions and to substantiate applications in relevant practical fields, such as sports, dance, rehabilitation, and human–robot interaction (e.g., Chen et al., 2015).

In summary, this review integrates and organises the growing body of research on how humans exploit multiple sensory modalities in joint actions. We propose that combining the classifications from Knoblich et al. (2011) and Jarrassé et al. (2012) creates a useful framework to structure previous and future research in this field. Current research covers emergent and planned coordination phenomena in a wide range of tasks. While the body of research provides indications that the reliability and task-relevance of sensory information might drive their use in joint action tasks, studies that systematically test integration principles are needed. We suggest that testing these integration mechanisms and studying how multisensory integration is learned during joint actions are highly relevant avenues for further research, both in terms of fundamental understanding and applied consideration across fields.

Supplemental Material

Supplemental Material - Multisensory Contributions in Joint Actions: A Scoping Review

Supplemental Material for Multisensory Contributions in Joint Actions: A Scoping Review by Mathilde Truffer and Stephan Zahno in Perceptual and Motor Skills.

Supplemental Material

Supplemental Material - Multisensory Contributions in Joint Actions: A Scoping Review

Supplemental Material for Multisensory Contributions in Joint Actions: A Scoping Review by Mathilde Truffer and Stephan Zahno in Perceptual and Motor Skills.

Footnotes

Acknowledgements

The authors thank Ellen Straalman for her help with the screening process, Damian Beck for his advice on conducting a systematic review, and Ernst-Joachim Hossner for his valuable contributions to the design and presentation of this research.

ORCID iDs

Mathilde Truffer

Stephan Zahno

Ethical Considerations

No approval of research ethics committees was required as this is a review of existing literature.

Author Contributions

Conceptualisation: MT & SZ; Protocol development: MT & SZ; Systematic literature search: MT; Data extraction and synthesis: MT; Writing: MT & SZ.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The authors confirm that all data are available within the article or the supplementary materials. Data extraction materials are available upon request from the corresponding author. No preregistration was undertaken for this review.*

Supplemental Material

Supplemental material for this article is available online.

Author Biographies

Mathilde Truffer is currently pursuing a PhD at the University of Bern's Institute of Sport Science. Her doctoral research focuses on the role of multisensory integration in joint actions.

Stephan Zahno is a postdoctoral researcher in Movement Science at the University of Bern’s Institute of Sport Science. His research focuses on functional mechanisms of motor control and learning in complex sensorimotor tasks.

References

Alais

Burr

(2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262. https://doi.org/10.1016/j.cub.2004.01.029

Alais

Newell

F. N.

Mamassian

(2010). Multisensory processing in review: From physiology to behaviour. Seeing and Perceiving, 23(1), 3–38. https://doi.org/10.1163/187847510x488603

Beck

Hossner

E.-J.

Zahno

(2023). Mechanisms for handling uncertainty in sensorimotor control in sports: A scoping review. International Review of Sport and Exercise Psychology, 18(2), 1–35. https://doi.org/10.1080/1750984X.2023.2280899

Bigand

Bianco

Abalde

S. F.

Novembre

(2024). The geometry of interpersonal synchrony in human dance. Current Biology, 34(13), 3011–3019. https://doi.org/10.1016/j.cub.2024.05.055

Bishop

Goebl

(2015). When they listen and when they watch: Pianists’ use of nonverbal audio and visual cues during duet performance. Musicae Scientiae, 19(1), 84–110. https://doi.org/10.1177/1029864915570355

Brand

T. K.

Maurer

L. K.

Müller

Döhring

F. R.

Joch

(2022). Predictability shapes movement kinematics and grip force regulation in human object handovers. Human Movement Science, 85, 102976. https://doi.org/10.1016/j.humov.2022.102976

Burdet

Osu

Franklin

D. W.

Milner

T. E.

Kawato

(2001). The central nervous system stabilizes unstable dynamics by learning optimal impedance. Nature, 414(6862), 446–449. https://doi.org/10.1038/35106566

Burr

Banks

M. S.

Morrone

M. C.

(2009). Auditory dominance over vision in the perception of interval duration. Experimental Brain Research, 198(1), 49–57. https://doi.org/10.1007/s00221-009-1933-z

Calvert

Spence

Stein

B. E.

(2004). The handbook of multisensory processes. MIT Press.

10.

Chauvigné

L. A. S.

Walton

Richardson

M. J.

Brown

(2019). Multi-person and multisensory synchronization during group dancing. Human Movement Science, 63, 199–208. https://doi.org/10.1016/j.humov.2018.12.005

11.

Chen

T. L.

Bhattacharjee

McKay

J. L.

Borinski

J. E.

Hackney

M. E.

Ting

L. H.

Kemp

C. C.

(2015). Evaluation by expert dancers of a robot that performs partnered stepping via haptic interaction. PLoS One, 10(5), e0125179. https://doi.org/10.1371/journal.pone.0125179

12.

Colomer

Dhamala

Ganesh

Lagarde

(2022). Interacting humans use forces in specific frequencies to exchange information by touch. Scientific Reports, 12(1), 15752. https://doi.org/10.1038/s41598-022-19500-1

13.

Demos

A. P.

Chaffin

Begosh

K. T.

Daniels

J. R.

Marsh

K. L.

(2012). Rocking to the beat: Effects of music and partner’s movements on spontaneous interpersonal coordination. Journal of Experimental Psychology: General, 141(1), 49–53. https://doi.org/10.1037/a0023843

14.

Döhring

F. R.

Müller

Joch

(2020). Grip-force modulation in human-to-human object handovers: Effects of sensory and kinematic manipulations. Scientific Reports, 10(1), 22381. https://doi.org/10.1038/s41598-020-79129-w

15.

Dotov

Bosnyak

Trainor

L. J.

(2021). Collective music listening: Movement energy is enhanced by groove and visual social cues. Quarterly Journal of Experimental Psychology, 74(6), 1037–1053. https://doi.org/10.1177/1747021821991793

16.

Elliott

M. T.

Wing

A. M.

Welchman

A. E.

(2010). Multisensory cues improve sensorimotor synchronisation. European Journal of Neuroscience, 31(10), 1828–1835. https://doi.org/10.1111/j.1460-9568.2010.07205.x

17.

Ernst

M. O.

Banks

M. S.

(2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433. https://doi.org/10.1038/415429a

18.

Ernst

M. O.

Bülthoff

H. H.

(2004). Merging the senses into a robust percept. Trends in Cognitive Sciences, 8(4), 162–169. https://doi.org/10.1016/j.tics.2004.02.002

19.

Faisal

A. A.

Selen

L. P. J.

Wolpert

D. M.

(2008). Noise in the nervous system. Nature Reviews Neuroscience, 9(4), 292–303. https://doi.org/10.1038/nrn2258

20.

Felsberg

D. T.

Rhea

C. K.

(2021). Spontaneous interpersonal synchronization of gait: A systematic review. Archives of Rehabilitation Research and Clinical Translation, 3(1), 100097. https://doi.org/10.1016/j.arrct.2020.100097

21.

Franklin

D. W.

Wolpert

D. M.

(2011). Computational mechanisms of sensorimotor control. Neuron, 72(3), 425–442. https://doi.org/10.1016/j.neuron.2011.10.006

22.

Hansen

Arambel

Ben Mansour

Perdereau

Marin

(2017). Human–human handover tasks and how distance and object mass matter. Perceptual and Motor Skills, 124(1), 182–199. https://doi.org/10.1177/0031512516682668

23.

Harrison

S. J.

Richardson

M. J.

(2009). Horsing around: Spontaneous four-legged coordination. Journal of Motor Behavior, 41(6), 519–524. https://doi.org/10.3200/35-08-014

24.

Heinen

Koschnick

Schmidt-Maaß

Vinken

P. M.

(2014). Gymnasts utilize visual and auditory information for behavioural synchronization in trampolining. Biology of Sport, 31(3), 223–226. https://doi.org/10.5604/20831862.1111850

25.

Hessels

R. S.

Teunisse

M. K.

Niehorster

D. C.

Nystrom

Benjamins

J. S.

Senju

Hooge

I. T. C.

(2023). Task-related gaze behaviour in face-to-face dyadic collaboration: Toward an interactive theory? Visual Cognition, 31(4), 291–313. https://doi.org/10.1080/13506285.2023.2250507

26.

Jarrassé

Charalambous

Burdet

(2012). A framework to describe, analyze and generate interactive motor behaviors. PLoS One, 7(11), e49945. https://doi.org/10.1371/journal.pone.0049945

27.

Kandel

E. R.

Schwartz

J. H.

Jessell

Siegelbaum

S. A.

Hudspeth

A. J.

(Eds.), (2013). Principles of neural science (5th ed.). McGraw-Hill Medical.

28.

Khan

Ahmed

Cottingham

Rahhal

Arvanitis

T. N.

Elliott

M. T.

(2020). Timing and correction of stepping movements with a virtual reality avatar. PLoS One, 15(2), e0229641. https://doi.org/10.1371/journal.pone.0229641

29.

Knoblich

Butterfill

Sebanz

(2011). Chapter three - Psychological research on joint action: Theory and data. In Ross

B. H.

(Ed.), Psychology of learning and motivation (54, pp. 59–101). Academic Press. https://doi.org/10.1016/B978-0-12-385527-5.00003-6

30.

Kopnarski

Rudisch

Voelcker-Rehage

(2023). A systematic review of handover actions in human dyads. Frontiers in Psychology, 14, 1147296. https://doi.org/10.3389/fpsyg.2023.1147296

31.

Körding

K. P.

Wolpert

D. M.

(2004). Bayesian integration in sensorimotor learning. Nature, 427(6971), 244–247. https://doi.org/10.1038/nature02169

32.

Körding

K. P.

Wolpert

D. M.

(2006). Bayesian decision theory in sensorimotor control. Trends in Cognitive Sciences, 10(7), 319–326. https://doi.org/10.1016/j.tics.2006.05.003

33.

Leib

Howard

I. S.

Millard

Franklin

D. W.

(2023). Behavioral motor performance. Comprehensive Physiology, 14(1), 5179–5224. https://doi.org/10.1002/cphy.c220032

34.

Leibfried

Grau-Moya

Braun

D. A.

(2015). Signaling equilibria in sensorimotor interactions. Cognition, 141, 73–86. https://doi.org/10.1016/j.cognition.2015.03.008

35.

Liebermann-Jordanidis

Novembre

Koch

Keller

P. E.

(2021). Simultaneous self-other integration and segregation support real-time interpersonal coordination in a musical joint action task. Acta Psychologica, 218, 103348. https://doi.org/10.1016/j.actpsy.2021.103348

36.

Maselli

Iodice

Cisek

Pezzulo

(2025). Embodied decision making in athletes and other animals. Psychology of Sport and Exercise, 80, 102915. https://doi.org/10.1016/j.psychsport.2025.102915

37.

Masumoto

Inui

(2014). Effects of speech on both complementary and synchronous strategies in joint action. Experimental Brain Research, 232(7), 2421–2429. https://doi.org/10.1007/s00221-014-3941-x

38.

Miyata

Varlet

Miura

Kudo

Keller

P. E.

(2021). Vocal interaction during rhythmic joint action stabilizes interpersonal coordination and individual movement timing. Journal of Experimental Psychology: General, 150(2), 385–394. https://doi.org/10.1037/xge0000835

39.

Mojtahedi

Santello

(2017). On the role of physical interaction on performance of object manipulation by dyads. Frontiers in Human Neuroscience, 11, 533. https://doi.org/10.3389/fnhum.2017.00533

40.

Nessler

J. A.

Gilliland

S. J.

(2009). Interpersonal synchronization during side by side treadmill walking is influenced by leg length differential and altered sensory feedback. Human Movement Science, 28(6), 772–785. https://doi.org/10.1016/j.humov.2009.04.007

41.

Nowicki

Prinz

Grosjean

Repp

B. H.

Keller

P. E.

(2013). Mutual adaptive timing in interpersonal action coordination. Psychomusicology: Music, Mind, and Brain, 23(1), 6–20. https://doi.org/10.1037/a0032039

42.

Noy

Mouta

Lamas

Basso

Silva

Santos

J. A.

(2017). Audiovisual integration increases the intentional step synchronization of side-by-side walkers. Human Movement Science, 56, 71–87. https://doi.org/10.1016/j.humov.2017.10.007

43.

O’Brien

Mason

Chan

Setti

(2023). Can we train multisensory integration in adults? A systematic review. Multisensory Research, 36(2), 111–180. https://doi.org/10.1163/22134808-bja10090

44.

Pezzulo

Donnarumma

Dindo

(2013). Human sensorimotor communication: A theory of signaling in online social interactions. PLoS One, 8(11), e79876. https://doi.org/10.1371/journal.pone.0079876

45.

Pezzulo

Knoblich

Maisto

Donnarumma

Pacherie

Hasson

(2026). A predictive processing framework for joint action and communication. Physics of Life Reviews, 57, 61–78. https://doi.org/10.1016/j.plrev.2026.03.001

46.

Repp

B. H.

Penel

(2002). Auditory dominance in temporal processing: New evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology: Human Perception and Performance, 28(5), 1085–1099. https://doi.org/10.1037/0096-1523.28.5.1085

47.

Repp

B. H.

Y.-H.

(2013). Sensorimotor synchronization: A review of recent research (2006–2012). Psychonomic Bulletin & Review, 20(3), 403–452. https://doi.org/10.3758/s13423-012-0371-2

48.

Reynolds

R. F.

Osler

C. J.

(2014). Mechanisms of interpersonal sway synchrony and stability. Journal of the Royal Society, Interface, 11(101), 20140751. https://doi.org/10.1098/rsif.2014.0751

49.

Richardson

M. J.

Marsh

K. L.

Baron

R. M.

(2007). Judging and actualizing intrapersonal and interpersonal affordances. Journal of Experimental Psychology: Human Perception and Performance, 33(4), 845–859. https://doi.org/10.1037/0096-1523.33.4.845

50.

Richardson

M. J.

Marsh

K. L.

Schmidt

R. C.

(2005). Effects of visual and verbal interaction on unintentional interpersonal coordination. Journal of Experimental Psychology: Human Perception and Performance, 31(1), 62–79. https://doi.org/10.1037/0096-1523.31.1.62

51.

Russo

Maselli

Sternad

Pezzulo

(2025). Predictive strategies for the control of complex motor skills: Recent insights into individual and joint actions. Current Opinion in Behavioral Sciences, 63, 101519. https://doi.org/10.1016/j.cobeha.2025.101519

52.

Schmidt

R. C.

Richardson

M. J.

(2008). Dynamics of interpersonal coordination. In Fuchs

Jirsa

V. K.

(Eds.), Coordination: Neural, behavioral and social dynamics (pp. 281–308). Springer. https://doi.org/10.1007/978-3-540-74479-5_14

53.

Scholz

J. P.

Schöner

(1999). The uncontrolled manifold concept: Identifying control variables for a functional task. Experimental Brain Research, 126(3), 289–306. https://doi.org/10.1007/s002210050738

54.

Sebanz

Bekkering

Knoblich

(2006). Joint action: Bodies and minds moving together. Trends in Cognitive Sciences, 10(2), 70–76. https://doi.org/10.1016/j.tics.2005.12.009

55.

Sebanz

Knoblich

(2021). Progress in joint-action research. Current Directions in Psychological Science, 30(2), 138–143. https://doi.org/10.1177/0963721420984425

56.

Sofianidis

Elliott

M. T.

Wing

A. M.

Hatzitaki

(2015). Interaction between interpersonal and postural coordination during frequency scaled rhythmic sway: The role of dance expertise. Gait & Posture, 41(1), 209–216. https://doi.org/10.1016/j.gaitpost.2014.10.007

57.

Stein

B. E.

Meredith

M. A.

(1993). The merging of the senses. The MIT Press.

58.

Stein

B. E.

Stanford

T. R.

Rowland

B. A.

(2020). Multisensory integration and the society for neuroscience: Then and now. Journal of Neuroscience, 40(1), 3–11. https://doi.org/10.1523/JNEUROSCI.0737-19.2019

59.

Todorov

Jordan

M. I.

(2002). Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5(11), 1226–1235. https://doi.org/10.1038/nn963

60.

Tricco

A. C.

Lillie

Zarin

O’Brien

K. K.

Colquhoun

Levac

Moher

Peters

M. D. J.

Horsley

Weeks

Hempel

Akl

E. A.

Chang

McGowan

Stewart

Hartling

Aldcroft

Wilson

M. G.

Garritty

Straus

S. E.

(2018). PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Annals of Internal Medicine, 169(7), 467–473. https://doi.org/10.7326/M18-0850

61.

Trommershäuser

Körding

Landy

M. S.

(2011). Sensory cue integration. Oxford University Press.

62.

Trommershäuser

Maloney

L. T.

Landy

M. S.

(2003). Statistical decision theory and the selection of rapid, goal-directed movements. JOSA A, 20(7), 1419–1433. https://doi.org/10.1364/JOSAA.20.001419

63.

van Beers

R. J.

Baraduc

Wolpert

D. M.

(2002). Role of uncertainty in sensorimotor control. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 357(1424), 1137–1145. https://doi.org/10.1098/rstb.2002.1101

64.

Werner

A. F.

Gorman

J. C.

(2023). The role of visual and auditory communication in the performance of a joint team task. Human Factors, 65(4), 682–694. https://doi.org/10.1177/00187208211031048

65.

Zivotofsky

A. Z.

Gruendlinger

Hausdorff

J. M.

(2012). Modality-specific communication enabling gait synchronization during over-ground side-by-side walking. Human Movement Science, 31(5), 1268–1285. https://doi.org/10.1016/j.humov.2012.01.003

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.10 MB

0.34 MB