Deconstructing language development: A commentary on Karadöller,Sümer and Özyürek

Abstract

Karadöller and colleagues propose an interesting analysis of multimodality in spoken and signed language acquisition. In this commentary, we aim to extend the authors’ approach and abandon speech-centred and brainbound perspectives. By considering multimodality as a collage of multiple skills, in which abilities are acquired and exploited with new purposes, we will avoid integrating gestures and signs into pre-existent speech-centred models. This will enable us to move confidently into a future in which embodied dynamic interactions between skills and contexts are analysed in their ability to broaden the child’s world beyond the here and now and mould the language surge.

Keywords

Multimodality communication child development embodiment extended cognition

Karadöller and colleagues (2025) provide a purposeful review of studies highlighting the relevance of pointing and iconicity in children’s first language acquisition. Their endeavour is praiseworthy, as it allows to better understand the central and diverse roles of multimodality in spoken and signed languages. In particular, this work interestingly shows how spoken and signed languages may share some common ground in their reliance on pointing and iconic components, while maintaining relevant differences in the ways in which these components are exploited (Boyes Braem & Volterra, 2023; Pizzuto et al., 2007). The target article also outlines a viable way in which multimodality may be effectively analysed in spoken and signed language acquisition. But on this point, we are compelled to move beyond the theoretical perspective proposed by the authors and seek a more radical perspective. In particular, we argue here that to fully embrace a multimodal approach to language in childhood, we need to abandon speech-centred and what have also been termed ‘brainbound’ approaches (Clark, 2010). This will involve considering other skills, beyond the visual-perceptual domain, while attempting to capture their dynamic relations and allowing a more embodied and embedded perspective on language and its unfolding in time and space (Capirci et al., 2022; Pouw et al., 2014).

First of all, we need to abandon implicit references to speech-centred theoretical approaches. A viable way is to avoid describing multimodal components playing a part in communicative acts only in their relation towards speech-based structures. For example, in spoken languages, pointing is more than an ‘indicator’, ‘anticipator’ or ‘predictor’ of first words or a ‘scaffold’ for narrative recall (Karadöller et al., 2025). Pointing is a deictic gesture with its own developmental trajectory, requiring specific patterns of gaze coordination and motor planning (Masataka, 2003; Sparaci, 2013). Furthermore, while linked to patterns of joint attention and the presence of a referent in its immediate surroundings, pointing can prove extremely flexible, even when used in the absence of speech (e.g., think of the multiple uses of pointing for oneself). Along the same lines, iconic gestures are more than just tools to ‘compensate’ for the lack of verbal knowledge, lexical terminology or missing information; they should not be considered as mere acts ‘supporting’ communication of spatial relations and conceptually-challenging domains or ‘introducing’ character perspectives before speech (Karadöller et al., 2025). On the contrary, iconic gestures extend their roots into motor experiences with objects and actions in the real world, they are a reflection of children’s lived stories, carrying contents and coordination of symbolic skills that may be linked but never limited to speech-based structures (Sparaci & Gallagher, 2024; Sparaci & Volterra, 2017). Finally, iconicity in signed languages should not be considered as leading to an advantage in vocabulary development for signing children, but rather analysed in terms of semantic similarities in first words and signs between speaking and signing children (Volterra & Iverson, 1995; Rinaldi et al., 2014, 2018). Overall, we need to abandon the tendency to consider multimodal components as the support band to speech, which ends up being the headliner, well consolidated on the central stage of language development. To do so, we are proposing to consider the role of multimodal components in themselves or in relation to skills other than speech, which may even be more closely related to them (e.g., consider the relation between iconic gestures and grasping) (Sparaci & Volterra, 2017). Therefore, rather than calling, as stated in the target article, for a ‘revision’ of current theories on language and cognitive development, we propose a bolder approach, aiming to de-structure current speech-centred theories.

In dethroning speech, we can start by considering the multiple and different dimensions that characterize multimodal communication, starting from early actions through gestures (Volterra et al., 2018). This implies moving beyond the purely visuo-perceptual domain, as justly underscored by Karadöller and colleagues (2025). In fact, vision is not nearly enough, and multiple components (e.g., auditory, motor, proprioceptive, postural) need to be considered, as well as the relations between them (Schroer & Yu, 2023). It is important to recognize that these components may influence language to various degrees. For example, gross motor skills may have an indirect effect: as children acquiring independent sitting or walking, can free their hands and extend their peri-personal spaces, this will in turn affect both gestures and vocabulary (Clearfield, 2011; Iverson 2022; Karasik et al., 2011; West et al. 2019; Schroer & Yu, 2023; Slone et al., 2019). In other cases, the relation is more direct and embodied. For example, during communicative acts, we directly experience the physical relation, or entanglement, between body motions, respiration, and vocal activities (Pouw & Fuchs, 2022). Both indirect and direct influences play a role in the here and now of the communication flow, which requires the dynamic interrelation of multiple skills. In a way by de-structuring language we gain new tools, partially shared from dynamic systems approaches: a major emphasis on time (i.e., language unfolds in the here and now and each communicative step is influenced by the ones that preceded it and moulds the ones that follow), the concept of language as multiply determined and softly assembled from the non-linear coordination of multiple skills (e.g., auditory, visual, motor and proprioceptive feedbacks require constant flexible adjustments by participants in a communicative exchange) and emphasis on cognition as essentially embodied and embedded (i.e., no communication can effectively be carried out in a void, but everything occurs in a situated context) (Spencer et al., 2006).

Mentioning contexts leads to our second and final argument: in order to overcome speech-centred theorizing, we should also start abandoning an exclusively brainbound perspective on language, in favour of more extended and situated models (Clark, 2010). Language acquisition is always influenced by extraorganismic environments (i.e., the different social, cultural and material contexts in which it is embedded): in fact, it is an activity-in-the-world (Bonsignori & Sparaci, 2022; Capirci et al., 2022). These contexts are not mere backstages or scaffolds but actively contribute towards shaping multimodal communication (Clark, 2011). For example, during dyadic interactions with toddlers, adults have been observed to exploit modifications in tone and in the use of emotional expressions, to signal the presence of pretend scenarios (Lillard & Witherington, 2004; Nishida & Lillard, 2007). Mothers may change pitch or smile more frequently when pretending to eat than when really eating (Lillard & Witherington, 2004). This is an example of how a specific communicative context, may require modulating multimodal cues (e.g., accompanying a specific gesture or action with a smile or a higher tone of voice) to enhance communicative intent. To fully capture how contexts may mould communicative structures, we need, therefore, to accompany our focus on the communicative actors in the here and now with broader considerations of the social, cultural and material world of people and things surrounding the child. This will also allow a non-WEIRD and better understanding of cross-linguistic differences (Henrich, Heine & Norenzayan, 2010; Sparaci & Gallagher, 2023).

Once we move beyond speech-centred and brainbound approaches, we can start grasping functional relations within multiple skills as structuring a communicative act. While a full exploration of this topic is beyond the scope of the present commentary, we can attempt to provide an example of how this may be done. Consider the case in which a child describes the spatial relation of objects using only speech or alternatively using speech and gestures, as mentioned in Karadöller and colleagues (2025). These two forms of communication are both valid and effective, but their use, according to what we stated above, will largely depend on communicative contexts. Ruth Millikan once argued that certain performative acts are only directive (representing what is to be done), while others are both directive AND descriptive (also describing what is the case) (Millikan, 1995). But choice among them is moulded by the specific contexts or circumstances in which they are used: a bid may sometimes be made by explicitly stating “I bid” in some contexts or by simply raising a finger in others (Millikan 1995, p. 195). So, an important question, to avoid a brainbound model, may involve exploring what contexts may lead a child to use speech-gesture combinations rather than relying exclusively on speech. Then we can move on to analyse not only spatial and temporal characteristics of the gestures and speech used, but also other multimodal components (e.g., motor planning, object affordances, proprioceptive space, posture) and how they interact dynamically in time and are soft-programmed to allow embodied communicative acts which adapt to constant changes in the communication flow.

In conclusion, Karadöller and colleagues’ (2025) work has the great merit of setting our compass towards taking seriously the inherently multimodal nature of child communication. But to fully capture the importance of multimodality in language acquisition, we need to wander further away from the beaten path, possibly abandoning speech-centred and brainbound models, which still characterize the literature on multimodality in language acquisition.

Footnotes

Author contributions

Sparaci, Laura: Conceptualization; Writing – original draft; Writing – review & editing.

Volterra, Virginia: Conceptualization; Writing – review & editing.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iDs

Laura Sparaci

Virginia Volterra

References

Boyes Braem

Volterra

. (2023). Through the sign language glass: Changing and converging views in spoken and signed language research. In Janzen

Shaffer

(Eds), Signed language and gesture research in cognitive linguistics (pp. 23–47). de Gruyter Mouton.

Bonsignori

Sparaci

(2022). Playing for the sake of playing: An enactive account of pretend play in childhood. Rivista di psicolinguistica applicata, 22(2), 31–43.

Capirci

Caselli

M. C.

Volterra

(2022). Interaction among modalities and within development. In Morgenstern

Goldin-Meadow

(Eds.), Gesture in Language: Development Across the Lifespan (pp. 103–133). De Gruyter Mouton; American Psychological Association.

Clark

(2011). Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.

Clearfield

M. W.

(2011). Learning to walk changes infants’ social interactions. Infant Behavior and Development, 34(1), 15–25.

Henrich

Heine

S.J.

Norenzayan

. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83. http://doi.org/10.1017/S0140525X0999152X

Iverson

J. M.

(2022). Developing language in a developing body, revisited: The cascading effects of motor development on the acquisition of language. Wiley Interdisciplinary Reviews: Cognitive Science, 13(6), e1626.

Karadöller

Sümer

Özyürek

(2025). First language acquisition in a multimodal language framework: insights from speech, gesture and sign. First Language, 0(0). https://doi.org/10.1177/01427237241290678

Karasik

L. B.

Tamis-LeMonda

C. S.

Adolph

K. E.

(2011). Transition from crawling to walking and infants’ actions with objects and people. Child development, 82(4), 1199–1209.

10.

Lillard

A. S.

Witherington

D. C.

(2004). Mothers’ behavior modifications during pretense and their possible signal value for toddlers. Developmental Psychology, 40(1), 95.

11.

Millikan

R. G.

(1995). Pushmi-pullyu representations. Philosophical perspectives, 9, 185–200.

12.

Masataka

(2003). From Index-finger extension to index-finger pointing: Ontogenesis of pointing in preverbal infants. In Kita

(Ed.), Pointing: Where Language, Culture, and Cognition Meet (pp. 69–109). Lawrence Erlbaum Associates Publishers.

13.

Nishida

T. K.

Lillard

A. S.

(2007). The informative value of emotional expressions: ‘Social referencing’in mother–child pretense. Developmental Science, 10(2), 205–212.

14.

Pizzuto

Pietrandrea

Simone

. (Eds.) (2007). Verbal and Signed Languages: Comparing structures, constructs and methodologies. Mouton De Gruyter.

15.

Pouw

Fuchs

(2022). Origins of vocal-entangled gesture. Neuroscience & Biobehavioral Reviews, 141, 104836.

16.

Pouw

W. T.

De Nooijer

J. A.

Van Gog

Zwaan

R. A.

Paas

(2014). Toward a more embedded/extended perspective on the cognitive function of gestures. Frontiers in Psychology, 5, 359.

17.

Rinaldi

Caselli

M. C.

Di Renzo

Gulli

Volterra

(2014). Sign vocabulary in deaf toddlers exposed to sign language since birth. Journal of Deaf Studies and Deaf Education, 19(3), 303–318.

18.

Rinaldi

Caselli

M. C.

Lucioli

Lamano

Volterra

(2018). Sign language skills assessed through a sentence reproduction task. The Journal of Deaf Studies and Deaf Education, 23(4), 408–421.

19.

Schroer

S. E.

(2023). Looking is not enough: Multimodal attention supports the real-time learning of new words. Developmental Science, 26(2), e13290.

20.

Sparaci

(2013). Beyond the point: A basic guide to literature on pointing abilities in children with autism. Humana. Mente Journal of Philosophical Studies, 6(24), 177–202.

21.

Sparaci

Gallagher

(2023). A Kaleidoscope of play: A new approach to play analysis in childhood. Philosophical Psychology, 38(2), 718–747.

22.

Sparaci

Gallagher

(2024). Continuity through change: How gestures inform current debates on the ontogeny. In: Maddalena

Ferrucci

Bella

Santarelli

(Eds.), Gestures: Approaches, Uses, and Developments (pp. 251–271). De Gruyter.

23.

Sparaci

Volterra

(2017). Hands shaping communication: From gestures to signs. In Bertolaso

Di Stefano

(Eds.), The Hand and Human Identity: Perception, Cognition, Action (pp. 29–54). Springer.

24.

Spencer

J. P.

Clearfield

Corbetta

Ulrich

Buchanan

Schöner

(2006). Moving toward a grand theory of development: In memory of Esther Thelen. Child Development, 77(6), 1521–1538.

25.

Slone

L. K.

Smith

L. B.

(2019). Self-generated variability in object images predicts vocabulary growth. Developmental Science, 22(6), e12816.

26.

Volterra

Capirci

Rinaldi

Sparaci

(2018). From action to spoken and signed language through gesture: Some basic developmental issues for a discussion on the evolution of the human language-ready brain. Interaction Studies, 19(1–2), 216–238.

27.

Volterra

Iverson

J. M.

(1995). When do modality factors affect the course of language acquisition? In Emmorey

Reilly

(Eds), Language, Gesture, and Space (371–390). Hillsdale, NJ: Erlbaum.

28.

West

K. L.

Leezenbaum

N. B.

Northrup

J. B.

Iverson

J. M.

(2019). The relation between walking and language in infant siblings of children with autism spectrum disorder. Child Development, 90(3), e356–e372.