Abstract
While numerous studies of robot-assisted language learning (RALL) for English-as-a-foreign-language (EFL) learners’ language skill development have been done, a comprehensive and theoretically-driven meta-analysis on its effects is still in paucity. To fill the gap, drawing on Activity Theory (AT), this study reported a meta-analysis from 47 independent studies out of 29 literature samples involving 1791 EFL learners on RALL for language skill development published during 2004–2023. The results indicated that the overall effect size was g = .69, 95% CI [.49, .90], suggesting that RALL outperforms non-RALL conditions. In addition, educational levels and intervention durations were found to be significant moderators. Based on the results, implications for practice were discussed.
Keywords
Introduction
English-as-a-foreign-language (EFL) learners’ language skill development is defined as the progress in English language ability for listening, speaking, reading, and writing skills, which is crucial to improving their communications and future careers in an increasingly globalized society (El Shazly, 2021). However, EFL learners often perceived language learning to be difficult due to the lack of sufficient exposure to authentic language contexts (Tai & Chen, 2022), the lack of EFL learning motivation (Lee et al., 2011; Meng & Li, 2023; Tsai, 2019), and the mismatch between teaching strategies and learning style (Huang, 2005). To solve these problems, researchers (e.g., Alemi et al., 2015; Banaeian & Gilanlioglu, 2021; Engwall & Lopes, 2022; Jeon, 2023) have adopted robot-assisted language learning (RALL) tools to facilitate language learning and teaching, given that social robots can imitate normal human-to-human conversations (El Shazly, 2021), play various social roles (Hsu & Liang, 2021), teach learners according to their aptitude (Belpaeme et al., 2013), and provide personalized feedback and authentic interactions (Li et al., 2021; Wang et al., 2013).
By definition, RALL refers to “the use of social robots to teach people language expression or comprehension skills” (Randall, 2020, p. 2). The pedagogical affordances of RALL have been well-documented, including facilitating domain-specific language skill development, e.g., vocabulary (Banaeian & Gilanlioglu, 2021), speaking (El Shazly, 2021; Iio et al., 2019), listening (Dizon, 2020; Hsiao et al., 2015) and reading (Kim, 2018; Tai & Chen, 2022), along with domain-general EFL development (Hsu & Liang, 2021; Lee et al., 2011; Wu et al., 2015). While previous studies (e.g., Lee et al., 2011; Park et al., 2011) empirically explored the use of RALL for EFL skill development, and most of them obtained the positive effects of RALL on language acquisition (Kory Westlund et al., 2017; Mubin et al., 2013; Vogt et al., 2019), there is still a lack of research that provides comprehensive, synthesized evidence of its overall effects for EFL learning, let alone from a more theoretical perspective. Because a holistic theoretically-driven understanding of RALL’s overall effects can not only offer new insights into its effectiveness, but also scrutinize the influence of potential moderators that informs pedagogy. To narrow the gap, this study aims to (a) meta-analyze the results based on an exhaustive retrieval of the previous RALL research, and (b) gain a deeper understanding of how potential moderators have an impact on the effects of RALL under the framework of Activity Theory (AT, Engeström, 2001), since such AT-related factors as subjects, objectives, community, rules, division of labor, and tools are crucial in RALL (Banaeian & Gilanlioglu, 2021; Tai & Chen, 2022).
Related Studies on Robot-Assisted Language Learning for English-as-a-Foreign-Language Skill Development
Currently, given the important facilitative pedagogical affordances of RALL, researchers have adopted the quasi-experimental design to explore the effects of RALL on language skill development along two academic strands of inquiry–the pedagogical effects as compared with traditional methods (e.g., ordinary paper-and-pencil) and other educational technologies (e.g., noncomputer-based media or web-based instruction).
One strand of the related studies has examined the effects of RALL on EFL learners’ language performance compared with traditional learning methods (Dizon, 2020; Hsiao et al., 2015; Hsieh et al., 2023), but consensus has not been reached thus far. For instance, Hong et al. (2016) conducted a quasi-experiment to investigate EFL learners’ language performance in a RALL setting, and found that learners of the experimental group using social robots had lower anxiety levels, better self-esteem, and stronger learning motivation than their counterparts in the control group. Similarly, Hsieh et al. (2023) explored how the use of social robots would promote EFL learners’ speaking skills, and posited that social robots provide more contextualized and personalized feedback than the traditional repeat-after-me pronunciation instruction. Despite the promising results, not everyone could be convinced (e.g., Banaeian & Gilanlioglu, 2021; Kanda et al., 2004; Kanero et al., 2022). For instance, Banaeian and Gilanlioglu (2021) compared the effect of RALL with traditional learning methods on college EFL learners’ vocabulary learning, and the results indicated that learners using traditional methods outperformed their counterparts who used RALL for EFL vocabulary learning. To elucidate, they attributed the discrepancies to such moderators as participants’ educational levels, the role of the robot and intervention duration. Kanero et al. (2022) examined the effect of RALL on learners’ language performance and individual difference variables (viz. attitude, anxiety, and personality). In doing so, they randomly assigned EFL learners into groups of NAO robot tutor and human tutor, and the results revealed that learners with negative attitudes and higher anxiety towards robots would learn fewer words than those with more positive attitudes and lower anxiety.
The other strand has compared the effects of RALL on language skill development with other technologies, e.g., web-based instruction (Han et al., 2005, 2008) and tablet-PC (Hsiao et al., 2015; Konijn et al., 2021). For instance, Han et al. (2008) compared the effects of web-based instruction and RALL on EFL learners’ language skills, and identified the more positive effects of social robots than computers for EFL learning, as social robots can also offer various forms of expression, such as motions, gestures, and facial expressions. Liang and Hwang (2023) compared the effect of web-based instruction and RALL on language learning outcomes, and found that RALL outperformed the web-based instruction, because it could provide EFL learners with rich multimodal materials, more contextual interactions and embodied learning experiences. Hsiao et al. (2015) conducted a quasi-experiment to compare the effectiveness of table-PC and RALL on EFL learners’ language skills. Results also reported that RALL was a more effective learning companion than tablet-PC, since learning materials installed on highly interactive RALL tool could significantly promote learner motivation. In another quasi-experiment, Konijn et al. (2021) observed that children who were trained with a social robot could learn more new words than a tablet, because they could perceive more social robot’s humanness than the tablet and thus were more engaged in the learning task.
Related Reviews of Robot-Assisted Language Learning for English-as-a-Foreign-Language Skill Development
Aside from the aforementioned empirical attempts, researchers (Uslu et al., 2022; Randall, 2020; van den Berghe et al., 2019) have to date adopted an evidence-based applied linguistics (EBAL) account that “pedagogical intervention should be supported with sound evidence available” (Li, 2023a, p. 36), and begun to synthesize the current trends of RALL for learners’ language skill development. For instance, van den Berghe et al. (2019) reviewed 33 RALL studies and discussed the possibilities and limitations of using social robots to improve language skills (e.g., vocabulary, reading, and speaking). Mixed results showed that social robots could offer interaction possibilities in a real-life environment, but some other issues remained to be addressed. That is, more research is needed to determine the most effective type of robot role (e.g., tutor or teaching assistant), the age groups for which social robots are most beneficial (e.g., preschool children or college students), and the optimal intervention durations. Likewise, Randall (2020) conducted a qualitative study to examine RALL’s effectiveness by reviewing previous RALL studies published between 2004 and 2017, and claimed that social robots can facilitate foreign language acquisition. Nevertheless, this review seems to raise more questions than it answers because the qualitative review might fail to systematically analyze how moderators (e.g., different EFL language skills, control conditions, and types of robot form) affect the effectiveness of RALL.
Although the aforementioned qualitative reviews may contribute to the understanding of the current status and the potential moderators regarding the effectiveness of RALL, such as age groups, robot roles, interaction types, and intervention durations, they might fail to provide a comprehensive quantitative analysis of RALL’s pedagogical effects, and it remains unclear how those moderators affect RALL’s effectiveness. To our knowledge, only Lee and Lee (2022) have begun to meta-analyze the previous experimental studies on RALL with a limited sample k = 12. While this meta-analysis may shed some light on the RALL’s effectiveness, we argue that this paper goes beyond the aforementioned reviews or meta-analysis in the following aspects: First, Lee and Lee’s (2022) meta-analysis only included 12 independent samples. A scrutiny of those 12 selected studies showed that they were sporadically distributed in each year and the latest were published in 2019 (Tsai, 2919; Vogt et al., 2019), which may result in the limited generalizability. In contrast, given the rapid development of generative artificial intelligence (AI)-empowered (e.g., ChatGPT) social robots in the recent three years (2020–2023), our paper would provide updated and timely synthesis on RALL research published from the time span of 2004–2023 that can inform scholarly activity on the state-of-the-art of social robots in language education. Second, some potential moderators had to be overlooked due to the small sample sizes, such as intervention duration and specific language domains. Third and importantly, while Lee and Lee (2022) had included limited moderators (e.g., roles of robots, interaction type, and control conditions), this meta-analysis had not yet been grounded with a solid theoretical framework. Existing studies (Banaeian & Gilanlioglu, 2021; Hsu & Liang, 2021; Tai, 2022; Tai & Chen, 2022) demonstrated that such factors as subjects, objectives, community, rules, division of labor, and tools play a crucial role in RALL, necessitating an Activity Theory- (AT-)driven meta-analysis having those dimensions with larger sample sizes for further investigation.
Current Study
Based on the existing gaps, this meta-analysis that includes more studies aims to examine the overall effects of RALL on EFL learning while addressing some of the complexities regarding the outcomes of RALL. Moreover, compared with previous relevant studies that were not supported by a well-grounded theoretical framework, this study adopted AT as a framework to provide a more systematic analysis of some key moderators of RALL.
AT originally referred to desired outcomes mediated by psychological tools, including subjects, objects, and mediating artifact (Vygotsky, 1978). Later, considering the interrelationships between the subjects and their community, the community was added (Leont’ev, 1981). Furthermore, to describe the interactivity between two or more activity systems, Engeström (1987) proposed that the interaction process includes not only the subjects, objects and tool, but also rules, community and division of labor. The rules refer to norms, guidelines, and social relations within a community. Additionally, the division of labor involves the task distribution among authhcommunity members (Engeström, 2001).
Considering that the advantage of AT is to bridge the gap between the subjects and the community through the mediating activity, it serves as a working framework for the general human-technology interaction (Lin et al., 2019) and the specific human-robot interaction (Tlili, et al., 2020). Tlili et al. (2020) adopted AT to conduct the content analysis of robot-assisted special education, which would shed some theoretical light on this study. In this study (as shown in Figure 1), subjects refer to participants with different educational levels in the RALL research (Li, 2022a). Tool includes type of robot form due to its important impact on the effectiveness of RALL (Randall, 2020). Objects consist of specific language skills, e.g., listening, speaking, writing, and reading, etc. (Li, 2022b, 2024). Labor division refers to the distribution of duties among teachers and robots, including the types of teacher role and the types of robot role because various social roles of teachers or robots might influence the process of RALL (Alemi et al., 2015). Community is defined as the interactions between students and social robots in a community, involving in groups and one-one-one types (Uslu et al., 2022). Rules refer to principles in implementing RALL interventions, including intervention durations and control conditions. The proposed framework of Activity Theory for RALL.
To reiterate, this study aims to comprehensively meta-analyze the effects of RALL for EFL learners’ language skill development. Specifically, two research purposes are to be achieved. First, it aggregates the overall effect sizes of RALL for EFL learning performance. Second, drawing on the theoretical underpinnings of AT, it reports the moderating effects of educational levels, types of robot form, types of robot role, language skills, types of teacher role, interaction types, intervention durations, and control conditions on the overall effect. Consequently, two questions are to be addressed as follows:
What is the overall effect size of RALL for EFL skill development?
How do AT-related moderators, such as educational levels, types of robot form, types of robot role, language skills, types of teacher role, interaction types, intervention durations, and control conditions, affect the aggregated effect size?
Methodology
Data Collection
Motivated by the preferred reporting items for systematic reviews and meta-analyses (PRISMA) (Moher et al., 2009), the following retrieval procedures were strictly observed. First, drawing on the insights of the recently published reviews (Uslu et al., 2022; Randall, 2020; Tlili et al., 2020; van den Berghe et al., 2019), a combination of the following robot-related and language learning-related keywords integrated with Boolean operators was performed, i.e., (robot-assisted language learning OR RALL OR robot applications OR education robots OR social robots OR chatbot) AND (EFL OR English OR language OR language performance OR language achievement OR reading OR writing OR listening OR speaking OR vocabulary OR education OR language skill OR language ability). Second, the systematic literature retrieval was conducted via several online databases (e.g., Web of Science, ScienceDirect, Springer, ProQuest, Wiley, ERIC, Scopus), search engines (Google Scholar and Baidu Scholar), and related journals on EduTech (e.g., Journal of Educational Computing Research, Journal of Computer Assisted Learning, Educational technology and Society, Interactive Learning Environments, Computers and Education, British Journal of Educational Technology, Computer-Assisted Language Learning, ReCALL, Language Learning and Technology, System, Computational Linguistics, Learning Media and Technology, Education and Information Technologies, Journal of Computing in Higher Education, Technology Pedagogy and Education, Mobile Media and Communication, International Journal of Social Robotics, International Journal of Humanoid Robotics, IEEE Transactions on Learning Technologies) and language education (e.g., Language Teaching, Language and Education, Linguistics and Education, Journal of Language Identity and Education, Foreign Language Annals, Reading and Writing, Journal of Research in English). Third, to avoid the insufficient search of a significant portion of the relevant literature, backward and forward citation search based on the related reviews (e.g., Uslu et al., 2022; Mubin et al., 2013; van den Berghe et al., 2019) was used. The inclusion and/or exclusion criteria were diagramed in Figure 2 as follows: (1) Thirty-four studies were excluded for the following reasons: Thirty studies are inaccessible to acquire full tests, and four studies are conference abstracts. (2) Studies included should be (quasi-) experimental that examined the effectiveness of RALL on EFL skill development. These studies (k = 16) that used social robots in other disciplines (e.g., mathematics, technology, native language) were excluded. For instance, the study (Mubin et al., 2012) that aimed to evaluate robot artificial language was excluded. (3) Studies should report experimental results of language skills measured by standardized examinations or researcher-designed tests. Those publications (k = 24) that investigated learners’ perceptions and pedagogical or theoretical recommendations were excluded. For instance, a qualitative study (Jeon, 2022) that explored young learners’ perspective of AI chatbot affordances in the EFL classroom was excluded. (4) Studies should have sufficient data to calculate effect sizes, such as mean, standard deviations (SDs), sample sizes, t value, or F value. The studies (k = 10) without sufficient data for effect size calculation were excluded. For instance, Lin et al. (2022) examined the impact of educational robots on enhancing EFL vocabulary learning. However, they only reported total test scores rather than mean or SDs. Therefore, this study was excluded. Flow diagram for the search and inclusion of studies.

Coding Scheme
The Descriptive Information of Coding Scheme.
Effect Size Calculation
Hedges’ g was used to calculate effect sizes due to small sample size cases included in the present meta-analysis (Lipsey & Wilson, 2001). The equation for calculation is formulated as follows
Mean T , n T , and SD T represent the mean, sample size, and standard deviation of the treated group, respectively; Mean C , n C , and SD C represent the mean, sample size, and standard deviation of the control group, respectively (Hedges & Olkin, 1985).
Outlier Diagnosis
Potential outliers that yielded extremely large effect sizes should be excluded from calculation (Higgins et al., 2019). According to Lipsey and Wilson (2001), potential outliers refer to extreme effect sizes that are more than three SDs from the mean of all the effect sizes. In this case, one study (g = 3.99, Hsu et al., 2021) was excluded, resulting in a total of 29 valid studies for the forthcoming analysis.
Publication Bias Analysis
Studies with positive findings are more likely to be published, which may result in publication bias (Rosenthal, 1991). For this purpose, visual (i.e., funnel plot, see Figure 3) and mathematical (i.e., Egger’s test) checks were both employed. Egger’s test is a linear regression method, which was used to test the funnel-plot symmetry. “It uses the standardized estimate with size effect as a dependent variable and the inverse of the standard error as an independent variable. If the intercept is significantly different from zero, the estimate of the effect is considered biased” (Shi et al., 2017, p. 15). In this study, Egger’ s test provided statistical evidence (p = .001), suggesting the existence of publication bias. Moreover, the ‘trim and fill’ method was used to test and adjust publication bias (Møller & Jennions, 2001). The mean effect size after trimming and filling for zero or five missing studies respectively was g = .692 (95% CI [.486, .897]) and g = .816 (95% CI [.604, 1.029]) (both ps < .05) compared with an original estimate of g = .692 (95% CI [.486, .897]). The conclusion is therefore reliable even if it is assumed that five studies are missing owing to publication bias. Funnel plot of the selected studies.
Results
Overall Analysis Results
Overall Effect Size Results of the Included Studies.
Note. k = number of effect sizes; N = number of participants; g = Hedges’ g; LL = lower limit, UL = upper limit; 95% CI = 95% confidence interval; ***p < .001.
Homogeneity Analysis Results
A between-study Q test was executed to examine whether there was substantial variability in the outcomes of the primary studies and the need for moderator analyses (Borenstein et al., 2005, 2009). In this study (see Table 2), Q-value was 325.71 with p < .001, indicating that moderator analysis was needed.
Moderator Analysis Results
The moderator analysis was conducted to examine the moderating effect of six moderators, including educational levels, intervention durations, types of robot form, types of robot role, types of teacher role, and interaction types. In Table 2, two moderators–educational levels and intervention durations–were found to have a moderating effect on the overall effect sizes, whereas the other moderators did not find a significant moderating effect.
Subjects
Educational Levels
There were three educational levels, primary (k = 26), secondary (k = 6), and tertiary (k = 15). The significant between-group difference of educational levels was found, Q between = 5.56, p = .038. According to Table 2, RALL was found to be significantly effective among learners of secondary (g = 1.098, 95% CI [.580, 1.617]), and primary (g = .791, 95% CI [.535, 1.047]) levels, but those of tertiary education level (g = .329, 95% CI [−.043, .701]) did not obtain a significant result.
Objects
Language Skills
There were six language skills involved in previous RALL research, listening (k = 5), reading (k = 2), speaking (k = 11), writing (k = 1), vocabulary (k = 18), and language in general (k = 10). Table 2 showed that RALL was effective to develop learners’ listening (g = .772, 95% CI [.307, 1.236]), speaking (g = .709, 95% CI [.245, 1.172]), vocabulary (g = .637, 95% CI [.244, 1.103]), and language in general (g = .797, 95% CI [.393, 1.201]), rather than other two skills, viz. reading (g = .591, 95% CI [−.279, 1.461]), and writing (g = .398, 95% CI [−.143, .939]). However, there was no statistical difference among these language skills, Q between = 1.61, p = .900.
Rules
Intervention Durations
Three categories of intervention durations were involved: short (k = 19), intermediate (k = 14), and long (k = 14). As shown in Table 2, both intermediate (g = .933, 95% CI [.487, 1.379]), and long intervention duration (g = .871, 95% CI [.466, 1.277]) achieved large effect sizes, much higher than that of short intervention duration (g = .377, 95% CI [.162, .592]) with a significant between-group difference, Q between = 7.73, p = .021.
Control Conditions
The moderator effect of control conditions did not reach a significant level, Q between = .32, p = .572, with moderate effect sizes for both the traditional methods (g = .713, 95% CI [.480, .946]) and other technology (g = .577, 95% CI [.164, .989]), suggesting that the effects of RALL studies are unlikely to be biased by the varying configurations of the control groups.
Community
Interaction Types
Interaction types can be roughly divided into two categories, in groups (k = 16) and one-on-one (k = 31). The effect size of in groups (g = .944, 95% CI [.560, .329]) was large, and one-on-one (g = 0.568, 95% CI [.333, .803]) had a medium effect size. In addition, there was a marginally significant between-group difference, Q between = 2.67, p = 1.102.
Labor Division
Types of Teacher Role
Teacher roles consisted of three categories, no teacher (k = 34), tutor (k = 11), and assistant (k = 2). While the effect of assistant (g = 1.023, 95% CI [−.216, 2.263]) was not observed, tutor (g = .790, 95% CI [.469, 1.110]) achieved a large effect size, and no teachers (g = .643, 95% CI [.384, .903]) had a medium effect. No between-group difference was found, Q between = .74, p = .692.
Types of Robot Role
Social robots can play three roles in RALL, including assistant (k = 14), peer (k = 7), and tutor (k = 26). While peer (g = .604, 95% CI [−.035, 1.243]) was found to be non-significant, significant effects of assistant role (g = .946, 95% CI [.642, 1.251]), and tutor role (g = .579, 95% CI [.300, .859]) had been obtained. No between-group difference was observed, Q between = 3.20, p = .202.
Tool
Types of Robot Form
According to social robots’ appearances, robot forms can be divided into three categories, anthropomorphic (k = 31), mechanomorphic (k = 14), and zoomorphic (k = 2). Table 2 showed that mechanomorphic (g = .976, 95% CI [.519, 1.432]) and anthropomorphic social robots (g = .554, 95% CI [.335, .772]) were found to be effective, while the effectiveness of zoomorphic social robots (g = .867, 95% CI [−.547, 2.282]) was not achieved. Moreover, there was no between-group difference, Q between = 2.77, p = .250.
Discussion
Moderator Analysis Results.
Note. k is the number of independent studies available for the certain variable; Hedges’ g is effect size; CI is short for confidence interval; p-value indicates significance
The first research question dealt with the overall effect size of RALL for EFL skill development compared with non-RALL conditions, which is in line with most previous RALL research (e.g., Aidinlou, 2014; Kanero et al., 2018; Kory Westlund et al., 2017; Mubin et al., 2013), suggesting social robots can serve as effective EFL tools. The beneficial effects of RALL might be explained by a number of reasons: First, RALL affords EFL learners with rich authentic language contexts and human-humanoid interactions. Confronted with the limited exposure to authentic language contexts in the traditional methods, social robots can play various social roles (Randall, 2020), and provide natural communication for EFL learners through imitating human-to-human conversations (Han et al., 2008; Lin et al., 2022; Tai & Chen, 2022). Second, RALL promotes EFL learners’ positive FL emotions and motivation. In other words, social robots have positive effects on EFL learners’ affects, such as motivation (Donnermann et al., 2020; Wu et al., 2015), interest (Han et al., 2005, 2008; Lee et al., 2011), and confidence (Lee et al., 2011; Hong et al., 2016; Tsai, 2019), which have been found to be positively related to learning achievements (Chen et al., 2020). Social robots can support interactive learning activities and allow learners to practice repeatedly in a relaxed atmosphere (Lee et al., 2011), which might be beneficial to increase learner motivation, gain confidence, and lower anxiety (Alemi and Bahramipour, 2019; Vogt et al., 2019). Third, Dual Coding Theory (DCT) assumes that the interconnection of a verbal channel and a non-verbal channel can decrease the cognitive load on working memory (Paivio, 1991). Social robots can relieve the cognitive load on working memory through associating verbal behaviors (e.g., offering feedback, saying someone’s name) (Tlili et al., 2020; van den Berghe et al., 2019) and nonverbal behaviors (e.g., gesture, body movement) (Banaeian & Gilanlioglu, 2021; Donnermann et al., 2020), which can facilitate language development in turn (Li, 2021). Fourth, social robots can support personalized learning, since social robots can update timely learning content and afford feedback to meet personalized needs and preferences (Belpaeme et al., 2013).
The second research question was to gain a deeper understanding of how potential AT-related moderators have an impact on the effectiveness of RALL. The significant moderators of educational levels and intervention durations are discussed first. On the one hand, moderator analysis of educational levels showed that RALL was effective for learners of the primary and secondary education levels, rather than those of tertiary level, resonating with previous studies that RALL might be more beneficial for young learners than older learners (Kanda et al., 2004; Zhang et al., 2021). To further elucidate, while adults are cognitively mature (Li, 2021), children who are cognitively premature are easily attracted by the RALL’s entertainment features, such as game-based activities, and various presentation types (Eimler et al., 2010; Park et al., 2011). Thus, compared with younger generations, adult learners are likely to psychologically regard social robots as childish toys rather than effective learning tools (Kanda et al., 2004). On the other hand, moderator analysis results of intervention durations indicate that longer durations are favored, as both intermediate and long durations achieved large effect sizes, while short duration had a small effect size, echoing Sung et al. (2015), because learners need time to be acquainted with educational technologies (Li, 2022a). Furthermore, while both intermediate and long durations achieved significant effect sizes, the effect of intermediate duration was larger than that of long duration might be accounted for by the novelty effect (Li, 2023b). In other words, after learners get acquainted with the RALL tools in a short duration period, they would have a great interest in social robots at the initial use during the intermediate duration, but their interest would wear off if they are familiar with social robots and continue using them for the long duration, warranting the needs to overcome RALL’s novelty effect in the future research.
Apart from those significant moderators, other non-significant moderators of language skills, control conditions, interaction types, types of teacher role, types of robot role and types of robot form, are valuable to be discussed. The moderator analysis of language skills showed RALL was effective to develop learners’ listening, speaking, vocabulary, and language in general, rather than other skills, viz. reading and writing. The limited effects of RALL on reading and writing might rest on RALL’s technological limitations. For instance, Hong and colleagues (2016) investigated the effect of RALL on EFL learners’ overall (listening, speaking, reading and writing) performance using a quasi-experiment design, and claimed that RALL could only enhance their listening and speaking, as opposed to reading and writing skills, which is partly due to technological unavailability of reading and writing tasks. Likewise, Lin and Chang (2020) adopted a mixed method to explore the effect of RALL on post-secondary writers’ writing skills. Results demonstrated that writing social robots still bear some complex technological realities, such as difficulties in understanding abstract writing topics and learners’ natural language. Another possible reason might be due to the small number of selected studies (kreading = 2 vs. kwriting = 1), which might limit the generalizability of moderator analysis results drawn for these two language skills, warranting further empirical attempts. For control conditions, while both traditional methods and other technologies have reached moderate effect sizes, there was no statistical difference between these two conditions, suggesting that RALL for EFL development is robustly effective, regardless of the difference in control conditions. In other words, although researchers adopted different methods (e.g., ordinary paper-and-pencil, noncomputer-based media, or web-based instruction) as control conditions, all those included studies were conducted under rigorous quasi-experimental designs, lending support to the accuracy and reliability of the quasi-experiment results. For interaction types, “in groups” type obtained a larger effect size than the “one-on-one” type, because language learners in groups can communicate and cooperate with their group peers, and collaborative learning could sustain and increase their motivation, which leads to improved language performance (Chen et al., 2020). When it comes to types of teacher role and robot role, both results indicated the only involvement of robot (viz. no availability of teachers and RALL serving as tutors) generates the smallest effect sizes as compared other types. One possible reason is that social robots are unlikely to fully replace human teachers due to technology limitations (Belpaeme et al., 2013), and other internet-related problems (Li, 2022c; 2023c). For instance, learners who lack information literacy skills would be puzzled about how to use social robots without teachers’ assistance (MacIntyre & Vincze, 2017). For types of robot form, besides the ineffective of zoomorphic social robots, the beneficiary effects of anthropomorphic and mechanomorphic social robots were obtained, and mechanomorphic social robots had a higher effect size that of anthropomorphic social robots. Compared with anthropomorphic social robots having a human-like torso or facial features, the ineffective of zoomorphic robots having animal-like features might be explained by the poverty of resemblance to real-life world (Yang & Li, 2023), as social robots that bear higher resemblance to real-life world would bring young learners with more immersive and embodied learning experiences (Banaeian & Gilanlioglu, 2021; Tsai, 2019), along with enhanced learning motivation and performance (Randall, 2020). Moreover, the highest effect of mechanimorphic social robots might be due to the fact that most mechanomorphic social robots are chatbots, which would customize courses for language learners (Hsu et al., 2021).
Implications
Implications for Teachers
First, to effectively integrate RALL into teaching activities, teachers should understand the pedagogical advantages of RALL. For instance, teachers can use social robots to attract learners’ attention (Aidinlou, 2014; Eimler et al., 2010), and provide personalized feedback through recording and saving learners’ language performance (Randall, 2020), which might enhance learners’ second language (L2) engagement (Chang et al., 2010). Second, what is noteworthy is that teachers should realize the challenges when employing social robots to facilitate language learning, such as the limited positive effect of short intervention duration (Engwall & Lopes, 2022; Uslu et al., 2022), difficulties in using social robots as tutors (van den Berghe et al., 2018), and suffering from a lack of interest in RALL for adult learners (van den Berghe et al., 2019). Third, teachers should avoid the novelty effect of RALL use and try to maximize the utilization of RALL, a possible solution is to optimize curriculum design by integrating language learning content well with the RALL’s functionalities to sustain learners’ motivation, interest and engagement in the long durations (Li, 2023b).
Implications for Designers
First, RALL designers should pay more attention to personalized learning materials based on users’ educational levels, and integrate interactive and collaborative features into RALL (Kim et al., 2022), since collaborative learning with interaction features can support social interactivity and enable learners to communicate with peers, thus enhancing their language skills (Alemi & Haeri, 2020; de Wit et al., 2018), and develop more functions for assistant robots because social robots as teacher assistants have the potential to facilitate language learning. Second, designers should be aware of the importance of different robot forms on learning outcomes according to learners’ individual differences (e.g., age, gender, and preference, etc.). For instance, anthropomorphic social robots having a human-like torso or facial features that bear a high resemblance to the real-life world would be psychologically more acceptable among young learners, which would improve their L2 learning outcomes in turn. Third, as RALL tools for reading and writing are scarce and have limited effects, the increasing demand in RALL for L2 development calls for designers to pay a particular eye on learners’ gratification of personalized needs in terms of reading and writing skill development.
Implications for Researchers
First, to further explore the potential of RALL, researchers should attempt to include large sample sizes to guarantee compelling statistical power since most related studies tend to have small sample sizes, which led to the lack of a large-scale field study on this topic (van den Berghe et al., 2019). Second, few studies focused on EFL writing and reading skills, suggesting researchers should pay more attention to these skills in further research. Third, given that most relevant experiments were conducted in kindergarten or primary schools and few focused on secondary education, warranting further investigations in this regard.
Conclusion
This study meta-analyzed the effects of RALL on EFL learners’ language skill development based on the theoretical framework of AT. The results revealed that the overall effect size was significantly moderate, suggesting RALL has more positive effects on EFL language development than non-RALL conditions. Moderator analysis results showed that educational levels and intervention durations were significant moderators.
Notwithstanding the meaningful findings that have been obtained, this study is not without its limitations. A main limitation lies in the limited studies that have been included. While we have followed the PRISMA guidelines to conduct the systematic literature retrieval, it is inevitable to miss some studies due to the limited number of search pathways or search techniques. Further endeavors that include more studies should be done in the future.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Scientific Research Funds from Hunan Provincial Ministry of Education (grant number 23A0040).
Data Availability Statement
Data are available from the corresponding author upon reasonable request.
Author Biographies
.
.
