Abstract
Aims and Objectives:
This study examined whether bilingual language control is shared across comprehension and production or supported by separate mechanisms in a dissimilar language pair. Specifically, we investigated whether Chinese learners of English as a foreign language (EFL) show asymmetric switch costs in comprehension-to-production (C-to-P) language switching.
Design:
Thirty-three Chinese EFL learners completed a joint language switch paradigm combining picture naming (production) and semantic categorization (comprehension). Language switching in picture naming was triggered by interleaved categorization in Chinese or English.
Data and Analysis:
Reaction times were analyzed using linear mixed-effects models.
Findings:
Participants showed a switch benefit in the L1 naming block but no switch effect in the L2 block. This asymmetry is inconsistent with both shared- and separate-control accounts and instead suggests a major contribution of modality control, particularly in the L1 block.
Originality:
The study provides initial evidence for asymmetric switch benefits in C-to-P language switching and highlights language similarity as a key factor of modality and cross-modality language control.
Implications:
We propose the Multimodal Inhibitory Control (MIC) model as a framework for predicting control demands across tasks, language pairs, and dominance profiles.
Keywords
Introduction
In English as a Foreign Language (EFL) classrooms, Chinese learners may switch between their L1 and L2. Although such switching is common, it is often experienced as effortful, raising questions about the control processes that support fluent switching.
For bilinguals, using one language can coactivate representations from the other, creating cross-language competition and potential interference (Moon & Jiang, 2012; van Heuven et al., 1998). The mechanisms that enable bilinguals to regulate or resolve interference from the nontarget language are typically referred to as language control (Declerck & Philipp, 2015). Influential accounts, such as the Inhibitory Control model (IC model; Green, 1998), propose that language control is achieved primarily through inhibition of the nontarget language, a view that is consistent with the switch costs reported in language switch tasks (Costa & Santesteban, 2004; Grainger & Beauvillain, 1987).
Despite extensive discussions of language switching and language control (for reviews, see Declerck & Koch, 2023; Declerck & Philipp, 2015), most studies have examined comprehension and production separately, neglecting the multimodal nature of real-life language use. Whether the two processes rely on shared or distinct language control mechanisms remains debated, motivating further investigation.
Production- and Comprehension-Based Language Switching
Language control is commonly examined using language switch paradigms (Declerck & Philipp, 2015). In traditional language switch tasks, bilinguals perform a task trial by trial in a mixed-language block. Trials following the other language are switch trials, whereas trials following the same language are nonswitch trials. The central measure is the language switch cost, defined as slower and/or less accurate responses on switch trials than on nonswitch trials. Such costs are commonly interpreted as the additional effort required to overcome residual inhibition from the previous language (Green, 1998), although inhibition-based accounts differ in the assumed locus of inhibition (Declerck & Philipp, 2015; Jiang, 2023).
Across studies, language switch paradigms vary in dimensions such as cueing method, input/output modality, and task demands, all of which can modulate the presence and magnitude of switch costs (Declerck & Philipp, 2015). At a broad level of modality, tasks are either production-based (e.g., picture/digit naming) or comprehension-based (e.g., language/lexical decision, semantic categorization).
Production-based language switch tasks are typically implemented with visual cues indicating the target language on each trial. Such cued production paradigms have reliably yielded language switch costs (Costa & Santesteban, 2004; Philipp et al., 2007). For example, in Meuter and Allport (1999), participants named digits presented in either blue or yellow, with each color corresponding to one of the tested languages. The results suggest asymmetric switch costs, with a larger cost when switching into L1 than into L2, aligning with the IC model (Green, 1998).
The IC model (Green, 1998) was developed primarily to explain control in production-based switching. It posits that language tasks are regulated by competing task schemas (e.g., L1 production vs. L2 production) and that selecting the target schema involves inhibiting nontarget schemas. On switch trials, the newly relevant schema must overcome residual inhibition from the prior schema, yielding switch costs. Because inhibition is reactive, the more active dominant-language (typically L1) schema receives stronger inhibition and thus shows larger costs when re-engaged in unbalanced bilinguals (Meuter & Allport, 1999), whereas costs tend to be more symmetric in balanced bilinguals (Costa & Santesteban, 2004).
In comprehension-based language switch tasks, the stimuli are typically written or spoken words, and explicit language cues are often unnecessary because the stimulus language is inherently specified (Olson, 2017; Thomas & Allport, 2000). For instance, in Thomas and Allport’s (2000) Experiment 1, participants performed a mixed-language lexical decision task, responding Yes to words written either in English or French and No to nonwords, without any language cue. Accordingly, the BIA model (Grainger & Dijkstra, 1992; van Heuven et al., 1998) predicts switch costs in comprehension-based language switching due to lexical-level inhibition. It assumes an integrated bilingual lexicon where word activation engages lexical competition and activates a language node. This node can exert top-down influence by activating words in the target language and inhibiting words in the nontarget language, leaving residual inhibition when the language changes.
In contrast to this prediction and the typically robust costs in cued production-based switching, comprehension-based findings are more variable: Some studies report switch costs (Coumel et al., 2024; Thomas & Allport, 2000), whereas others find null effects (Declerck et al., 2019; Jackson et al., 2004). Declerck et al. (2019) argued that whether switch costs emerge depends on the degree of parallel language activation. Production tends to always coactivate translation equivalents, but comprehension sometimes lacks sufficient overlap to engage strong control. This aligns with the IC model’s assumption (Green, 1998) that the more active L1 schema receives stronger inhibition, and highlights additional factors beyond dominance, such as language similarity (Chen et al., 2020), that may shape control demands.
Together, both comprehension and production are hypothesized to rely on inhibition for language control; however, they appear to differ in the proposed locus of control and in the robustness of observed switch costs, raising the question of whether they are governed by shared or distinct mechanisms.
Language Switching: Combining Comprehension and Production
Language switch costs have been replicated in both production (Costa & Santesteban, 2004; Meuter & Allport, 1999) and comprehension (Coumel et al., 2024; Thomas & Allport, 2000). These findings suggest that both modalities can recruit inhibition-based control, in line with the IC model (Green, 1998) and the BIA model (Grainger & Dijkstra, 1992; van Heuven et al., 1998). It therefore seems plausible that language control may be shared across modalities, such that activating one language during comprehension could induce inhibition to production in the other language.
Evidence for such a shared-control possibility comes from joint language switch tasks, where both language and modality may be switched across successive responses (Gambi & Hartsuiker, 2016; C. Li & Gollan, 2022; Peeters et al., 2014). For instance, Peeters et al. (2014) examined how comprehension influences production in French–English bilinguals. Within each trial, participants completed a comprehension task (language decision in Experiment 1; semantic categorization in Experiment 2) followed by picture naming (production). Production remained in a single language within each block, while the interleaved comprehension appears in either language. This interleaved design led to natural language switching without cues. Results showed asymmetric switch costs: significant costs emerged when switching from L2 comprehension to L1 production, but not from L1 comprehension to L2 production. Using a listening–speaking joint paradigm, Gambi and Hartsuiker (2016) likewise reported comprehension-to-production (C-to-P) switch costs in Dutch–English bilinguals’ L1 when two participants took turns naming pictures.
C. Li and Gollan (2022) revisited this issue with two joint switch experiments. Experiment 1 involved switching between word naming (comprehension) and picture naming (production), which minimized task switch demands. Experiment 2 involved switching between word categorization (comprehension) and picture naming (production), replicating Peeters et al.’s (2014) design. Participants were Spanish–English bilinguals with varying dominance profiles. Experiment 2 revealed a switch cost for the dominant language and a switch benefit for the nondominant language, particularly among less balanced bilinguals, while Experiment 1 reported no effects. These findings highlight the role of task similarity in shaping C-to-P switch effects, plausibly by modulating the degree of parallel language activation (Declerck et al., 2019).
These C-to-P switch effects are often explained within the BIA framework (Grainger & Dijkstra, 1992; van Heuven et al., 1998), especially later formulations emphasizing that language nodes can be activated both bottom-up and top-down (Grainger et al., 2010). Under this view, comprehension-driven (bottom-up) activation of one language node can inhibit the competing node, so switching to production in the other language requires overcoming residual inhibition. Thus, switch costs in joint paradigms are expected.
However, other findings challenge this prediction and have been taken to support a separate-control account. For instance, also concerning how comprehension influences production, Liu et al. (2021) employed a listening–speaking joint switch task in which Chinese–English bilingual participants randomly alternated between auditory semantic categorization (comprehension) and picture naming (production). In both cued and voluntary conditions, language switch costs emerged only on within-person (modality nonswitch) trials, but not on cross-person (modality switch) trials, suggesting that comprehension in one language does not interfere with production in another (see also Liu et al., 2023).
Additional evidence consistent with separate mechanisms comes from studies comparing independent production- and comprehension-based switching (Ahn et al., 2020; Blanco-Elorrieta & Pylkkänen, 2016). For instance, among Mandarin–English bilinguals, de Bruin and Xu (2023) found that response–stimulus interval (RSI) modulated switch costs in picture naming (production) but not in animacy judgment (comprehension), supporting separate mechanisms.
Taken together, evidence from independent tasks mostly supports separate control, whereas joint paradigms yield mixed findings, sometimes consistent with separate control and sometimes with shared control. These discrepancies underscore the role of task design in shaping observed switch effects.
The Present Study
The conflicting findings across studies can be understood from two perspectives. The first concerns task designs. Within the Adaptive Control Hypothesis (ACH; Green & Abutalebi, 2013; Green & Wei, 2014), different interactional contexts are assumed to shape language control strategies. In this sense, single-modality designs typically involve sustained comprehension or sustained production and may therefore be better suited to detecting modality-specific mechanisms, whereas joint designs more closely approximate interactional situations in which comprehension and production are coordinated, and therefore mostly target the mechanisms that enable the two modalities to work cooperatively.
However, task-design differences alone do not provide a unified account of the divergent findings across joint switching studies. For example, in production-based language switching, auditory stimulus has been argued to trigger larger switch costs than visual stimulus (Wong & Maurer, 2021), yet C-to-P switch costs were observed in Gambi and Hartsuiker (2016) but not in Liu et al. (2021, 2023), despite both involving listening–speaking switching.
A second perspective concerns language similarity, which may determine how strongly the nontarget language is coactivated and thus how much interference must be regulated. Language similarity can be operationalized as cross-language overlap at typological levels including script/orthography, phonology, and vocabulary (see Chai & Bao, 2023; Schepens et al., 2013). Greater similarity has been linked to stronger parallel activation and cross-language competition (Momenian et al., 2024), increasing control demands in both comprehension (Chen et al., 2020) and production (Mosca, 2019). This pattern aligns with the view that language-control demands scale with parallel activation (Declerck et al., 2019). Extending this logic to cross-modality switching, more similar pairs should be more likely to engage C-to-P language control, whereas dissimilar pairs may rely more on modality-specific control.
In reading–speaking joint designs, language similarity can be defined as the degree of cross-language overlap between resources used in word comprehension and those used in spoken production, especially at orthographic, phonological, and lexical levels. Studies supporting shared control (Gambi & Hartsuiker, 2016; C. Li & Gollan, 2022; Peeters et al., 2014) have tested pairs (Dutch–English, Spanish–English, French–English) that share the Roman alphabet and many cognates or interlingual homographs/homophones (Schepens et al., 2013), which may foster comprehension-driven phonological coactivation and thus C-to-P language control. By comparison, Chinese is logographic and enjoys less systematic orthography-to-phonology mapping compared with alphabetic languages (Perfetti et al., 1992), offering little orthography-based overlap (e.g., cognates) with English. In addition, Chinese and English have less phonological overlap 1 and few interlingual homophones (Yang et al., 2017). This reduced overlap may weaken comprehension-driven coactivation of production and make cross-modality switch costs less likely (cf. Liu et al., 2021, 2023).
In sum, whether language control mechanisms are shared between comprehension and production can be tested most directly in joint paradigms that examine whether comprehension in one language induces inhibition to production in the other language. Nevertheless, findings from joint switch studies remain mixed, and language similarity may offer an account of this variability beyond task design alone. To test the language-similarity account while holding task design constant, the present study examined whether Chinese EFL learners, a Chinese–English bilingual sample (cf. Liu et al., 2021, 2023), show C-to-P switch effects in Peeters et al.’s (2014) reading–speaking joint design (see also C. Li & Gollan, 2022).
The research questions were as follows:
Do Chinese EFL learners exhibit a language switch cost from L2 comprehension to L1 production?
Do they show a language switch cost from L1 comprehension to L2 production?
Are the effects asymmetric across directions?
Accordingly, two competing predictions were derived. If the cross-modality shared language control in reading–speaking switching is modulated by language similarity, then Chinese–English bilinguals should show absent switch costs, consistent with Liu et al. (2021, 2023). In contrast, if such control is robust across language pairs, then switch costs should emerge, especially when switching into L1 production due to the language dominance effect (Green, 1998), consistent with C. Li and Gollan (2022) and Peeters et al. (2014).
Methods
Participants
A total of 37 Chinese EFL learners were recruited. All participants completed a language history questionnaire via the Language History Questionnaire 3.0 (LHQ3; P. Li et al., 2020) and the LexTALE proficiency test (Lemhöfer & Broersma, 2012).
The final sample consisted of 33 participants (31 females; age: M = 19.88, SD = 1.75; see Data analysis for participant exclusion criteria). L1 Chinese was dominant over L2 English in both age of acquisition and proficiency (ps < .05). See Table 1 for detailed language background measures.
Participants’ Language Backgrounds of the Final Sample (N = 33).
Note. Self-rated proficiencies were made on a 7-point scale. For each language, LHQ3 (P. Li et al., 2020) can compute (a) a proficiency score, defined as the average self-rating across four skills (listening, speaking, reading, and writing), and (b) a dominance score, which integrates self-rated proficiency and self-reported daily language use (hours per day). The L2 to L1 dominance ratio is calculated by dividing the L2 dominance score by the L1 dominance score. Values below 1 indicate stronger L1 dominance, values around 1 represent a more balanced bilingual profile, and values above 1 indicate relatively stronger L2 dominance. An asterisk (*) indicates significant L1–L2 differences (p < .05).
Materials
For the production task (i.e., picture naming), we selected 55 black-and-white line drawings from standardized picture sets (Severens et al., 2005; Snodgrass & Vanderwart, 1980), excluding animal items and Chinese–English interlingual homophones. Animal pictures were excluded to avoid potential confusion between the target naming task and the semantic categorization required in the comprehension component (“animal” vs. “non-animal”). All items were pretested for name agreement, familiarity, complexity, and frequency following the approaches of Snodgrass and Vanderwart (1980) and Zhang and Yang (2003). Pictures not reaching 80% name agreement were replaced, resulting in a final set of 50 pictures with high familiarity, low complexity, and high word frequency in both languages. The dominant Chinese names consisted of one to three characters, while the dominant English names contained one to three syllables.
For the comprehension task (i.e., semantic categorization), 50 Chinese nouns (20 animals, 30 objects) and their English translations were selected, none overlapping with the picture set or being interlingual homophones. All Chinese words consisted of two characters, while the English translations contained three to six letters. Pretests confirmed high familiarity and frequencies in both languages.
No significant cross-language differences were found in characteristics for either pictures or words, except for greater naming diversity in Chinese pictures. Word lists and detailed descriptive statistics appear in Supplementary Material.
Procedure
The procedure was adapted from the reading–speaking joint paradigm of Peeters et al. (2014).
The experiment was conducted using E-Prime 3.0. Participants were tested in a quiet language laboratory. Responses were recorded with a Chronos box. They completed two experimental blocks, one naming pictures in L1 and one in L2, with block order counterbalanced. Each block began with picture familiarization and practice. Each trial consisted of a fixation (200 ms), a blank (100 ms), a word (1500 ms), another blank (500 ms), and a picture (3,000 ms), followed by a short blank (100 ms). When a word appeared, participants judged whether it referred to an animal or a non-animal object using the left or right button on the Chronos box, as in Peeters et al. (2014, Experiment 2). When a picture appeared, the participant named the picture in the target language aloud, responding as quickly as possible and withholding responses if uncertain. Stimuli disappeared upon response detection. The trial structure is illustrated in Figure 1.

Trial structure of the L1 naming block.
Each block contained 100 pseudo-randomized trials, with a break halfway. The 50 pictures were presented twice per block, spaced by at least 25 trials. The 50 Chinese words and their English equivalents were also pseudo-randomized, with restrictions to prevent immediate repetition and long runs of the same categorization language.
Data Analysis
Four participants were excluded prior to analysis: one for failing to follow task instructions, two for not completing the full experiment, and one for an error rate above 50% in the L2 semantic categorization task (Gambi & Hartsuiker, 2016). Table 2 presents the mean reaction times (RTs) and accuracy rates by condition.
Mean Reaction Times and Accuracy Rates for Picture Naming Per Condition.
Note. The 95% confidence intervals are presented in brackets. RT = reaction time. ACC rate = accuracy rate.
Accuracy in the production task was uniformly high (>95%) and therefore reported descriptively only. RT data were preprocessed following standard procedures: trials with incorrect responses were excluded, and then those exceeding ±3SDs from a participant’s mean were further removed.
Analyses on RTs were conducted in R (version 4.3.1) using the lme4 package (Bates et al., 2015) with lmerTest (Kuznetsova et al., 2017). RTs were log-transformed to improve normality. The linear mixed-effects (LME) model included trial type (switch = 0.5, nonswitch = −0.5), language (L1 = 0.5, L2 = −0.5), and their interaction as fixed effects.
Results
Table 3 summarizes the results of the final LME model. A significant main effect of trial type was observed (β = −0.02, p < .01), with shorter RTs on switch trials (M = 790 ms, SD = 227 ms) than on nonswitch trials (M = 802 ms, SD = 236 ms); no significant main effect was found for naming language (p > .05). The interaction effect between trial type and naming language was not significant (p > .05).
Linear Mixed-Effects Model of Picture Naming Latencies.
To address the research questions, simple effect analyses were conducted. In the L1 naming block, a significant switch benefit was observed (β = 0.03, p < .05), with shorter RTs on switch trials (M = 784 ms, SD = 228 ms) than on nonswitch trials (M = 801 ms, SD = 245 ms). In the L2 naming block, no significant difference was found between switch and nonswitch trials (β = 0.02, p > .05). Thus, the main effect of trial type was driven primarily by the L1 naming block.
Furthermore, linear regression models were fitted to predict L1 and L2 switching effects from L2 proficiency scores, LexTALE scores, and L2-to-L1 dominance ratios. Results revealed marginal associations between the dominance ratios and L1 switching effects (β = 0.10, t = 1.87, p = .072), and between L2 proficiency scores, often treated as a proxy for language dominance degree (Costa & Santesteban, 2004), and L2 switching effects (β = −0.17, t = −1.82, p = .079), indicating that as bilingual balance (from less unbalanced to more balanced) increased, L1 switch benefits became smaller, whereas L2 switch benefits became larger. No other models reached significance (ps > .10).
Discussion
In the present study, Chinese EFL learners showed a language switch benefit in the L1 naming block and no switch effect in the L2 naming block. This pattern diverged from both of our predictions and from previous joint switch studies reporting costs in at least one block (C. Li & Gollan, 2022; Peeters et al., 2014) or null effects (Liu et al., 2021, 2023). Together, these findings highlight the variability of C-to-P language switch effects across tasks and populations.
Within the dominant shared-versus-separate control framework, such variability is difficult to reconcile. The shared account predicts switch costs, while the separate account predicts null effects. However, neither account explains the observed switch benefit in L1 production. Interestingly, C. Li and Gollan’s (2022) Experiment 2 also reported a benefit in nondominant language naming, suggesting that switch benefits may reflect a systematic phenomenon rather than an artifact. Crucially, recent evidence indicates that joint designs embed modality-control demands when the language does not change: Liu et al. (2023) showed that, even when language did not switch, switching between modalities (e.g., from comprehension to production) incurred additional cognitive demands than remaining in the same modality, reflecting a modality switch cost. 2 Thus, the language nonswitch condition is not a pure naming baseline, as it already involves C-to-P modality control.
Accordingly, we propose that switch effects in joint paradigms reflect the contrast in control demands between switch conditions: switch trials may require C-to-P language control to resolve cross-language interference across modalities, while nonswitch trials may demand C-to-P modality control to regulate competition between comprehension and production on language nonswitch trials. Extending Declerck et al.’s (2019) proposal, we suggest that the magnitude of C-to-P language-control demands and modality-control demands also scales with the degree of parallel activation. In the following analyses, we discuss parallel activation in relation to three candidate moderators identified in the prior literature: task similarity (C. Li & Gollan, 2022), language similarity (Chen et al., 2020), and language dominance (Costa & Santesteban, 2004).
The Language Switch Benefit in the L1 Naming Block
In the L1 naming block, participants responded faster on language switch than nonswitch trials, indicating a switch benefit rather than a cost. We interpret this benefit as reflecting greater C-to-P modality-control demands on nonswitch trials than C-to-P language-control demands on switch trials.
The critical source of modality control lies in the dual role of written words, which afford both silent reading and reading aloud. Importantly, phonological information is often coactivated during silent reading in both alphabetic and non-alphabetic scripts (Perrone-Bertolotti et al., 2012; Ziegler et al., 2000). In the nonswitch condition, L1 semantic categorization was triggered by L1 written words. Although the task required comprehension, these words could also trigger a tendency to read aloud. For highly proficient L1 speakers in a dense L1-speaking context, this may coactivate the L1 naming schema. To complete categorization, the L1 naming schema must be inhibited, leaving residual inhibition that slows subsequent picture naming. Thus, modality control was heavily engaged in the nonswitch condition. By contrast, in the switch condition, L2 word categorization preceded L1 picture naming. Given minimal cross-language overlap between Chinese and English, L2–English written words are less likely to coactivate L1 naming schema, unless intended experimental manipulations (Zhou et al., 2010). Consequently, a switch benefit should be expected in L1 naming for Chinese EFL learners.
Exploratory analyses suggested that the L1 switch benefit, interpreted here largely as a marker of modality-control demands, became smaller as participants’ L2-to-L1 dominance ratio increased, consistent with evidence that greater L2 exposure is associated with reduced L1 switch effects (Bonfieni et al., 2019). A dominance-sensitive interpretation is that increasing balance reduces the dominance of L1, and thus the inhibition it attracts, while strengthening the dominance of L2, potentially yielding smaller L1 effects but larger (more L1-like) L2 effects (cf. Green, 1998). Alternatively, increasing balance may reflect a mechanism shift: Costa and Santesteban (2004) reported largely symmetric switching not only between two dominant languages but also between L1 and a much weaker L3, suggesting reduced reliance on dominance-sensitive inhibitory control and more reliance on language-specific selection. Under this dominance-insensitive view, switch effects of both languages would be expected to diminish with increasing balance regardless of language dominance.
Taken together, the L1 switch benefit provides indirect evidence for the involvement of modality control in joint C-to-P paradigms. For dissimilar pairs such as Chinese–English, reduced cross-language overlap may limit comprehension-driven coactivation of production in the other language, making switch costs in L1 production less likely and leaving room for benefits or null effects depending on language balance degree or the broader task configuration (see also Liu et al., 2021, 2023).
The Null Effect in the L2 Naming Block
Unlike the L1 naming block, the L2 block showed no significant switch effect, consistent with Peeters et al. (2014) but differing from C. Li and Gollan’s (2022, Experiment 2) report of a switch benefit in nondominant-language naming. We interpret this null effect as reflecting a relative balance between C-to-P language-control demands on switch trials and C-to-P modality-control demands on nonswitch trials.
In the L2 naming block, C-to-P language-control demands on switch trials may be modest, given limited coactivation and learners’ overall L2 non-dominance. Modality-control demands on nonswitch trials may likewise be limited: even in a mixed-language context, increased L2 activation does not necessarily trigger an L2 naming schema during silent reading due to differences in orthography-to-phonology mapping regularity between Chinese and English (Perfetti et al., 1992). This idea is consistent with reports of weaker phonological involvement and greater reliance on orthography among Chinese learners than learners with an alphabetic L1 (Wang et al., 2003). In our sample, this reduced naming automaticity is further suggested by the proficiency profile and task performance: L2 speaking proficiency was lower than L2 reading and L1 speaking (ps < .001), and L2 semantic categorization was slower and less accurate than L1 (β = 1.34, p < .001; β = −0.13, p < .001). With both demands modest and roughly comparable, no reliable switch effect is expected.
Exploratory analyses further suggested that L2 switch effects varied with language profile: higher L2 proficiency was associated with larger switch benefits, which we also interpret as a marker of greater modality-control demands. One interpretation is that more proficient EFL learners may engage more phonology in English visual word recognition (X. Li & Chen, 2024), thereby increasing coactivation of the L2 naming schema during comprehension, as expected under a dominance-sensitive account (Green, 1998). Under this scenario, stronger L2 coactivation could increase cross-language competition and thus the modality-control demands that must be overcome when switching into L2 production.
Overall, the L2 null effect highlights how language (dis)similarity can shape switch outcomes in joint language switching. In the present study, L2 modality-control demands may be attenuated because Chinese and English differ markedly in orthography-to-phonology mapping regularities, which may lead Chinese EFL learners to rely relatively more on orthography and less on phonology during English visual word recognition (Wang et al., 2003). With C-to-P language-control demands on switch trials also modest, a null effect in L2 naming is most likely, though a benefit is also possible as L2 dominance increases.
Asymmetric C-to-P Language Switch Benefits
The present findings revealed a qualitatively asymmetric pattern: a language switch benefit in the L1 naming block but no reliable switch effect in the L2 block. This benefit–null asymmetry differs from the cost asymmetries reported in reading–speaking joint switching studies (C. Li & Gollan, 2022; Peeters et al., 2014). We argue that the mechanisms underlying these patterns are not the same.
For Chinese EFL learners with a dissimilar pair, we propose that the benefit–null asymmetry primarily reflects how language dissimilarity constrains cross-language coactivation during comprehension and cross-modality coactivation in L2, thereby reshaping the relative control demands across blocks. Due to language dissimilarity, C-to-P language-control demands may remain modest in both blocks. The key difference likely lies in modality-control demands embedded in the language nonswitch baseline: in the L1 block, L1 written-word comprehension can coactivate naming-related phonology (Ziegler et al., 2000), increasing the need to inhibit the naming schema during categorization and slowing subsequent naming; in the L2 block, silent comprehension of English is less likely to engage an L2 naming schema in Chinese EFL learners due to weaker phonological involvement in L2 visual word processing (Wang et al., 2003), keeping modality-control demands relatively low. Together, this makes modality-driven benefits more likely in L1 but null effects more likely in L2, especially among unbalanced bilinguals.
In alphabetic pairs with considerable cross-language overlaps, we assume high control demands on both switch and nonswitch trials. In addition, because intralingual phonology coactivation during reading is robust in both alphabetic languages (cf. Wang et al., 2003), modality-control demands may be broadly comparable across languages, whereas C-to-P language-control demands may be more dominance-sensitive. Consistent with this, C. Li and Gollan (2022) found dominant-language costs but nondominant-language benefits in less balanced bilinguals, a pattern that can be captured if dominant-language C-to-P language control exceeds modality control, while nondominant-language C-to-P language control falls below it (Figure 2), in line with dominance-sensitive control assumptions (cf. Green, 1998).

Schematic illustration of how language balance may modulate control demands in joint reading–speaking language switching for alphabetic language pairs.
Differences between C. Li and Gollan (2022) and Peeters et al. (2014) may further reflect the language balance effect. C. Li and Gollan (2022) showed that both dominant-language costs and nondominant-language benefits diminish as bilinguals become more balanced. We therefore propose that, in typologically similar pairs, both modality-control and cross-modality language-control demands are high in less balanced bilinguals and jointly decrease with increasing balance (Costa & Santesteban, 2004), yielding a shift from cost–benefit toward null effects. 3 Under this account, the cost–null pattern in Peeters et al. (2014) among “advanced students of English” may reflect a more balanced sample in which nondominant-language demands have already converged, whereas the dominant language may still show a residual gap (Figure 2).
Taken together, these comparisons suggest that asymmetry patterns in joint reading–speaking switching are jointly shaped by language similarity and language dominance. For dissimilar pairs such as Chinese–English, reduced phonological engagement in L2 comprehension can weaken L2 modality control, yielding a benefit–null asymmetry as observed here, whereas for alphabetic pairs, robust phonological engagement in both languages may suggest a balance-driven continuum from cost–null to null–null.
The Multimodal Inhibitory Control (MIC) Model
The benefit–null asymmetry is better explained by the contrast in control demands than by the shared-versus-separate dichotomy. The BIA model (Grainger & Dijkstra, 1992; van Heuven et al., 1998) can support cross-modality language control via language nodes (Grainger et al., 2010), but the distinction between bottom-up and top-down routes cannot explain competition between comprehension and production. The IC model’s (Green, 1998) schema-based inhibition extends to modality-level competition and carryover, yet it does not specify what determines the magnitude of these control demands beyond language dominance.
To address this issue, we propose the Multimodal Inhibitory Control (MIC) model, integrating the IC model’s (Green, 1998) schema-based inhibition with the parallel language activation assumption (Declerck et al., 2019). For clarity, the model assumes four partially overlapping schemas (i.e., L1 comprehension, L1 production, L2 comprehension, and L2 production) that compete for activation. 4 Three forms of control can be distinguished:
Following Declerck et al. (2019), the amount of inhibition required to control a competing schema depends on the degree of coactivation. In reading–speaking joint designs, three moderators are especially relevant for the comparatively understudied modality control and cross-modality language control. First, task similarity. C. Li and Gollan (2022) showed that reducing task-switch demands can eliminate C-to-P switch effects, consistent with lower parallel activation when tasks are more similar. Second, language similarity. Our comparison between Chinese–English (present study) and alphabetic pairs (C. Li & Gollan, 2022; Peeters et al., 2014) suggests that similarity is crucial not only for cross-modality language control as predicted but also for L2 modality control in reading–speaking contexts. Third, language dominance. Both categorical dominance asymmetries in unbalanced bilinguals (cf. Meuter & Allport, 1999) and continuous balance (cf. Costa & Santesteban, 2004) can modulate control demands.
Synthesizing these factors yields different predictions for dissimilar versus similar pairs. For dissimilar pairs with distinct orthography-to-phonology mappings (e.g., Chinese–English), L2 modality control may be minimal in less balanced bilinguals; with increasing balance, it may increase (X. Li & Chen, 2024) and later decrease (Costa & Santesteban, 2004). L1 modality control is expected to be substantial for less balanced bilinguals and to decrease with increasing balance. For alphabetic pairs, both modality control and cross-modality language control should be engaged in both languages but diminish as balance increases. Furthermore, dominance is assumed to modulate cross-modality language control more than modality control, given robust phonological coactivation in both languages (Wang et al., 2003).
Although the MIC framework fits the present study, a further refinement is needed to accommodate the dominant-language switch costs consistently observed in alphabetic pairs (C. Li & Gollan, 2022; Peeters et al., 2014). In such pairs, modality-control demands in the dominant language nonswitch baseline should be substantial because alphabetic orthography readily activates phonology of the same language. Yet dominant-language switch costs indicate that cross-modality language-control demands exceed modality-control demands.
To explain these findings, we further hypothesize three possible pathways through which cross-modality language control can arise (see Figure 3):

Three pathways through which cross-modality language control can arise on language switch trials in the L1 naming block.
This refinement helps explain why dominant-language switch costs often arise in alphabetic reading–speaking joint designs (C. Li & Gollan, 2022; Peeters et al., 2014): direct effects, modality-control mediation, and possible comprehension-control spillover can make switch-trial demands exceed the nonswitch baseline. Importantly, this does not contradict our assumption that cross-modality language control is modest in Chinese–English. Although high L1 modality-control demands could indirectly influence L2 naming, this indirect route is likely attenuated and thus insufficient to produce a reliable switch cost.
In sum, the MIC model attributes switch effects in reading–speaking joint designs to the combined aftereffects of modality control and cross-modality language control, shaped by factors including task similarity, language similarity, and dominance. It further distinguishes a decisive direct pathway from attenuated indirect pathways, helping explain dominant-language costs in alphabetic pairs. Importantly, the model cautions against treating C-to-P switch costs as a direct marker of shared language control. Future work should better isolate the nonswitch baseline (Liu et al., 2021, 2023) while retaining the ecological validity of joint paradigms (Gambi & Hartsuiker, 2016; Peeters et al., 2014), and test how proposed factors modulate modality and cross-modality language control.
Conclusion
This study tested whether language similarity modulates C-to-P switching. Among Chinese EFL learners, we found an asymmetric pattern: a switch benefit when switching from L2 comprehension to L1 production, but no switch effect when switching from L1 comprehension to L2 production. This pattern contrasts with the cost asymmetries reported for alphabetic pairs (C. Li & Gollan, 2022; Peeters et al., 2014).
To explain these findings, we proposed the MIC model, which distinguishes language control, modality control, and cross-modality language control, and operationalizes their demands in terms of parallel coactivation. In joint C-to-P paradigms, observed switch effects are argued to reflect the magnitude difference between modality-control demands embedded in language nonswitch trials and cross-modality language-control demands on language switch trials. In line with our prediction, the present results suggest that language dissimilarity attenuates cross-modality language control. They also point to an additional role of language similarity in shaping modality control, especially in L2, plausibly by restricting phonological engagement during visual word comprehension (Wang et al., 2003).
Several limitations should be noted. First, our individual-differences analyses were exploratory and yielded only marginal associations; larger samples and finer-grained language-profile measures are needed to test how dominance shapes cross-modality switching. Second, our sample was gender-imbalanced, calling for more balanced recruitment. Third, the present experiments used relatively familiar and frequent words; whether the MIC model generalizes to less familiar or lower-frequency stimuli remains to be tested.
Supplemental Material
sj-docx-1-ijb-10.1177_13670069261456515 – Supplemental material for Asymmetric Language Switch Benefits From Comprehension to Production in Chinese Learners of English as a Foreign Language (EFL): Rethinking Cross-Modality Language Control
Supplemental material, sj-docx-1-ijb-10.1177_13670069261456515 for Asymmetric Language Switch Benefits From Comprehension to Production in Chinese Learners of English as a Foreign Language (EFL): Rethinking Cross-Modality Language Control by Dongni Wang, Shifa Chen, Yue Qin, Shaoxin Wang and Renhui Hou in International Journal of Bilingualism
Footnotes
Acknowledgements
We would like to thank Chunyan Xiao for recruiting participants and collecting data. We would also like to thank all the participants of the study.
Ethical Considerations
Ethics approval was granted by the Ethics Review Committee of the College of Foreign Languages, Ocean University of China.
Consent to participate
All participants signed written consents prior to data collection.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Shandong Social Science Planning Project (key project) (21BYYJ02).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
