Abstract
Previous studies have shown that native language backgrounds of both talkers and listeners affect speech intelligibility. This study investigated the interlanguage speech intelligibility benefit (ISIB) that is also known as the advantage in understanding second language (L2) speech that non-native listeners have over native listeners when both groups listen to speakers with the same first language (L1). More specifically, it looked into the ISIB in relation to the Arabic language spoken by both native Arabic (NA) and English speakers. To this end, 15 NA and 15 native English (NE) subjects listened to Arabic produced by two groups of talkers (5 NA talkers and 5 NE talkers) and were asked to identify the words they heard. Results showed evidence for the interlanguage speech intelligibility for listeners (i.e. NE listeners were more accurate than NA listeners at identifying English-accented Arabic speech). However, no evidence for the ISIB for talkers was found. That is, NE listeners did not find English-accented speech more intelligible than NA speech. By examining L2 learners’ recognition of L2 consonant contrasts, the study contributes to the body of knowledge on L2 sound acquisition as well as the ISIB literature. It also provides some insight into the problem of adult L2 learners’ ability to learn novel L2 consonants.
Keywords
I Introduction
Non-native speakers (NNSs) typically speak a second language with a foreign accent that is defined as ‘a pronunciation deviating from what a native speaker (NS) expects another NS to sound like’ (Major, 2018, p. 1). Such deviation results from the use of non-standard speech features that differ from those used by native speakers of the language including variations in different aspects of speech production such as intonation, stress patterns, and rhythm.
To better understand speech intelligibility and the factors that impact it; therefore, researchers have begun to examine the intelligibility of native and non-native speech (NNS) for both native and non-native listeners (NNLs). In this regard, prior research studies have distinguished two types of factors. The first type is related to speakers that may contribute to speech intelligibility such as speech rate (Derwing & Munro, 2001), stress, pausing and intonation (e.g. Tajima et al., 1997; Trofimovich & Baker, 2006), neighborhood density (Imai et al., 2005), and word frequency (Bradlow & Pisoni, 1999). The second type, however, is related to listeners that include language proficiency (Stibbard & Lee, 2006), context (Field, 2004), and speech familiarity (Hsieh & Tsao, 2022; Saito et al., 2019).
Researchers have also discussed the presence of an interlanguage linguistic system that contains speech features which differ from learners’ first language and their second language. That is, the intelligibility of second language (L2) learners’ speech is related to the phenomenon of interlanguage speech intelligibility benefit (ISIB) that refers to the extent to which listeners can understand speech produced by non-native speakers (Bent & Bradlow, 2003; Hayes-Harb et al., 2008; Gosselin et al., 2022). This is a crucial aspect of second language acquisition (SLA) because it can greatly impact the success of communication in multilingual settings. As the world becomes increasingly globalized and multilingual, studying the ISIB has become increasingly important. Thus, there are several studies reporting on the ISIB in relation to several languages such as English (e.g. Bent & Bradlow, 2003; Imai et al., 2005; Major et al., 2002; Munro et al., 2006; Stibbard & Lee, 2006; Xie & Fowler, 2013), and Dutch (e.g. van Wijngaarden, 2001; van Wijngaarden et al., 2002). However, to our knowledge, there are no studies on the Semitic languages that are less studied in previous SLA research. Hence, this article aims to build on recent work in this area and explore the ISIB in the context of the Arabic language. More particularly, it investigates how speech produced by L2 learners of Arabic is perceived by both native Arabic (NA) listeners and other NNLs. By doing so, it contributes to an understanding of L2 speech intelligibility and adds to the literature of the L2 sound acquisition.
II Literature review
1 Foreign accent and accented speech
When learning an L2, one of the biggest challenges that learners encounter is acquiring the new language’s phonemic system, which includes the set of sounds used to differentiate between words in that language (Albrechtsen et al., 1980; Selinker, 1972). That is, each language has its own set of phonemes, or distinct sounds, which can vary widely from one language to another. As a result, L2 learners often transfer the phonemic system of their native language to their L2, leading to the production of non-native sounds, stress patterns, and intonation patterns. For example, the English phoneme /ð/ (as in ‘the’ or ‘this’) does not exist in the French language, so French learners of English may substitute it with the closest French phoneme, which is /z/. This substitution can result in non-native-like pronunciation, as the two sounds are not acoustically identical, which can make it difficult for native English (NE) speakers to understand, especially if they are not familiar with the phonemic inventory of French (Laeufer, 1996).
Similarly, a Japanese speaker experiences difficulties in distinguishing the English /l/ and /r/ sounds, which are two separate sounds in English but not in Japanese which may lead to errors in comprehension and miscommunication (Guion et al. 2000; Hirata, 2004). Likewise, Jenkins (2000) found that Japanese, Taiwanese, and Korean speakers frequently produced phonetic substitutions of English sounds that led to unintelligible speech for listeners who did not share the speaker’s native language. In Spanish, moreover, voiceless stop consonants have a much shorter voice onset time (VOT) than in English and therefore, the delay between the release of the stop and the onset of voicing for voiceless consonants in Spanish is much shorter than in English. As a result, native Spanish speakers tend to produce English voiceless stop consonants with shorter VOT values than NE speakers do (Sole, 2018). Together, it can be concluded that there are difficulties in producing and perceiving sounds that are not present in one’s first language, which may pose a challenge in cross-linguistic communication.
It is well documented, moreover, that foreign-accented speech is less intelligible for native listeners who are not familiar with the foreign accent that involves variations in pronunciation, stress patterns, and intonation. Thus, it differs from what listeners are accustomed to hearing (Bradlow & Bent, 2002; Munro,1998; Munro & Derwing,1995; Sereno et al., 2016; Thomson, 2018). Due to these variations, it becomes difficult for listeners to recognize individual words and phrases, or to comprehend the overall meaning of a sentence. Foreign accents also cause confusion around sounds that are similar in the listener’s first language but distinct in the speaker’s language (Major et al., 2002). Furthermore, researchers have become interested in understanding how listeners’ language backgrounds can affect speech intelligibility. In this respect, Smith (1992) has pointed out that ‘native speakers were not found to be the most easily understood, nor were they, as subjects, found to be the best able to understand the different varieties of English’ (p. 88).
Prior research has indicated that speech intelligibility is affected not only by the talker’s pronunciation, but also by listener variables such as vocabulary knowledge (Politzer, 1976), familiarity with the topic being discussed (Gass & Varonis, 1984), and the listener’s ability to process speech sounds that are not part of their native language (Jenkins, 2000, 2002). In other words, a listener who has a limited vocabulary in the language being spoken may struggle to understand a non-native speaker who uses complex or unfamiliar words. Similarly, a listener who is not familiar with the topic being discussed may find it harder to follow the speaker’s train of thought. As a result, researchers have increasingly focused on understanding how listener variables interact with talker variables to affect speech intelligibility. This includes studying factors such as the listener’s cognitive abilities, their prior exposure to the non-native language, and their level of experience in listening to non-native speech. By gaining a better understanding of the role of listener variables, researchers aim to develop more effective techniques for enhancing the intelligibility of non-native speech (Lee et al., 2022; van Wijngaarden et al., 2002).
2 ISIB in SLA research
Research in SLA has shown that non-native listeners may have an advantage in understanding non-native speech spoken by speakers of their own native language compared to native listeners (Dokovova et al., 2022; Fishero et al., 2023; Hayes-Harb et al., 2008; Korpal & Sobkowiak, 2020; Munro et al., 2006). This advantage arises because non-native listeners are more attuned to the phonetic and prosodic features of their native language, including the patterns of stress, intonation, and rhythm. As a result, they may be better able to recognize these features in non-native speech, even when spoken by non-native speakers. This phenomenon is called by Bent and Bradlow (2003) as the ISIB that they defined as ‘the benefit afforded by a shared interlanguage between a non-native talker and listener’ (p. 1602). Their study explored how the native language backgrounds of non-native listeners affected their ability to comprehend non-native speech. The authors thus recruited sixty-three participants whose native languages were English, Chinese, and Korean, in addition to a mixed group of learners with other native language backgrounds. All participants listened to English speech produced by NE, Chinese and Korean speakers and were asked to transcribe the sentences they heard. The results showed that native Mandarin and Korean listeners were able to comprehend non-native accented English speech spoken by high-proficiency speakers with whom they shared the same native language that they found as intelligible as speech produced by NE talkers. This phenomenon was identified by the researchers as ‘matched ISIB’. The study also found evidence for mismatched ISIB, as non- native listeners from the mixed native language group found the speech produced by the high-proficiency Korean talker, with whom they did not share the same native language, equally or more intelligible than the speech produced by the NE talker.
There is also a handful of studies that have explored the ISIB advantage. For example, van Wijngaarden et al.’s (2002) study investigated the role of language proficiency in ISIB effects, moreover, native Dutch listeners listened to Dutch and English and German sentences produced by speakers of these languages. Findings revealed native Dutch listeners were able to perceive German talkers rather than English talkers that was explained in light of listeners’ low proficiency in German and high proficiency in English. It is thus implied that ISIB can be detected when the listeners have a limited level of proficiency in the target language. Besides, Imai et al. (2005) used eighty familiar English words produced by one NE talker and one native Spanish talker that were used in a word recognition task in which NE and Spanish listeners participated. Listeners in the two groups were asked to write down each word they heard. The results indicated that native Spanish listeners were more accurate than NE listeners at identifying words produced by native Spanish talkers.
Similarly, Munro et al. (2006) conducted a study in which they asked participants from four different first language (L1) backgrounds (Cantonese, Japanese, Spanish, and English) to transcribe and rate English sentences produced by L2 learners of English from four different native language backgrounds (Cantonese, Japanese, Polish, and Spanish). The study reported no advantage for native Cantonese or Spanish listeners in understanding Cantonese-accented English and Spanish-accented English, respectively, compared to NE listeners. However, native Japanese listeners found the speech of native Japanese talkers more intelligible than did NE listeners. There was also a moderate to high correlation between participants’ intelligibility scores and their ratings of both comprehensibility and accentedness ratings, suggesting that native and non-native listeners shared a similar response to speech, regardless of their different linguistic backgrounds. The results, therefore, suggested that listeners’ ability to understand L2 speech is affected by the resemblance between the listener’s native language and the accent of the speaker. Additionally, listeners’ ability to understand speech in a second language is related to their perception of comprehensibility and accentedness. That is, listeners’ ability to understand L2 speech is influenced by the similarity between their native language and the speaker’s accent. Thus, it is important to consider both the listener’s native language and the speaker’s accent when studying second language speech perception.
Furthermore, Hayes-Harb et al. (2008) identified two types of ISIB in previous research: ISIB for listeners (ISIB-L) and ISIB for talkers (ISIB-T). While the first one refers to the phenomenon where non-native listeners find non-native speech more intelligible than native listeners do, the latter refers to a phenomenon where non-native listeners find the speech of non-native talkers more intelligible than the speech of native speakers. These two types of ISIB are distinct and independent phenomena, and they have been observed in various studies using different languages and populations (e.g. Imai et al., 2005; Munro et al., 2006; Stibbard & Lee, 2006). To explore these two types of ISIB, Hayes-Harb and colleagues asked NE listeners and native Mandarin listeners to identify individual English words in a forced-choice word identification task that were produced by six native speakers of English and six native speakers of Mandarin. The results provided evidence for the ISIB-L, as native Mandarin listeners understood native Mandarin-accented English speech better than NE listeners did. However, no evidence for the ISIB-T was found, as the speech of NE speakers was not more intelligible to native Mandarin listeners than the speech of native Mandarin talkers. The findings revealed that the intelligibility of L2 speech can be influenced by the correlation between the listener’s mother tongue and the speaker’s accent. Specifically, non-native listeners may have an advantage in understanding non-native speech, but this advantage may not extend to the speech of native speakers.
In the same vein, Fishero et al. (2023) examined how native and non-native listeners at various competency levels perceived L1 and L2 speech. In order to investigate the effects of ISIB on both accuracy and response time, two groups of listeners (i.e. 36 NE listeners and 36 native Mandarin listeners) took part in a lexical judgment task in which they listened to NE and Mandarin-accented English. Findings revealed evidence for both ISIB-L and ISIB-T effects for native Mandarin listeners. It was concluded that the advantage non-native listeners had in comprehending non-native speech was found to rely on multiple factors, including listener proficiency, speaker proficiency, phoneme attributes, and the acoustic properties of individual speech tokens.
In contrast, other studies reported no evidence for the two types of ISIB. For instance, Major et al. (2002) investigated the effect of shared native language between the speaker and listener on their listening comprehension performance in English. The study used a special version of the listening section of the TOEFL called the listening comprehension trial test. The participants were four groups of 100 listeners from four different L1 backgrounds: Chinese, English, Japanese, and Spanish. The participants listened to lectures delivered in English by speakers from the same native language as the listeners and then answered questions based on the lectures. The results showed that both native and non-native listeners had lower scores on the listening test when they listened to non-native speakers of English. However, native Spanish listeners had a small advantage in intelligibility when hearing Spanish-accented English speech compared to other non-native English listeners. Hence, the study revealed that having the same native language as the speaker can provide a slight advantage in understanding English speech. The total listening comprehension ratings did not, however, significantly change as a result of this advantage. The results also revealed no evidence for the ISIB-L, as NE listeners were more accurate at recognizing words produced by native Chinese and Japanese talkers than both native Chinese and Japanese listeners.
In addition, Stibbard and Lee (2006) used a word recognition test in which fifty participants in four groups (10 native Korean, 10 NA, 10 NE, and 20 with mixed L1 backgrounds) listened to sentences produced by five talkers from different L1 backgrounds (2 native Korean, 2 NA and 1 NE). The results showed no evidence for the ISIB-T for low-proficiency non-native speakers and no strong evidence for the ISIB-T for non-native speakers with high-proficiency. Nevertheless, NE talkers were more intelligible to all listeners in the four groups than non-native talkers.
Likewise, Rasmussen (2007) explored the ISIB in relation to the production of the English /p/ and /b/ by both NA talkers (7 males) and NE talkers (5 male, 6 female). The study also examined how their productions of the two target phonemes were perceived by both NE and NA listeners. Stimuli were minimal pairs, such as bat and pat, presented in a forced-choice word identification task. The results demonstrated no evidence for the ISIB-L, as NA listeners were less accurate at identifying the Arabic-accented English than were the NE listeners. In contrast, NA listeners found Arabic-accented English words as intelligible as NE words that provides support for the ISIB-T. Moreover, the study’s examination of acoustic measurements including stop closure duration and VOT further supports this conclusion, as it indicates that NA listeners had difficulty consistently differentiating between the English /p/ and /b/ sounds.
Additionally, Xie and Fowler’s (2013) study reported an ISIB-L for two groups of Mandarin learners of English who were able to identify Mandarin-accented English better than NE listeners. However, their findings revealed no significant difference in the performance of Mandarin listeners in the US, regardless of ‘whether the speaker was Mandarin-accented or not’ (p. 11). In other words, the study provided no supporting evidence for the ISIB-T. More recently, Kang et al. (2018) found no evidence for both ISIB-L and ISIB-T where Chinese and Mexican listeners scored lower when listening to speech produced by talkers with whom they shared the same native language.
More recently, Lee et al. (2022) investigated the effects of talker type (native vs. non-native) and experience with the target dialect – North American English (NAE) vs. Standard Southern British English (SSBE) – on L2 listeners’ perception. To this end, Korean listeners were tested on how well they could distinguish 12 English vowels spoken by native and non-native (L1-Korean) talkers of North American English (NAE) and Southern Standard British English (SSBE). The tests were given to two groups of native Korean listeners: L1-Korean ESL listeners in the USA and L1-Korean ESL listeners in the UK. The findings displayed a significant influence of L2 listeners’ experience with the target dialect on the accuracy of the target vowel recognition. No ISIB-talker effects were noticed for the L1-Korean listener groups despite differing degrees of exposure to the two English dialects.
To sum up, previous research has reported inconsistent results with respect to the ISIB. Whereas some studies provided evidence in support of the existence of either the ISIB-L, ISIB-T, or both, others provided evidence against either benefit. Except for few studies that focused on Dutch speech (e.g. van Wijngaarden, 2001; van Wijngaarden et al., 2002), previous research has primarily examined the ISIB in relation to English speech produced by learners with diverse native languages, including Arabic, Chinese, Japanese, Korean and Spanish. Given the varying results reported in previous studies and the limited range of language investigated, it is apparent that more research is needed to more fully understand the ISIB phenomenon. Therefore, the present study was designed to address this issue.
3 Arabic perception research
Arabic is a Semitic language that is widely spoken by 422 million people in the Middle East (Ryding, 2013). Due to its significance, it is one of the major six languages of the United Nations (Younus, 1977). It is characterized by one main variety, Modern Standard Arabic (MSA), and a number of spoken dialects that vary from one Arab state to another (Ferguson, 1959). Whereas MSA is acquired through formal education and is only used in formal contexts including formal speeches and news bulletins, the spoken colloquial varieties are the medium of communication in daily life conversations in the Arab world. Although the last two decades have witnessed a rapid increase in the number of language programs teaching Arabic at different universities in North America (Abdalla, 2006), Arabic is still classified as a less studied language (Shehata, 2015, 2023; Taha, 2007) compared to other languages such as English (Gordon & Darcy, 2016), German (O’Brien, 2014), and modern languages including Spanish as well as French (Imai et al., 2005; Mroz, 2018). As result, very little research has been published on the acquisition of Arabic by non- native speakers. While some studies have examined grammar (see Ryding, 2013) and receptive (e.g. Zouhir, 2013) and productive skills (e.g. Shehata, 2021), fewer have addressed Arabic perception (e.g. Al Mahmoud, 2013; Hayes-Harb & Durham, 2016; Lababidi & Park, 2017; Shehata, 2018). For instance, Al Mahmoud (2013) explored the perception of Arabic consonant contrasts including /x–ɣ/, /t–d/, and /x–ħ/ by English speakers. The results revealed that learners were better able to distinguish contrasts that have English equivalents, such as /t–d/ and /θ–ð/ than the emphatic-plain contrasts that do not have English equivalents, such as /x–ɣ/, /ħ–h/, and /x–ħ/.
Besides, Hayes-Harb and Durham (2016) explored NE speakers’ perception of Arabic emphatic-plain contrasts using vowel identification and discrimination tasks. The findings indicated that when distinguishing between Arabic emphatic and plain consonants, English speakers tended to depend more on following vowels rather than consonants. Moreover, the accuracy of their discrimination was higher when the following vowel was /æ/, followed by /u/ and /i/. Moreover, Lababidi and Park (2017) examined how factors like prosodic location, vowel duration and vowel quality affect the perception of Arabic consonants by NE speakers without prior exposure to Arabic. To this end, 14 Arabic consonants /t, d, ð, s, tʕ, dʕ, ðʕ, sʕ, q, x, ɣ, ħ, ʕ, ʔ/, presented in CV and VCV syllable structures were used in two main tasks: an identification and a goodness-of-fit rating. It was found that both vowel duration and quality significantly influenced category selection for some of the target consonants, but there was no discernible impact of prosodic location on the CV and VCV structures with two syllables.
Similar findings were reported by Shehata’s (2018) study that displayed some Arabic consonants such as /t–tʕ/, /h–ħ/, and /s–sʕ/ to be more difficult to perceive by NE speakers of Arabic than others (i.e. /ħ–ʕ/). More recently, Aldamen and Al-Deaibes (2023) investigated L2 learners’ perception of Arabic emphatic consonants using mono- and disyllabic words that contrasted the target consonants in initial position. The study consisted of two main experiments: one examined perception, and the other, a production experiment, explored an analysis of acoustic aspects of learners’ production as well. The results indicated that learners performed well on both the perception tasks in both the pre-and post-tests, suggesting that they had a good grasp of the phonetic differences between plain and emphatic consonants, as they were able to accurately perceive both types.
To address the gap in previous research and arrive at a clearer understanding of the phenomenon in question, the current study investigated the ISIB in relation to Arabic spoken by both native and non-native Arabic speakers.
III Current study
The main purpose of the present study is to investigate the flickering phenomenon of ISIB in relation to the Arabic language. Therefore, the following research question is addressed by this study.
• Is there evidence for the ISIB-L and/ or the ISIB-T for native Arabic and native English participants listening to English-accented and native Arabic speech?
To address this question, the intelligibility of three Arabic consonant contrasts (i.e. /h–ħ/, /s–sˤ/ and /t–tˤ/) in English-accented Arabic speech by both NA listeners and L2 Arabic listeners are examined. This study contributes to the existing body of knowledge on L2 sound acquisition as well as the ISIB literature by exploring the ISIB phenomenon in relation to the target Arabic contrastive consonants, which have not previously been looked at in this context.
IV Methods
1 Participants
a Talkers
The spoken materials used in this study were elicited from two groups of talkers. The first group included 10 NE participants (6 females, 4 males) who were enrolled in an intermediate and Advanced Arabic classes at a Mid-West American university. This means that they have studied Arabic for more than one year and their proficiency level ranged from intermediate-low to advanced. They ranged in age from 20 to 27 years (M = 22.9 years). The second group of talkers included 10 NA participants (5 females, 5 males) who were graduate students at the same university campus, and they ranged in age from 26 to 36 years (M = 30.7 years). Whereas NE participants were given Arabic language course credit for their participation in the study, NA participants voluntarily participated.
b Listeners
Two groups of listeners were recruited from the Arabic courses at the same campus. The first group included 15 NE participants (10 females, 5 males) who reported studying Arabic for more than one year and were given Arabic language course credit for their participation in this study. Listeners ranged in age from 19 to 29 years (M = 24.6) and their language proficiency level ranged from intermediate-low to advanced. The second group included 15 NA listeners (9 females and 6 males) who ranged in age from 18 to 29 years (M = 26.22) and participated voluntarily in the present study. No individual served both as a talker and a listener.
Learners of Arabic in this study were recruited from two Arabic language classes at a mid-west American university which used two different parts of Al Kitaab fii ta’allum al-’arabiyya textbook (Brustad et al., 2013a, 2013b) as the primary teaching material. These classes were conducted four times a week, with 50-minute sessions each, and focused on teaching all language skills, including listening, speaking, writing, and reading.
2 Stimuli
Six non-word minimal pairs were used that were of the form C1VC2, contrasting /h–ħ/, /s–sˤ/ and /t–tˤ/ in C1 (word-initial) position (i.e. حاث-هاث /haːθ/–/ħaːθ/; حوج-هوج /huːʒ/–/ħuːʒ/; -صاثساث /saːθ/–/sˤaːθ/; صوج-سوج /suːʒ/–/sˤuːʒ/; طاث-تاث /taːθ/–/tˤaːθ/; طوج-توج /tuːʒ /–/tˤuːʒ/). Each talker produced each of the target words in the context of the sentence, ‘I want to write the word haːθ and after that I will write ħaːθ.’ Each word was read two times in each sentence position (medial and final), resulting in a total of forty-eight individual nonword productions. Two tokens of each of these words were randomly extracted from the productions of each speaker using PRAAT software (Boersma, 2001). This resulted in a total of 480 tokens (12 words* two tokens * 20 talkers) that were used in a forced-choice word identification task described below.
3 Word identification task
The listeners performed a forced-choice word identification task. In each trial, the listeners heard an auditory word and simultaneously saw two words written in Arabic on a computer screen as presented in Figure 1. They were asked to identify which of the two words matched the auditory word they heard by pressing a right or left shift key on a computer keyboard. Crucially, each item involved an auditory word (e.g. huːg) and the written presentation of that word along with its minimal pair correspondent (e.g. huːg and ħuːg). In this way, it is possible to investigate the confusability of pairs of words differing only in the segments of interest (i.e. /h–ħ/, /s–sˤ/, and /t–tˤ/).

An example of what the screen looked like for the listeners for /huːʒ/–/ħuːʒ/.
DMDX experiment presentation software was used to present the visual and auditory stimuli and to collect responses (Forster & Forster, 2003). Each of the 480 word tokens was presented in random order once in each of two blocks of trials, for a total of 960 trails in the task (480 tokens/block * 2 blocks). Listeners had no time limit to respond to the auditory stimuli in the listening task and were given participant-controlled breaks between blocks. The task took approximately 45 minutes to complete. After completing the word identification task, listeners were asked to fill out a written questionnaire that included questions about their age, gender, and language experience.
V Data coding
Listeners’ responses on the word identification task were correct only when they matched words as intended by the talkers. For example, if participants listened to /haːθ/, saw /haːθ/–/ħaːθ/ on the screen and then pressed the left shift on the keyboard, their answer was counted as correct. Conversely, if they pressed the right shift, their answer was considered wrong as it did not match the talker’s intended word. The word tokens in each of the two blocks of trails were balanced, and the presentation positions of the two response options on the screen were balanced across trails (e.g. ‘haːθ’ appeared on the right side of the screen in one trail and ‘ħaːθ’ on the left side in another trail). Only participants’ correct responses were coded and analysed in the present study.
VI Results
Listeners’ responses were coded as correct if they matched the words intended by the talker, and proportion correct scores were calculated for each combination of listener and talker group. This data was analysed using a mixed-design analysis of variance (ANOVA) with listener group (two levels: NA and NE listeners) as a between-participants factor and talker group (two levels: NA and NE talkers) as a within-participants factor. This analysis showed a significant main effect of listener group (F(1,28) = 88,866, p < .001, partial η2 = .76), with performance by NE listeners (.92) more accurate than that of NA listeners (.67). There was also a significant main effect of talker group (F(1,28) = 79.646, p < .001, partial η2 = .74), with performance of NA talkers (.91) higher than that of NE talkers (.67). However, there was a significant interaction between talker and listener group (F(1, 28) = 39.685, p < .001, partial η2 = .58).
Follow-up pairwise analyses were conducted to explore the interaction of talker and listener group. For English-accented, the NE listeners were significantly more accurate (.79) than NA listeners (.63) at identifying words spoken by NE talkers (F(1, 28) = 161.398, p < .001, partial η2 = .53). This finding provides support for the ISIB-L. It was also found NA listeners’ performance (.96) was more accurate than NE listeners’ performance (.86) for NA speech (F(1, 28) = 46.800, p < .001, partial η2 = .51) (see Figure 2).

Word identification accuracy, organized by listener group and talker group.
On the other hand, NA talkers were found to be more intelligible than NE talkers by all listeners. Thus, the NA listeners were more accurate at identifying words produced by NA talkers (.96) than by NE talkers (.63; F(1,14) = 1056,321, p < .001, partial η2 = . 98). Additionally, NE listeners were significantly more accurate at identifying words produced by NA talkers (.86) than by NE talkers (.79; F(1,14) = 51,432, p < .001, partial η2 = .79). These findings did not provide support for the ISIB-T.
VII Discussion
The current study aimed at exploring both the ISIB-L and ISIB-T as related to the perception of some Arabic consonant contrasts (i.e. /h–ħ/, /s–sˤ/ and /t–tˤ/). To this end, two groups of listeners (native English and native Arabic) listened to individual Arabic non-words produced by both NE and NA talkers. The major finding of the present study was that NE listeners were significantly more accurate than NA listeners at identifying words produced by other NE talkers, providing evidence for an interlanguage speech intelligibility benefit for listeners (the ISIB-L). One possible explanation for this finding may relate to the shared phonetic and phonological knowledge between non-native talkers and non-native listeners who share the same native language background. This suggests that listeners who share the same native language background as the talker may be better equipped to understand interlanguage speech, and that this shared knowledge can help to overcome the challenges posed by accent and pronunciation differences (Fishero et al., 2023; Gosselin et al., 2022; Hayes-Harb et al., 2008; Lee et al., 2022). On the contrary, NA listeners did not share the same phonetic and phonological knowledge with NE talkers; therefore, their intelligibility of NE talkers was not equal to their intelligibility of NA talkers. This finding is consistent with that of Hayes-Harb et al. (2008), which demonstrated that native Mandarin listeners were significantly more accurate than NE listeners at identifying English words produced by native Mandarin talkers. It is interesting that both studies found evidence for the ISIB-L despite using different target languages (English vs. Arabic) and stimuli (real English minimal pairs vs. non-word minimal pairs). This suggests that the ISIB-L may be a robust phenomenon that is not limited to specific languages or types of stimuli. Taken together, the two studies provide converging evidence for the ISIB phenomenon and reinforce the claim that factors related to both talkers and listeners should be taken into consideration with respect to the phenomenon of non-native speech intelligibility.
Another interesting finding of the present study is that NE listeners were more accurate at identifying words spoken by NA talkers than those produced by NE talkers. This finding is in line with previous research conducted by Hayes-Harb et al. (2008), for instance, which found that native Mandarin listeners were more accurate at identifying English words produced by NE talkers than by native Mandarin talkers, this finding was not supportive of the interlanguage speech intelligibility benefit for talkers (the ISIB-T). The analysis of the data collected from both native and non-native Arabic speakers in the present study revealed that NE listeners were significantly better at detecting words produced by NA talkers than by NE talkers. Similarly, some previous studies have not found support for the ISIB-T. Imai et al. (2005), for example, found that high-proficiency native Spanish listeners were better at identifying words produced by native talkers than those produced by native Spanish-accented speech. Furthermore, Munro et al. (2006) found that Cantonese listeners did not find speech produced by Cantonese talkers to be more intelligible than that of any of the other talker groups, but they found the Japanese talkers to be more intelligible than other talkers, which provided evidence counter to the ISIB-T. Likewise, Stibbard and Lee (2006) found no strong evidence for the ISIB-T for high-proficiency non-native speakers. However, the results provided some support for the matched ISIB, with Korean listeners gave the Saudi high-proficiency talker a lower rating than NE, Korean high-proficiency, and Korean low-proficiency talkers. These findings are intriguing and highlight the complexity of non-native speech intelligibility. It is worth noting that the lack of support for the ISIB-T in this study and other previous studies does not necessarily imply that non-native speakers cannot benefit from their non-native language background. It may be that the benefit is less consistent or less pronounced than the ISIB-L. Another possibility is that the listeners may have been more familiar with the accent of the NA talkers than that of the NE talkers. It could also be the case that the NA talkers may have adapted their speech to the expectations of the NE listeners, making their speech more intelligible.
On the other hand, a lack of evidence for the ISIB-T is inconsistent with the findings of other previous studies, such as Bent and Bradlow (2003), van Wijngaarden (2001) and van Wijngaarden et al. (2002). One possible explanation for this difference may be attributed to the different tasks used in these studies. While the present study used a forced-choice word identification task in which participants listened to isolated words, Bent and Bradlow (2003), for instance, used a listening task where the target words were presented in sentences that may have provided additional information, i.e. morphological, semantic, syntactic and prosodic information, that helped participants better identify the target words. On the contrary, presenting words in isolation in the present study did not require lexical access that may have contributed to participants’ perception of accented speech.
These findings suggest that the relationship between the native language of the listener and the accent of the speaker can influence speech intelligibility. Specifically, non-native listeners may have an advantage in understanding non-native speech because they may be more familiar with the phonetic and prosodic patterns of the speaker’s native language, which can help them compensate for the differences between the speaker’s accent and the target language. However, this advantage may not extend to the speech of native speakers, possibly because native speakers are more variable in their pronunciation and less predictable in their use of prosodic patterns. It is very likely that studying the ISIB provides a better understanding of a situation in some L2 classrooms where the students easily communicate using a shared foreign language that may not be easily understood by the teacher.
The current study is limited in that it only included a small number of participants with mixed proficiency levels and one specific type of accent (i.e. English-accented Arabic). Thus, caution should be taken when generalizing the results to other contexts or accents. Moreover, the use of non-words may not fully capture the complexities of natural language production; therefore, the results may not be generalized to other types of speech, such as connected speech or spontaneous speech. The fleeting phenomena of the ISIB-L and the ISIB-T, moreover, require further investigation before we can draw any conclusions about them and to fully understand the factors that contribute to them, and how these factors can be used to improve language learning and teaching. Future research could explore the ISIB for listeners in more complex linguistic contexts using different perception and production tasks. Another possible direction for future research is to explore the effects of task type and listeners’ and talkers’ phonological proficiency on the listening accuracy of native and non-native speakers of Arabic at different proficiency levels using a recognized language proficiency test such as ACTFL Oral Proficiency Interview (OPI). By doing so, we further understand the development of interlanguage phonetic systems that allows us to get further insight into how interlanguage systems, particularly those featuring novel phonemic contrasts develop and explore the role of listener and talker proficiency that were described as ‘critical variables in understanding this shared language effect’ (Fishero et al., 2023, p. 15). Future research could also investigate the ISIB among learners of Arabic from different L1 backgrounds.
VIII Conclusions
This study provides valuable insights into the ISIB and presents important issues related to the intelligibility of non-native speech. First, NE listeners participated in this study showed an advantage over NA listeners at identifying words in English-accented Arabic speech. This finding may be attributed to the shared phonetic and phonological knowledge between non-native talkers and non-native listeners. In addition, NE listeners did not find the speech produced by other NE speakers more intelligible than the speech of NA talkers. In fact, NA speech was proved to be more intelligible than non-native speech to all native and non-native listeners. Overall, a better understanding of the ISIB-L and the ISIB-T can help to improve language learning as well as teaching strategies and facilitate communication and understanding between speakers of different languages.
