Abstract
Mastering lexical stress is a persistent challenge for second-language (L2) learners, particularly when first-language (L1) prosodic systems differ markedly from the target language. This study investigates how Arabic phonological patterns influence English stress assignment and evaluates the effectiveness of two explicit instructional approaches, ”contrastive phonological analysis” (CPA) and “auditory discrimination training” (ADT), for Arab English as a foreign language (EFL) learners. A total of 180 Jordanian 11th-grade learners completed perception and production tasks involving 50 disyllabic and polysyllabic English words representing five stress types. Acoustic features (pitch, duration, and intensity) were measured using Praat and judged for accuracy by native speakers. Results showed persistent L1 transfer, particularly default penultimate stress and reduced accuracy in morphologically complex forms. Both interventions led to significant improvement in perception and production, with CPA yielding greater gains on stress-ambiguous forms. A strong perception–production correlation (r = .72, p < .001) confirmed a transfer effect and delayed post-tests showed moderate retention. Findings highlight the role of prosodic transfer and support contrastive, perception-based instruction to improve stress competence among Arab learners.
Keywords
I Introduction
English lexical stress presents persistent challenges for Arab English as a foreign language (EFL) learners because of fundamental differences between English and Arabic prosodic systems. English stress assignment relies on complex interactions between syllable structure, morphology, and lexical category (Roach, 2009; Yavaş, 2020). In contrast, Jordanian Arabic employs a predictable, weight-sensitive system that favors stress on the rightmost heavy syllable within a three-syllable window, typically yielding penultimate or final stress (Al-Ani, 1970; Watson, 2011). These typological contrasts frequently result in first-language (L1) transfer, leading to systematic misassignment in English word stress (Archibald, 2023).
While the influence of Arabic segmental features on second-language (L2) English pronunciation has been well documented (Al-Khresheh, 2024), fewer studies have investigated the acquisition of suprasegmental features such as stress. Descriptive accounts have noted stress placement errors (Alzi’abi, 2022; Maghrabi, 2021), but few have empirically evaluated instructional interventions tailored to prosodic transfer in Arab learners.
The study addresses this gap by assessing two explicit instructional approaches: contrastive phonological analysis (CPA) and auditory discrimination training (ADT). CPA targets learners’ metalinguistic awareness of L1–L2 differences, whereas ADT aims to enhance perception of stress cues such as pitch, duration, and intensity. Using a mixed-methods design, including pre-, post-, and delayed post-tests, the study assesses learners’ perception and production of five English stress patterns, complemented by acoustic analysis and native-speaker judgments.
By examining both instructional effect and the relationship between perception and production, the study contributes to interlanguage phonology and pedagogy. It also informs pronunciation instruction by clarifying how explicit, contrastive training can mitigate L1-induced stress errors in English.
II Literature review
The influence of L1 stress patterns on L2 stress perception and production has been widely discussed in phonological transfer research (Archibald, 2023; Alzi’abi, 2023; Fabra, 2016). English stress assignment is shaped by syllable weight, morphology, and lexical category (Cutler, 2005; Roach, 2009; Yavaş, 2020), whereas Arabic stress systems, particularly Jordanian Arabic, are more predictable and quantity-sensitive (Shaw et al., 2011; Watson, 2011), often favoring penultimate or final stress placement. These typological contrasts can lead to persistent transfer errors among Arab EFL learners.
1 Theoretical framework: L1 transfer in the acquisition of English lexical stress
L1 transfer remains central to understanding L2 phonological development (Odlin, 1989, 2003). Flege’s (1995) speech learning model posits that learners assimilate L2 phonetic categories into existing L1 structures or struggle to form new ones. For stress assignment, this means Arab learners may default to Arabic prosodic patterns, especially syllable weight sensitivity, when learning English stress. Archibald (1998) highlights this issue, noting how prosodic parameters such as metrical structure and stress position are frequently transferred.
The prosodic transfer hypothesis (de Garavito, 2019; Özçelik, 2019) further supports the view that L1 prosodic frameworks persist in L2 acquisition. Empirical studies support these models: for example, Altmann (2006) showed Finnish learners struggled with English stress because of fixed L1 prosody, whereas Kijak (2009) observed that Arabic speakers imposed their L1 syllable weight rules on English stress placement. These examples illustrate how deeply entrenched L1 prosodic habits can interfere with L2 stress learning.
Moreover, segmental correlates, such as vowel reduction, absent in Arabic, pose additional challenges. The perceptual assimilation model (Best, 1995) explains how learners filter L2 contrasts through L1 categories, reducing sensitivity to non-native cues. This makes perceptual training crucial for learners with prosodic systems divergent from English.
2 Arabic stress assignment versus English stress assignment
Jordanian Arabic stress assignment is quantity-sensitive, typically placing stress on the rightmost heavy syllable within the final three syllables (Al-Ani, 1970; Al-Shalabi, 2021). This leads to frequent stress on penultimate or final syllables. English stress involves more variable rules, influenced by syllable structure, affixation, and lexical category (Cutler, 2005; Roach, 2009; Yavaş, 2020).
These differences result in recurrent transfer errors. Arab learners often misplace stress on the final syllable (Almbark et al., 2014; Zuraiq & Sereno, 2007), e.g., photograph /ˈfəʊtəɡrɑːf/ or avoid antepenultimate stress, which is rare in Arabic. This study focuses on five stress types, including derived stress shift (DSS; e.g., áccident → accidénta), to capture these divergences and address RQ1 explicitly.
3 L1 influence on perception and production
Studies consistently show that Arabic phonology shapes both perception and production of English stress. Almbark et al. (2014) and Al-Tamimi and Khattab (2015) found that Arab learners frequently misused pitch, duration, and intensity because of limited L1 exposure to these cues. Zuraiq and Sereno (2007) reported low perceptual sensitivity to stress contrasts, including reduced vowels.
These challenges are not unique to Arab learners. Trofimovich and Baker (2006) and Tremblay (2008) found that L2 learners often show perception–production mismatches, suggesting that perceptual deficits may underlie production errors. Contrastive studies (Derwing & Munro, 2005; Field, 2005) show that segmental instruction alone is insufficient for stress acquisition.
Despite some research on Arab learners’ acoustic patterns (e.g., Almbark et al., 2014), few studies test targeted interventions or measure retention. This gap underscores the need for pedagogical studies evaluating long-term gains.
4 Instructional interventions for L2 stress
Two primary instructional approaches have emerged: contrastive analysis and auditory discrimination.
a Contrastive analysis
Contrastive instruction draws learners’ attention to L1–L2 differences (Yavaş, 2020). Studies (Guion et al., 2003; Sadat-Tehrani, 2017) show that explicit feedback and rule-based instruction can reduce prosodic errors. Meta-analyses (Islam, 2024; Lyster & Saito, 2010; Ren & Wang, 2023) support its use in L2 pronunciation, though few studies isolate its effect on stress among Arab learners.
b Auditory discrimination
Auditory discrimination enhances bottom-up perception of stress cues such as pitch and duration (Birdsong, 1987; Grabowski, 2024; Lewis & Deterding, 2018). It can be supported by visual tools (e.g., Praat), which improve learners’ acoustic awareness (Almbark et al., 2014). Yet, its effect on transfer-induced errors in Arabic speakers, particularly in contrast to rule-based methods, remains underexplored.
5 Research gap and rationale
While L1 transfer in suprasegmentals is well documented, few studies compare instructional strategies for Arab learners, particularly with attention to word-level complexity and retention. The need for contrastive–acoustic integration has been noted (Schwab & Dellwo, 2022). This study addresses that gap by comparing CPA and ADT, assessing perceptual and productive accuracy and retention and providing empirical grounding for future suprasegmental pedagogy.
Although many studies document these issues (Islam, 2024; Ren & Wang, 2023; Sa’di et al., 2022), few have evaluated structured interventions. Most focus on segmental phonology (Al-Khresheh, 2024), or describe errors without targeting prosodic instruction (Maghrabi, 2021). In addition, cross-linguistic studies (e.g., Tremblay, 2008; Trofimovich & Baker, 2006) rarely include Arabic-speaking learners, leaving a gap in understanding prosodic acquisition in this group.
III The study
This quasi-experimental study employed a convergent mixed-methods, pretest–post-test–delayed post-test design to examine English stress perception and production among intermediate Arab EFL learners. It analyzed error patterns, assessed the effectiveness of CPA and ADT interventions and explored the perception–production link. Quantitative data were triangulated with qualitative interview insights across 10 weeks, including a control group.
1 Research questions
This study is guided by four research questions.
RQ1. How do different stress pattern types (initial, penultimate, antepenultimate, DSS, and final) affect Arab EFL learners’ accuracy in perceiving and producing English stress?
RQ2. What is the frequency and distribution of stress placement errors made by Arab learners in the perception and production of English words across different stress pattern types?
RQ3. How effective are CPA and ADT in improving Arab EFL learners’ accuracy and retention in perceiving and producing English lexical stress across different stress pattern types?
RQ4. Does instructional improvement in stress perception lead to gains in production accuracy among Arab EFL learners?
2 Participants
A total of 180 grade-11 students, comprising 98 females and 82 males, aged 16–17 years, participated in this study. These were drawn from Jordanian secondary schools. Gender differences were not analyzed in the main study, as previous research on lexical stress processing has not identified consistent sex-related effects or significant advantages for any gender in L2 phonological acquisition (e.g., Schmidt-Kassow et al., 2024; Rød & Calafato, 2023). Gender was recorded for transparency and future studies may explore potential gender-related trends, particularly within this specific L1 background.
All participants were intermediate-level EFL learners, identified by their placement in the Ministry of Education’s National Action Pack 11 curriculum (typically aligning with CEFR A2-B1 for Grade 11 students). Their end-of-year English exam scores (70–85%) further corroborated this solid intermediate proficiency. While a standardized international test (e.g., TOEFL or IELTS) was not administered because of logistical constraints, these combined measures provided a robust indicator of their proficiency for this study’s scope.
All participants were native speakers of Jordanian Arabic, a variety whose prosodic system, marked by syllable-timed rhythm and regular stress placement on heavy syllables, was hypothesized to influence English stress perception and production.
These 180 participants constituted the sole cohort for the entire study, contributing data to the diagnostic analysis of stress errors (RQ1 and RQ2) and the subsequent intervention (RQ3). To ensure equivalent starting points in the intervention phase, participants were stratified by their baseline English stress perception scores before random assignment to one of three groups (n = 60 each): (1) CPA intervention, (2) ADT intervention, and (3) control. Perception scores were prioritized for stratification because of their central role in the study design and the theoretical premise that perceptual accuracy precedes and facilitates production in L2 phonological acquisition.
A priori power analysis was conducted to determine the required sample size for detecting a medium effect size (f = .25) in the intervention study. With 80% power and an alpha level of .05 in a repeated-measures analysis of variance (ANOVA; three groups across three measurement points), the analysis recommended a minimum of 159 participants. The final sample size of 180 ensured adequate statistical power to detect the expected treatment effects.
3 Materials
This section describes the instruments and materials used in the study, including the assessment tools and the instructional resources designed for the intervention.
a Instruments
Two researcher-designed instruments, a stress perception test and a stress production task, were developed to assess participants’ proficiency in English lexical stress. Both instruments employed the same meticulously selected set of 50 English target words.
These 50 words were balanced across five distinct lexical stress categories, with 10 items per category. These categories were specifically chosen to highlight stress patterns that typically pose challenges for Arab EFL learners because of L1 transfer, as well as those that follow more predictable rules. The five stress categories and illustrative examples are as follows:
initial stress (underived disyllabic words), e.g., table /ˈteɪbl/, doctor /ˈdɒktə/;
penultimate stress (common in polysyllabic words), e.g., banana /bəˈnænə/, computer /kəmˈpjuːtə/;
antepenultimate stress, e.g., cinema /ˈsɪnəmə/, family /ˈfæmɪli/;
DSS (derivational morphology), e.g., photograph /ˈfəʊtəɡrɑːf/ → photography /fəˈtɒɡrəfi/;
final syllable stress (suffix-induced), e.g., employee /ɪm.plɔɪˈiː/, engineer /ˌen.dʒɪˈnɪə/.
Items ranged from disyllabic to quadrisyllabic and included both underived and morphologically complex forms, reflecting common stress assignment rules and morphophonological processes. All words were drawn from Nation’s (2004) academic word list and Coxhead’s (2000) academic vocabulary list and cross-referenced with the Action Pack 11 textbook to ensure relevance and familiarity. The final list of 50 items was selected from an initial pool of 75 following pilot testing, a procedure that ensured optimal clarity, appropriate difficulty, and balanced representation across the five stress categories.
In the perception test, phonologically similar distractors were excluded to ensure that responses reflected stress perception rather than segmental recognition or lexical guessing.
(i) Stress perception and stress production tests
The stress perception and stress production tests each comprised 50 lexical items, carefully selected from an initial pool of 75 based on pilot testing. This selection process ensured optimal clarity, appropriate difficulty and a balanced distribution across key English stress categories.
For the perception test, all target words were recorded by a single female native speaker of English with an Received Pronunciation accent, chosen based on availability. Recordings were made in a sound-attenuated booth to ensure high prosodic fidelity. The stimuli were articulated clearly and at a moderate pace, with consistent pronunciation across items to facilitate accurate perception. Participants identified the stressed syllable by selecting from visually presented options that displayed each word with standard syllable divisions, following conventions from the Longman Dictionary of Contemporary English. The test showed high internal reliability, as evidenced by a Cronbach’s alpha coefficient of .84.
For the production test, participants were required to read aloud the same 50 target words presented in isolation. These materials were designed to elicit the spontaneous production of word-level stress without influence from surrounding sentence prosody, supporting subsequent acoustic analysis of stress realization.
b Instructional materials
The instructional design was informed by two theoretical models: the speech learning model for the CPA group and the perceptual assimilation model for the ADT group. Materials were systematically developed to target both stress perception and production.
(i) Contrastive analysis material
Two theoretical models: the speech learning model for the CPA group and the perceptual assimilation model for the ADT group, informed the instructional design. The materials comprised the following components.
Explicit instruction on syllable structure and stress rules in both English and Arabic, with particular emphasis on syllable weight distinctions (e.g., light versus heavy syllables). Arabic stress tendencies, such as assigning stress to the ultimate or penultimate heavy syllable, were contrasted with English stress assignment patterns.
Illustrative activities targeting English stress rules, especially those governed by suffixation and derivational morphology (e.g., –ic, –ity, –ion, –ee, –eer, –ese), which often trigger stress shifts.
Analytical tasks requiring learners to examine English words for morphological complexity, apply stress placement rules and compare these with Arabic stress patterns (e.g., photograph versus photography; academy versus academic), fostering contrastive awareness.
Applied exercises involving syllable segmentation, stress annotation, and categorization of the 50 target words into predefined stress pattern categories, aimed at fostering learners’ metalinguistic awareness of prosodic structure.
Established research informed the development of these materials in contrastive phonology and L2 pronunciation pedagogy.
(ii) Auditory discrimination materials
The ADT group received auditory training materials designed to improve sensitivity to suprasegmental stress cues, pitch, duration and intensity, based on the perceptual assimilation model framework. The materials included:
listening exercises featuring multiple native English accents (British and American) to expose learners to cross-dialectal stress variability and promote perceptual generalization;
minimal pair discrimination tasks that targeted stress shifts (e.g., import [ˈɪmpɔːt] versus import [ɪmˈpɔːt]) to help learners identify stress as a contrastive feature;
stressed syllable identification tasks at both word and sentence levels, with immediate feedback to reinforce the auditory salience of stressed units.
In contrast to the single-accent format used in the perception test, these training materials incorporated accent variability to foster more robust and transferable perceptual categories, consistent with the goals of high-variability phonetic training (HVPT).
c Postintervention interview
In addition to quantitative measures, semi-structured interviews were conducted with a purposive subsample of 18 participants (nine per research group), selected through stratified sampling based on performance and gender to ensure balanced representation. The interview guide included open-ended questions on participants’ perceptions of the instructional interventions, awareness of English stress patterns, perceived improvements, instructional clarity and challenges. The protocol was piloted with five students to refine question clarity and thematic relevance. Each interview lasted 20–30 minutes, was conducted bilingually (in English and Arabic), audio-recorded and subsequently transcribed.
4 Procedure
a Data collection
Preintervention tests RQ1 and RQ2
To address RQ1 and RQ2, stress perception and production data were collected from 180 Arab EFL learners. All participants completed the perception and production test under controlled conditions. The perception test assessed learners’ ability to identify stressed syllables in spoken English words, whereas the production test evaluated their capacity to produce stress accurately in oral reading. The test scores also served as the baseline (Time 1) for the intervention study (RQ3) for all three groups (CPA, ADT, and control).
(i) Stress perception
Participants completed a paper-based perception test of 50 items representing five distinct stress pattern types, drawn from monomorphemic and polymorphemic words. Stimuli were played twice over loudspeakers in quiet classrooms. This provided a more accurate measure of a learner’s actual perceptual ability rather than just their initial processing speed or attention. Response sheets displayed the test words with syllable divisions (e.g., pho-to-graph and pho-tog-ra-phy) and participants were instructed to circle the syllable they perceived as stressed. Response sheets were manually scored and the test showed high internal consistency (Cronbach’s α = .84).
(ii) Stress production
Immediately afterward, participants read aloud the same stimuli in randomized order from a printed list. Recordings were conducted individually in quiet rooms using high-fidelity audio equipment and external condenser microphones to ensure acoustic fidelity for detailed phonetic analysis. All procedures were standardized to ensure methodological consistency across participants.
Before testing, all tasks were explained in participants’ L1 using sample items to ensure clarity and minimize procedural error.
(iii) Intervention and data collection for RQ3
Data collection for RQ3 involved three distinct time points: pre-test, immediate post-test and delayed post-test. The same set of 50 English target words was used across all perception and production tests. Participants took part in a ten-week intervention study and were stratified into three matched groups (n = 60 each) based on their baseline perception scores: (1) CPA, (2) ADT and (3) Control. All participants completed three rounds of testing, pre-test, immediate post-test and delayed post-test, on both perception and production measures using the same 50-word test set.
Repeated use of test items: Rationale
To minimize memory or practice effects, no feedback was provided during testing, and the word order was randomized at each phase. An 8-week interval separated the post- and delayed post-tests. In addition, the inclusion of morphologically complex and low-frequency lexical items reduced the likelihood of rote recall. This consistent-word approach enhanced internal validity by facilitating within-subject comparability and eliminating lexical variability as a confounding factor. It also heightened the sensitivity of the instruments to instructional effects in both perception and production without introducing item-specific learning bias.
The CPA group received explicit instruction using contrastive phonological materials focused on prosody, syllable weight, and affixation. For example, participants analyzed lexical pairs such as photograph /ˈfəʊtəɡrɑːf/ and photography /fəˈtɒɡrəfi/, segmenting each into syllables, identifying stress locations, and explaining shifts in stress resulting from derivational morphology. These derivationally induced changes exemplified DSS, which was addressed explicitly to raise learners’ metalinguistic awareness of DSS. Learners then compared these English stress patterns with Arabic stress rules to foster cross-linguistic phonological awareness. Each CPA session began with a 10-minute minilecture on contrastive stress rules, followed by pair-based analysis worksheets and guided group correction of practice items. Instructional tasks were designed to engage learners in identifying and interpreting DSS patterns, promoting deeper understanding and reflective analysis of stress pattern alternations.
The ADT group completed computer-assisted ADT using high-variability input informed by the perceptual assimilation model. The training involved discriminating stress-based minimal pairs such as import [ˈɪmpɔːt] (noun) versus import [ɪmˈpɔːt] (verb), with immediate feedback incorporating visual aids such as pitch contour graphs and waveform overlays to reinforce stress perception (e.g., pitch rise and vowel duration). This targeted both segmental and suprasegmental discrimination in real time.
The control group (n = 60) followed their regular Ministry of Education Action Pack 11 English curriculum for twelve 50-minute sessions. This national curriculum is designed for general English proficiency and includes vocabulary building, grammar drills, reading comprehension, and segmental phoneme awareness, alongside basic sentence intonation and general listening tasks. However, it does not explicitly target English lexical stress or provide focused instruction in suprasegmental features.
The researcher delivered instruction for both intervention groups (CPA and ADT) in English via Zoom. Each group received twelve 50-minute sessions over the intervention period, totaling 24 instructional sessions. While the researcher was aware of the study hypotheses and group assignments, rigorous measures were taken to ensure pedagogical consistency and to minimize bias. These included strict adherence to structured, manualized curricula with predefined lesson plans and teaching scripts and regular review of session recordings for instructional fidelity. Crucially, external raters and primary coders for all outcome measures were blinded to participant group assignments, ensuring objective assessment.
During analytical tasks, participants received immediate, structured feedback to enhance their awareness and skills. The CPA group received oral feedback on stress patterns and phonological generalizations. For the ADT group, feedback involved corrective prompts supported by audio and visual aids for acoustic cues, facilitated by Praat. This specialized software allowed for precise audio control, repeated listening, and the integration of visual feedback (e.g., F0 contours, duration, and intensity overlays), reinforcing accurate perception and production of stress patterns.
Testing timeline
Participants were randomly assigned to one of three groups: two intervention groups (CPA and ADT) and a control group. Pretest scores from RQ1 were used to ensure comparability across groups. From weeks 1 to 10, the CPA and ADT groups received targeted, stress-focused instruction, while the control group continued with regular instruction. In week 10, all participants completed the same perception and production tasks as immediate post-tests. To assess retention, the same instruments were administered again in week 18 as delayed post-tests, following 8 weeks without formal instruction.
To minimize confounding exposure, all participants followed the same academic program and curriculum, with no phonological instruction beyond the assigned interventions. While students were advised to avoid external stress-related materials, incidental exposure to English (e.g., through media or informal input) could not be fully controlled.
In addition, semi-structured interviews were conducted postintervention with a stratified subset of 18 participants (nine per intervention group) to explore learner experiences and perceptions. To capture a representative range of instructional responses, each group included three high-, three mid-, and three low-performing participants, selected based on their post-test performance.
b Data analysis
A mixed-methods design enabled comprehensive triangulation by combining quantitative performance measures with qualitative insights into learner attitudes and instructional effects. Quantitative analyses of perception and production accuracy were conducted using SPSS Version 27 with a significance level of p < .05. Assumptions for parametric tests were verified and corrections (e.g., Greenhouse–Geisser) were applied where necessary.
(i) Quantitative analysis
Participants and testing phases
Quantitative data were collected from 180 participants who completed baseline perception and production tests addressing RQ1 and RQ2. Subsequently, participants were randomly assigned to three groups (CPA, ADT, or control; n = 60 each) to investigate instructional effects for RQ3. All groups completed identical perception and production tests at three time points: pretest (week 1), immediate post-test (week 10). and delayed post-test (Week 18).
RQ1: Accuracy across stress patterns
Perception and production tasks comprised 50 words equally distributed across five stress patterns: initial, penultimate, antepenultimate, DSS, and final. Each participant’s responses were scored dichotomously (1 = correct, 0 = incorrect), yielding 9000 data points per task. While lexical stress is inherently gradient, dichotomous scoring was adopted to ensure consistent coding, maximize inter-rater reliability and allow clear statistical comparison across groups and time points. Future research may explore gradient or weighted scoring to better capture prosodic nuances.
Perception accuracy: Participants identified the stressed syllable in auditory stimuli. Accuracy was determined by comparison to native-speaker norms.
Production accuracy: Learners’ oral responses were acoustically analyzed using Praat, based on pitch (F0 in hertz), vowel duration (ms), and intensity (dB) on the target syllable.
Mean accuracy scores were computed per participant and stress type. Repeated-measures ANOVAs (time × pattern type) assessed changes in accuracy over time, with Bonferroni-adjusted post hoc comparisons and partial eta squared (η²ₚ) reported as effect sizes.
RQ2: Error frequency and distribution
Errors were defined as incorrect stress placement in perception or production.
Production errors: Three native English-speaking raters independently judged stress placement based on auditory perception and Praat spectrogram inspection. Inter-rater agreement was high (Cohen’s κ = .96), with disagreements resolved by consensus.
Perception errors: Errors were recorded when participants misidentified stressed syllables. Frequency counts by stress type were descriptively analyzed to identify patterns.
ANOVAs examined whether error rates differed significantly across stress types and between perception and production modalities.
RQ3: Instructional effects and retention
A 3 (group: CPA, ADT, and control) × 3 (time: pretest, post-test, and delayed post-test) repeated-measures ANOVA was conducted on perception and production accuracy scores. In total, 54,000 responses were coded dichotomously.
To validate production data, 20% of tokens (all 50 words from 20% of participants) underwent detailed acoustic and perceptual analysis by expert raters, serving as benchmarks for automated batch processing of remaining data via a custom Praat script developed for this study.
Acoustic and perceptual validation
Trained phoneticians annotated and segmented responses using Praat and TextGrid. Acoustic parameters (F0, vowel duration, and intensity) were manually extracted from stressed syllable nuclei. Approximately 2.05% of responses were excluded because of ambiguous cues or poor audio quality, evenly distributed across groups and time points. Intraclass correlation coefficients (ICCs) for acoustic measures exceeded .90.
Two native English-speaking linguists independently rated stress placement, blinded to group and time, achieving excellent inter-rater agreement (Cohen’s κ = .97). Discrepancies were resolved via joint spectrogram review.
Group equivalence at pretest
A one-way ANOVA on pretest perception and production scores confirmed no statistically significant differences between groups (p > .05), supporting the assumption of baseline equivalence. This ensured that post-test differences could be attributed to instructional effects rather than pre-existing group disparities.
Supplementary analysis
Pearson correlation examined the relationship between post-test perception and production scores.
(ii) Qualitative analysis
Semi-structured interviews (n = 18) were thematically analyzed using Braun and Clarke’s (2006) six-phase model with a hybrid inductive-deductive approach aligned to research questions. Thematic saturation was reached by interview 14. An independent rater coded 75% of transcripts, yielding strong inter-rater reliability (Cohen’s κ = .87). Coding discrepancies were resolved collaboratively. Participant validation via anonymized transcript review enhanced credibility.
Findings were integrated through a convergence approach, comparing thematic patterns with quantitative results to identify convergences and divergences.
5 Validity and reliability of instruments
All instruments and instructional materials were evaluated for validity and reliability.
Stress perception test: high internal consistency (Cronbach’s α = .84). Three applied linguists established content validity via expert review.
Stress production task: high inter-rater reliability for acoustic measurements (ICC > .90) and perceptual scoring (Cohen’s κ = .96). Content validity supported by expert evaluation of the 50-word list.
Instructional materials: Developed based on established theoretical models (speech learning model and perceptual assimilation model). Pilot-tested with 15 students to ensure clarity, appropriateness and instructional efficacy. Feedback informed subsequent revisions.
6 Ethical considerations
All procedures adhered to internationally recognized ethical standards for human subjects research. Written informed consent was obtained from all participants, outlining the study’s aims, procedures, voluntary nature, withdrawal rights, and confidentiality protections. For participants under the age of 18, parental or guardian consent was secured in accordance with national regulations governing research with minors. Both the Ministry of Education in Jordan and the Institutional Review Board of Isra University granted ethical approval. All data, including recordings and test scores, were securely stored and accessed only by the research team under strict data protection protocols. Participants were fully debriefed upon completion of data collection.
7 Results
To address the three research questions, repeated-measures ANOVA was conducted because of the within-subjects design with repeated measures across stress patterns, tasks, and time points. Mauchly’s test assessed sphericity, with Greenhouse–Geisser corrections applied as needed. Effect sizes are reported as partial eta squared (η²ₚ).
RQ1: Accuracy of stress perception and production across stress pattern types
The selected stress patterns differ in their congruence with Arabic prosodic norms. Initial, penultimate and final patterns partially align with Arabic stress rules, whereas antepenultimate and DSS patterns diverge morphophonologically and lack direct L1 equivalents. It was therefore hypothesized that learners would show higher accuracy on L1-congruent patterns and lower accuracy on DSS and antepenultimate forms because of their structural unfamiliarity.
The five selected stress patterns vary in their congruence with Arabic prosodic norms. Initial, penultimate, and final stress patterns partially align with Arabic stress rules, whereas antepenultimate and DSS patterns diverge more markedly in their morpho-phonological structure. It was hypothesized that learners would show higher accuracy on stress types more congruent with their L1 and lower accuracy on structurally divergent patterns.
Table 1 presents the descriptive results. In perception, accuracy was highest for initial-stress words (52.3%), followed by penultimate (39.4%), final (34.2%), antepenultimate (38.7%), and DSS items (31.2%). Production accuracy was lower overall but showed a similar trend: initial-stress items were produced most accurately (40.2%), whereas DSS patterns had the lowest scores (22.2%).
Accuracy scores (%) by stress pattern type.
A two-way repeated-measures ANOVA was conducted with task type (perception versus production) and stress pattern (five levels) as within-subjects factors. Mauchly’s test indicated a violation of sphericity for stress pattern, χ²(9) = 26.89, p < .001; therefore, Greenhouse–Geisser correction was applied. A significant main effect of stress pattern emerged, F(2.71, 159.29) = 56.43, p < .001, η²ₚ = .49. Bonferroni-adjusted pairwise comparisons showed that initial-stress words were significantly more accurate than all other categories, including penultimate (p = .003, d = .51), final (p < .001, d = .66), antepenultimate (p < .001, d = .69), and DSS patterns (p < .001, d = .85). Penultimate-stress words were more accurate than DSS patterns (p = .012, d = .41), though the difference with antepenultimate items was not statistically significant (p = .112). Final-stress items were significantly more accurate than both antepenultimate (p = .045, d = .33) and DSS items (p = .018, d = .39). No significant difference was found between antepenultimate and DSS patterns (p = .289). These comparisons reveal a clear accuracy hierarchy: initial > penultimate/final > antepenultimate/DSS.
A significant main effect of task type also emerged, F(1, 59) = 78.65, p < .001, η²ₚ = .57, indicating that perception outperformed production overall. Moreover, there was a significant task × stress pattern interaction, F(3.18, 187.63) = 4.96, p = .003, η²ₚ = .08. The most substantial declines from perception to production occurred in DSS and antepenultimate patterns, each showing an approximate 9.2% drop (p < .01). Notably, while penultimate-stress items were better perceived than final-stress items, this trend reversed in production, where final-stress words were articulated more accurately than penultimate ones. This divergence suggests that perceptual awareness and articulatory control may be governed by distinct cognitive or phonological processes depending on the stress pattern involved.
RQ2: Frequency and distribution of stress placement errors
To address RQ2, error analysis was conducted across five stress pattern types. Errors were classified as either misplaced stress (incorrect syllable stressed) or nonidentification (failure to detect or produce stress).
Overall error frequencies
Each participant completed 50 items per task, yielding 18,000 responses (9000 per task). Table 2 presents the distribution of errors by stress pattern.
Frequency and percentage of stress errors by pattern type.
As shown, DSS and antepenultimate patterns consistently elicited the highest error rates in both perception and production. In contrast, initial-stress patterns produced the fewest errors. Across all categories, production tasks yielded more errors than perception, highlighting the greater articulatory and cognitive demands involved in stress realization.
Distribution by error type
Table 3 reports the frequency of each error type (misplaced versus nonidentification) across tasks and patterns.
Error types in perception and production by pattern type.
Misplaced stress was the dominant error type across all patterns, particularly in DSS and antepenultimate items—patterns which lack direct equivalents in Arabic. To assess whether the distribution of error types varied significantly by stress pattern or task, a chi-square test of independence was conducted: χ²(12) = 8.07, p = .780. The result was not significant, suggesting that although overall error frequency differed across patterns, the proportional split between error types (misplaced versus nonidentification) remained relatively stable. This uniformity may reflect a general learner tendency to attempt stress placement (even if incorrectly), rather than omitting stress altogether, regardless of the pattern’s predictability.
To statistically examine error frequency, a repeated-measures ANOVA was performed on error counts. Because raw counts were used, data were first converted into proportions per participant and condition to meet assumptions of normality and homogeneity of variance (cf. Larson-Hall & Herrington, 2010). The analysis revealed significant main effects for: stress pattern, F(2.94, 173.56) = 41.79, p < .001, η²ₚ = .42; task type, F(1, 59) = 53.12, p < .001, η²ₚ = .47; and their interaction, F(3.18, 187.73) = 3.67, p = .012, η²ₚ = .06. Post hoc Bonferroni-adjusted comparisons confirmed significantly higher error rates for DSS and lower error rates for initial-stress words (p < .001), whereas initial-stress words showed significantly lower error rates (p < .001). Errors were also significantly more frequent in production than in perception across all patterns (p < .01), with the discrepancy particularly marked for DSS and final-stress items.
In summary, stress errors were most common in unpredictable patterns (DSS, antepenultimate) and least frequent in predictable patterns (initial). The predominance of misplaced stress and the higher error rates in production suggest increased difficulty in articulatory encoding relative to perceptual processing.
RQ3: Effectiveness of contrastive and auditory interventions
The third research question examined the efficacy of CPA and ADT in improving Arab EFL learners’ perception and production of English stress. Separate repeated-measures ANOVAs were conducted for each task, with time (pretest, post-test, and delayed post-test), group (CPA, ADT, and control), and stress pattern (five types). Mean scores are presented in Table 4.
Mean of accuracy scores across groups and time points by stress pattern.
In the perception task, significant main effects of time, F(2, 234) = 150.76, p < .001, η²ₚ = .56 and group, F(2, 117) = 84.55, p < .001, η²ₚ = .59, were found, alongside a significant time × group interaction, F(4, 234) = 44.11, p < .001, η²ₚ = .43. These results indicate substantial perceptual improvement over time, with group differences in the magnitude of change. As indicated by Table 4, the CPA and ADT groups significantly improved from pre- to post-test (p < .001), with gains sustained at the delayed test, whereas the control group showed no significant progress (p > .05).
Similarly, in the production task, significant main effects were found for time, F(2, 234) = 158.34, p < .001, η²ₚ = .58, group, F(2, 117) = 62.79, p < .001, η²ₚ = .52, and stress pattern, F(4, 468) = 44.23, p < .001, η²ₚ = .28. The time × group interaction was also significant, F(4, 234) = 41.27, p < .001, η²ₚ = .41. Both intervention groups showed marked improvements across stress patterns, with CPA outperforming ADT on DSS and antepenultimate patterns at post-test (p = .04), indicating CPA’s superior efficacy in remediating complex L1-induced stress assignment errors. The control group’s performance remained largely unchanged.
To complement the descriptive statistics in Table 4, Figure 1 presents mean accuracy trends across stress pattern types and testing phases, allowing a visual comparison of intervention effects by group and task modality.

Group mean accuracy scores (%) on perception and production tasks across five English stress pattern types at pretest, post-test, and delayed post-test.
To further assess the magnitude of improvement resulting from each instructional approach, Cohen’s d effect sizes were calculated from pre- to post-test for each group and stress pattern. These effect sizes are summarized in Table 5.
Within-group pre- to post-test effect sizes (Cohen’s d) for stress perception accuracy by pattern.
The CPA group showed the largest effect sizes overall, especially on stress patterns most susceptible to L1 transfer, including DSS (d = 1.87) and antepenultimate (d = 1.73). The ADT group also showed notable gains, though smaller in magnitude, whereas the control group’s effect sizes were negligible. These results reinforce CPA’s effectiveness in remediating L1-induced stress misassignment.
Complementary evidence emerged from acoustic analyses of participants’ stress production, which assessed pitch (F0), duration and intensity across time points. Repeated-measures ANOVAs revealed significant time × group interactions for all three acoustic correlates: F0, F(4, 234) = 19.43, p < .001; duration, F(4, 234) = 22.01, p < .001; and intensity, F(4, 234) = 20.37, p < .001. Post hoc comparisons indicated the CPA group exhibited significantly larger increases in these prosodic cues compared to the ADT and control groups, particularly at post-test (p < .05). These findings provide instrumental confirmation of the observed improvements in stress perception and production, especially in complex stress contexts.
Acoustic correlates of stress production
To provide objective evidence of changes in stress production, acoustic analyses were conducted on the fundamental frequency (F0) peak, vowel duration, and syllable intensity for correctly stressed syllables. Table 6 summarizes mean values and standard deviations for these measures by group and time point.
Mean F0, duration and intensity for correctly stressed syllables by group and testing phase.
Repeated-measures ANOVAs on acoustic parameters revealed significant time effects and time × group interactions consistent with accuracy gains. For F0 peak, there was a main effect of time, F(2, 234) = 115.67, p < .001, η²ₚ = .50 and a significant time × group interaction, F(4, 234) = 38.92, p < .001, η²ₚ = .40. Both CPA and ADT groups showed significant increases in F0 from pre- to post-test, maintained at delayed post-test (p < .01), while the control group exhibited no significant changes (p > .05). The CPA group’s post-test F0 peaks were marginally higher than ADT’s (p = .04).
Similarly, vowel duration showed a significant main effect of time, F(2, 234) = 108.30, p < .001, η²ₚ = .48, and a time × group interaction, F(4, 234) = 35.15, p < .001, η²ₚ = .38. Post hoc comparisons confirmed that both intervention groups significantly elongated stressed vowels compared with baseline and control group (p < .01), with the largest increases in CPA.
Syllable intensity showed a significant time × group interaction, F(4, 234) = 28.76, p < .001, η²ₚ = .33, reflecting significantly higher intensity for stressed syllables postintervention in CPA and ADT groups (p < .01), with no significant changes in the control group.
The data robustly support the efficacy of both CPA and ADT in improving Arab EFL learners’ English stress perception and production. These improvements were consistent across multiple stress patterns and maintained at the delayed post-test. Acoustic analyses corroborate these gains, revealing enhanced F0 peaks, vowel duration and intensity for stressed syllables after the intervention, particularly for CPA participants.
RQ4 Perception–Production Correlation
To examine whether improvement in stress perception was associated with gains in production accuracy, Pearson correlations were conducted between post-test perception and production scores. A strong, statistically significant positive correlation was found across the full sample (r = .72, p < .001), suggesting that learners with higher perceptual accuracy also produced stress more accurately.
Group-level analyses revealed similar effects: the CPA group showed a strong correlation (r = .75, p < .001), as did the ADT group (r = .72, p < .001). In contrast, the control group’s correlation was weak and not statistically significant (r = .16, p > .05), indicating no meaningful relationship between perception and production without targeted instruction.
8 Discussion
This study investigated how Arabic L1 phonological patterns would influence the perception and production of English lexical stress among Arab EFL learners. It specifically examined the effect of different stress pattern types, the role of L1 transfer in complex lexical items and the effectiveness of explicit instructional intervention. The findings provided empirical and qualitative insights into L2 suprasegmental acquisition, particularly regarding prosodic transfer and instructional remediation in English stress learning.
RQ1: Influence of stress pattern types
The findings confirm that stress pattern type significantly influences Arab learners’ accuracy in both perception and production. Performance was highest for initial stress, followed by penultimate and final patterns. These patterns partially align with the quantity-sensitive stress system of Jordanian Arabic, which emphasizes syllable weight and vowel quality. In contrast, accuracy dropped significantly for antepenultimate and DSS patterns, which lack clear counterparts in Arabic. These findings suggest that prosodic congruence with the L1 strongly facilitates correct stress assignment.
This pattern supports the prosodic transfer hypothesis (Archibald, 1998) and the perceptual assimilation model (Best & Tyler, 2007), both of which posit that learners perceive and process L2 input through L1 prosodic filters. The consistent use of syllable weight and vowel quality, highlighted both in task performance and interview data, demonstrates the strategic application of L1-based cues during stress identification. Similar trends were observed by Alzi’abi (2025), Baagbah and Jaganathan (2024), and Al Thalab (2021), who found that Arabic speakers performed better when English stress patterns aligned with Arabic norms.
The reduced accuracy for antepenultimate and DSS items suggests learners had difficulty assimilating these patterns into their L1 prosodic system. This is consistent with findings from Garcia and Guzzo (2022) and Sa’dí et al. (2022), who reported that typologically unfamiliar or morphologically complex stress patterns lead to greater cognitive load and error rates. Learners in this study frequently defaulted to L1-based strategies in such cases, consistent with transfer theory (Odlin, 1989).
Finally, the difficulty with DSS items may stem from the absence of clear acoustic or morphological cues, which increases processing demands. This aligns with Altmann’s (2006) and Pater’s (2007) argument that cue salience is crucial for successful phonological learning. When cues are subtle or unpredictable, learners rely on familiar but inapplicable L1 strategies. These findings reinforce Flege’s (1995) speech learning model, which states that L1 categories constrain the development of new L2 phonological representations, especially under conditions of low perceptual distinctiveness.
Together, these findings point to the importance of explicit instruction and practice targeting less-predictable stress patterns, a theme further examined in RQ3.
RQ2: Frequency and distribution of stress placement errors
The results of RQ2 shed light on the types and frequency of stress placement errors made by Arab learners across different word patterns. Consistent with the accuracy data, error rates were highest in DSS and antepenultimate items and lowest in words with initial stress. These results confirm that typological distance from L1 stress norms correlates with increased error frequency.
Across all patterns, misplaced stress was far more frequent than nonidentification. This suggests that learners generally attempted to apply stress, even when doing so incorrectly. The persistence of misplaced stress supports the idea that learners transfer familiar L1 prosodic strategies to L2 input, even when inappropriate. This tendency is consistent with Jarvis and Pavlenko (2008), who described cross-linguistic influence as the application of entrenched L1 schemas during L2 processing. Interview responses also reflected this tendency, with many learners reporting reliance on syllable counting or defaulting to the penultimate syllable, strategies rooted in Arabic.
The nonsignificant Chi-square result indicated that the distribution between error types (misplaced versus nonidentification) was stable across stress patterns. This consistency suggests a systematic learner preference to place stress, even inaccurately, rather than omitting the stress altogether. Such behavior echoes findings from Pater (2007); in the absence of salient cues, learners often applied default strategies rather than suppressing stress assignment.
Production errors significantly outnumbered perception errors across all stress types. This discrepancy reflects greater articulatory and cognitive load during production, particularly for DSS items. Learners struggled more when they had to realize unfamiliar stress patterns through speech, a challenge compounded by morphophonological complexity. This aligns with Flege’s (1995) speech learning model, which predicts that L1 phonological categories inhibit accurate L2 production when novel contrasts are difficult to perceive or produce.
Finally, these patterns align with Garcia and Guzzo (2022) and Sa’dí et al. (2022), who found that morphologically complex and prosodically unpredictable items led to higher error rates. While RQ1 focused on accuracy by pattern type, RQ2 clarifies how and where learners made errors, reinforcing the influence of L1-based strategies and perceptual salience on both perception and production performance.
RQ3: Effectiveness of instructional interventions
Building on the findings from RQ1 and RQ2, RQ3 investigated the effect of instructional interventions on Arab EFL learners’ perception and production of English stress. The results provide strong evidence that both CPA and ADT significantly improved learners’ performance across a range of stress patterns.
Consistent with prior studies on the effectiveness of phonological instruction in L2 prosody (Chen et al., 2025; Lacabex & Gallardo del Puerto, 2014; Pennington & Ellis, 2000), both intervention groups showed significant gains from pre- to post-test, sustained at delayed post-test. These improvements were supported by significant time × group interactions for perception (F(4, 234) = 44.11, p < .001, η²ₚ = .43) and production (F(4, 234) = 41.27, p < .001, η²ₚ = .41), indicating that the observed gains were directly linked to the instructional treatments.
CPA outperformed ADT across most stress types, particularly in complex patterns such as antepenultimate and DSS. Effect sizes (Table 5) highlight these differences: CPA showed larger gains (e.g., d = 1.87 for DSS) compared with ADT (d = 1.61), while the control group showed negligible improvement. Acoustic data further supported these findings, with significant increases in F0, vowel duration, and syllable intensity in both intervention groups, especially for CPA learners (see Table 6). These enhancements reflect more nativelike stress production and suggest successful prosodic restructuring.
Learner interviews complemented these findings, with many participants reporting increased awareness of English stress patterns and greater sensitivity to acoustic cues, echoing Amer and Amer (2011). Notably, stress patterns with partial overlap with Arabic, such as penultimate stress, showed the most improvement. This aligns with RQ1 and supports claims that instruction can leverage existing L1 prosodic templates (White & Genesee, 1996).
However, gains were more modest for morphologically complex patterns such as DSS. The lack of stress-shifting affixation in Arabic makes such structures less perceptually salient and more cognitively demanding. Despite CPA’s explicit instruction and ADT’s perceptual input, neither method sufficiently addressed the morphophonological complexity of DSS. This finding supports Pater’s (2007) argument that L2 morphophonological restructuring is particularly challenging.
CPA’s focus on contrastive rules may not have adequately emphasized morphological structure, whereas ADT lacked explicit parsing of derivational boundaries. These instructional gaps likely contributed to residual difficulties in DSS production. As a result, the findings highlight the need for integrated instruction that targets the interaction between morphology and stress assignment.
In summary, both CPA and ADT significantly improved English stress perception and production, with CPA yielding larger and more consistent gains. The results underscore the value of phonological and perceptual training while pointing to the need for morphology-sensitive approaches when addressing complex L2 stress phenomena.
RQ4 Relationship between Perception and Production Accuracy
These results suggest a strong relationship between improved stress perception and production accuracy, particularly in the two instructional groups. Learners who achieved greater perceptual gains through CPA or ADT also showed more accurate stress production at post-test. The absence of a significant correlation in the control group reinforces the interpretation that this perception–production link was driven by the interventions rather than general exposure.
While correlation does not establish causality, the consistent and strong relationships observed within both experimental groups provide compelling support for a perception-to-production transfer effect. This aligns with interactionist models of L2 phonological development (Flege et al., 1995; van Leussen & Escudero, 2015), which argue that accurate perception forms the foundation for phonological restructuring and accurate L2 production.
These findings further underscore the pedagogical value of perception-focused instruction in suprasegmental training and reinforce the idea that enhancing perceptual acuity can facilitate more targetlike prosodic output in L2 learners.
Limitations and future directions
This study has several limitations. First, its focus on a single Arabic dialect limits generalizability to other Arabic-speaking populations. Second, although the 8-week follow-up provides insight into initial retention, it does not assess long-term durability. Future research should include extended follow-up periods to evaluate sustained learning.
While learners were stratified by proficiency, other factors, such as motivation, working memory, or auditory processing ability, were not controlled and may have influenced outcomes. Future studies should account for these individual differences to better understand intervention efficacy.
In addition, although CPA and ADT were treated as distinct instructional approaches, both required sustained attention to stress cues. CPA’s relative advantage may reflect shared metalinguistic demands rather than uniquely contrastive benefits. A factorial design could help isolate the contributions of individual instructional components.
Finally, although acoustic analyses of F0, duration, and intensity were conducted, production accuracy was primarily assessed by trained raters. Future studies should rely more heavily on instrumental measures and assess how these phonetic changes affect perceived prosodic naturalness and intelligibility.
IV Conclusion
This study examined how Arabic phonological structures influence English stress assignment and evaluated two instructional interventions: CPA and ADT. Findings confirmed that L1-induced errors persist unless explicitly addressed. Learners performed better on stress patterns aligned with Arabic, confirming transfer effects. Both interventions significantly improved perception and production, with CPA slightly more effective in addressing complex, morphologically driven patterns.
The strong correlation between perceptual and productive gains reinforces the role of perceptual accuracy in phonological restructuring. These findings contribute to interlanguage phonology by showing that targeted instruction can overcome entrenched L1 influence and promote more nativelike prosody. The study offers a replicable instructional model that can inform suprasegmental teaching in multilingual contexts.
V Pedagogical implications
This study’s findings offer several pedagogical implications for teaching English word stress. These findings support a targeted, multimodal approach to teaching English word stress to Arabic-speaking learners.
First, instructors should explicitly contrast Arabic and English stress rules to raise metalinguistic awareness. Techniques such as contrastive stress mapping can help learners recognize and adjust L1-based strategies (Gussenhoven, 2004; Rasier & Hiligsmann, 2007).
Second, ADT should be central. This study confirmed a strong perception–production link (Flege, 1995). Techniques such as minimal pairs, controlled listening and feedback on pitch and duration are essential. Praat-based visualizations used in this study can be adapted for classroom use to enhance awareness of prosodic cues (Field, 2005).
Third, instruction should progress from predictable to less predictable stress types. Learners improved most on patterns aligned with Arabic (e.g., penultimate stress in heavy syllables). Scaffolding toward more complex forms supports gradual “proceduralization” (DeKeyser, 2007) and aligns with usage-based learning (Ellis, 2006).
Finally, stress instruction should be embedded in broader communicative tasks. Integrating stress into reading, speaking, and listening activities promotes automatization and transfer to spontaneous speech (Derwing & Munro, 2005; Lecumberri, 2008).
In sum, developing prosodic competence requires explicit, contrastive, and perceptually rich instruction tailored to learners’ L1 backgrounds.
