Abstract
This study examines (a) the prosodic cues to stress and focus in Ukrainian speakers in their first language (L1 Ukrainian) and in a second language (L2 German), as compared to German speakers and (b) the influence of F0 during online word recognition in L2 German. Analyses of productions of trisyllabic target words differing in stress (Experiment 1) show that stressed vowels are consistently longer and produced with slightly higher intensity than unstressed vowels, with little additional modulation by focus (focused, unfocused) and hardly any group effects (L1 Ukrainian, L2 German, L1 German). The groups differed strongly in F0 though. While L1 Ukrainian speakers produced the initial syllable with high-pitched and stressed syllables with a falling/low F0 contour (suggesting head-edge prominence marking), L1 German speakers produced focus with a rising pitch accent. There were two subgroups of learners: a small group that produced the conditions as in L1 Ukrainian (Type 1) and a larger group that was similar to L1 German (Type 2). In perception (Experiment 2), target words were fixated more when the first syllable was high-pitched (in line with L1 Ukrainian productions). Similar to Germans, Type 2 learners temporarily treated high-pitched syllables as stressed, leading to increased competitor fixations when the pitch peak preceded the stressed syllable (early-peak accent, H+L*, compared to medial-peak accents, L+H*). These findings demonstrate individual differences in the L2 production of stress and focus, which are closely linked to cue use during L2 word recognition.
1 Introduction
In stress-accent languages, such as English, German and Ukrainian, prominence can arise from both lexical stress – a property encoded in the word’s metrical representation – and post-lexical accentuation, mostly used for information-structural purposes (Ladd, 2008). Understanding how these two levels interact has been central to theories of prosody and speech processing in the context of both native and non-native speech.
Lexical stress is an abstract property and there is no single phonological or phonetic cue to stress. Phonological cues to stress include increased syllable complexity and the ability for accentuation: Only stressed syllables can bear pitch accents (Ladd, 2008). Regarding phonetics, the following cues to stress have been discussed in the literature: increased vowel duration, higher vowel intensity, higher fundamental frequency (F0), greater vocal effort, more peripheral vowel quality, and a stronger resistance to coarticulation (e.g. van Heuven, 2019). Of these, the role of F0 has been a matter of considerable debate (Fry, 1958; Gordon & Roettger, 2017; Zahner et al., 2019) because F0 is influenced by the pitch accent associated with the stressed syllable (e.g., in L+H* accents, which are often used to signal new information in German, the stressed syllable is high and the preceding one is low, while in H+L* accents, often used to signal given information, the stressed syllable is low and the preceding one is high; Baumann & Grice, 2006; Kohler, 1991; Pierrehumbert & Hirschberg, 1990).
A lot of phonetic research on stress and accent has been done on Germanic languages. This paper investigates the role of duration, intensity, and F0 in the production of stressed and unstressed syllables in accented and unaccented words in Ukrainian and in Ukrainian learners of German, and compares this to German natives (Experiment 1). Experiment 2 then investigates the Ukrainian learners’ sensitivity to F0 as a stress cue in online word recognition.
The functional load of stress in Ukrainian is similar to German; there is a small number of monomorphemic minimal pairs that only differ in stress: for instance, in Ukrainian, PRYklad means ‘example’ (stressed syllable in capitals), pryKLAD means ‘gunstock’; in German, MOdern means ‘to rot’, moDERN means ‘fashionable’. In both languages, morphology may affect stress, leading to polymorphemic minimal pairs (e.g., German particle verbs such as UMfahren ‘to run over’ vs. umFAHren ‘to drive past’, Ukrainian SEStry ‘sister.Nom.Pl’, sesTRY ‘sister.Gen.Sg’). There are more of these morphologically conditioned stress minimal pairs in Ukrainian than in German.
Ukrainian vowels in unstressed syllables may be phonetically reduced, leading to a more central vowel quality (Pompino-Marschall et al., 2017). In German, unstressed vowels are also more reduced (Mooshammer & Geng, 2008). Even though Ukrainian has not been studied in great detail yet, there are indications that the role of F0 is different in Ukrainian than in German (Łukaszewicz & Mołczanow, 2018c).
The role of F0 as a (direct) correlate of lexical stress remains debated due to the common co-occurrence of word-level prominence and pitch accents (Gordon & Roettger, 2017). However, the function of F0 is likely to differ cross-linguistically: While Sluijter (1995) claims that F0 is a correlate of phrase-level accent rather than word-level stress in West Germanic languages, Gordon and Roettger (2017) argue that a higher F0 could be indicative of lexical stress in some languages (e.g., Finnish, Greek, Hungarian), even in unaccented positions, and point out that also lower values of F0 can correlate with lexical stress (e.g., in Urdu). Still, whereas F0 is not necessarily a direct acoustic stress correlate and depends on the pitch accent type in intonation languages, phonological accentuation (i.e., the presence of a pitch accent) is an unambiguous cue to the position of lexical stress, irrespective of the peak-stress alignment, that is, whether the accented stressed syllable is aligned with an F0 peak or an F0 valley. Research on native German listeners has shown that the identification of the lexically stressed syllable is affected by the alignment of the F0 peak relative to the stressed syllable (Zahner et al., 2019). There were fewer errors when the F0 peak was aligned with the stressed syllable. In online word recognition, this led to the temporary activation of cohort words with a similar sequence of sounds but a different stress pattern. Such findings raise an important question for L2 prosody acquisition: Do non-native listeners of German rely on similar cues? Specifically, do L2 learners use high F0 as a cue to stress in a manner comparable to native listeners?
The paper first gives some background on the realization of stress and focus in Ukrainian and German and on the processing of stress cues, including information on the role of prosodic transfer, which allows us to phrase the research questions more concretely. Section 3 presents the production experiment, Section 4 the perception experiment, and Section 5 discusses the results of both studies.
2 Background
2.1 Realization of Stress and Focus in German and Ukrainian
In Ukrainian, lexical stress is free, contrastive, and can fall on any syllable (e.g., Karpiński et al., 2020). Both traditional (Brovčenko, 1969; Toc’ka, 1969) and more recent (Łukaszewicz & Mołczanow, 2018a, 2018b, 2018c, 2024) studies on Ukrainian identify increased vowel duration as the most consistent acoustic correlate of stress, while other stress correlates appear to be less reliable (vowel intensity) or unreliable (F0). However, the target words in Łukaszewicz and Mołczanow (2018c) appeared in an accented position, thus making it impossible to tease apart the effects of stress and focus.
The intonational system of Ukrainian has only recently begun to receive description within the autosegmental-metrical framework (Pierrehumbert, 1980). In broad-focus utterances, accent placement typically falls on the utterance-final word, while in narrow focus, the focused constituent is accented and followed by post-focal compression. Both early work by Féry et al. (2007) and a more recent analysis by Bokova (2021, p. 78) identified H*+L (an early-falling accent) as the default pitch accent used for the prosodic marking of narrow focus. In broad-focus contexts, the prenuclear accent is rising (L+H* in Féry et al., 2007), followed by a nuclear accent with an early peak (H+L* in Féry et al., 2007).
Similar to Ukrainian, German has free stress, phonetically marked by increased vowel duration (e.g., Jessen et al., 1995). Other acoustic correlates of German stress include more vocal effort (Mooshammer, 2010) and less peripheral vowel quality (Mooshammer & Geng, 2008). In terms of intonation, German makes use of pitch accent types that differ with respect to the alignment of the tonal targets (high or low) with the stressed syllable: Whereas in so-called ‘medial-peak’ accents (L+H*), the F0 peak is aligned with the stressed syllable, ‘early-peak’ (H+L*) and ‘late-peak’ (L*+H) accents do not align the F0 peak but the F0 valley (L) with the stressed syllable, making high F0 an unreliable phonetic cue to the detection of metrical strength (Kohler, 1991). These different pitch accent types are used mostly for information-structural and attitudinal purposes. For example, medial-peak accents (L+H*) are commonly used to introduce new referents to the discourse (Kohler, 1991; Pierrehumbert & Hirschberg, 1990), while early-peak accents (H+L*) typically signal semi-active and therefore inferable information (Baumann & Grice, 2006; Kohler, 1991).
In many regards, the representations and processes of the native language, L1, influence the second language, L2, known as CLI (cross-linguistic influence) or transfer (Flege & Davidian, 1984; Odlin, 1989). The extent of CLI in stress production naturally depends on the phonology and phonetics of stress in the L1 and L2 (Altmann, 2006; Eckman, 2013; Kijak, 2009; Major, 2008); the same is true for the L2 production of phrase-level prominence (focus) and intonation (Baek, 2024; Trouvain & Braun, 2020). Turco et al. (2015) further argued that three levels of transfer should be considered: the relevance of linguistic categories (in this context the presence or absence of contrastive stress in the L1 and L2, which is similar in Ukrainian and German), the linguistic markers used (e.g., intonation or morphosyntax, which is similar in Ukrainian and German), and the phonetic implementation of phonological categories (e.g., duration, intensity, F0, where duration and intensity seem to be used in a similar way in Ukrainian and German, but F0 is not). Similar factors have been argued to guide the acquisition of intonation in an L2 (Mennen, 2015). In her L2 Intonation Learning theory (LILt), Mennen (2015) further included a semantic dimension (the meaning contribution of intonational events in L1 and L2) and a frequency dimension (how frequent intonational events are in a language), both of which differ between Ukrainian and German. Furthermore, markedness has been argued to play a role, such that more marked structures are harder to acquire (Rasier & Hiligsmann, 2007; Zerbian, 2015). The markedness aspect is not so relevant for our study because the two languages we investigate are very similar regarding the marking of stress and focus (both are free-stress intonation languages).
2.2 Processing of Stress
In stress languages, prosodically marked lexical stress plays an important role in speech comprehension as it affects how we segment, recognize, and interpret spoken language (for an overview see Cutler, 2012). Stress processing has been studied using stress identification (Kutscheid et al., 2022), the lexical identity of minimal pairs (Fry, 1958; Kohler, 2012; Severijnen et al., 2021; Tremblay et al., 2021), the encoding and retrieval of different stress patterns as number sequences (Dupoux et al., 2001), lexical decision tasks with or without priming (Cooper et al., 2002; Donselaar et al., 2005; Protopapas et al., 2016), neurophysiological studies (Domahs et al., 2008; Friedrich et al., 2004), and the visual-world eye-tracking paradigm (Lee, 2014; Reinisch et al., 2010). The literature in this area is too vast to do justice to individual studies. We first summarize recent studies that investigate the relative weighting of different prosodic cues in stress perception, in particular with regard to the role of F0 in German and other Germanic languages, and then review studies on cross-linguistic influence.
F0 was a relevant cue to stress for Dutch learners of English to disambiguate a minimal stress pair in English. Tremblay et al. (2021) manipulated words of a stress minimal pair (DEsert with stress on the first syllable vs. deSSERT with stress on the second syllable) with regard to three dimensions: (1) vowel quality versus F0, (2) vowel quality versus duration, and (3) duration versus F0. English native speakers and Dutch speakers of English indicated which of the two words they heard. The results showed that vowel quality had a stronger impact than pitch for native English speakers, but a reversed cue weighting was observed for Dutch learners of English. Proficiency affected the learners’ sensitivity to vowel quality but not their reliance on pitch. Vowel quality generally had a stronger impact than duration, but less so for Dutch learners of English.
With an exposure-test paradigm, Severijnen et al. (2021) investigated idiosyncratic cues to stress in Dutch. They used stimuli that only varied in one stress cue during exposure (F0 or intensity), while the other cue and duration were held constant. In the test phase, they presented ‘mixed items’ with conflicting stress cues from the speakers heard during exposure. Their results showed that listeners processed stress cues in relation to the speaker (higher weighting of F0 for the speaker who used F0 to distinguish stress, higher weighting of intensity for the speaker who varied intensity during exposure). The differences were small but significant and suggest speaker-specific adaptation to phonetic cues to stress, even to F0.
As introduced earlier, high-pitched syllables are interpreted as metrically stressed also in German. Zahner et al. (2019) showed that German native listeners made more mistakes in identifying the stressed syllable in German trisyllabic words when the pitch peak preceded or followed the stressed syllable. There was a clear and significant response bias toward the syllable aligned with the F0 peak. In a visual-world eye-tracking study, which we replicate here, listeners were shown four printed words, two of which were trisyllabic cohort competitors differing in the position of the stressed syllable (e.g., target LiBElle [li.ˈbɛ.lə] with penultimate stress and competitor LIbero [ˈli.bə.ʁo] with initial stress). Fixations to the competitor with an SWW (strong-weak-weak) pattern increased when the target word was produced with an early-peak accent (H+L*) compared to a medial-peak accent (L+H*), despite the fact that early-peak accents are appropriate in experimental lists with repeated carrier phrases (‘Please click on < target >’). This indicates that high-pitched unstressed syllables are temporarily perceived as stressed, directly influencing lexical activation. Momentarily, F0 was hence a more relevant cue to stress than duration and intensity. When participants were familiarized with speech with only early-peak and late-peak accents, this competitor activation disappeared, suggesting that the frequency of occurrence of accent types affects relative cue weighting of F0.
Research into stress processing in a non-native language suggests that L2 listeners often have difficulty interpreting prosodic prominence in a native-like way, especially when stress realization differs between the L1 and L2. Negative transfer in processing can lead to reliance on other cues compared to native speakers of the target language. This was shown in the cue-weighing experiment reported above, in which Dutch listeners weighted vowel quality less than native English speakers (and the F0 cue more) in line with the cues in their L1 (Tremblay et al., 2021). Canadian French listeners of English could only use stress information in lexical activation when they could produce the stress pattern correctly, leading to a positive correlation between accuracy in a vocabulary task and accuracy in a lexical decision task. Gibson and Bernales (2020) furthermore reported that even advanced L2 Chilean listeners of English (L1 Spanish) showed reduced sensitivity to prosodic prominence compared to native English listeners, suggesting that the mapping between acoustic cues and linguistic function develops gradually and may remain incomplete, cf. also Ortega-Llebaria et al. (2013).
There are indications that increased proficiency leads to more target-like use of cues (Tremblay, 2008; Tremblay et al., 2021). There are also indications of a production-perception link: Cue use in L2 production is mirrored in perception (Schertz & Clare, 2020, for a review), although studies typically focus on segmental contrasts and not on suprasegmental contrasts such as stress and focus.
2.3 Aims and Research Questions
In this study, we address both the production and perception of lexical stress in Ukrainian learners of German.
In the production experiment (Experiment 1), Ukrainian learners of German read question-answer pairs to elicit different focus structures in both Ukrainian (L1) and German (L2). While acoustic correlates of stress in Ukrainian have been examined in several studies, none of them disentangled word-level and phrase-level prominence. In addition, we are not aware of any studies that have addressed stress realization in Ukrainian learners of German. We investigate the effects of stress in trisyllabic words (on the first or second syllable) and measure vowel duration, vowel intensity, and F0 of the utterance. The German part of the task is performed by Ukrainian learners as well as by a small group of L1 German speakers. This allows for a comparison between Ukrainian and L2 German (to study prosodic transfer from the L1) as well as between L1 German and L2 German (to study L2 attainment).
Experiment 2 is a replication of the visual-world eye-tracking experiment described in Section 2.2 (cf. Experiment 2 in Zahner et al., 2019), which investigates online processing of stress – and hence the uptake of segmental and prosodic cues to stress – by Ukrainian learners of German.
Together, the experiments address the following three research questions:
RQ1. To what extent do vowel duration, intensity, and F0 signal stress and focus in Ukrainian?
RQ2. To what extent does the prosodic marking of stress and focus differ between Ukrainian learners of L2 German and L1 German speakers?
RQ3. Does F0 peak-stress alignment influence stress perception in Ukrainian learners of German, and does sensitivity to F0 in perception align with production patterns?
This paper adds to the literature on Ukrainian prosody and provides data on cross-linguistic influence for a lesser-studied language pair (East Slavic language Ukrainian vs. West Germanic German) regarding production and perception of metrical stress and focus. It is expected that the cues duration and intensity are similar across languages, but that the pitch accent types are likely to differ. We hypothesize that this may affect the weight of high F0 as a phonetic cue to metrical strength in perception. We will formulate more specific hypotheses for perception in Section 4, once we have information on the L1 and L2 productions of stress and focus.
3 Experiment 1: Production of Stress and Focus
Experiment 1 examines the role of vowel duration, intensity, and F0 in marking stress and focus in L1 Ukrainian and in Ukrainian learners of German. Furthermore, it compares the productions of Ukrainian learners of German with those of a small group of German native speakers. Participants produced trisyllabic words with one of the two metrical patterns (SWW or WSW) as a response in a question-answer pair. The question elicited narrow focus on the target word (focused condition) or on the verb (unfocused condition). SWW and WSW were chosen to avoid effects of pre-boundary lengthening (Shattuck-Hufnagel & Turk, 1998).
If vowel duration is a correlate of lexical stress in Ukrainian, we expect increased vowel duration for the second vowel of the WSW words in both focused and unfocused conditions (compared to the second vowel of SWW words). Following Łukaszewicz and Mołczanow (2018c), we assume that duration constitutes a more reliable indicator of stress than vowel intensity and therefore predict larger differences in duration between stressed and unstressed vowels, and smaller differences for intensity, if any.
In addition, we analyze focus marking and the F0 contours over the course of the utterance. For Ukrainian narrow focus, we expect that the F0 peak is realized in the stressed syllable, following Bokova (2021) and Féry et al. (2007) who argue that H*+L is used in narrow focus (compared to prenuclear rising accents, L+H* and a default nuclear H+L* accent with an early-peak). However, Łukaszewicz and Mołczanow (2018c) did not observe higher F0 values correlating with stressed vowels in focused words. Therefore, it is equally possible that high F0 does not co-occur with sentence-level prominence.
Regarding L2 production, we expect different outcomes for duration, intensity, and F0 in marking stress and focus. Specifically, since duration and intensity are also correlates of stress in German (Jessen et al., 1995; Mooshammer, 2010), we expect little or no differences between Ukrainian learners of L2 German and German native speakers.
In German, narrow focus is typically realized with rising accents (Mücke & Grice, 2014; Roessig et al., 2022; Roessig, 2024). If Ukrainian learners acquire native-like marking of narrow focus in Standard German, we predict a rising tone on the stressed syllable in the focused condition and hence no differences between Ukrainian-accented L2 German and L1 German; in the case of negative transfer (cf. Beinrucker et al., 2016), we expect no differences between Ukrainian and L2 German.
3.1 Methods
3.1.1 Participants
Sixteen native speakers of Ukrainian (12 female, 4 male, mean age = 30.6 years, SD = 12.2 years) participated in the data collection. All participants spoke L2 German at a CEFR B2 level of self-reported proficiency or higher. The participants had normal or corrected-to-normal vision and no language or learning disabilities. All participants were born in Ukraine and originated from different regions of the country. 1 At the moment of testing, all participants had been living in Germany for at least 2 years and resided in Konstanz (Germany).
The monolingual German comparison group consisted of 5 native speakers of Standard German (4 female, 1 male). All speakers were born in Germany and were students of the University of Konstanz at the moment of testing. All L1 German participants were proficient speakers of L2 English (at least CEFR B2).
Participation was voluntary, and participants were reimbursed for their time.
3.1.2 Materials
The experiment employed a blocked design with the following factors: (A) Language (Ukrainian, German), (B) Stress (SWW, WSW), and (C) Focus (unfocused, focused).
A total of 40 trisyllabic word pairs (20 per language) were selected, one member with an SWW, one with a WSW metrical pattern (see Table 1). For the Ukrainian stimuli, the vowel in the second syllable corresponded to one of the graphemes <а> (<я> if preceding consonant is palatalized), <е>, <i>, <о>, or <у> (representing [a], [e], [i], [ɔ], and [u], respectively), four times each (see Appendix 7.1 for the complete list of stimuli). For German, we chose the same distribution of graphemes in the second syllable, the sounds typically mapped to [a:], [a], [ɛ], [i], [o], and [u]. In both languages, the reduction of vowel quality (centralization) in unstressed syllable is possible. The words in the two stress conditions were matched for lexical frequency in both languages: in Ukrainian, SWW words had average occurrence of 23.3 per million (SD = 48.8) compared to 32.6 (SD = 76.0) for WSW words,
Example Quadruplet of Target Words for the Two Languages and Stress Conditions; Target Vowel Is Located in the Second Syllable.
Each target word was presented in the answer of question-answer-pairs (QAPs) as an object in an SVO (subject-verb-object) sentence in two conditions:
-
(1) (Що ти написала?) – Я написала [‘ (Ščo ty napysala?) – Ja napysala [‘ (What you wroteFEM.?) – I wroteFEM. [‘ ‘(What have you written?) – I have written [“ (2) (Was hast du geschrieben?) – Ich habe [‘ (What have you written?) – I have [‘ ‘(What have you written?) – I have written [“
-
(3) (Ти написала ‘дерево’?) – Нi, я [прочитала]F ‘ (Ty napysala ‘derevo’?) – Ni, ja [pročytala]
F
‘ (You wroteFEM. ‘tree’?) – No, I [readFEM.]
F
‘ ‘(Did you write ‘tree’?) – No, I [read]
F
“ (4) (Hast du ‘Libelle’ gesagt?) – Nein, ich habe ‘ (Have you ‘dragonfly’ said?) – No, I have ‘ ‘(Have you said “dragonfly”?) – No, I have [read]
F
In Ukrainian, the answer in a QAP contained one of the following verbs: написати (napysaty) ‘to write down’, прочитати (pročytaty) ‘to read’, сказати (skazaty) ‘to say’ in the past tense, equally distributed. In total, 50% of the verbs were used with the masculine, 50% with the feminine gender marking. The German counterparts of these verbs, namely schreiben ‘to write (down)’, lesen ‘to read’, sagen ‘to say’, were used in the past participle.
Each block also included five fillers: imperatives or short statements with a conformist response. Fillers were used to reduce strategic responses and prevent prosodic bias from repeated contexts (cf. Schütze & Sprouse, 2013). Targets and fillers were pseudo-randomized so that no more than three consecutive targets appeared, and no more than two successive targets shared the same metrical pattern or target vowel.
Stress and focus conditions were manipulated within-subjects; stress was manipulated between-items, focus within-items. To this end, four experimental lists were created, each consisting of four blocks of 20 items. Each block contained 10 QAPs with SWW words and 10 QAPs with WSW words. Within one block, all QAPs were presented in the same focus condition. A short optional break (10s) was inserted in the middle of each block. Half of the lists started with the focused, the other half with the unfocused condition. All lists started with the Ukrainian stimuli to make participants feel more at ease. In total, each Ukrainian participant recorded 80 utterances (40 per language).
3.2 Procedure
The experiment took place at the PhonLab at the University of Konstanz (Germany). Prior to the experiment, the participants read and signed the Data Protection Regulation Form and a consent form. Then, the participants were randomly assigned an experimental list and were asked to proceed to the sound-attenuated booth.
The materials were prepared as a Keynote presentation that was played as a slideshow on a screen in the sound-attenuated booth. The participants were asked to read the entire QAP from the presentation as naturally as possible. Each slide contained exactly one QAP (typeface: Arial, font style: Regular, font size: 48 pt, black). To be explicit, the stressed vowels were marked in red font. This was done to facilitate reading, which could be especially relevant for the L2 section of the task and for words with lower lexical frequency in both languages, because they require more processing effort than high-frequency vocabulary (Graves et al., 2010). 2
A blank slide was inserted after each QAP. The blocks were separated by a short break (10 s). For the Ukrainian participants, after two blocks in Ukrainian, the same instructions appeared on the screen in German for the German part of the experiment. This order was fixed for all participants.
The participants were recorded in Praat (Boersma, 2019) using headset microphones (AKG, and directly digitized, 48 kHz, 16 bit). The entire procedure was completed in a single session with an approximate duration of 15 min.
3.3 Data Treatment
The productions were segmented and manually labeled in Praat on different tiers. Figure 1 shows a sample token from the Ukrainian part of the experiment, along with the annotation. The first tier indicates the target word (stressed syllable capitalized). The second tier represents the critical segments for the analysis of F0: the interval preceding the target word (e.g., Ja pročytala, ‘I read’) and the three syllables of the target word (grouped according to the weight as W or S on Tier 3). A sentence-final interval with the past participle was added in the German part of the experiment (not shown in Figure 1). The vowel was segmented on Tier 4, followed by a label for focus condition. All segmentation was done using the broadband spectrogram and standard segmentation criteria (Turk et al., 2006). Some utterances contained a pause before the target word. In this case, an additional segment labeled with <p> was added.

Annotation of a sample token cerešn’a (‘sweet cherry’) in the focused condition.
Productions with segmental mispronunciations, wrong allocation of lexical stress, or sentence accent, and technical errors were excluded from the analysis. Then, error rates by participant were computed. The data of three participants were excluded: one speaker who showed the highest error rates (22.5%) in the German part of the task, one speaker who reported Ukrainian to be their L2, and one participant who needed hearing aids. Two participants’ data were incomplete due to a technical error, but 50% of their data could be analyzed. After the exclusion of three speakers, overall error rates were 5% (SD = 6%). In the Ukrainian part of the task, there were fewer incorrect responses (M = 2%, SD = 3%) than in the German part (M = 7%, SD = 7%). The monolingual German speakers exhibited error rates of 5% (SD = 3%).
3.4 Analysis and Results
3.4.1 Duration and Intensity
Vowel durations (in ms) and mean intensity (in dB) were automatically extracted. Vowel duration was log-transformed using the log10() function in R due to a right-skewed distribution and analyzed using a linear mixed-effects model implemented in the lme4 package in R (Bates et al., 2015). The initial models included Language (L1 Ukrainian, L2 German), Focus, and Stress as fixed effects and in interaction, as well as random effects for Participant and nested random effects for Vowel/Item. Random effects for Vowel were specified as nested within Item because each item contained a single vowel category, such that vowel was not crossed with items. The dependent variable was Duration.
Interaction terms and main effects were removed if they were not significant (and did not appear in higher-order interactions), and the models were refitted. Model comparisons were conducted using maximum likelihood (ML) estimation (
The effect of Language was not significant
Type-III ANOVA for Duration Model with Spelled Out Variable Names.
The interaction between Stress × Focus is shown in Figure 2. We see that stressed vowels were produced with longer duration

Predicted values of vowel duration, split by stress and focus condition.
For the comparison of Ukrainian learners of German with German L1 speakers, the L2 German data and L1 German data were combined and the model was refitted with Group as additional factor. We were especially interested in interactions of Group × Stress, Group × Focus, as well as Group × Stress × Focus. Neither the main effect of Group nor any of the interactions with Group were significant (all
Intensity was analyzed in the same way as duration. The final model for the Ukrainian participants’ productions included the following predictors: Stress, Focus, Language, Stress × Focus, and Language × Focus. Random intercepts were included for Vowel/Item and Participant, and random slopes were included for Stress by Participant. Residuals were roughly normally distributed. Table 3 shows the Type III ANOVA.
Type III ANOVA for Intensity Model with Spelled Out Variable Names.
As shown in Figure 3, stressed vowels were produced with higher intensity than unstressed vowels

Predicted values of vowel intensity, split by stress and focus condition.
Estimated marginal means revealed that stressed vowels had 1.061 times higher intensity than unstressed vowels in Ukrainian (52.4 dB vs. 49.3 dB) and 1.056 times in German (56.4 dB vs. 53.4 dB;
The data of Ukrainian learners were compared with German native speakers as described above (for duration). Group was not significant as a main effect, nor in two- or three-way interactions with Stress and Focus (all
Overall, the data show that duration is an important property of stress in Ukrainian (cf. Brovčenko, 1969; Łukaszewicz & Mołczanow, 2018a, 2018b, 2018c, 2024; Toc’ka, 1969); duration is used equally in L2 German and in L1 German. We also observe that stressed vowels are produced with greater intensity than unstressed vowels. Both duration and intensity undergo an additional increase under narrow focus. Our results do not reveal any significant differences between Ukrainian learners of German and German native speakers in their use of duration and intensity for the production of stress and focus.
3.4.2 F0
F0 can be analyzed categorically, using labels for different pitch accent types, such as H + L* and L + H* in autosegmental-metrical phonology (Ladd, 2008), or phonetically. We decided to start with a phonetic analysis, which is often used for intonation contours in learner varieties (Lee, 2014; Takahashi et al., 2018; Zahner-Ritter et al., 2022). The phonetic analysis allows us to see phonetic differences across conditions more directly, including more subtle differences in pitch accent realization. However, during the annotation of the L2 German productions, we observed considerable individual variation in the prosodic F0 realization of stress and focus in L2 German. A small number of speakers (N = 4, henceforth Type 1) consistently realized the focused target word with a high tone on the word-initial syllable, independent of metrical pattern (i.e., both in SWW and WSW targets). Therefore, the stressed syllable was always produced with a falling pitch accent. Descriptively, this F0 contour resembles the pattern observed for Ukrainian (cf. Figure 4, upper panel). In contrast, the majority of learners (N = 9, henceforth Type 2) produced the stressed syllable predominantly with a rising pitch accent, thus resembling the productions of German native speakers. 3

Averaged and time-aligned F0 contours in Ukrainian (upper panel), L2 German (middle panels), and L1 German (lower panel) per condition across normalized time. The first time interval (0–50) represents the subject and the verb; the latter intervals represent the three syllables of the target word (50–200). The last interval is only present for the German data (last three panels) and shows the past participle verb.
It does not make sense to average F0 values over such categorical differences. 4 Therefore, we split the productions of the Ukrainian learners of German by type. For all data, 50 F0 points were extracted per filled interval on Tier 2 (see Figure 1), using ProsodyPro (Xu, 2013).
This resulted in 200 F0 values per utterance in Ukrainian and 250 in German. Figure 4 presents average F0 contours in L1 Ukrainian, Ukrainian learners of German, split into two types and in L1 German, each grouped by metrical pattern and focus condition. Given word order differences across languages, the target word is utterance-final in Ukrainian and prefinal in German. To ease comparison across languages despite these differences, 50 points without values were added at the end for the Ukrainian data. Hence, in all figures, the three syllables of the target word are time-aligned horizontally; the target word is always located between normalized time 50–200.
In L1 Ukrainian, a visual inspection of the F0 contours suggests greater F0 compression in unfocused (postfocus) than in focused contexts (Figure 4, upper panel) already at the target word. In the unfocused condition, the pitch accent is on the verb preceding the target word. In the focused condition, the pretonic syllable exhibits a higher F0 than the stressed syllable, which is realized with a falling contour.
The production of the learners classified as Type 1 differs from L1 Ukrainian in the unfocused condition, because the accent needs to be realized on the non-finite verb, which is in sentence-final position in the German data (geschrieben ‘written’ in Figure 4) but before the target word in Ukrainian (napysala). L1 Ukrainian and Type 1 learners cannot be compared phonetically. Phonologically, all Ukrainian speakers produced a falling accent on the verb, but the fall seems more pronounced in Type 1 learners (and in L1 Ukrainian).
In the focused condition, the pitch peak was on the first syllable of the target word, regardless of whether this syllable was stressed (SWW) or unstressed (WSW), resembling a language with head-edge prominence marking. The Type 1 learners resemble L1 Ukrainian in the use of a falling accent to mark focus, while the Type 2 learners resemble the German native speakers, who produced the stress syllable with a rising pitch accent, L+H*.
Figure 5 plots the German productions of focused words for the two types of Ukrainian learners and the German control group together.

Averaged F0 contours in German per group (Type 1, Type 2, L1 German) across normalized time. Only focused target words are included.
To substantiate these qualitative differences, further statistical analysis was conducted to assess whether the differences between the two learner types, as well as between each learner type and the L1 German speakers, are statistically significant. For this analysis, only focused target words were considered. Since F0 contours do not develop linearly over time, time-insensitive measures such as minimum, mean, or maximum F0 values may obscure relevant prosodic patterns (Chuang et al., 2021). Therefore, generalized additive mixed models (GAMMs; Hastie & Tibshirani, 1986; Wood, 2017) were used to analyze the differences between F0 contours in different conditions. By modeling non-linear dependencies between the outcome variable (F0) and predictors (Stress, Learner’s Type) over time through smooth functions, GAMMs enable direct comparison of contour shapes using a pre-specified number of base functions.
The analysis was conducted in R using the mgcv package (Wood, 2011, 2017) to fit the models and itsadug (van Rij et al., 2022) to visualize the results. The outcome variable (F0) was approximately normally distributed. As adjacent F0 measurements are not statistically independent, autocorrelation was modeled using an autocorrelation parameter (rho) determined with the acf_resid() function from itsadug. Several models of varying complexity were fitted using ML estimation, which allows model comparison. Model fits were inspected using gam.check(), and the number of basis functions (k) was adjusted where necessary. To account for heavy-tailed residuals, models were re-run with the scaled t-distribution (family = ‘scat’; Chuang et al., 2021, p. 18).
In the final model, bam() was used instead of gamm() for computational efficiency with large datasets (Chuang et al., 2021). The final model, summarized in Table A1 in Appendix 7.2, included F0 (in SD) as the dependent variable, Stresstype (an interaction between Stress and Learner Type) as a fixed, parametric effect, a smooth term for Stresstype across Normtime (normalized time) to model the interaction between stress and type over time, factor-smooth random effects for Participant and Item (random intercepts), as well as random smooths for Normtime by Stress for each Participant (random slope). The final model accounted for 58% of the deviance. All smooth terms for Normtime by Stresstype were significant (
To analyze differences in F0 across groups, we plotted group-wise difference curves. Figure 6 presents the predicted pairwise F0 differences in German, comparing the two L2 learner types with each other and with L1 German. Type 1 learners differ significantly from German native speakers for both metrical patterns already early on (higher F0 values for Type 1 learners) and toward the end of the stressed syllable and the post-tonic syllable (lower F0 values for Type 1 learners). On the contrary, only a small difference is observed between Type 2 learners and German native speakers in SWW words. These patterns naturally lead to significant differences between two types of learners: in both conditions, Type 2 learners start with lower F0 and the stressed syllable has higher pitch than in Type 1 learners.

Predicted difference curves (pairwise comparisons between contours) for the GAMM analysis of F0 contours: Top panels show differences between Type 1 learners and L1 German productions (a–b), middle panels between Type 2 learners and L1 German production (c-d), bottom panels between Type 2 and Type 1 (e–f). Gray shading represents the 95% confidence interval (CI) of the predicted mean difference. Differences are significant when zero is not included in the 95% CI, indicated by red marking.
3.5 Interim Discussion
Experiment 1 investigated how Ukrainian learners of German produce lexical stress and focus in both their L1 (Ukrainian) and in L2 (German), as compared to L1 German. We orthogonally manipulated metrical pattern (SWW vs. WSW) and focus condition (focused vs. unfocused) in read speech, and measured vowel duration, intensity, and F0 as potential acoustic correlates of stress and narrow focus.
Results showed that vowels had longer duration and higher intensity in stressed syllables, even more so under narrow focus, irrespective of language (Ukrainian, L2 German). Duration was clearly a stronger cue to stress than intensity, evidenced by larger differences and a clearer separation of duration values across stress conditions compared to intensity values. Interestingly, Ukrainian learners performed similarly to German native speakers regarding these non-tonal aspects.
This robust role of duration to signal stress supports earlier claims on L1 Ukrainian (Łukaszewicz & Mołczanow, 2018a, 2018b, 2018c, 2024; Brovčenko, 1969; Toc’ka, 1969, cf. also van Heuven, 2019, for cross-linguistic evidence). The differences in intensity were statistically significant but small, with a lot of overlap across stress conditions. This mirrors Łukaszewicz and Mołczanow (2018a), who also only found intensity differences in a subset of their data. Since metrical stress is relative to metrical prominence in neighboring syllables, it may be interesting to investigate the duration and intensity contrast in relation to unstressed syllables.
Regarding F0, Ukrainian speakers were sensitive to the focus structure: In focused contexts, they produced the first syllable of the target word with high pitch, irrespective of whether the first or second syllable was stressed. This led to accented syllables with an (early) F0 fall in their L1 (cf. also Féry et al., 2007). Ukrainian hence appears to be a head-edge prominence language, that is a language that marks both the edges of prosodic phases (here the word) and the head of the word (the stressed syllable); cf. Jun (2006). In unfocused (postfocus) position, F0 was compressed and there were no F0 movements. The phonetic analyses will have to be complemented by a phonological analysis to learn more about the phonological system of Ukrainian. For instance, one open question is whether the accent associated with the stressed syllable is more like an H*+L or H+L*, which can be tested by manipulating the number of syllables and distance to prosodic edges (cf. Arvaniti et al., 1998; Mücke et al., 2009; Peters, 2008). Regarding the Ukrainian part of the task, our results are by and large consistent with Łukaszewicz and Mołczanow (2018c).
These intonational realizations differ from German native speakers, who produced a rising L+H* accent in narrow focus conditions. While Ukrainian speakers generally produced falling contours on stressed syllables in focused contexts in their L1, they differed in their realization of German intonation contours. Some learners (Type 1) transferred their L1 head-edge prominence marking to the L2 and produced the initial syllable with high pitch and the stressed syllables with a fall, whereas the majority (Type 2) produced the stressed syllable with a rising contour and were hence more similar to the German control group (there was no indication of high-pitch edge-marking). This categorization of learner types is supported by phonetic analysis of F0 contours over the utterance. Statistically, Type 2 differed only slightly from the L1 German group for SWW targets (small difference in a short time interval) and did not differ at all for WSW targets, but Type 2 clearly differed from Type 1 in the area of the stressed syllable, cf. Figure 6. Ukrainian L1 and Type 1 learners are also characterized by a high contour prior to the target word in focused contexts, which is absent for Type 2 learners and L1 German participants (Normtime 0–50 in Figure 4). Note that a direct phonetic comparison between L1 Ukrainian and Type 1 learners is not possible due to differences in word order across languages.
The difference in accent realization across learner types of focused constituents is also visible in the unfocused condition, in which the accent is on the non-finite verb at the end of the sentence: Type 1 learners showed an early fall with a large pitch excursion on the verb (similar to L1 Ukrainian), while Type 2 learners had a fall with a smaller F0 range that occurred later in the verb as in Type 1 learners (see Figure 4). Yet Type 2 learners differed from German natives in marking focus on the verb.
At this point, we cannot determine any factors that predict which type a participant belongs to (since neither age, gender, proficiency, nor region was associated with learner type). Potential other factors are musicality (Delogu & Zheng, 2020; Jekiel & Malarski, 2021) or language aptitude (Kormos, 2013; Miyake & Friedman, 2013; Sparks et al., 2011).
In any case, the presence of two types of learners regarding focus realizations begs the question of whether the learners would also differ in their use of F0 in online reference resolution. To address this question, Experiment 2 employs a visual-world eye-tracking paradigm to examine whether an unstressed syllable realized with an F0 peak is temporarily interpreted as stressed by Ukrainian listeners (similar to L1 German speakers) or whether this competitor activation is only observed in the more target-like Type 2 group (if at all).
4 Experiment 2: Online Comprehension
Experiment 2 is a visual-world eye-tracking paradigm with four printed words displayed on the screen (cf. McQueen & Viebahn, 2007; Tanenhaus et al., 1995), of which one is auditorily presented. In critical trials, the display contained two trisyllabic cohort competitors that differed in stress placement: an SWW competitor (e.g., Libero [ˈli.bə.ʁo] ‘sweeper’) and a WSW target (e.g., Libelle [li.ˈbɛ.lə] ‘dragonfly’), alongside two unrelated distractor words. We manipulated the pitch accent on the target word (either carrying an early-peak accent, H+L*, or a medial-peak accent, L+H*). German native speakers temporarily fixated the SWW competitor when the WSW target was produced with an early-peak accent (Zahner et al., 2019).
Competitor activation in German could be removed by exposure to low-pitched stressed syllables, suggesting that exposure to high-pitched accents affects the use of high F0 as cue to stress (Zahner et al., 2019). Since narrow focus in Ukrainian is typically marked by falling/low accents (see Experiment 1, high-pitched stressed syllables are less frequent in Ukrainian, naturally mimicking the low-pitched training phase in German). If Ukrainian learners of German transfer the L1 pattern to L2 German, we do not expect to see any competitor activation. Conversely, since perceptual abilities often precede production abilities (cf. Casillas, 2016; Nagle, 2018, for recent evidence and further references), it is also possible that the whole learner group shows the competitor activation (in particular in an immersive context, Kang et al. (2012); Wang et al. (2023), but evidence is mixed, cf. Casillas (2020)). A more nuanced prediction arises from previous findings that a native-like phonetic realization of an L2 contrast is correlated with a more native-like interpretation of acoustic cues (Borden et al., 1983; Bradlow et al., 1999; Flege et al., 1997, 1999; Jia et al., 2006). From this perspective, we expect Type 2 learners, who produce narrow focus similar to L1 German speakers (rising accents resulting in a high-pitched stress syllable), to exhibit competitor activation in words with early-peak accents as well.
4.1 Methods
4.1.1 Participants
Sixteen native speakers of Ukrainian (12 female, 4 male, mean age = 25.2 years, SD = 5.3 years) took part in the experiment. All participants spoke L2 German at least at the CEFR B2 level of proficiency. All had normal or corrected-to-normal vision and hearing and no language or learning disabilities. All participants were born in Ukraine and originated from 10 regions of the country (see Appendix 7.3). All participants had been living in Germany for at least 2 years and resided in Konstanz (Germany) at the moment of testing. They took part voluntarily and were reimbursed. Two further participants were tested, but their data were not analyzed because of hearing impairment (1) and a technical error (1).
Eleven of the participants performed in both experiments. Seven of them were tested in the same session, but Experiment 1 always preceded Experiment 2 5 , four were tested in a separate session. Eight of these 11 participants were assigned to Type 2 and three participants to Type 1 in Experiment 1, see Table A2 in Appendix 7.3 for a by-type breakdown.
4.1.2 Materials
We used the same materials and procedure as Zahner et al. (2019, Experiment 2, p. 82): Sixty-four trisyllabic cohort pairs, consisting of one member with an SWW metrical pattern and one member with a WSW metrical pattern, were used; they were segmentally identical until at least the first consonant of the second syllable. Of the 64 cohort pairs, 32 were used for ‘cohort trials’ (in which one of the cohort members was named): 16 of these were experimental trials (the WSW word was named), 16 distractor trials (the SWW word was named). A full list of cohort pairs in cohort trials is presented in Appendix 7.4. The other 32 cohort pairs were included as filler trials; they were shown on the screen but not mentioned.
All cohort members were matched for lexical frequency and number of characters. Distractor items were semantically and phonologically unrelated to the cohort members but had comparable lexical frequency and length. The distractors had stress on the first, second, or last syllable.
The target words in cohort trials were recorded with early-peak and medial-peak accents, the distractors were realized half with early-peak and half with medial-peak accents. The words were embedded in the carrier sentences: Bitte klicke … an (‘Please click on …’). The other words in the instructions were unaccented, and the utterances ended in low edge tones. The carrier sentence was spliced onto the target word to remove residual prosodic effects (Roessig, 2024).
After splicing, the medial-peak contour (L+H*) was superimposed on a recording originally made with an early-peak contour (H+L*) and vice versa (cf. Zahner et al., 2019, p. 83) to remove residual differences in duration, intensity, or vowel quality (caused by differences in pitch accent type) and hence to isolate the role of F0 as best as possible. An example pair with WSW stress in the two intonation conditions is shown in Figure 7. As can be seen, the early-peak contour has high pitch on the first (pretonic) syllable and a low-toned stressed syllable (a bit more similar to the L1 Ukrainian focus realizations in Experiment 1), the medial-peak contour has low pitch on the first syllable and a rising pitch on the stressed syllable (more similar to the L1 German focus realizations).

Example production of a word with WSW stress in the two intonation conditions: early-peak contour, H+L* (a); medial-peak contour, L+H* (b). The starred tone is aligned with the stressed syllable of the target word in both cases.
4.1.3 Procedure
The same eight experimental lists as in Zahner et al. (2019) were used. They contained the 32 cohort trials (16 experimental, 16 distractor trials) and 32 filler trials in different, pseudo-randomized orders. The written words were positioned in the outer third of the four quadrants of the screen to prevent peripheral viewing. The words were enclosed within a rectangular box (6.5 cm × 4 cm). The four kinds of words appeared in different locations on the screen but equally often in each of the four locations on the screen for each intonation and stress condition.
The participants were tested individually at the PhonLab at the University of Konstanz. First, all participants read and signed the Data Protection Regulation Form and a consent form. Then, they received written instructions about the procedure of the experiment. The participants’ dominant eye was determined and calibrated using the SR Eyelink 1000 Plus in a desktop mount system at a sampling rate of 500 Hz. The participants were positioned approximately 70 cm away from an LCD screen (37.5 × 30 cm). The calibration followed an automatic procedure (pupil and corneal reflection, Eyelink default settings).
After successful calibration, the experiment started upon a mouse click. Every trial was initiated by a black dot in the center of the screen on a white background. After participants clicked on the screen, four words (typeface: Times New Roman, font style: Regular, font size: 20 pt, hex color: #000000) appeared.
Each trial started with a centered fixation cross; 500 ms later, the four words appeared on the screen for 2,000 ms. Then, the auditory instruction started. The target words started on average 575 ms after the onset of the instruction, giving participants a preview time of 2,575 ms. This preview time is appropriate to access phonetic information (Huettig & McQueen, 2007).
Participants were explicitly instructed to click on the word from the auditory instruction as quickly as possible. The presentation of auditory stimuli occurred via headphones (Beyerdynamic DT-990 Pro, 250 OHM) at fixed comfortable loudness. A drift correction followed every fifth trial. An optional break was inserted after half of the trials.
After responding to all trials, the participants were given the list of all members of cohort pairs used in the cohort trials. The items were presented in a random order (e.g., Furie, Libelle, Radieschen . . . ‘fury, dragonfly, radish . . .’) without the pre-context. The participants were asked to read these words aloud, and the reading was recorded. This was done to exclude the trials where participants stressed the wrong syllable (cf. Tremblay, 2008, who showed that lacking knowledge of stress position impairs stress processing). Finally, the participants filled out a questionnaire asking for their age, region of origin, language and academic background, etc. The entire procedure was completed in a single session with an approximate duration of 30 min.
4.1.4 Analysis and Results
We first analyzed the behavioral data (accuracy and latency of clicks) and the accuracy of stress production in the recordings of the items to determine whether the stress patterns of the experimental items were known. The behavioral data showed a high accuracy for clicks (98.9% for the early-peak condition, 98.2% for the medial-peak condition). The latency was similar across intonation conditions: 2,094 ms after target onset for the early-peak condition (SD = 632 ms) and 2,091 ms for the medial-peak condition (SD = 556 ms).
The analysis of post hoc recordings revealed wrong stress placement in 25.2% of the words. Yet, for the analysis of fixation data, productions were only marked as correct if stress was adequately allocated in both members of a cohort pair (the WSW target and the SWW competitor) because the validity of this experiment relies on maintaining the relationship between the target and competitor, which would be compromised if participants were unfamiliar with the metrical pattern of SWW distractors. Due to such conservative exclusion criteria, only 56.9% of the data were included in the analysis; see by-participant error rates in Appendix 7.3. To test the hypothesis of increased competitor fixation with early-peak accents, only experimental targets (WSW words) were analyzed, cf. Zahner et al. (2019).
Fixations were extracted in 20 ms-bins and automatically categorized as being directed to the target (WSW, Libelle), the cohort competitor (SWW, Libero), or to unrelated distractors, provided they occurred within a 200 × 200-pixel square surrounding the respective word. The VWPre package was used in R (Porretta et al., 2020) to prepare the data for plotting and statistical analysis with GAMMs.
The fixations to the four words on the screen, averaged across all speakers and correctly stressed items, are presented in Figure 8, split by intonation condition (early-peak accents in top panel, medial-peak accents in bottom panel). As expected, fixations to the distractors decreased as soon as disambiguating segmental information of the auditory target started becoming available. From 800 ms onwards, the time when segmental information on the target is available, there were initially more fixations to the SWW competitor, but equally so in both conditions, see left panel of Figure 9 for a direct comparison of competitor fixations across intonation conditions. These data show that the Ukrainian learners as a group did not activate the stress competitors more in the early-peak than the medial-peak condition, thus differing from German natives.

Evolution of fixations to target (WSW, e.g., Libelle ‘dragonfly’, dark blue line), competitor (SWW, e.g., Libero ‘sweeper’, red line), and the two distractors (SWW Thymian ‘thyme’ [top left] and WSW Safari ‘safari’ [bottom right], light blue lines) in experimental trials in two intonation conditions (early-peak condition [upper panel], medial-peak condition [lower panel]) across all participants. Acoustical landmarks (gray shaded vertical lines) are shifted by 200 ms to account for saccade planning time.

Competitor fixations across intonation conditions (early-peak, red vs. medial-peak condition, orange). Acoustical landmarks (gray dashed vertical lines) are shifted by 200 ms.
We next turn to the subset of Type 2 participants, who are similar to German native speakers in the production of F0 in stressed syllables. The fixations to the four words on screen, split by intonation condition, are shown in Figure 10. Figure 10 illustrates that in the early-peak condition, fixations to the competitor strongly increase during the segmentally ambiguous phase (prior to the segmental U.P. [the uniqueness point in the signal, at which the competitor and the target no longer overlap segmentally, not considering suprasegmental information]). By contrast, fixations to the competitor did not differ from fixations to the target in the medial-peak condition (similar to the pattern reported for German); see left panel of Figure 11 for a direct comparison of competitor fixations. These data show that successful production of stress and focus in L2 leads to similar processing of F0 as a stress cue in online word recognition. Interestingly, fixations to the target word also rise more quickly in the early-peak compared to the medial-peak condition, a pattern that was not predicted (see left panel of Figure 12 for a direct comparison). We return to this seemingly paradoxical pattern of more competitor and more target fixations in the early-peak condition after the statistical analysis of competitor fixations.

Evolution of fixations to target (WSW, e.g., Libelle ‘dragonfly’, dark blue line), competitor (SWW, e.g., Libero ‘sweeper’, red line) and the two distractors (SWW Thymian ‘thyme’ [top left] and WSW Safari ‘safari’ [bottom right], light blue lines) in experimental trials in two intonation conditions (early-peak condition [upper panel], medial-peak condition [lower panel]) across Type 2 participants. Acoustical landmarks (gray shaded vertical lines) are shifted by 200 ms.

Left panel: Competitor fixations across intonation conditions (early-peak, red vs. medial-peak condition, orange) for Type 2 group. Acoustical landmarks (gray dashed vertical lines) are shifted by 200 ms. Right panel: Difference curve in competitor fixations in early-peak condition versus medial-peak condition across participants of Type 2. Differences are significant when zero is not included in the 95% CI, indicated by red marking.

Left panel: Target fixations across intonation (early-peak, red vs. medial-peak condition, orange) for Type 2 group. Acoustical landmarks (gray dashed vertical lines) are shifted by 200 ms. Right panel: Difference curve in target fixations in early-peak condition versus medial-peak condition for Type 2 group. Differences are significant when zero is not included in the 95% CI, indicated by red marking.
The statistical analysis closely followed Zahner et al. (2019, pp. 85–86). Competitor fixations were converted to empirical logits (elogs), that is, fixations to the competitor versus fixations to three other words (target or two distractors), and used as the outcome variable in the GAMMs models. As predictors, the model included a parametric coefficient for Intonation condition, and a smooth term for Intonation condition over time.
The time window spanned the acoustic onset of the target word (shifted by 200 ms to account for planning time of saccades) until the segmental U.P. (shifted by 200 ms, i.e., from 800 ms until 1,100 ms). In addition, a random intercept was included for Event (combining Subject and Item as unique identifiers). To account for autocorrelation, an AR-1 correlation parameter was estimated (value: 0.69) with the acf_resid() function from the itsadug package. To choose the best model fit, a backward stepwise elimination procedure was applied, such that only smooth terms that reached significance were included. The function CompareML() was used to find out if inclusion of a smooth term improved ML scores, and terms were retained if they led to a better fit.
The final model included Intonation condition as a parametric coefficient and as a smooth term, and random intercepts for the event variable. The model explained 63.3% of the deviance. There were more fixations to the competitor in the early-peak condition compared to the medial-peak condition, with an estimated slope of 0.45 logits, and this difference was time-independent. Furthermore, the smooth term of Intonation condition over Normtime was also significant. The right panel of Figure 11 shows the predicted differences of competitor fixations over time; the differences between the two intonation conditions reached significance between 852 and 1,106 ms (39% of the analysis window). This time interval of significant differences is slightly larger than that of the German participants in Zahner et al. (2019), but astonishingly similar (for German participants, it was between 868 ms and 1,001 ms).
We now turn to the unexpected fact that the early-peak condition not only increases competitor fixations (see Figures 10 and 11) but also target fixations (see Figure 12). The results of the GAMM analysis (modeled in the same way as for competitors, but with a larger analysis window, from the onset of the target word, 800 ms, to its offset, 1,350 ms, cf. Table 4) showed significantly more target fixations in the early-peak contour too (right panel of Figure 12). The time course of significant difference was later than for the competitor activation (1,200–1,400 ms); see right panel of Figure 12 for difference curve of the GAMM analysis.
Final Generalized Additive Mixed Model (GAMM) Summary for
Note. The model includes a Gaussian family with an identity link function. Part 1: Estimate, Standard Error, t- and p-values for the parametric coefficients. Part 2: Estimated degrees of freedom (EDF), reference degrees of freedom (Ref.df), F- and p-values for the smooth terms. Part 3: Model specification (original R formula).
It is puzzling that target words in the medial-peak contour were processed so inefficiently by the Type 2 group, in particular since speakers of this group predominantly produced medial-peak accents (L+H*) in Experiment 1. Why would they have difficulties in lexical activation when words start with a low syllable if they produce them themselves? To understand better, we took a second look at the entire learner group and the effect of intonation contour on target fixations (Figure 8). Statistical analysis revealed that the entire learner group showed more target fixations in the early-peak condition compared to the medial-peak condition (see Figure A1 in Appendix 7.5 for details).
4.2 Interim Discussion
Experiment 2 investigated whether Ukrainian learners of German interpret high-pitched syllables as stressed in an online reference resolution task, similar to German natives. The answer is clearly no: The Ukrainian learners as a whole did not fixate the SWW competitor words more when the first syllable was high-pitched. This suggests that a high F0 on an unstressed syllable does not trigger the perception of stress for Ukrainian learners of German. These data are in line with the prediction that the frequent occurrence of falling/low pitch accents in Ukrainian L1 narrow focus productions transfers to cue use in perception. Unexpectedly, the word with word-initial stress (SWW word) received more fixations than the WSW word in both intonation conditions (see red lines in Figure 8). It seems plausible that the cohort word starting with the more prominent, stressed syllable (longer duration, higher intensity) attracted more attention.
The larger group of learners classified as Type 2 in Experiment 1 – whose productions closely resembled those of German controls – showed increased fixations to SWW competitors during the segmentally ambiguous phase when the target word carried an early-peak compared to a late-peak accent. The time course and the magnitude of this competitor activation were comparable to German speakers previously tested (Zahner et al., 2019). These differences in the use of F0 for lexical activation depending on learner type underscore individual differences in phonetic realization and processing and, in light of our study, suggest a strong link between L2 production and L2 perception of stress. Given that Type 2 learners produced the target words in a more native-like way, one may consider them to be more proficient in the L2 (although this is not captured by the proficiency measure collected, self-reported CEFR or length-of-residence). Interestingly, the bias toward the word starting with a stressed syllable, present in the entire learner group, is no longer visible in Type 2 learners (see Figure 10).
Note that listeners heard spliced and resynthesized materials. The splicing ruled out any influence of prosodic differences prior to the target word, and the resynthesis removed potential residual effects of high pitch on duration and intensity (i.e., isolating the effect of the F0 cue). The stimuli sounded very natural and the manipulations were not noticeable. In any case, the manipulation would affect both intonation contours equally, so even if there were residual artifacts, they would not interfere with the experimental manipulation. Given the importance of the duration cue, future studies may pit duration and F0 cues against each other to investigate whether the findings replicate. Furthermore, it may be worthwhile to test cue weighting in isolated words (Tremblay et al., 2021) to remove potential difficulties with segmentation.
Exploratory analyses further showed that Ukrainian learners of German quickly recovered from the increased competitor activation and swiftly shifted their attention to the actual (WSW) target in the early-peak condition. In the medial-peak condition, however, fixations to the four words rose very slowly. One tentative hypothesis is that the L1 Ukrainian focus marking (with high pitch on the word-initial syllable) interfered with processing in the medial-peak condition, in which the target word started with low pitch (L+H*). This was not predicted and is hence a surprising side effect. We see a number of explanations for this intonation effect on word recognition.
First, it is possible that Type 2 listeners (and in fact the entire learner group) needed a high-pitched word onset (high pitch as necessary cue to word onsets). These perceptual data may indicate that Ukrainian is a head-prominence system (marking both the heads [stressed syllable] and the edges [word onset], cf. Jun [2006]). The experiments were not set up to test the typological status of Ukrainian, so we have to leave the evaluation of this proposal to future research. Second, high or rising pitch on the initial syllable of the target word may have attracted more attention (cf. Lialiou et al., 2024, for recent evidence), thereby boosting lexical activation of words starting with the respective sequence of segments. In that sense, it may facilitate segmentation (note, however, that the effect of high pitch on the first syllable on target fixations surfaced only late in the analysis window, which speaks against an immediate effect). If this attention explanation were correct, we would expect a similar pattern for the entire group as well, which is the case. The former explanation rests on prosodic transfer from the L1; the second explanation should hold true for other L2 learners of German.
It is unfortunate that only a subset of the items could be analyzed because knowledge of the correct stress patterns for both members of the target word was low (on average, 58%). The low knowledge is not surprising, given the presence of a large number of words with non-Germanic origin (so that there was a full vowel quality in all syllables, including unstressed syllables). Our exclusion criterion was very strict, as we not only excluded the word with misstressing but also its cohort word. In our view, this is the only viable way to study stress processing in the L2, and this criterion has also been applied in other L2-studies (Connell et al., 2018). The exclusion criterion could only be applied post hoc, after the data collection. It is expected that correct stress placement in German would be higher with words of Germanic origin, which often have the central vowel schwa in unstressed syllables.
5 General Discussion
The present study investigated how Ukrainian learners of German produced lexical stress in focused and unfocused words (in terms of duration, intensity and F0, RQ1), to what extent they differed from L1 German speakers (RQ2) and whether peak-stress alignment in L1 (and L2) productions influenced sensitivity to high pitch as a stress cue during online word recognition (RQ3).
We first discuss RQ1 and RQ2 in tandem, but separately for the non-tonal features duration and intensity, and for F0, and then turn to RQ3.
5.1 Prosodic Marking of Stress in Production
Regarding the use of duration and intensity to signal stress, there are no striking differences for Ukrainian L1 speakers, Ukrainian learners of German, and German controls. Stressed syllables are longer than unstressed syllables, slightly more so in narrow focus (factor 1.9 vs. factor 1.8). Also, stressed syllables are produced with higher intensity than unstressed syllables, a difference that was again slightly larger in narrow focus (factor 1.1 vs. 1.06). Duration and intensity were measured on the second vowel in trisyllabic words. Since vowel quality can affect these measures (open vowels are longer and have higher intensity), we sampled words so that vowel quality was matched across stress conditions and balanced across items, with four words for each vowel quality. The data were recorded using head-mounted microphones to keep the distance to the microphone constant and avoid confounds in intensity. The similarities across speaker groups and languages were predicted, given that these cues have been reported as stress cues in both languages.
Regarding the F0 contours, narrow focus led to a pitch accent on the target word and to unaccented realizations when the word was unfocused. The realization of the pitch accents in narrow focus showed interesting differences across groups. We start with the two native speaker conditions. In Ukrainian L1, speakers produced high pitch on the word-initial syllable in both stress conditions, with a falling/low accent. This pattern is in line with Łukaszewicz and Mołczanow (2018c). The high pitch on the word-initial syllable may indicate that Ukrainian is prosodically marking the word edge, in addition to the head (stressed syllable). In that regard, Ukrainian may be a head-edge prominence language (Jun, 2006), unlike German, which is classified as a head-prominence language. Further data with more varied sentence materials are needed to determine the typological status of Ukrainian.
The German L1 speakers produced the stressed syllable with a high-rising pitch accent (L+H*, medial-peak accent). This is an accent typically reported for narrow focus realizations (Roessig, 2024). The interesting question is whether the Ukrainian learners of German would be more influenced by their L1 prosodic system or whether they would have acquired the L2 German marking of narrow focus. During the labeling process, two categorically different subgroups of learners emerged. One group was similar to L1 Ukrainian (Type 1) and produced a high-pitched first syllable and a falling/low accent aligned with the stressed syllable; another group was similar to L1 German speakers (Type 2) and produced a rising pitch accent on the stressed syllable in the majority of productions. This division was initially based on auditory classification, but was supported by a phonetic analysis of the F0 contours: There were hardly any phonetic differences between the F0 contours of Type 2 learners and German natives, but clear phonetic differences between Type 1 and Type 2 learners in the stressed (and pretonic and post-tonic syllables). The phonetic analysis lends quantitative support to the perceptual classification. The presence of subgroups suggests that L2 learners differ in the extent to which they restructure prosodic representations when acquiring a new language and also in the extent to which their L1 influences the prosodic realizations in a foreign language. It should be noted that all Ukrainian participants had resided in Germany for at least 2 years and are therefore expected to have a lot of input in the L2. As Mennen (2015) has argued, the frequency of prosodic events is a predictor for the acquisition of intonation. Unfortunately, with the metadata collected, we are not able to reliably predict group membership, partly because of a small sample size. In the unfocused condition, there is deaccentuation of the target word in all three conditions (L1 Ukrainian, L2 German, L1 German), and the accent is realized on the verb. The verb was realized with an early fall in L1 Ukrainian and Type 1 learners (strengthening the interpretation of L1 transfer). The Type 2 learners, unlike the narrow focus on the target word, also produced a falling accent (which started later than for Type 1 learners) and not a rising accent (as the German natives). It appears that the sentence-final position is special, in that it seems to demand an early-falling accent (cf. also F0 contours in Féry et al., 2007).
The production data stem from a reading task, and it is not impossible that this artificial production task prompted a particular intonation contour. Obviously, it is important to include more natural data (Wagner et al., 2015) to test whether the presence of two types of prosodic marking in the L2 generalizes across speech styles. On the contrary, the findings are largely in line with earlier findings on Ukrainian (Łukaszewicz & Mołczanow, 2018a, 2018b, 2018c, 2024; Brovčenko, 1969; Toc’ka, 1969) and German (Mücke & Grice, 2014), lending validity to the analyses.
5.2 Processing of Acoustic Cues to Stress
The processing of acoustic cues to stress was tested using a visual-world eye-tracking paradigm that manipulated pitch accent realization and traced fixations to written words on screen. The experiment had been set up for German to test the relative role of F0 as a stress cue (the alignment of the pitch peak can differ as a factor of pitch accent type). The German participants were shown to temporarily activate (i.e., fixate) a word with initial stress (SWW competitor) when the target word with penultimate stress (WSW word) was produced with an early-peak accent (H+L*) compared to a medial-peak accent (Zahner et al., 2019). This competitor activation was reduced when listeners were familiarized with utterances that only contained early-peak and late-peak accents (in which the pitch peak was not aligned with the stressed syllable). The early-peak accent is exactly the contour that the Ukrainian speakers produced in narrow focus contexts in their L1, leading to the prediction that Ukrainian learners of German would not show increased competitor activation in the early-peak condition. This is exactly what we found. The Ukrainian learners of German were as a group ‘immune’ to the intonationally induced competitor activation. However, there is clear evidence that the German-like use of prosodic stress cues in production influences cue use in L2 perception just like for German natives. More specifically, those Ukrainian learners of German who produced focus with a high/rising pitch accent on the stressed syllable (Type 2) interpreted high pitch as a cue to stress in online reference resolution, fixating the stress competitor more in early-peak accents. They immediately interpreted high pitch as a cue to stress, which suggests similar processes as in German listeners.
These results point to a tight coupling between production and perception processes in L2 prosody: Learners who produce target-like intonation contours (here L+H* in focused words, with the pitch peak aligned with the stressed syllable) are also those who interpret high pitch as a cue to stress. This native-like processing leads to a temporary garden path in this perception experiment, though. Note that the perceptual data indirectly validate the conclusions drawn from the reading task: if the production data were flawed by the reading setting, then the division into the two learner types would be an artifact, too, and we would not see differences in online processing. Since this is not the case, it is reasonable to assume that the early-peak contours of Type 1 and the medial-peak contours of Type 2 learners are representative of everyday L2 language use in these speakers (cf. also Xu, 2010, for the argument that lab speech relies on similar processes as more spontaneous speech).
Surprisingly, the eye-tracking data unearthed yet another effect of intonation contour on processing, which was not observed in German natives: Words that do not start with a high pitch (WSW words with L+H* medial-peak accents) are more difficult for Ukrainian learners to process, including the learner group that produced these contours (Type 2). We see two possible explanations for this pattern. First, the data are compatible with a classification of Ukrainian as a head-edge prominence language, rendering high F0 a necessary cue for word onsets. Second, the high pitch on the first syllable attracts listeners’ attention (Lialiou et al., 2024), leading to a more efficient processing of the respective segments. Future research with learners from different L1s is necessary to decide between these hypotheses.
The identification of learner subgroups underscores the importance of considering individual variability in prosodic acquisition research, as averaging across all learners can obscure meaningful differences in processing strategies. It may also be fruitful not to form subgroups at all, but to include some continuous predictor. It is likely that the split into two learner groups rests on phonetic-phonological proficiency (as the Type 2 learners produced F0 more similar to the German natives than the Type 1 learners). An influence of proficiency has been shown in other studies on word recognition (Tremblay, 2008; Tremblay et al., 2021). Note, however, that the self-rated CEFR only provided a poor estimate of learners’ abilities. Very likely, the value of CEFR certificates diminishes over time and in an immersion setting. Ideally, a phonetic-phonological proficiency task would be more explanatory (cf. Braun & Tagliapietra, 2011, for a task-based measure of proficiency). Such a phonetic-phonological proficiency task may also capture language aptitude.
To sum up, this research has shown that Ukrainian and German differ in the type of pitch accents used to mark narrow focus (falling/low in Ukrainian, rising/high in German) and that most of the Ukrainian learners of German residing in Germany acquired the German focus marking. In perception, these participants temporarily interpreted high-pitched syllables as stressed, similar to native speakers. This similarity in online processing speaks to the high flexibility of L2 spoken word recognition (Broersma, 2025). In a smaller group of Ukrainian learners of German, the L1 focus marking negatively transferred to L2 processing of F0. We also find a general advantage in the processing of target words when the stressed syllable started with high pitch (as in L1 Ukrainian). We tentatively concluded that this suggests that Ukrainian is a head-edge prominence language.
There are a number of directions for future research: First, as indicated before, more spontaneous speech tasks with more varied materials would help to determine whether the preliminary classification of Ukrainian as a head-edge prominence language is warranted. Furthermore, this will allow us to test whether the observed interaction between stress and focus in L1 Ukrainian and in L2 German generalizes beyond the controlled reading setting. We think it does (given the clear link between perception and production of stress cues), but this awaits empirical justification. One may also want to include other prosodic measures, such as voice-quality spectral slope to attain a more complete picture (Roessig et al., 2022). Second, increasing the sample size would enable us to recruit a larger cohort of Type 1 and Type 2 participants, so that we can better determine which factors predict group membership. Apart from the factors we included, other aspects may be musical training (Delogu & Zheng, 2020; Jekiel & Malarski, 2021), linguistic aptitude (Kormos, 2013; Miyake & Friedman, 2013; Sparks et al., 2011), attitudes toward the L1 and L2 (Rindal, 2010; Schmidt, 2020), or order of acquisition and use of other languages or dialects (Cenoz et al., 2001; Wrembel, 2010). The L2 status is particularly important for the current study since Ukrainian participants are almost never monolingual, and there are dialectal differences. We tried to sample equally from different regions and from Kyiv, where the standard variety prevails. A larger sample would also allow us to include continuous predictors. Third, longitudinal designs could trace whether (and under which circumstances) Type 1 learners gradually develop sensitivity to F0 alignment as their exposure to German increases and whether Type 2 learners learn to process target words that start with low-pitched syllables (i.e., medial-peak accents) as efficiently as early-peak accents in the L2. A further open question is whether medial-peak accents, which have been reported in Ukrainian (Féry et al., 2007), are also processed less efficiently in the native language.
6 Conclusion
This study showed that stress in Ukrainian is marked primarily by duration and less by intensity. Narrow focus slightly increased the differences between stressed and unstressed syllables compared to unfocused contexts. In the focused trisyllabic words we used, F0 was high on the first syllable, with an early fall on the stressed syllable. In Ukrainian learners of German, duration and intensity were produced similarly to German native speakers, but for F0, there were two types of learners. A small group of the Ukrainian learners of German were influenced more by their L1 system (a high pitch on first syllable and fall on the stressed syllables), the larger group by the L2, German system (a rising pitch accent on stressed syllable).
In online word recognition, the Ukrainian learners as a group did not use high pitch as a cue to stress. The test we used was a lexical garden path that led to the temporary activation of a stress competitor. Learners who approximated German-like intonation patterns in production also demonstrated German-like processing of high-pitched syllables as stressed (and also experienced the temporary garden path). Unlike German participants, however, recovery from the garden path was much quicker, probably owing to the frequent occurrence of high-pitched word-initial syllables in L1 Ukrainian.
The data show individual differences for stress and focus marking in L2 speech, but also that some learners can produce stress and focus in a similar way to native German speakers. More native-like production of stress and focus in the L2 is mirrored by more native-like processing of stress, supporting the production-perception link in L2. Our findings corroborate that cross-linguistic influence is sensitive to the frequency of intonation patterns in the two languages (cf. the suggestion in Mennen, 2015) but that differences in prosodic systems across languages substantially influence word recognition in L2.
Footnotes
7 Appendix
Acknowledgements
The data were collected at the core facility LingLab at the University of Konstanz. We thank Pascal Tschabrun and Linda Peiler for help with testing, Michaela Svatošová for discussion on ratios, and Sarah Warchhold for discussion regarding the time course of the eye-tracking data. We further thank the audience of the Workshop on Slavic Prosody (Saarbrücken 2024), the DGfS workshop on Multifaceted and multifactorial approaches to developing phonological systems (Mainz 2025), and the workshop on Prosodic Structures and Functions in Slavic Languages: Cross-Linguistic Perspectives and Methodological Advances (SLS 20, Verona 2025) for discussion.
Author Contributions
KM: Design of the production study, selection of materials, recording and labeling of data, statistical analysis, and writing of paper.
BB: Design of the eye-tracking study, statistical analysis, and writing of paper.
Ethical Considerations
The study was approved by the Institutional Review Board at the University of Konstanz (IRB 05/2021). Participants gave written consent.
Consent to Participate
The study was approved by the Institutional Review Board at the University of Konstanz (IRB 05/2021). Participants gave written consent and participated voluntarily.
Consent for Publication
Participants gave informed consent to publish the results in an aggregated anonymous form. No identifying information is published.
