A Reanalysis of the Voicing Effect in English: With Implications for Featural Specification

Abstract

The voicing effect is among the most studied and robust of phonetic phenomena. Yet there remains a lack of consensus on why vowels preceding voiced obstruents should be longer than vowels preceding voiceless obstruents. In this paper we provide an analysis of the voicing effect in a corpus of natural speech, and using production data from a metronome-timed word repetition study. From this evidence, as well as the existing literature, we conclude that vowel duration differences follow from consonant duration differences. The characteristic voicing effect in English is largely limited to words of especially long duration, and preceding vowel duration does not reliably cue obstruent voicing under the following circumstances: when obstruent voicing or duration cues conflict; for lax or unstressed vowels; and for most conversational speech. We show that this behavior can be modeled using a competing-constraints framework, where all segments resist expanding or compressing past a preferred duration. Inherent segment elasticity determines the degree of resistance, but segment duration is ultimately determined by the interaction of these segmental constraints with constraints on the distribution of the lengthening force within the syllable, and how closely target durations are matched. This account of the voicing effect has a number of implications for phonological theory, especially the central role that the concept of prominence plays in the analysis of underlying features.

Keywords

Voicing effect vowel lengthening final lengthening temporal compensation enhancement Articulatory Phonology

1 Introduction

The terms “vowel lengthening” and “voicing effect” are used to refer to the empirical finding that vowels preceding voiced obstruents are longer than those preceding voiceless obstruents (e.g., Beguš, 2017; Braunschweiler, 1997; Chen, 1970; Coretta, 2019; Crowther & Mann, 1992; Crystal & House, 1982; de Jong, 1991, 2004; Denes, 1955; Derr & Massaro, 1980; Fischer & Ohde, 1990; Fitch, 1981; Fox & Terbeek, 1977; Hillenbrand et al., 1984; House, 1961; House & Fairbanks, 1953; Javkin, 1977; Klatt, 1973, 1976; Kluender et al., 1988; Ko, 2018; Krause, 1982; Kulikov, 2012; Laeufer, 1992; Lisker, 1974, 1978, 1986; Luce & Charles-Luce, 1985; Ohala, 1983; Peterson & Lehiste, 1960; Port, 1976; Port & Dalby, 1982; Raphael, 1972, 1975; Raphael et al., 1975; Sanker, 2019; Sharf, 1962; Smith, 2002; Sweet, 1880; Tanner et al., 2019; Umeda, 1975; Van Summers, 1987; Walsh & Parker, 1981). Voicing effects have been documented in a number of different languages. However, it is generally agreed that English (in many of its varieties) exhibits one of the strongest such effects, with vowels preceding voiced obstruents up to twice as long as their counterparts preceding voiceless obstruents (e.g., Chen, 1970; Harris & Umeda, 1974; Mack, 1982). English-speaking listeners also exhibit a robust categorical perception effect for final voicing based on preceding vowel duration alone (e.g., Crowther & Mann, 1992; Denes, 1955; Hillenbrand et al., 1984; Klatt, 1976; Raphael, 1972). Preceding vowel duration has, in fact, been described as the most reliable cue to voicing on final obstruents (Luce & Charles-Luce, 1985; Raphael, 1972; Raphael et al., 1975). Because stops are often unreleased in final position, it has also been suggested that a sound change has occurred (or is underway) in which the contrastive relationship between words like “bad” (bӕd) and “bat” (bӕt) has shifted away from the final obstruent itself, to be expressed in the duration of the preceding vowel (bӕˑt˺ vs. bӕt˺), at least in phrase-final position (e.g., Klatt, 1976).

In this paper, however, we will argue that the primary cue to obstruent voicing in coda position is the duration of the obstruent itself. Short obstruents are perceived as members of the phonologically voiced category, and long obstruents, as members of the voiceless. Because the long/short distinction is relative, preceding vowel duration, as an indicator of speaking rate, affects the perception of voicing. We find that the duration of the preceding vowel is better predicted by the following obstruent duration than by its voicing, and that “voicing effects” are the result of the negative correlation of voicing with segment length. We argue that the two phonological categories of stop are distinguished, not by their absolute durations, but by their inherent elasticity: the less elastic a segment, the more it resists both lengthening and shortening.

The paper is organized as follows. In the following section we provide background on the voicing effect and related phenomena. Section 3 contains a corpus study on American English. In Section 4, predictions of our hypothesis are tested using a variable-rate production task. In Section 5, we model the results using continuously violable constraints on duration at the segment and syllable levels. Our hypothesis is elaborated in Section 6, where we account for the perceptual side of the voicing effect. We summarize and conclude in Section 7, where we discuss the implications of the present work for theories of phonological contrast.

2 Background

Despite the large amount of research on the phenomenon, the underlying source of the voicing effect is unknown, with little consensus on what acoustic or articulatory properties give rise to the observed duration differences. Belasco (1958) and Delattre (1962) both argue that there is a tradeoff of effort between the vowel and consonant: strong (voiceless) consonants are accompanied by weak (shorter) vowels. However, Moreton (2004) and Schwartz (2010) argue essentially the opposite: that it is the spread of “fortisness” that shortens the preceding vowel. It has also been claimed that careful, and therefore slower (or simply earlier, see Klatt, 1976), movements of the vocal cords are required to avoid spontaneous voicing under reduced pressure (Halle & Stevens, 1967). However, clear evidence of differences in energy, effort, or precision between voiced and voiceless obstruents has not been forthcoming.

On the auditory side, Kluender et al. (1988) and also Jessen (2001) suggest that vowel lengthening is an enhancement effect, reinforcing the length differences of the obstruents, and thus the voicing contrast. Javkin (1977) posits that vowels are consistently perceived as longer before voiced obstruents because listeners mis-attribute glottal pulsing to the end of the vowel. However, there seems to be little evidence to support the latter hypothesis, and enhancement explanations are unable to account for why it is preceding vowel length, and not obstruent length, degree of voicing, presence of audible release, or aspiration that are used to enhance the contrastiveness of the obstruents themselves. More recently, Sanker (2020) has proposed an explanation based on the interaction between acoustics and articulation: a subset of the features in the vowel that are affected by the voicing of the following obstruent (spectral tilt, and intensity contour) also affect perception of vowel duration, presumably for unrelated articulatory reasons. Thus, in the presence of those cues, independent of the presence of a following obstruent, listeners perceive the vowel as being longer/shorter than some baseline duration. This proposal hinges on the source of the duration percept being independent of the voicing, which has not been established, since no articulatory explanation for why spectral tilt and intensity contour affect perceived duration has been proposed.

2.1 Production

The above hypotheses encounter more problems when the voicing effect is examined more closely. Voicing effects occur even when voiced obstruents are phonetically devoiced (Chen, 1970; Fox & Terbeek, 1977; Walsh & Parker, 1981), ruling out an active articulation-based source that relies on actual vocal fold vibration (although such a source may have been active at some previous time, and the pattern subsequently phonologized). A universal basis for the effect is called into question by the apparent absence of a lengthening effect in certain languages (Flege, 1979; Hillenbrand et al., 1984; Keating, 1979, 1985).¹ Even in English, with one of the most robust voicing effects measured, durational differences are not found in all contexts. Production studies typically consist of either word lists or brief sentences read by participants in a laboratory setting. In sentence contexts, the target words are often in utterance final position. Such words are also typically monosyllabic, which entails that the target vowel receives primary stress. When some or all of these factors are varied, the voicing effect can be significantly reduced, or disappear altogether: in phrase-medial position (vs. phrase-final) (Crystal & House, 1988; Luce & Charles-Luce, 1985; Smith, 2002; Umeda, 1975),² polysyllabic words (vs. monosyllabic) (Klatt, 1973; Port, 1981; Umeda, 1975),³ lax vowels (vs. tense) (Crystal & House, 1988; Luce & Charles-Luce, 1985; Peterson & Lehiste, 1960),⁴ unstressed vowels (vs. stressed vowels) (de Jong, 2004; Van Summers, 1987),⁵ and fast speaking rates (vs. slow speaking rates) (Ko, 2018; Port, 1976; Smith, 2002).⁶

2.2 Compensation

Obstruent duration differences mirror vowel duration differences: they are small or non-existent in the contexts in which vowel duration differences are small or non-existent (Crystal & House, 1982; Luce & Charles-Luce, 1985; Miller et al., 1986),⁷ and they are large where large vowel duration differences are found, and in the opposite direction (e.g., Chen, 1970; Klatt, 1976; Luce & Charles-Luce, 1985; Miller & Volaitis, 1989; Umeda, 1975).⁸ The inverse correlation between differences in obstruent duration and differences in preceding vowel duration was noted early on (Catford, 1977; Kozhevnikov & Chistovich, 1965). However, temporal compensation as an explanation for the voicing effect has been explicitly considered and rejected on a number of separate occasions (Braunschweiler, 1997; Chen, 1970; Keating, 1985). These rejections are largely based on the assumption that temporal compensation should be total, or near total, with the consequence that all syllables would have the same length.⁹

Although syllable-level isochrony was originally hypothesized to apply in so-called “syllable-timed” languages like English (e.g., Pike, 1945), and to be the source of a number of apparently compensatory effects, it has become clear that uniform timing for syllables is not consistently enforced in English¹⁰ or in any other language that has been investigated (see Krivokapić, 2020 for a review).

2.3 Competition

More recent work on temporal compensation is situated within Articulatory Phonology (AP), which does not assume isochrony at any level. This is appealing for “compensatory” phenomena that range widely in their degree (e.g.,Elert, 1965; Kavitskaya, 2002; H. Kim & Cole, 2005; Kristoffersen, 2000; Munhall et al., 1992). In AP, articulatory units of various sizes are modeled as harmonic oscillators with different characteristic frequencies (e.g., Browman & Goldstein, 1988, 1990; Nam & Saltzman, 2003; O’Dell & Nieminen, 1999; Saltzman et al., 2008). Phasing relationships between such articulatory units are modeled as oscillator coupling, in which the system settles at a characteristic frequency intermediate between the frequencies of the individual units. The same result can be derived from a competing constraints model in which none of the individual constraints on preferred frequencies can be perfectly satisfied, and a “compromise” frequency is adopted. Our model of the voicing effect is based on this approach.¹¹

2.4 Intrinsic duration

In our proposed model, it is preferred durations at the segment level that drive the voicing effect. We posit that something similar is at work in so-called prominence-based compensation, which occurs between two syllables of inherently different durations within the same word. Final lengthening associated with phrasal boundaries is typically strongest for the segment closest to the boundary, and extends only as far as the onset of the final syllable in most cases (Berkovits, 1993; Cambier-Langeveld, 1997; Campbell, 1992; Hofhuis et al., 1995; Port & Cummins, 1992; Turk & Shattuck-Hufnagel, 2007). However, Cambier-Langeveld (1997, 2000) show that, in Dutch, the penultimate syllable of the final word also sometimes experiences significant lengthening. This happens only when the final syllable is unstressed, or contains a schwa vowel (see also, Katsika, 2016; Turk & Shattuck-Hufnagel, 2007 for similar results).

In these studies, the characteristically shorter duration of unstressed vowels seems to prevent them from lengthening to a degree sufficient to satisfy the requirements of phrase-final lengthening. The voicing effect can be described in similar terms: lengthening (also often due to a phrase-final boundary) “shifts” to earlier segments (the vowel) when the final segment (the voiced obstruent) cannot be lengthened sufficiently.¹²

2.5 Elasticity

Our model assigns a characteristic elasticity to each segment, which mediates the degree to which the segment resists pressures to lengthen or shorten from its preferred duration (see also Cambier-Langeveld, 2000; Miller, 1981). The concept of elasticity is related to the concept of spring stiffness in AP (e.g., Browman & Goldstein, 1986): a force of the same size will cause a greater perturbation to a spring with a smaller stiffness parameter, resulting in a longer duration. However, elasticity differs from spring stiffness in important ways: it applies to phonemes, which are not part of the inventory of timing units within AP, and not just to phonemes as a class, but to individual phonemes.

What is crucial in our model is that elasticity and duration determine the proportion of the syllable that each segment comprises (see Campbell, 1992). Our Expandability Hypothesis is defined in (1).

(1) The Expandability Hypothesis

All segments have a characteristic elasticity that determines their resistance to lengthening

Resistance to lengthening increases with increasing duration for all segments

Lower elasticity equates with a more rapid increase in resistance

Relative resistance determines the distribution of duration across the syllable

Modeling voiceless obstruents as high elasticity, and voiced obstruents, as low elasticity, we will show that the Expandibility Hypothesis parsimoniously accounts for the production data on the voicing effect, and is also, crucially, consistent with the existing perception data.

2.6 Perception

It turns out that vowel duration is not the only cue, and may not even be the main cue, to the voicing distinction in final obstruents. Wardrip-Fruin (1982) demonstrates that when preceding vowel duration conflicts with either formant transition cues, or actual vocal fold vibration, the latter dominates. Hogan and Rozsypal (1980) also report that, for certain voiceless-final words, lengthening the vowel does not change the percept to voiced, but produces no effect, or results in stimuli that sound unnatural. Revoile et al. (1982), using naturally produced stimuli, find that the identification of voiced stops is most strongly disrupted by removing vowel offset cues (see also Nittrouer, 2004; O’Kane, 1978), while the identification of voiceless stops is most strongly disrupted by removing the release burst. Similarly, Repp and Williams (1985) find that the addition of a release burst to otherwise ambiguous stimuli reduces voiced responses. Changes to vowel duration, on the contrary, have little effect on voicing perception in their study.

It is our hypothesis that the perception results are based primarily on obstruent duration. It was established quite early on that the perceptual boundary between the fricatives /s/ and /z/ in final position is dependent on both consonant and vowel duration (Denes, 1955); see also Raphael (1981) and Repp and Williams (1985). In fact, the literature on voicing in word-medial position standardly describes the perceptual boundary in terms of the ratio of closure duration to preceding vowel duration (e.g., Lisker, 1957; Port, 1979, 1981; Port & Dalby, 1982). The C/V ratio effectively normalizes stop duration relative to estimated speaking rate. This is exactly what we believe occurs in final position, with final lengthening accounting for the magnitude of the effect.

3 A corpus study

In this section, we provide an in-depth analysis of the voicing effect in conversational speech, using data from the Buckeye Corpus (Pitt et al., 1997). Although the corpus is not balanced, it provides much more data, and a larger range of speaking rates and contexts than any single laboratory experiment. A corpus study allows us, first, to quantify the voicing effect in actual usage. Second, it allows us to probe more deeply into the factors that affect the realization of the effect. Conversational styles of speech are expected to exhibit considerable reduction in the realization of individual words, some component sounds of which may be entirely missing (e.g., Harris & Umeda, 1974; Johnson, 2004; Jurafsky et al., 1998). This reduction could neutralize small differences in duration that result from an underlying voicing effect. And previous studies with read scripts have shown a reduced voicing effect in comparison to single sentence or word list productions (Crystal & House, 1982, 1988). What we find is that there is an inconsistent effect of voicing that is dependent on the structure of our statistical model. For a simple model with no interactions, voicing is significant. However, when interactions are added to this model, the voicing effect fails to reach significance. However, the effect is found to participate in predicted interactions with speaking rate, phrase position, and frequency, exhibiting a dependence on absolute duration.

3.1 The data

The Buckeye Corpus consists of segmented and transcribed sound files. These are taken from interviews, each lasting about an hour, with 40 different speakers, all middle-class and Caucasian, who are also natives of central Ohio. Intertranscriber reliability of the phonetic symbols for stops and fricatives was reported for a sample of the Buckeye Corpus at 91.2% and 92.9%, respectively. For the unanimously transcribed subset of this sample, segmentation boundaries differed by an average of 16 ms. (Pitt et al., 2005). However, Raymond et al. (2002) report a difference in segmentation agreement for shorter versus longer phones: 73% of phones that were longer than average agreed within 20% of the average length of the two phones on either side of the segment boundary, whereas only 50% of phones that were shorter than average agreed within 20%. Shorter phones were thus proportionally less consistently transcribed than longer phones. In the absence of a consistent bias in the placement of the boundary, such errors could wash out a small voicing effect. Given that the voicing effect is expected to be larger for longer durations, however, this is unlikely to affect the outcome significantly. The segmentation of the vowel and final consonant are inherently negatively correlated; an error in which the vowel duration is longer will also produce an error in which the final obstruent is shorter. However, such ambiguity is more likely to arise with voiced than with voiceless stops. Thus, we would expect such errors to inflate any voicing effect.

From the Buckeye Corpus we extracted all monosyllabic words of the form (C)onsonant-(V)owel-(C)onsonant ending with one of the following obstruents: voiced (d,b,g,z,ʒ,v) or voiceless (t,p,k,s,ʃ,f). Consonant–vowel–consonant (CVC) words were selected because they were expected to show the largest voicing effect. Complex onsets were excluded to eliminate potential variability. No nasalized or rhotacized vowels were included, to be sure that each word had exactly three underlying segments. Only tokens that were both phonemically and phonetically CVCs were included. For example, tokens of “past” realized as [pæs], and tokens of “allowed” realized as [lâʊd] were both excluded. Because the transcription of the corpus is quasi-phonetic, we constructed a dictionary of citation forms to ensure that the phonological voicing category was correctly assigned to each word. Because there were no words ending in voiced dental fricatives, those ending in voiceless dental fricatives were also removed. The vowel /oȏɪ/ was also excluded for reasons of data sparsity. In all, 20.3% of the stops in the remaining data were transcribed as glottalized (tq), which could represent a glottal stop or unreleased stop with glottalization on the vowel, but less than 1% of those were underlyingly voiced, so all such tokens were removed from analysis. Affricates were excluded due to the possibility that they might straddle a word boundary.

In Figure 1, raw vowel durations for the set of CVC word tokens used in the following analyses are plotted as a function of the voicing feature of the final obstruent. If a voicing effect does exist in these data, it is masked by factors that affect vowel duration more strongly. The density plot on the right suggests that there is a very small effect of voicing at the longest durations. However, the actual counts given in the left panel show that there are never more voiced than voiceless tokens at any duration. This is due to the fact that there are considerably more word tokens with (phonemically) voiceless coda obstruents (over twice as many as voiced tokens, although there are more voiced than voiceless fricative tokens; see Appendix A). Vowels preceding voiceless obstruents have a slightly longer mode than those preceding voiced obstruents, and at the longer durations (above 175 ms.), the relative proportion of the voiced-preceding distribution is larger than the voiceless-preceding. For the most part, however, the two distributions are completely overlapped, showing no transparent voicing effect.

Figure 1.

All CVC vowel durations: a) histogram b) density plot c) log duration density plot.

3.2 Model factors

The following factors, each of which is known to affect segment duration, are included in the statistical model of vowel duration. Because the analysis was limited to CVC words, stress and word length are not included:

INHERENT VOWEL CLASS: Tense and lax vowels in English are differentiated in part by duration. /ɪ, ɛ, ʊ, ᴧ/, all lax vowels, are reliably shorter than their tense counterparts (e.g., Klatt, 1976; Peterson & Lehiste, 1960; Stevens & House, 1963). Reduced or absent voicing effects have been reported for both unstressed and lax vowels which we interpret as a dependence of voicing on absolute duration (Crystal & House, 1982; de Jong, 2004; Umeda, 1975). To capture this phonetic length difference, Vowel Class is modeled as a factor with 2 levels: lax (ɪ, ɛ, ʊ, ᴧ, æ), and tense (all other vowels, namely, i,e,u,a,o,ɔ,aɪ,aʊ), coded as 1, and −1, respectively. Although vowel duration based on vowel quality is not actually binary, data sparsity for certain low-frequency vowels makes using vowel quality itself problematic as a finer-grained determiner of inherent duration.

VOWEL HEIGHT: Because high vowels tend to be shorter than low vowels, this can affect the realization of the voicing effect. Although there are potentially three possible height values, for ease of interpretability in the sum-coded model we use binary values for all discrete factors. Therefore.Vowel height is a factor with the levels: high (i,u,ɪ,ʊ), and non-high (all other vowels), coded as −1, and 1, respectively.

SPEAKING RATE: An estimate of speaking rate was calculated by counting the number of phones within the preceding 1 s of speech that includes the target word. Only the previous context was used since phrase-final tokens are included in the model. Speaking rate is modeled as a continuous variable.

WORD FREQUENCY: More frequently used words generally have shorter durations than less frequently used words, and both vowels and consonants within those words are affected (e.g., Fidelholtz, 1975; Fosler-Lussier & Morgan, 1999; Hooper, 1976; Jurafsky et al., 2001; Pluymaekers et al., 2005). Function words, generally the most frequent and the most contextually predictable words, are consistently shorter than content words (Bell et al., 2009; Umeda, 1975). Because the difference in frequency between content and function words is several orders of magnitude, Zipf scores, $\log_{10} (F r e q u e n c y)$ , were used. Word frequencies were supplied as counts per million from the SUBTLEX corpus (Van Heuven et al., 2014). Log-frequency is modeled as a continuous variable.

PHRASAL POSITION: Prosodic boundaries have the effect of lengthening adjacent segments. The greater the number of nested phrases marked by the boundary, the greater the degree of lengthening, and the further its spread (Byrd & Saltzman, 2003; Fougeron & Keating, 1997; Oller, 1973; Wightman et al., 1992). Because the Buckeye Corpus is not annotated for syntactic boundaries, tokens were classified only as pre-pausal or non-pre-pausal, based on the end of a transcribed utterance. Pre-pausal position is expected to show the largest lengthening effects (see, e.g., Crystal & House, 1988; Klatt, 1975). The following tags in the Buckeye Corpus were used to identify a boundary: SIL (silence), E_TRANS (end of phonetic transcription), IVER (interviewer speaking), VOCNOISE (non-speech sound such as a cough, or laugh). Position is modeled as a factor with two levels: pre-pausal and non-pre-pausal, coded as 1 and −1, respectively.

PHONETIC VOICING: Phonetically voiced segments exhibit acoustic evidence of voicing, as transcribed by corpus annotators. Phonetic voicing is modeled as a 2-level factor: voiced, and voiceless, coded as 1 and −1, respectively. Phonetic voicing was included in the model because it was not known if there might be an effect of actual voicing above and beyond the effect of phonological voicing. We soon found that phonetic voicing did not differ appreciably from phonemic voicing, and it was dropped from the analyses. For the remainder of the paper, “voicing” will refer to phonological voicing.

PHONEMIC VOICING: Phonemic voicing refers to the category of the phoneme in the citation form of the word. Phonemic voicing is modeled as a 2-level factor: voiced, and voiceless, coded as 1 and −1, respectively.

OBSTRUENT TYPE: A 2-level factor: stop, or fricative, coded as 1 and −1, respectively.

3.3 Method

All statistics were performed using the lme4 package in R. Linear mixed effects models were run using the function lmer, fit by REML. The lmerTest function was used to obtain estimated p-values. All continuous numerical variables were log-transformed and mean-centered to approximate a normal distribution with a mean of zero. Following Tanner et al. (2019), we normalize by dividing by two standard deviations. Random intercepts for word and speaker were included in all models. Place of articulation of the final obstruent, although known to affect consonant duration, was too small of an effect to significantly improve model fit, and was therefore left out of the final model. Due to the asymmetric distribution of the data, it was not possible to used paired data in analyzing the voicing effect. All factors were sum-coded so that each individual factor was assessed at the mean value of all other factors. Three-way interactions were avoided for reasons of interpretability as well as model convergence.

3.4 Results

For each variable, the average value of its levels (if a factor), or of its range of values (if a continuous numerical variable) was the baseline for analysis. This allows us to conceptualize the results in a way that is similar to analysis of variance (ANOVA), where each effect is an adjustment to the average value for the model. For example, the effect of Vowel Class is determined by whether the average duration of the class of tense vowels is significantly different from the global vowel duration average, calculated over both tense and lax vowels.

We begin the analysis with a simple model of vowel duration, containing no interactions, but using random intercepts for word, speaker, and vowel quality (models with random slopes did not converge); see Table 1.

Table 1.

	Estimate	Std. error	df	t value	Pr(>\|t\|)
(intercept)	4.48	0.05	18.2	83.4	<2e-1 6
Vowel type	0.08	0.04	10.7	1.99	.07
Vowel height	1.56	0.05	10.3	3.36	.007
Speaking rate	−0.17	6.0e-3	1.9e4	−27.6	<2e-16
Voicing	0.07	0.01	341	4.85	1.85e-6
Obstruent type	−0.04	0.01	1.2e3	−3.29	.001
Phrase position	0.41	7.9e-3	1.9e4	52.2	<2e-16
Frequency	−.20	0.02	347	−9.43	<2e-16

As expected, there was a significant main effect of speaking rate. Longer vowel durations were found at slower than average speaking rates. Word frequency also had the expected negative effect on vowel duration, such that words with higher than average frequency had shorter vowel durations. As predicted, high vowels were shorter than the average of high and low vowels. However, tense vowels were not significantly longer than the average of tense and lax vowels. Pre-pausal tokens were longer than phrase-medial. Voicing was also significant: vowels preceding voiced obstruents were longer than vowels preceding voiceless obstruents.

Our second model included interactions between voicing and speaking rate, voicing and frequency, voicing and vowel height, voicing and vowel length, and voicing and phrase-position. Frequency, speaking rate, and phrase-position remained significant, but voicing did not. See Table 2. The result was the same for a model including only pre-pausal tokens.

Table 2.

Results of the Following Linear Regression Model: Vowel Duration as a Function of Vowel Height (Non-High = 1), Vowel Length (Tense = 1), Speaking Rate, Voicing (Voiced = 1), Obstruent (Stop = 1), Phrase Postion (Final = 1), and Frequency, and the Interaction Between Voicing and Speaking Rate, Voicing and Frequency, Voicing and Vowel Height, Voicing and Vowel Class, and Voicing and Phrase-Position.

	Estimate	Std. error	df	t values	Pr(\|t\|)
(Intercept)	4.48	0.05	18.6	83.1	<2e-16
Vowel height	0.15	0.05	10.4	3.30	0.008
Vowel length	0.09	0.04	10.9	2.09	0.06
Voicing	0.02	0.02	220	0.99	0.32
Speaking rate	−0.2	6.1e-3	1.9e4	−27.5	<2e16
Obstruent type	−0.04	0.01	1.2e3	−3.23	0.001
Frequency	−0.98	0.02	3449	−9.24	<2e16
Phrase-position	0.42	8.1e-3	1.9e4	51.5	<2e16
Voicing: speaking rate	−2.4e-3	5.9e-3	1.9e4	−0.40	0.69
Voicing: frequency	−0.04	0.02	351	−1.86	0.06
Voicing: phrase-position	0.03	8.1e-3	1.9e4	3.27	0.001
Voicing:vowel height	0.01	0.02	307	0.75	0.45
Voicing: vowel length	0.02	0.01	325	1.31	0.19

This is somewhat surprising, given that Tanner et al. (2019) report a voicing effect for phrase-final tokens in the Buckeye Corpus. However, they use a model that includes interactions between voicing and frequency, voicing and vowel type, voicing and obstruent type, and voicing and word class. They included random intercepts for speaker, word, and vowel quality, as well as random slopes for speaker by frequency, vowel type, obstruent type, word class, and by the interaction of voicing and obstruent type. Random slopes were also included for word by both speaking rate measures that they used. This model overfits our data and does not converge. We did find a significant interaction between voicing and phrase position, and a marginal interaction between voicing and frequency. Both factors increase the voicing effect, and both factors also increase the duration of the word. This suggests that the significant voicing effect in the simple model was driven by longer words. We posit that this effect, in turn, is driven by the difference in obstruent duration, which increases with increasing word duration.

A regression model with obstruent duration as the dependent variable shows that voicing is significantly negatively correlated (see Table 3). Interactions between voicing, and rate, frequency, and phrase-position are all significant, indicating that the (negative) effect of voicing is reduced for fast rates, high frequency words, and non-pre-pausal position, as predicted. Although these results, in and of themselves, cannot prove the hypothesis that vowel duration differences arise from consonant duration differences, they are consistent with that hypothesis.

Table 3.

Linear Regression Model of: Consonant Duration as a Function of Voicing, Rate, Stop Type, Frequency, Phrase Position, and Interactions Between Voicing and Rate, Frequency and Phrase-Position.

	Estimate	Std. error	df	t value	Pr(>\|t\|)
(intercept)	−0.07	0.02	170	−3.25	0.001
Voicing	−0.06	0.02	172	−3.30	0.001
Speaking rate	−0.17	6.1e-3	1.9e4	−30.0	<2e-16
Obstruent type	−0.14	9.8e-3	69.8	−14.7	<2e-16
Frequency	−0.13	0.02	313	−7.32	2.0e-12
Phrase position	−0.47	8.2e-3	1.9e4	57.3	<2e-16
Voicing: Rate	0.03	5.9e-3	1.9e4	4.30	1.7e-5
Voicing: Frequency	0.05	0.02	313	2.81	0.005
Voicing: Phrase	0.10	8.1e4	1.9e4	11.8	<2e-16

3.5 Summary and discussion of corpus results

The corpus results show that voicing is a significant predictor in the simple model, but not in the model containing interactions, failing to replicate the finding of Tanner et al. (2019). We also see a dependence of the voicing factor on absolute durations in interaction terms with phrase-position and frequency. Consonant duration is also strongly (negatively) correlated with voicing and shows the same interactions for phrase-position and frequency, respectively. This is expected if the voicing effect is actually an effect of consonant duration, where the difference in obstruent duration between voiced and voiceless obstruents increases with increasing duration.

On the contrary, laboratory production studies find that there is a large and consistent voicing effect; vowels preceding voiced obstruents can be up to twice as long as vowels preceding voiceless obstruents (Chen, 1970; House, 1961; Luce & Charles-Luce, 1985; Mack, 1982; Peterson & Lehiste, 1960; Umeda, 1975). Similarly, voiceless stop closure durations can be from 25% to 50% longer than voiced stop closures (Chen, 1970; Luce, 1986). These discrepancies can be explained by the large difference in absolute durations between the corpus and the laboratory. Vowel durations in these studies are reported in the range of 175–300 ms (House, 1961; Luce & Charles-Luce, 1985; Mack, 1982; Peterson & Lehiste, 1960; Umeda, 1975), with voiceless stop closures ranging from 95 to 140 ms (Chen, 1970; Luce & Charles-Luce, 1985). For the vowel tokens in the Buckeye Corpus, on the contrary, durations this long are rare. Among the set of CVC words ending in voiced obstruents, less than 7% reach durations of 200 ms or above. Even restricting the sample to just characteristically longer vowels, only 13% of such tokens fall in this range. Median vowel duration over the complete set of CVC words used in this study is only 83 ms. Median vowel duration for just the voiced tokens is actually lower than that, at 75 ms. Similarly, only 9% of CVC-final voiceless stops reach durations of 100 ms or above in the Buckeye Corpus, while the median closure duration is 46 ms.

4 A production study

We take the corpus results, in conjunction with the production literature as a whole, to provide preliminary support for the Expandability Hypothesis. However, because paired data are not available in the corpus, our predictions must be confirmed in a setting where sources of variation can be controlled for. In this section, we report the results of a production experiment in which we asked native English speakers to repeat a series of CVC minimal pairs at varying rates. This allows us to directly compare the lengthening behavior of final voiced obstruents with that of final voiceless obstruents. The difference in obstruent duration can also be compared with the difference in vowel duration as a function of speaking rate. And we can test the hypothesis that consonant duration is not only negatively correlated with vowel duration, but a better predictor than voicing as well.

4.1 Methodology

4.1.1 Participants

All participants were undergraduate students at The Ohio State University who were given course credit for completing the experiment. A total of 45 participants were run; of this group, 11 were excluded from analysis for the following reasons: they reported hearing issues (3); they reported learning a language other than English before the age of 7 (7); or they did not learn English until after the age of 7 (1). In many cases, participants produced “dose” and “doze” tokens that were difficult to disambiguate. Two such participants were removed due to their productions of final s and z being practically identical. Five participants were removed for either failing to vary their speaking rate significantly across trials, or varying only inter-word pause duration rather than word duration. An additional participant was removed due to adopting a sing-song (high-low) prosody to the word repetition. This left data from 26 participants (a total of 5,049 tokens): 17 female, and 9 male, with an average age of 20.

4.1.2 Procedure

Participants were seated in front of a computer monitor inside a sound-attenuated booth. Continuous audio was recorded from a desktop microphone using the sound editing software Audacity.¹³ Participants were instructed that they would be asked to speak into the microphone in response to prompts on the computer screen. The entire experiment took less than an hour to complete.

The experiment began with a practice block to acclimate participants to the experimental task, and the different repetition rates involved. Prior to the start of the practice block, participants were given the following instructions:

A + sign will appear on the screen. It will be black to begin with, then will change to red, and keep alternating. Your job is to repeat the word on the screen every time + changes color. Try to use the entire time that the + does NOT change color to say the word. Keep going until the flashing stops. Press any key when you are ready to practice with the word “lab.”

For the first trial, participants saw the following text: “Here’s the fastest speed.” The word “lab” appeared 1.5 s later. The word stayed on the screen as the “+” immediately appeared and began to change color. Color changes occurred 8 times. At the end of the 8 cycles, a new trial began. For each new trial, participants were alerted to the change with the following text: “A little slower.” The same word then appeared 2 s later. There were five different rates, corresponding to the time it took for the plus sign to change from black to red: 350, 550, 750, 950, and 1,150 ms. The slowest and fastest rates were chosen to be as extreme as possible while still being within the ability of participants to match.¹⁴

At the end of the practice session participants were told that they could begin the experiment whenever they were ready. The experimental trials were identical to the practice except that the rates went in order from slowest to fastest. Participants were presented with the following text: “You will begin with the SLOWEST speed, and the flashing will become faster.” Subsequently, each rate change was signaled with: “The speaking rate will now speed up a bit.” Trials were blocked by word, such that participants experienced all rates before beginning with a new word. At the end of a given block, participants were alerted that “The next item will now appear on the screen,” with a pause of 2 s before the word appeared. Word order was randomized across participants, but the order of rate presentation was fixed. Each word/rate pair was presented once.

The minimal pairs reported in this paper (feet/feed, thief/thieve, lobe/lope, and doze/dose) were chosen to vary across vowel quality (o or i), consonant manner (stop or fricative), and final consonant place (coronal or labial). Differences in part of speech and morphological complexity were largely unavoidable in constructing CVC minimal pairs, but those factors are not expected to show interactions with speaking rate. No effort was made to balance word frequency, beyond the avoidance of archaic forms, for the same reason. While higher frequency words would be expected to be somewhat shorter across the board, there was no reason to believe that speaking rate would affect the individual segments differently. Nevertheless, it is possible that such differences across a given minimal pair might obscure vowel length differences due to coda voicing.

4.2 Data selection and annotation

Each participant produced approximately 8 tokens of each word at each rate. To avoid edge effects, and fluctuations in rate, a single representative token from the center of the group was selected and measured. Because each token was surrounded by other tokens at the same repetition rate, it was possible to segment both the closure and the release interval for each stop. Occasionally, at the fastest rates, final stops did not have a clear release. In those cases, the end of the stop was set to the end of the voicing bar (for voiced stops), or the point at which the amplitude returned to background levels (for the voiceless stop). Background level was estimated by the amount of noise visible during the gaps between successive words.

The most ambiguous cases involved the segmentation of the sonorant /l/ from the following /o/ vowel, given a large degree of coarticulation. At faster rates, the point at which the release of the final /d/ became the initial fricative of the following token of “feed” could also be hard to determine. This was also true of the final /v/ and the initial /θ/ in “thieve” sequences. Measurement variability is likely to be highest in those contexts. Note, however, that any measurement errors for these kinds of tokens will only introduce error for one of the measured variables. For “lo,” the vowel duration will be affected by where the segment boundary is placed, but not the final p/b. For “feedfeed” and “thievethieve” the coda duration will be affected by where the segment boundary is placed, but not the vowel duration. Furthermore, any possible annotator bias in segmenting the sequence “lo,” for example, would have a minimal impact on the results, first because each participant was assigned to a single annotator, meaning that any effect could be absorbed into a random effect by speaker, and second, because both the voiced and voiceless minimal pairs would be segmented in the same way, such that the voicing effect (the difference in vowel durations) would not be affected by any bias. There is a possibility of resyllabification for the fastest word repetition rates, but this is only likely for p#l and b#l sequences in the lope/lobe pair. If such resyllabification occurred, we might expect the stop to be shorter, with reduced aspiration. In fact, this might explain the abrupt drop in VOT between rates 4 and 5 for this word pair (see Figure 2).

Figure 2.

Closure duration, VOT, and Total duration (TDur) for final stops as a function of repetition rate (decreasing from left to right). Means and standard error bars.

The data for the first two word pairs (feet/feed, thief/thieve) were randomly assigned to three undergraduate research assistants for annotation. One of the authors and two of the RAs then re-measured a subset of the data produced by the other two annotators. Discrepancies between any two raters were discussed as a group to establish shared criteria for ambiguous tokens. The two RAs then individually reviewed their previous measurements and made adjustments where their original segmentation did not meet the discussed criteria. The same two RAs each also re-measured half the data of the third RA who had left the lab at that point. The second set of words (lobe/lope, doze/dose) were measured later, by an additional two RAs. Measurement verification was conducted in the same way. It was stressed that the most important criterion was consistency. As a final check, 2% of all tokens from each annotator were re-measured by the first author, selected in pairs in order to assess the discrepancy in the measured voicing effect. In terms of absolute durations, vowel measurements differed by an average of 12 ms, and total consonant durations differed by an average of 21 ms. The difference in vowel duration between voiced and voiceless minimal pairs differed by 13 ms, and the difference in total consonant duration by 30 ms. However, because the durations were sometimes longer than the first author’s measurements, and sometimes shorter, the actual effect of discrepancies in this sample of data were much smaller: 6 ms shorter for vowel duration; 12 ms shorter for total consonant duration, a vowel duration difference that was 3.6 ms smaller, and a consonant duration difference that was 15 ms larger.

Occasionally the voiced stops and fricatives at the slower repetition rates were produced with a final epenthetic schwa. There were 29 such tokens. Any words with final schwa were removed from the analysis. Praat (Boersma & Weenink, 2009) was used for segmentation and annotation.

4.3 Results

In Figure 2, final consonant durations for the stop-final words are plotted as a function of repetition rate (shown as a number between 1 and 5, where 5 is the fastest rate, and 1 the slowest). Voiced and voiceless tokens are plotted separately, and three different duration measures are given: closure (black), VOT (light gray), and the sum of the two (TDur: dark gray). Closure duration for final voiced stops varied relatively little across repetition rates. However, most stops were also produced with a period of aspiration (VOT). Voiced stops show a clear increase in total duration as rate decreases, but one that appears to plateau at the slowest rates. For voiceless stops, closure duration increases steadily, patterning very closely with VOT. Because both duration measures show a dependence on rate, total duration was used as the dependent variable for testing the Expandability Hypothesis.

Figure 3 provides duration data for the full set of words, both vowel duration (triangles), and total obstruent duration (filled circles). Visual inspection shows that larger vowel durations were reached by the voiced member of each minimal pair, while larger obstruent durations were reached by the voiceless member. There is also a larger difference between consonant and vowel durations for voiced-final tokens across all repetition rates, and that difference increases with decreasing repetition rate.

Figure 3.

Total consonant duration and preceding vowel duration, as a function of repetition rate. Means and standard error bars.

A linear mixed-effects model was fit to the vowel duration data as a function of repetition rate and consonant duration. Consonant duration was treated as a continuous variable, and repetition rate, as an ordinal variable. Random intercepts for participant and word were included. Random slopes were not used as they caused the model to fail to converge, or led to singularity. As expected, a significant (linear and quadratic) effect of speaking rate was found (vowels were longer at slower speaking rates). There was also a main effect of consonant duration; vowels were longer when the coda consonant was shorter. The interaction between rate and consonant duration also reached significance; the negative effect of consonant duration was strongest at the slowest rates (see Table 4).

Table 4.

Mixed-Effects Linear Regression Model of Vowel Duration as a Function of Speaking Rate and Consonant Duration and Their Interaction.

	Estimate	Std. error	Estimated df	t value	p value
(Intercept)	355	25.1	27.3	14.2	4.1e-14
rate.L	324	15.0	970	21.6	<2e-16
rate.Q	42.5	15.2	969	2.8	.005
rate.C	4.17	15.6	968	0.27	.79
rate 4	−12.6	15.7	967	−0.80	.42
Consonant Duration	−0.17	0.06	978	−2.88	.004
rate.L: C Duration	−0.64	0.11	969	−5.66	2.0e-8
rate.Q: C Duration	−0.08	0.11	969	−0.77	.44
rate.C: C Duration	−0.06	0.11	968	−0.59	.55
rate 4: C Duration	0.10	0.10	967	0.98	.33

A separate model of vowel duration as a function of rate and voicing behaves very similarly. As before, random intercepts were used for participant and word, but random slopes were not used as they caused the model to fail to converge, or led to singularity. Main effects of (linear) rate and voicing are found (reference level is Voiceless), as well as an interaction between voicing and rate such that the positive effect of voicing is strongest at slower rates.

The model of consonant duration as a function of voicing confirms the interpretation that the “voicing” effect is driven by consonant duration (see Table 5). Random intercepts for participant and word were included. Random slopes were not used as they caused the model to fail to converge, or led to singularity. A fully crossed rate, voicing, and manner model produced significant main effects of speaking rate (linear), voicing, and manner. Fricatives were significantly longer than stops (reference level is Stops). A significant interaction between rate (linear) and voicing was also found, indicating, as expected, that differences in duration between voiced and voiceless consonants increased with decreasing repetition rate. An interaction between manner and rate (linear) also reached significance: the difference in duration between fricatives and stops was even larger at slower rates.

Table 5.

Mixed-Effects Linear Regression Model of Consonant Duration as a Function of Speaking Rate, Voicing, and Manner, With Full Interactions (Three-Way Interactions Are Not Included).

	Estimate	Std. error	Estimated df	t value	p value
(Intercept)	160.1	7.253	7.175	22.07	7.37e-8
rate.L	87.84	6.835	960.2	12.85	<2e-16
rate.Q	−3.07	6.85	960	−0.45	.65
rate.C	−2.38	6.86	960	−0.35	.73
rate^4	−1.26.	6.88	960	−0.18	.86
voicing	−54.96	8.804	3.987	−6.243	.003
manner	28.83	8.812	4.001	3.272	.031
rate.L:voicing	−46.76	9.702	960.2	−4.819	1.67e-6
rate.Q:voicing	−9.77	9.70	960	−1.01	.31
rate.C:voicing	4.61	9.69	960	0.48	.63
rate^4:voicing	−1.13	9.69	960	−0.12	.91
rate.L:manner	19.59	9.715	960.2	2.016	.044
rate.Q:manner	−18.7	9.71	960	−1.93	.05
rate.C:manner	4.60	9.75	960	0.47	.64
rate^4: manner	3.88	9.73	960	0.40	.69
voicing: manner	−14.2	12.5	4.00	−1.14	.32

A final analysis of the paired duration differences confirms the negative correlation between the difference in duration of voiceless and voiced consonants, and the difference in duration of their preceding vowels. Random intercepts for participant and word were included. Random slopes were excluded as they caused the model to fail to converge, or led to singularity. Adding manner to the model also resulted in singularity. The final model of ∆ $V$ = $(V_{V L} - V_{V D})$ included rate and consonant duration difference, $Δ C = (C_{V L} - C_{V D})$ , and their interactions. A significant main effect of rate (quadratic) and ∆ $C$ were found, and significant interactions between ∆ $C$ and rate, for both the linear and the quadratic terms. Thus the voicing effect (∆ $V$ ) is shown to be larger for larger negative values of ∆ $C$ , which are enhanced at the slowest speeds (see Table 6). Only significant factors are shown.

Table 6.

Mixed-Effects Linear Regression Model of Vowel Duration Difference as a Function of Consonant Duration Difference, Manner and Rate, With Full Interactions.

	Estimate	Std. error	Estimated df	t value	p value
(Intercept)	−57.37	12.34	6.891	−4.649	.002
$Δ C$	−0.217	0.064	475.8	−3.390	.001
rate.L	−14.7	11.1	455	−1.33	.19
rate.Q	30.11	10.94	451.8	2.751	.006
rate.C	8.26	11.1	448	0.75	.46
rate^4	−4.53	10.8	448	−0.42	.68
rate.L: ∆C	−0.376	0.139	460.4	−2.701	.007
rate.Q: $Δ C$	−0.277	0.133	455.8	−2.081	.038
rate.C: ∆C	−0.06	0.13	451	−0.48	.63
rate^4: ∆C	−0.006	0.13	451	−0.05	.96

4.4 Discussion

These results strongly support the Expandability Hypothesis. First, we confirm the predicted difference in lengthening between voiced and voiceless consonants in coda position, paralleling what has been repeatedly found in initial and medial position (Miller & Baer, 1983; Miller & Volaitis, 1989; Port, 1976, 1981; Volaitis & Miller, 1992). There is a difference in consonant durations at all rates,¹⁵ and there is also a large difference in the slopes of the duration curves. The difference in consonant duration increases with decreasing rate, as does the vowel duration difference. Pairing consonant duration differences with vowel duration differences at each speaking rate shows that the strength of the voicing effect is significantly correlated with the size of the consonant duration difference, something that cannot be captured with a binary voicing feature. The significant interaction between rate and consonant duration (vowels), and between rate and voicing (consonants), is precisely what is predicted if vowel duration differences derive from consonant duration differences. In fact, absolute vowel duration differences and consonant duration differences are very close. Rhyme (VC) durations were significantly different between voiceless and voiced, but not large, at 26 ms. For stops, the rhyme duration difference was only 2.8 ms. These results probably overestimate the degree to which vowel and consonant duration are traded off, given that the experimental task is highly unnatural, and likely to bias more toward uniform syllable duration than natural speech contexts.

5 The expandability hypothesis: modeling the corpus data

The corpus and production study results, combined with the previous research summarized in Section, strongly suggest that final obstruent duration trades off against preceding vowel duration, and that the size of the resulting voicing effect depends on absolute duration. To account for both of these properties, in addition to the fact that apparent compensation is not “perfect” (cf., Chen, 1970; Keating, 1985; Port & Dalby, 1982), we propose a competition-based model where tradeoffs in duration arise, not from isochrony, but from pressures to meet certain duration targets, none of which can be fully satisfied.

5.1 A competing constraints model of the voicing effect

In this section we model the voicing effect as the outcome of a competition between duration targets at the segment level which conflict with final-targeted lengthening. Each segment possesses an inherent elasticity which is implemented as the weighting factor on a constraint that acts to keep the segment at its preferred duration. Constraints are implemented as Normal probability distributions. Each distribution assigns the highest probability to its preferred duration (the mean of the distribution), and smoothly decreasing probabilities for durations both longer and shorter than that mean. The variance of the distribution controls how quickly the probability decreases.¹⁶ The smaller the variance, the more rapid the decrease, and the greater the resistance to deviations from the mean. Its variance thus acts effectively as a weighting factor for each constraint. This means that, all else being equal, a segment with a broader probability distribution will be lengthened or shortened more than a segment with a narrower probability distribution. Variance thus also maps to segment elasticity. Constraint “competition” in this model is realized through maximization of the joint probability function over all constraints. This function exhibits the desired behavior: one constraint may be “violated” to a greater degree (a decrease in probability) if this allows another, more highly weighted constraint to be less “violated” (a greater increase in probability).

The results reported here are for VC syllables. The three segment-level constraints for the voicing effect model are shown graphically in Figure 4. Voiced and voiceless obstruent constraints are given the same mean value in these simulations, differing only in their variance. In the absence of a lengthening force it is assumed that segments default to their preferred durations.

Figure 4.

Probability densities for: voiced obstruent (solid); voiceless obstruent (thick solid); and vowel (dashed).

A nonzero lengthening force generates target durations for consonant and vowel. To model final lengthening, this force is not distributed equally, but is biased toward the segment closest to the prosodic boundary (the consonant in this case). Furthermore, lengthening under applied force is calculated proportionally. In other words, the same force applied to consonant and vowel will lengthen each the same relative amount, but not by the same amount in absolute terms. The variable $α$ controls the distribution of the lengthening force. Constraints associated with target durations impose a penalty for deviating from that target, expressed as a function of the normalized difference between target ( $S_{t}$ ) and actual segment ( $S_{a}$ ) duration: $\frac{S_{t} - S_{a}}{S_{t}}$ .

For a given force value, the model conducts a brute force search for the durations of the coda consonant (D or T), vowel (V), and alpha value ( $α$ ) that result in the highest joint probability over the entire set of constraints.¹⁷ Although the model simply tries all possible combinations of values, the search space is restricted with each variable constrained to fall within a fixed maximum and minimum. A fixed step size of 10 ms for both consonant and vowel is used to search this space. To simplify, each variable is assumed to be independent, therefore the joint probability is given as the product of the individual probabilities (see Appendix B for further details of the model).

Figure 5 shows the result of running the model for a set of force values ranging between 0 and 10 (x-axis). On the y-axis, vowel duration, consonant duration, and syllable duration (V + C) are plotted for both voiced (black) and voiceless (gray) syllables. Each point on the graph corresponds to a VC word (V₀ = µ_V; C₀ = µ_C), after the given lengthening has been applied to the segments. For example, for a lengthening force of 3, and a voiced-final syllable, the optimal vowel duration is 420 ms, and the optimal voiced obstruent duration is 90 ms, for a total syllable duration of 510 ms. For a voiceless-final syllable, on the contrary, the optimal vowel duration is 350 ms, and the optimal voiceless obstruent duration is 125 ms (for a syllable duration of 475 ms). Both sets of points are shown as filled circles in Figure 4. The vowel, like each obstruent type, prefers the mean duration of its probability distribution (150 ms in this case). It is forced to lengthen due to the pressures of the other constraints. Because the voiced obstruent constraint is more highly weighted than the voiceless obstruent constraint (smaller variance), it does not shift as far from its preferred duration (at 50 ms). Therefore, the vowel is forced to lengthen more when it co-occurs with a voiced obstruent than with a voiceless obstruent.

Figure 5.

Behavior of the competing constraints model of segment duration as a function of lengthening force.

At shorter target syllable durations, voiced and voiceless obstruents (black and gray solid lines, respectively) are more or less identical in duration; preceding vowel durations (black and gray dashed lines) are also identical within the same range. As target syllable duration continues to increase, the consonant durations start to diverge. Because of its much smaller variance, the voiced obstruent not only resists lengthening more strongly than the voiceless, but that resistance also grows faster, leading to smaller and smaller increases in duration. As a result, either the vowel must lengthen more, or the divergence from the target vowel duration must increase, or both. Here the vowel duration difference continues to increase with increasing duration, meaning that the magnitude of the voicing effect increases as well. This is true up to the point at which the vowel duration of the voiced stop hits an effective maximum value.

Syllables closed by voiced obstruents also appear to be longer than those closed by voiceless obstruents (cf. Luce & Charles-Luce, 1985). There are two model features that produce this result. First, more of the force, on average, is distributed to the coda than the nucleus. Second, the translation from force to length is given as a proportion of preferred segment duration. In other words, instead of distributing syllable length over the vowel and consonant, the same amount of force is assumed to affect lengthening in the two segments relative to their default durations (see Campbell, 1992 for a similar approach). Greater lengthening applied to the coda consonant will be diverted in larger amounts to the vowel in the voiced case, due to the high weight of the voiced consonant constraint, whereas more of that lengthening will stay with the voiceless coda, due to the lower weight of its associated constraint. This interaction is what allows the vowel preceding the voiced coda to lengthen additionally relative to the vowel preceding the voiceless coda.

For comparison we created a baseline model that determines the length of the vowel preceding the voiced obstruent as a fixed percentage (30%) of the length of the vowel preceding the voiceless obstruent. Both segments also lengthen as the lengthening force increases, by a fixed percentage of their current durations at each step (10%). The resulting vowel duration functions look somewhat similar to the vowel duration functions in the competing constraints model, at least at the low end (see Figure 6). This simple model, however, does not allow interactions between vowel duration and consonant duration. Therefore, an apparent connection between the two would have to be due to chance.

Figure 6.

Baseline percentage increase model.

As a proof of concept, the competition model does quite well at capturing the critical behaviors that motivated our re-analysis of the voicing effect in English, and without a directly compensatory mechanism. The model can also capture the interaction between the voicing effect and vowel length, using a lower elasticity parameter for inherently shorter vowels.¹⁸ Reducing the variance of the vowel probability distribution, but keeping all other parameters the same, results in a smaller duration difference between the paired preceding-voiced/preceding-voiceless vowels, and much less of a difference in syllable duration between the two. Both obstruents are also longer in the short vowel model. Nevertheless, the difference in duration between the obstruents is comparable for both types of syllable (see Appendix B). Qualitatively, this behavior is consistent with the finding that the voicing effect is significantly reduced in preceding vowels that are inherently short (Crystal & House, 1982; de Jong, 2004; Umeda, 1975). Because very few studies on the voicing effect report final obstruent durations, it remains to be seen whether this prediction is borne out. Note that the baseline model cannot capture the difference between long and short vowels. Starting the model with a lower vowel duration leads to less lengthening before voiced stops, but also to less lengthening, as a function of force, before voiceless stops. Therefore, the size of the voicing effect is almost identical for the two types of syllable.

The lengthening force in our model is treated as an independent variable derived from various sources, such as speaking rate, and final lengthening. For the largest durations/strongest forces, a robust voicing effect is found both in laboratory speech, (e.g., House, 1961; Luce & Charles-Luce, 1985; Mack, 1982; Peterson & Lehiste, 1960; Umeda, 1975), and in phrase-final position. A particularly large final lengthening effect in English (e.g., Delattre, 1966), we conjecture, may be largely responsible for the particularly large voicing effect in this language.

6 Further tests of the expandability hypothesis

In the previous sections we have shown that vowel duration is better predicted by coda duration than by coda voicing. The implication being that the correlation between obstruent duration and voicing is the source of the apparent voicing effect. It has also been demonstrated that a model of competing durational constraints can qualitatively capture the duration trade-offs between consonant and vowel duration. However, the Expandability Hypothesis, in and of itself, does not explain the ability of listeners to reliably use vowel duration to predict post-vocalic obstruent voicing. In this section we will show that not only is the Expandability Hypothesis consistent with the perception literature, it is confirmed by certain results. For the remainder of the paper we will focus on word-final stops because there are often very limited cues to stops in final position, and it is primarily for stops that preceding vowel duration has been characterized as a contrastive cue.

6.1 Perception of voicing in final position

A review of the perception literature in Section 2.6 has shown that other cues to the voicing contrast are likely to be stronger than preceding vowel duration, and categorical perception results may only be possible with highly impoverished stimuli. Meanwhile, categorical perception results have been obtained by varying obstruent duration alone (e.g., Denes, 1955). Based on these results, we hypothesize that listeners are using stop duration itself as the cue to voicing when final stops are both voiceless and unaspirated. Vowel duration factors into the classification decision insofar as it provides information about stop duration indirectly, as a measure of speaking rate.¹⁹ In essence, the listener’s task is to decide whether what they are hearing is a voiced stop spoken slowly or a voiceless stop spoken quickly. Shorter vowel durations, which comprise the majority of the corpus data, correspond to speaking rates at which voiced and voiceless stop durations are not significantly different from one another. In this range, vowel duration is ineffective as a cue to voicing. Only as speaking rate slows to the point where the voiced and voiceless expansion trajectories begin to diverge, does vowel duration become predictive.

The competition model of Section 4 is used to illustrate this hypothesis (see Figure 7). The duration of the voiceless stop (gray solid line) gradually diverges from the duration of its voiced counterpart (black solid line), as the lengthening force increases. This divergence is mirrored in the preceding vowel duration (gray dashed line—preceding voiceless stop; black dashed line—preceding voiced stop). If the listener is exposed to a relatively short vowel (Figure 7(a): horizontal gray line), their expectation for the duration of the upcoming stop will be roughly the same whether it is voiced or voiceless (vertical difference between the lower open circles). An observed stop duration (lower dotted line) that falls close enough to both expected values is assumed to be acceptable for either member of the pair, and will not be sufficient to distinguish between the two in the absence of other cues.

Figure 7.

Competition simulation: Observed vowel duration is marked by the horizontal gray line in both figures. Vertical solid lines intersect expected lengthening force, and expected stop duration. Left line: voiced stop coda; Right line: voiceless stop coda. The lower dotted line indicates the actual duration of the following stop. (a) Short vowel token. (b) Long vowel token.

For a longer vowel, on the contrary, there is a larger difference in the expected durations of the voiced and voiceless stops. See Figure 7(b). The same observed stop duration (lower dotted line) now falls significantly below both expected values. In a two-alternative forced choice task we predict that this stimulus should sound more like a voiced than a voiceless stop. In general, an ambiguous final stop of fixed duration should sound more and more like a voiced stop as vowel duration increases. We assume that the category cross-over point from voiceless to voiced falls where the stimulus is significantly shorter than expected for a voiceless stop at that rate. After that, the likelihood of a voiced stop continues to increase (cf. Massaro & Cohen, 1983).

The foregoing can thus explain the increase in voiced responses with increasing vowel duration. However, given that we hypothesize that shorter vowels should not provide any cues to the voicing contrast, we would expect, all else being equal, that listeners would be at chance in identifying tokens in the short half of the continuum. Here it is the nature of the actual experimental stimuli that may bias perception strongly toward the voiceless stop. In the first place, ambiguous tokens are, by definition, phonetically voiceless. Depending on how exactly such stimuli were created, they may retain other cues to the original speech token from which they were generated, such as an F1 offset that is more consistent with a voiceless, than a voiced, stop. The synthetic stimuli used in Denes (1955), for example, were based on originally voiceless tokens. Whereas Repp and Williams (1985), using naturally produced stimuli, found a large perceptual difference between continua generated from an originally voiced stop (lab), versus an originally voiceless stop (lap). Voiced responses were about 40% higher for the former across all but the two longest vowel durations.

We therefore posit that the categorical perception results are due, first, to a default voiceless percept, based on residual cues that are more consistent with the voiceless member of the contrast, and second, to unusually long vowel durations. At the longest vowel durations (vanishingly rare in the speech corpus), we posit that the expected duration of a voiceless stop is so long that its likelihood approaches zero. For such extreme tokens, selection/perception of the voiced alternative may occur prior to actually hearing the final segment. However, it appears that the addition of a period of strong aspiration at the end of the stop is sufficient to switch the percept to voiceless.²⁰ Listeners may also be able to reliably select the voiced member of a minimal pair when final stops are entirely removed. We suspect that this is only possible in an explicit comparison task where listeners must label one token as voiced, and one as voiceless. In such a a task it is likely that listeners assume a uniform speech rate, leading them to attribute a somewhat longer vowel duration to the effect of a following voiced stop.

Additional support for this account of voicing perception comes from studies of the voicing contrast in initial position. It has been consistently found that the perceptual VOT boundary is longer than the boundary estimated from production data (e.g., Miller et al., 1986; Miller & Volaitis, 1989; Volaitis & Miller, 1992). However, the two boundaries coincide when naturally produced, unedited stimuli are used in the perception task. Nagao and de Jong (2007) suggest that the mismatch may arise from the fact that the stimuli typically used in perception experiments are artificially impoverished. In other words, the edited tokens are so ambiguous that they can only be confidently classified at very long VOT, or very slow speaking rates. The consistency in the reported perceptual cross-over point across experiments on word-final stops may be explained by the same artificiality. For voiceless closures with no audible release, the duration of the coda stop is indeterminate. Listeners may therefore assume a duration that is plausible given their language experience and consistent with experimental variables such as the inter-stimulus interval. It is therefore likely to be relatively stable across experiments involving native speakers of English.

6.2 Predictions

Our explanation of the perception results generates a number of testable hypotheses. For one, we predict that a change in the perception of voicing should lead to a change in the perception of speaking rate (greater force = slower speaking rate). During the course of vowel production, it is assumed that a hypothesis about both speaking rate and following segment duration is generated by the listener. In the absence of any information about the duration of the following stop (silent and unreleased), we posit that listeners will infer a duration that is consistent with those hypotheses. For a particularly long vowel, an expectation for a following phonologically voiced stop should lead listeners to infer the expected duration for a voiced obstruent, and the lengthening force associated with that duration (as depicted in Figure 7(a) and 7(b): the intercepts of the leftmost vertical line with the voiced obstruent duration curve and the x-axis, respectively). In the case where that lengthening force comes from differences in speaking rate, they are predicting the associated rate for each token. However, if listeners subsequently experience unambiguous release or aspiration cues, then we hypothesize that there should be a noticeable correction to both the perceived stop class and the perceived speaking rate. The voiceless stop should indicate that the speaking rate is actually slower than previously supposed (represented by the x-intercept of the rightmost vertical line in Figure 7(b)).²¹ Sanker (2019) has shown that the judgment of whether a vowel is “long” or “short” depends not only on the duration of the vowel, but on whether it is followed by a voiced or a voiceless obstruent. For vowels preceding voiced obstruents, longer durations are required to elicit a “long” response. Although she did not report obstruent duration, we interpret her results as deriving from the expectation for a specific vowel duration given the unambiguous obstruent duration and its voicing. Vowels shorter than this expected value would be perceived as “short,” and vowels longer than this value would be perceived as “long.”

The Expandability Hypothesis also predicts that it should be possible to find apparent compensation with segments other than immediately preceding or following vowels, as long as they are more expandable than voiced obstruents. This is corroborated to a certain extent. A difference in nasal duration preceding voiced versus voiceless stops has been found both for monosyllabic words of the form “dens/dense” (Beddor, 2009; Port & Cummins, 1992; Raphael et al., 1975), and polysyllabic words of the form “cantor/candor” (Vatikiotis-Bateson, 1984). Furthermore, Raphael et al. (1975) find that both vowel and nasal duration affect perception of voicing on final stops. In an eye-tracking study by Beddor et al. (2013), participants heard CVND words (such as “bend”), CVNT words (such as “bent”), and $C \tilde{V} C$ words ([b $\tilde{ε}$ d] vs. [b $\tilde{ε}$ t]), in which the nasal was missing but the vowel was nasalized. They found that, for $C \tilde{V} C$ tokens, participants were overall more likely to fixate on the image corresponding to the CVNT word than the CVND word. They interpret this result as deriving from listener expectation that the nasal gesture will be coordinated differently in the two contexts: initiating earlier before a voiceless stop, and later before a voiced stop. However, no explanation is offered as to why the phasing relationship should be different in the two contexts. This difference, however, can be accounted for under the Expandability Hypothesis if the competition at the word (or syllable) level affects both the duration of individual gestures, as well as their phasing, as occurs under changes in speaking rate (e.g., Hardcastle, 1985; Stetson, 1928), and other types of prosodic lengthening (e.g., Byrd et al., 2000; Byrd & Saltzman, 1998). A shorter voiced stop would thus correlate with both longer tautosyllabic segments, as well as a preceding VN sequence that is less coarticulated. Less coarticulation, in turn, would result in less vowel nasalization. Thus, a highly nasalized vowel is more likely to occur preceding a [t] than a [d].

An additional corollary of our account of the voicing effect is that actual voicing, or any feature other than length, is not required for a “voicing” effect to arise. In fact, active phonetic voicing cannot be a requirement when the strongest effect is seen in English pre-pausally, where final voiced obstruents are likely to undergo devoicing. Given our hypothesis, however, it should be possible to find a “voicing” effect involving segments that have low elasticity for a reason not related to historic voicing. Some evidence for this comes from Beguš (2017), who finds that stop duration correlates negatively with preceding vowel duration not just for voiced and voiceless stops in Georgian, but for ejectives as well, with ejectives intermediate between voiced and voiceless stops in terms of both consonant duration and preceding vowel duration.

In principle, any apparent temporal compensation phenomenon could be modeled using the competing constraints framework (see Section 4.1). All else being equal, we might also predict that an appreciable difference in consonant duration should lead to a complementary difference in preceding vowel duration in monosyllabic words. However, it may prove difficult to isolate elasticity-based effects from other factors that affect syllable duration. For example, vowels in monosyllables closed by nasals have been found to be as long, or longer, than vowels in monosyllables closed by voiced obstruents in English (e.g., Crystal & House, 1988; House & Fairbanks, 1953; Peterson & Lehiste, 1960; Umeda, 1975), which is the opposite of what one would expect for a sonorous segment like a nasal. However, both Crystal and House (1988) and Klatt (1975) report nasal durations that are comparable to those for voiced stops. Thus, it may be the case that nasals (and possibly other sonorants) are not as elastic as might have been expected. Another possibility is that the phasing relationship between vowel and coda may be different in the case where the two gestures can overlap significantly without masking. Thus vowels may be measured as longer, and nasals, as shorter, if there is significantly more coarticulation than occurs with other consonants. If this is correct, then the vowel should be acoustically highly nasalized when the nasal is short, reflecting the true length of the nasal. Note that this would be consistent with the results for $C \tilde{V} C$ words in Beddor et al. (2013).

There is also evidence that may argue against the Expandability Hypothesis. It has been found that vowels preceding voiced fricatives are longer than vowels preceding voiced stops, while vowels preceding voiceless fricatives are somewhat longer than those preceding voiceless stops (Peterson & Lehiste, 1960; Umeda, 1975).²² Furthermore, the voicing effect has been reported to be larger for fricatives than for stops (e.g., House, 1961; House & Fairbanks, 1953). Although our production experiment was not designed to explicitly test fricatives against stops, our results are in line with these findings. In our data, vowel durations were longest before voiced fricatives, and a larger voicing effect was found for fricatives than stops (91 ms, vs. 56 ms ∆ $V$ ); see Figure 8. However, the Expandability Hypothesis predicts that preceding vowel durations should be similar for voiced stops and fricatives, given that voiced fricatives were only 15 ms longer than voiced stops on average. It also predicts that vowels should be shorter before voiceless fricatives than voiceless stops, given that voiceless fricatives were about 29 ms longer than voiceless stops. A possible explanation could be that different consonants have different biasing functions (the model variable $α$ ), thus exhibiting different distributions of lengthening. This could also interact with a higher elasticity among fricatives. Allowing $α$ to vary by phoneme is undesirable from the perspective of theoretical economy. However, this hypothesis is testable, as are other aspects of the Expandability Hypothesis.

Figure 8.

Production differences by voicing for each minimal pair.

The Expandability Hypothesis as developed here was designed for consistency with an already very large experimental literature, thus many of its predictions are actually postdictions. Nevertheless, we have offered a number of speculations that can, in principle, be tested. Among these are the hypothesis that longer vowels in CVN words are highly nasalized, and that less nasalization in VNC sequences is correlated with longer VN durations. The competing constraints model also offers the hypothesis that significant differences in obstruent duration can occur without apparent compensation on vowels that are inherently short (see Appendix B). Additional predictions about differences in effect size across final, medial, and initial position cannot be entirely determined by comparing across heterogeneous studies, but require carefully controlled experimentation to assess. More detailed information about gestural coordination between vowels and specific following consonants is also needed to fine-tune model predictions.

7 Summary and conclusion

In much modern work, the voicing effect tends to be described in simplified terms, as a regular, quasi-universal, phonetically driven phenomenon. In English, preceding vowel duration is often said to play a contrastive role for word-final stops (e.g., Klatt, 1976). Yet vowel duration differences can be quite small in continuous speech, in polysyllabic words, across a syllable boundary, and phrase-medially (e.g., Umeda, 1975). In addition, lax, unstressed, or otherwise inherently short vowels show little to no voicing effect even in laboratory speech (e.g., Peterson & Lehiste, 1960).

In production studies that manipulate speaking rate it has been shown that voiceless obstruents, in both word-initial pre-stressed (VOT, e.g., Miller & Volaitis, 1989), and word-medial post-stress (closure duration, e.g., Port, 1976) position, are longer than voiced, with that difference increasing as speaking rate decreases. We extended that finding to coda position, demonstrating that the difference in vowel duration increased in step with the inverse duration difference for obstruents.²³ Using paired data, we were able to show that the magnitude of the “voicing” effect depended on obstruent duration across the board, while voicing was only significant at the slower rates (i.e., when it was significantly correlated with duration). And obstruent duration itself has been shown to affect voicing perception in final position (Denes, 1955; Raphael, 1981; Repp & Williams, 1985), just as it does in word-medial position (Port & Dalby, 1982).

This body of results argues against preceding vowel duration as a primary cue to the voiced/voiceless contrast in English. Indeed, it strongly suggests that vowel duration affects the perception of obstruent duration, not voicing itself. We have offered a proposal that fits a large range of experimental findings. Namely, that the voicing effect in English is the result of the inherently low elasticity of voiced obstruents, and that segment durations, in general, are determined by the components of the Expandability Hypothesis, reproduced below.

(2) The Expandability Hypothesis

All segments have a characteristic elasticity that determines their resistance to lengthening

Resistance to lengthening increases with increasing duration for all segments

Lower elasticity equates with a more rapid increase in resistance

Relative resistance determines the distribution of duration across the syllable.

The inverse correlation between obstruent duration and vowel duration, and its dependence on speaking rate, are attributed to a type of compensatory effect (see also Campbell, 1992; Massaro & Cohen, 1983), but not one based on syllable isochrony. Our competing constraints model of segment timing allows for “imperfect compensation,” which appears to be the rule in language generally, rather than the exception (e.g., Browman & Goldstein, 1988; Krivokapić, 2020).

This model provides a proof of concept for deriving the voicing effect from a set of general-purpose timing constraints. The fact that the voicing effect is larger in fricatives than in stops cannot be explained under our account without allowing for differences in one or more parameter values. However, we still cover much more empirical ground than explanations of the voicing effect that are based on actual vocal fold vibration, or articulatory effort. It is also worth noting that competing explanations (described in Section 2) have not attempted to explain this difference between stops and fricatives (and most don’t even mention it). Whereas, we are able to unify the treatment of the contrast across word and syllable position, and draw connections between effects based on differences of consonant elasticity, and those based on differences of vowel elasticity. Our explanation for the voicing effect also has ramifications for theories of contrastive features.

7.1 Contrast and allophony

Throughout this paper the relevant obstruent contrast in American English has been referred to as one of voicing. This is in spite of the fact that it is precisely because phonetic voicing is often absent from “voiced” stops that preceding vowel duration can be discussed as a possible cue to contrast. Clearly, the presence or absence of vocal fold vibration is not always necessary, or even sufficient, for phoneme identification. In order for the contrast to be described as one of voicing, it is necessary to treat the phonological voicing feature as distinct from the phonetic feature of the same name. The first is transformed to the second via a series of allophonic rules. For example, in absolute initial position the /-voice/ stop becomes [+spread glottis], while the /+voice/ stop may become [-voice]. In final position, a /-long/ vowel preceding a /+voice/ stop becomes [+long].

However, we have seen that apparent vowel lengthening varies considerably as a function of speaking rate, sentence and word position, stress, and other factors (e.g., Crystal & House, 1988; Umeda, 1975). Most importantly, longer vowels correlate with shorter consonants, and voiced obstruents tend, cross-linguistically, to be shorter than their voiceless counterparts. The apparent physiological difficulty of maintaining the necessary conditions for voicing over extended closure periods has been proposed as an explanation for this tendency (e.g., Ohala, 1983, 2011). Nevertheless, it is possible, by virtue of greater articulatory effort, to maintain voicing if desirable, at least up to a point. Partial, or total, devoicing is also a possible outcome. Therefore, we expect the duration differences between voiced and voiceless obstruents to be language specific. The fact that “voiced” stops in English are now frequently devoiced means that the observed duration differences are no longer the direct result of physiological constraints, but of what has become an underlyingly specified property of the segment. The fact that the difference in behavior between voiced and voiceless obstruents is only observable at long durations means that the specification is not for absolute duration, but for something that quantifies resistance to lengthening. The large voicing effect in English, we argue, is due to the voiced segment being pushed well beyond its preferred duration. Our claim is that vowel duration differences emerge directly from these elasticity differences. Therefore, we also conclude that vowel duration is not a feature that is specified in English, either at the phonological or phonetic level.

Although categorical perception effects have been demonstrated for the vowel duration cue, this is not particularly noteworthy, given that the number of acoustic cues to the contrast that listeners are able to exploit has been shown to be quite large. Duration and intensity of voicing, aspiration, and F0 contour, length of vowel formant transitions with respect to steady state duration (Fitch, 1981), F1 offset frequency (Crowther & Mann, 1992), speed of jaw lowering, and jaw offset position (Van Summers, 1987) all differ consistently between the two stop types in final position. In medial post-stress position, consistent differences have also been found in the timing of vocalic voice offset, and the signal decay time (Lisker, 1986), which should apply to final position as well. Furthermore, it is well known that cues can be “traded off” with one another. That is, while a long enough closure duration can cue a “voiceless” stop on its own, a shorter closure in tandem with a shortened vowel can also do so (e.g., Bailey & Summerfield, 1980; Fitch, 1981; Klatt, 1976; Kohler, 1979, 1984; Lisker, 1986; Malécot, 1968; Van Summers, 1987). Yet absolute vowel duration, and not closure duration or formant transition information, is frequently characterized as a phonological “voicing” feature, even though the latter two cues have been shown to influence perception to the same, or an even greater, degree. This may be due, in large part, to the privileging of “prominent” contexts in phonological theory.

7.2 Prominence

While the phonetic realization of underlyingly contrastive features is assumed to vary by context, the most prominent environment, usually initial pre-stress position, is assumed to most faithfully reflect those features. Not only that, but features are said to be enhanced, or more strongly signaled, in such contexts (e.g., Kingston & Diehl, 1994). Conversely, observed enhancement is taken to indicate features that are “controlled,” or underlyingly specified, as opposed to being supplied by context-sensitive rules (e.g., Ohala, 1981). Enhancement can be realized as an increase in acoustic amplitude, an increase in size of articulatory gestures, and/or an increase in gestural, and thus, segmental, duration (e.g., Beckman et al., 2013). In addition to making individual features more salient, enhancement is also assumed to be a mechanism for increasing discriminability between the members of a phonemic contrast (e.g., Cho, 2016; Cho & Jun, 2000; de Jong, 1995). For the above reasons, slower than normal speaking rate is considered to be an enhancement mechanism that should lead to lengthening, but only of contrastively specified features (e.g., Solé, 2007).

Underspecification theory applied to laryngeal contrasts typically makes use of the following privative features: [spread glottis], [voice], and [constricted glottis] (e.g., Iverson & Salmons, 1995; C.-W. Kim, 1970). This system yields three possible two-way contrast systems, one for each of the features, with the second member always unspecified. The phonetically voiceless stops in French and Thai fail to lengthen significantly with decreased speaking rate, and are therefore taken to be unspecified for laryngeal features, while the phonetically short lag/voiced stops in English are the unspecified member of the contrast²⁴ (Beckman et al., 2013; Kessinger & Blumstein, 1997).

In the same vein, an observed interaction between a given phonetic cue, and any variable that affects duration, is taken to indicate that the cue is an inherent part of the contrast. It has been argued that vowel duration is purposefully manipulated by speakers to enhance the laryngeal contrast of the following obstruent, based on the following set of results: that the effect of stress is smaller for voiceless-preceding vowels than for voiced-preceding vowels in English (de Jong, 2004); that /-long/ voiced-preceding vowels lengthen less than they would otherwise, in order to avoid overlapping with /+long/ voiceless-preceding vowels, and preserve an existing long versus short vowel distinction in German (Braunschweiler, 1997); that vowel duration differences preceding voiced versus voiceless segments are greater for long vowels than for short vowels in English (Peterson & Lehiste, 1960); that the difference in duration between the stressed vowel in a monosyllabic word and the same vowel in a bisyllabic word is larger (by percentage) for syllables closed by voiced stops than those closed by voiceless stops (Crowther & Mann, 1992; de Jong, 1991; Klatt, 1973; Raphael, 1975; Smith, 2002; Van Summers, 1987); that the vowel shortening effect of affixation is greater (both absolutely, and proportionally) for a voiced-final stem than for a voiceless final (Lehiste, 1972).

In this paper, however, we have conceptualized stress, prosodic boundary marking, and speaking rate simply as external forces which, among others, can act to lengthen segments. Under our account, all segments are subject to such lengthening and shortening pressures. How much lengthening or shortening actually occurs, however, is governed by the interactions of all such constraints, some of which are more highly weighted than others. The apparently asymmetric effects on voiced versus voiceless syllables do not need to be explained as the result of speaker effort to avoid phonetic ambiguity, or to maintain a specific range of phonetic values. They follow directly from these two premises: that the voicing effect derives from differences in segment elasticity; and that the resulting differences in duration increase with increasing duration. Characterizing the voicing effect as a consequence of on-line timing adjustments (to which multiple factors can contribute) is therefore more parsimonious, and more explanatorily adequate, than the hypothesis that there is both a grammatical rule of vowel lengthening, and a set of deliberate adjustments made to preserve the output of that rule. Note that this analysis requires elasticity to be underlyingly specified. This is not the same, however, as an underlying specification for an abstract voice feature. In the first place, all segments are assumed to have their own characteristic elasticity. Furthermore, a specification of this kind is necessary for independent reasons: to account for the differing degrees to which segments respond to changes in speaking rate. Finally, relative duration values within a word cannot be derived from a /voice/ feature, or even a long/short duration feature, as they depend on potentially complex interactions between all the segments within a word.

If “prominent” contexts (such as slow speaking rate) do not actually enhance contrastive features, then the realization of the features in such contexts should not necessarily be taken as underlying. Doing so, in fact, requires potentially extensive transformations to arrive at the more frequent, non-prominent contexts of normal speech. If we reverse this relation, then very slow hyper-articulated speech is the exception, rather than the rule, and intense aspiration and especially long durations are derived from features that are more typical of the contrast in general. Large differences in preceding vowel duration are, almost exclusively, the product of atypical speech and therefore, in our view, should be considered the least central to the “voicing” contrast, not the most. This flipped view of contrast offers an intriguing avenue for future research.

Footnotes

Appendix A

Appendix B

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by an OSU Targeted Ideas in Excellence grant. Thank you to Jessica Jelinger, Hannah Young, Nohyong Kim, Evan Nelson, Joseph Conley, Micaella Bruton, Rachel Meyer, Andrew Duffy, Maddie Bloomquist, and Margot Hare for their assistance in data collection and preparation.

ORCID iD

Rebecca L. Morley

Notes

References

Bailey

P. J.

Summerfield

(1980). Information in speech: Observations on the perception of [s]-stop clusters. Journal of Experimental Psychology: Human Perception and Performance, 6, 536–563.

Beckman

Jessen

Ringen

(2013). Empirical evidence for laryngeal features: Aspirating vs. true voice languages. Journal of Linguistics, 49, 259–284.

Beddor

P. S.

(2009). A coarticulatory path to sound change. Language, 85, 785–821.

Beddor

P. S.

McGowan

K. B.

Boland

J. E.

Coetzee

A. W.

Brasher

(2013). The time course of perception of coarticulation. The Journal of the Acoustical Society of America, 133, 2350–2366.

Beguš

(2017). Effects of ejective stops on preceding vowel duration. The Journal of the Acoustical Society of America, 142, 2168–2184.

Belasco

(1958). Variations in vowel duration: Phonemically or phonetically conditioned? The Journal of the Acoustical Society of America, 30, 1049–1050.

Bell

Brenier

J. M.

Gregory

Girand

Jurafsky

(2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60, 92–111.

Berkovits

(1993). Utterance-final lengthening and the duration of final-stop closures. Journal of Phonetics, 21(4), 479–489.

Boersma

Weenink

(2009). Praat: Doing phonetics by computer (Version 6.0.36) [Computer program]. http://www.fon.hum.uva.nl/praat/

10.

Braunschweiler

(1997). Integrated cues of voicing and vowel length in German: A production study. Language and Speech, 40, 353–376.

11.

Browman

C. P.

Goldstein

L. M.

(1986). Towards an articulatory phonology. Phonology, 3, 219–252.

12.

Browman

C. P.

Goldstein

L. M.

(1988). Some notes on syllable structure in articulatory phonology. Phonetica, 45, 140–155.

13.

Browman

C. P.

Goldstein

L. M.

(1990). Tiers in articulatory phonology, with some implications for casual speech. In Kingston

Beckman

M. E.

(Eds.), Papers in laboratory phonology I: Between the grammar and the physics of speech (pp. 341–376). Cambridge University Press.

14.

Byrd

Kaun

A. R.

Narayanan

Saltzman

(2000). Phrasal signatures in articulation. In Broe

M. B.

Pierrehumbert

J. B.

(Eds.), Papers in laboratory phonology v (pp. 70–87). Cambridge University Press.

15.

Byrd

Saltzman

(1998). Intragestural dynamics of multiple prosodic boundaries. Journal of Phonetics, 26, 173–199.

16.

Byrd

Saltzman

(2003). The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31, 149–180.

17.

Cambier-Langeveld

G. M.

(1997). The domain of final lengthening in the production of Dutch. Linguistics in the Netherlands, 14, 13–24.

18.

Cambier-Langeveld

G. M.

(2000). Temporal marking of accents and boundaries. Holland Academic Graphics.

19.

Campbell

W. N.

(1992). Syllable-based segmental duration. In Bailly

Benoit

(Eds.), Talking machines: Theories, models, and designs (pp. 211–224). Elsevier.

20.

Catford

J. C.

(1977). Fundamental problems in phonetics. Indiana University Press.

21.

Chen

(1970). Vowel length variation as a function of the voicing of the consonant environment. Phonetica, 22, 129–159.

22.

Cho

(2016). Prosodic boundary strengthening in the phonetics–prosody interface. Language and Linguistics Compass, 10, 120–141.

23.

Cho

Jun

S.-A.

(2000). Domain-initial strengthening as enhancement of laryngeal features: Aerodynamic evidence from Korean. In UCLA working papers in phonetics (pp. 57–70). https://linguistics.ucla.edu/people/jun/cls_36.pdf

24.

Coretta

(2019). An exploratory study of voicing-related differences in vowel duration as compensatory temporal adjustment in Italian and Polish. Glossa, 4, 125.

25.

Crowther

C. S.

Mann

(1992). Native language factors affecting use of vocalic cues to final consonant voicing in English. The Journal of the Acoustical Society of America, 92, 711–722.

26.

Crystal

T. H.

House

A. S.

(1982). Segmental durations in connected speech signals: Preliminary results. The Journal of the Acoustical Society of America, 72, 705–716.

27.

Crystal

T. H.

House

A. S.

(1988). Segmental durations in connected-speech signals: Current results. The Journal of the Acoustical Society of America, 83, 1553–1573.

28.

Davis

Van Summers

W. V.

(1989). Vowel length and closure duration in word-medial VC sequences. Journal of Phonetics, 17, 339–353.

29.

de Jong

K. J

. (1991). An articulatory study of consonant-induced vowel duration changes in English. Phonetica, 48, 1–17.

30.

de Jong

K. J

. (1995). The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. The Journal of the Acoustical Society of America, 97, 491–504.

31.

de Jong

K. J

. (2004). Stress, lexical focus, and segmental focus in English: Patterns of variation in vowel duration. Journal of Phonetics, 32, 493–516.

32.

Delattre

(1962). Some factors of vowel duration and their cross-linguistic validity. The Journal of the Acoustical Society of America, 34, 1141–1143.

33.

Delattre

(1966). A comparison of syllable length conditioning among languages. IRAL—International Review of Applied Linguistics in Language Teaching, 4, 183–198.

34.

Denes

(1955). Effect of duration on the perception of voicing. The Journal of the Acoustical Society of America, 27, 761–764.

35.

Derr

M. A.

Massaro

D. W.

(1980). The contribution of vowel duration, F0 contour, and frication duration as cues to the /juz/-/jus/ distinction. Perception & Psychophysics, 27, 51–59.

36.

Durvasula

Luo

(2012). Voicing, aspiration, and vowel duration in Hindi. The Journal of the Acoustical Society of America, 18, 060009.

37.

Elert

C. C.

(1965). Phonologic studies of quantity in Swedish: Based on material from Stockholm speakers. Almqvist & Wiksell.

38.

Fidelholtz

J. L.

(1975). Word frequency and vowel reduction in English. Chicago Linguistic Society, 11, 200–213.

39.

Fischer

R. M.

Ohde

R. N.

(1990). Spectral and duration properties of front vowels as cues to final stop-consonant voicing. The Journal of the Acoustical Society of America, 88, 1250–1259.

40.

Fitch

H. L.

(1981). Distinguishing temporal information for speaking rate from temporal information for intervocalic stop consonant voicing [Technical report]. Haskins Laboratories.

41.

Flege

(1979). Phonetic interference in second language acquisition [Doctoral dissertation]. Indiana University.

42.

Fosler-Lussier

Morgan

(1999). Effects of speaking rate and word frequency on pronunciations in conversational speech. Speech Communication, 29, 137–158.

43.

Fougeron

Keating

P. A.

(1997). Articulatory strengthening at edges of prosodic domains. The Journal of the Acoustical Society of America, 101, 3728–3740.

44.

Fox

R. A.

Terbeek

(1977). Dental flaps, vowel duration and rule ordering in American English. Journal of Phonetics, 5, 27–34.

45.

Halle

Stevens

(1967). Mechanism of glottal vibration for vowels and consonants. The Journal of the Acoustical Society of America, 41, 1613–1613.

46.

Hardcastle

W. J.

(1985). Some phonetic and syntactic constraints on lingual coarticulation in stop consonant sequences. Speech Communication, 4, 247–263.

47.

Harris

M. S.

Umeda

(1974). Effect of speaking mode on temporal factors in speech: Vowel duration. The Journal of the Acoustical Society of America, 56, 1016–1018.

48.

Hillenbrand

J. M.

Ingrisano

D. R.

Smith

B. L.

Flege

J. E.

(1984). Perception of the voiced–voiceless contrast in syllable-final stops. The Journal of the Acoustical Society of America, 76, 18–26.

49.

Hofhuis

Gussenhoven

Rietveld

(1995). Final lengthening at prosodic boundaries in Dutch. In Proceedings of 13th international congress of phonetic sciences (Vol. 1, pp. 154–157). Stockholm University. https://repository.ubn.ru.nl/bitstream/handle/2066/105391/112997.pdf

50.

Hogan

J. T.

Rozsypal

A. J.

(1980). Evaluation of vowel duration as a cue for the voicing distinction in the following word-final consonant. The Journal of the Acoustical Society of America, 67, 1764–1771.

51.

Hooper

J. B.

(1976). Word frequency in lexical diffusion and the source of morphophonological change. In Christie

(Ed.), Current progress in historical linguistics (pp. 96–105). North-Holland.

52.

House

A. S.

(1961). On vowel duration in English. The Journal of the Acoustical Society of America, 33, 1174–1178.

53.

House

A. S.

Fairbanks

(1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. The Journal of the Acoustical Society of America, 25, 105–113.

54.

Iverson

G. K.

Salmons

J. C.

(1995). Aspiration and laryngeal representation in Germanic. Phonology, 12, 369–396.

55.

Javkin

(1977). Phonetic universals and phonological change [Doctoral dissertation]. University of California, Berkeley.

56.

Jessen

(2001). Phonetic implementation of the distinctive auditory features [voice] and [tense] in stop consonants. In Distinctive feature theory (Vol. 2, pp. 237–294). Mouton de Gruyter. https://www.degruyter.com/document/doi/10.1515/9783110886672.237/pdf

57.

Johnson

(2004). Massive reduction in conversational American English. In Proceedings of the workshop on spontaneous speech: Data and analysis (pp. 29–54). https://buckeyecorpus.osu.edu/pubs/Massive.pdf

58.

Jurafsky

Bell

Fosler-Lussier

Girand

Raymond

(1998). Reduction of English function words in switchboard. In 5th international conference on spoken language processing. https://www1.icsi.berkeley.edu/pubs/speech/reductionofenglish98.pdf

59.

Jurafsky

Bell

Gregory

Raymond

W. D.

(2001). Probabilistic relations between words: Evidence from reduction in lexical production. In Bybee

J. L.

Hopper

P. J.

(Eds.), Typological studies in language: Frequency and the emergence of linguistic structure (pp. 229–254). John Benjamins.

60.

Katsika

(2016). The role of prominence in determining the scope of boundary-related lengthening in Greek. Journal of Phonetics, 55, 149–181.

61.

Kavitskaya

(2002). Compensatory lengthening: Phonetics, phonology, diachrony. Routledge.

62.

Keating

P. A.

(1979). A phonetic study of a voicing contrast in Polish [Doctoral dissertation]. Brown University.

63.

Keating

P. A.

(1985). Universal phonetics and the organization of grammars. In Fromkin

V. A.

(Ed.), Phonetic linguistics: Essays in honor of Peter Ladefoged (pp. 115–132). Academic Press.

64.

Kessinger

R. H.

Blumstein

S. E.

(1997). Effects of speaking rate on voice-onset time in Thai, French, and English. Journal of Phonetics, 25, 143–168.

65.

Kim

C.-W.

(1970). A theory of aspiration. Phonetica, 21, 107–116.

66.

Kim

Cole

(2005). The stress foot as a unit of planned timing: Evidence from shortening in the prosodic phrase. In 9th European conference on speech communication and technology (pp. 2365–2368). https://www.isca-speech.org/archive/pdfs/interspeech_2005/kim05_interspeech.pdf

67.

Kingston

Diehl

R. L.

(1994). Phonetic knowledge. Language, 70, 419–454.

68.

Klatt

D. H.

(1973). Interaction between two factors that influence vowel duration. The Journal of the Acoustical Society of America, 54, 1102–1104.

69.

Klatt

D. H.

(1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3, 129–140.

70.

Klatt

D. H.

(1976). Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. The Journal of the Acoustical Society of America, 59, 1208–1221.

71.

Kluender

K. R.

Diehl

R. L.

Wright

B. A.

(1988). Vowel-length differences before voiced and voiceless consonants: An auditory explanation. Journal of Phonetics, 16, 153–169.

72.

E.-S.

(2018). Asymmetric effects of speaking rate on the vowel/consonant ratio conditioned by coda voicing in English. Phonetics and Speech Sciences, 10, 45–50.

73.

Kohler

K. J.

(1979). Dimensions in the perception of fortis and lenis consonants. Phonetica, 36, 332–343.

74.

Kohler

K. J.

(1984). Phonetic explanation in phonology: The feature fortis/lenis. Phonetica, 41, 150–174.

75.

Kozhevnikov

V. A.

Chistovich

L. A.

(1965). Speech: Articulation and perception. Nauka.

76.

Krause

S. E.

(1982). Vowel duration as a perceptual cue to postvocalic consonant voicing in young children and adults. The Journal of the Acoustical Society of America, 71, 990–995.

77.

Kristoffersen

(2000). The phonology of Norwegian. Oxford University Press on Demand.

78.

Krivokapić

(2020). Prosody in articulatory phonology. In Barnes

J. A.

Shattuck-Hufnagel

(Eds.), Prosodic theory and practice (pp. 213–236). MIT Press.

79.

Kulikov

(2012). Voicing and voice assimilation in Russian stops [Doctoral dissertation]. The University of Iowa.

80.

Laeufer

(1992). Patterns of voicing-conditioned vowel duration in French and English. Journal of Phonetics, 20, 411–440.

81.

Lehiste

(1972). The timing of utterances and linguistic boundaries. The Journal of the Acoustical Society of America, 51, 2018–2024.

82.

Lisker

(1957). Closure duration and the intervocalic voiced-voiceless distinction in English. Language, 33, 42–49.

83.

Lisker

(1974). On “explaining” vowel duration variation. Glossa, 8, 233–246.

84.

Lisker

(1978). Rapid vs. rabid: A catalogue of acoustic features that may cue the distinction. Haskins Laboratories Status Report on Speech Research, 54, 127–132.

85.

Lisker

(1986). “Voicing” in English: A catalogue of acoustic features signaling /b/ versus /p/ in trochees. Language and Speech, 29, 3–11.

86.

Luce

P. A.

(1986). Neighborhoods of words in the mental lexicon. Research on speech perception (Technical Report No. 6). Indiana University.

87.

Luce

P. A.

Charles-Luce

(1985). Contextual effects on vowel duration, closure duration, and the consonant/vowel ratio in speech production. The Journal of the Acoustical Society of America, 78, 1949–1957.

88.

Mack

(1982). Voicing-dependent vowel duration in English and French: Monolingual and bilingual production. The Journal of the Acoustical Society of America, 71, 173–178.

89.

Malécot

(1968). The force of articulation of American stops and fricatives as a function of position. Phonetica, 18, 95–102.

90.

Massaro

D. W.

Cohen

M. M.

(1983). Consonant/vowel ratio: An improbable cue in speech. Perception & Psychophysics, 33, 501–505.

91.

Miller

J. L.

(1981). Effects of speaking rate on segmental distinctions. In Eimas

P. D.

Miller

J. L.

(Eds.), Perspectives on the study of speech (pp. 39–74). Routledge.

92.

Miller

J. L.

Baer

(1983). Some effects of speaking rate on the production of /b/ and /w/. The Journal of the Acoustical Society of America, 73, 1751–1755.

93.

Miller

J. L.

Green

K. P.

Reeves

(1986). Speaking rate and segments: A look at the relation between speech production and speech perception for the voicing contrast. Phonetica, 43, 106–115.

94.

Miller

J. L.

Volaitis

L. E.

(1989). Effect of speaking rate on the perceptual structure of a phonetic category. Perception & Psychophysics, 46, 505–512.

95.

Moreton

(2004). Realization of the English postvocalic [voice] contrast in F1 and F2. Journal of Phonetics, 32, 1–33.

96.

Munhall

Fowler

Hawkins

Saltzman

(1992). “Compensatory shortening” in monosyllables of spoken English. Journal of Phonetics, 20, 225–239.

97.

Nagao

de Jong

K. J.

(2007). Perceptual rate normalization in naturally produced rate-varied speech. The Journal of the Acoustical Society of America, 121, 2882–2898.

98.

Nam

Saltzman

(2003). A competitive, coupled oscillator model of syllable structure. In Proceedings of the 15th international congress of phonetic sciences (Vol. 1, pp. 2253–2256). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_2253.pdf

99.

Nittrouer

(2004). The role of temporal and dynamic signal components in the perception of syllable-final stop voicing by children and adults. The Journal of the Acoustical Society of America, 115, 1777–1790.

100.

O’Dell

Nieminen

(1999). Coupled oscillator model of speech rhythm. In Proceedings of the XIVth international congress of phonetic sciences (Vol. 2, pp. 1075–1078). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS1999/papers/p14_1075.pdf

101.

Ohala

J. J.

(1981). The Listener as a source of sound change. In Masek

C. S.

Hendrick

R. A.

Miller

M. F.

(Eds.), Papers from the parasession on language and behavior (pp. 178–203). Chicago Linguistic Society.

102.

Ohala

J. J.

(1983). The origin of sound patterns in vocal tract constraints. In MacNeilage

P. F.

(Ed.), The production of speech (pp. 189–216). Springer.

103.

Ohala

J. J.

(2011). Accommodation to the aerodynamic voicing constraint and its phonological relevance. In Proceedings of the 15th international conference of phonetic sciences (pp. 64–67). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2011/OnlineProceedings/SpecialSession/Session1/Ohala/Ohala.pdf

104.

O’Kane

(1978). Manner of vowel termination as a perceptual cue to the voicing status of postvocalic stop consonants. Journal of Phonetics, 6, 311–318.

105.

Oller

D. K.

(1973). The effect of position in utterance on speech segment duration in English. The Journal of the Acoustical Society of America, 54, 1235–1247.

106.

Peterson

G. E.

Lehiste

(1960). Duration of syllable nuclei in English. The Journal of the Acoustical Society of America, 32, 693–703.

107.

Pike

K. L.

(1945). The intonation of American English. University of Michigan Press.

108.

Pitt

M. A.

Dilley

Johnson

Kiesling

Raymond

Hume

Fosler-Lussier

(1997). Buckeye Corpus of conversational speech (2nd release). Department of Psychology, Ohio State University (Distributor).

109.

Pitt

M. A.

Johnson

Hume

Kiesling

Raymond

(2005). The Buckeye Corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45, 89–95.

110.

Pluymaekers

Ernestus

Baayen

R. H.

(2005). Lexical frequency and acoustic reduction in spoken Dutch. The Journal of the Acoustical Society of America, 118, 2561–2569.

111.

Port

R. F.

(1976). The influence of speaking tempo on the duration of stressed vowel and medial stop in English trochee words [Unpublished doctoral dissertation]. University of Connecticut.

112.

Port

R. F.

(1979). The influence of tempo on stop closure duration as a cue for voicing and place. Journal of Phonetics, 7(1), 45–56.

113.

Port

R. F.

(1981). Linguistic timing factors in combination. The Journal of the Acoustical Society of America, 69, 262–274.

114.

Port

R. F.

Cummins

(1992). The English voicing contrast as velocity perturbation. In Proceedings of the second international conference on spoken language processing (pp. 1311–1314). https://cspeech.ucd.ie/Fred/docs/ICSLP93.pdf

115.

Port

R. F.

Dalby

(1982). Consonant/vowel ratio as a cue for voicing in English. Perception & Psychophysics, 32, 141–152.

116.

Raphael

L. J.

(1972). Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American English. The Journal of the Acoustical Society of America, 51, 1296–1303.

117.

Raphael

L. J.

(1975). The physiological control of durational differences between vowels preceding voiced and voiceless consonants in English. Journal of Phonetics, 3, 25–33.

118.

Raphael

L. J.

(1981). Durations and contexts as cues to word-final cognate opposition in English. Phonetica, 38, 126–147.

119.

Raphael

L. J.

Dorman

M. F.

Freeman

Tobin

(1975). Vowel and nasal duration as cues to voicing in word-final stop consonants: Spectrographic and perceptual studies. Journal of Speech, Language, and Hearing Research, 18, 389–400.

120.

Raymond

W. D.

Pitt

Johnson

Hume

Makashay

Dautricourt

Hilts

. (2002). An analysis of transcription consistency in spontaneous speech from the Buckeye Corpus. In Seventh international conference on spoken language processing. https://buckeyecorpus.osu.edu/pubs/icslp02.pdf

121.

Repp

B. H.

Williams

D. R.

(1985). Influence of following context on perception of the voiced–voiceless distinction in syllable-final stop consonants. The Journal of the Acoustical Society of America, 78, 445–457.

122.

Revoile

Pickett

J. M.

Holden

L. D.

Talkin

(1982). Acoustic cues to final stop voicing for impaired-and normal-hearing listeners. The Journal of the Acoustical Society of America, 72, 1145–1154.

123.

Saltzman

Nam

Krivokapic

Goldstein

(2008). A task-dynamic toolkit for modeling the effects of prosodic structure on articulation. In Proceedings of the 4th international conference on speech prosody (speech prosody 2008) (pp. 175–184). https://sail.usc.edu/~lgoldste/me/documents/SaltzEtAl_Prosody08_SHORT.pdf

124.

Sanker

(2019). Influence of coda stop features on perceived vowel duration. Journal of Phonetics, 75, 43–56.

125.

Sanker

(2020). A perceptual pathway for voicing-conditioned vowel duration. Laboratory Phonology, 11, 18.

126.

Schwartz

(2010). Phonology in the speech signal-Unifying cue and prosodic licensing. Poznań Studies in Contemporary Linguistics, 46, 499–518.

127.

Sharf

D. J.

(1962). Duration of post-stress intervocalic stops and preceding vowels. Language and Speech, 5, 26–30.

128.

Smith

B. L.

(2002). Effects of speaking rate on temporal patterns of English. Phonetica, 59, 232–244.

129.

Solé

M.-J.

(2007). Controlled and mechanical properties in speech. In Solé

M. J.

Beddor

Ohala

(Eds.), Experimental approaches to phonology (pp. 302–321). Oxford University Press.

130.

Stetson

R. H.

(1928). Motor phonetics: A study of speech movements in action. Springer.

131.

Stevens

K. N.

House

A. S.

(1963). Perturbation of vowel articulations by consonantal context: An acoustical study. Journal of Speech, Language, and Hearing Research, 6, 111–128.

132.

Summerfield

(1981). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance, 7, 1074–1095.

133.

Sweet

(1880). Clarendon Press series: A handbook of phonetics. Macmillan Publishers.

134.

Tanner

Sonderegger

Stuart-Smith

, & SPADE Data Consortium. (2019). Vowel duration and the voicing effect across English dialects. Toronto Working Papers in Linguistics, 41, 1–13.

135.

Turk

A. E.

Shattuck-Hufnagel

(2007). Multiple targets of phrase-final lengthening in American English words. Journal of Phonetics, 35, 445–472.

136.

Umeda

(1975). Vowel duration in American English. The Journal of the Acoustical Society of America, 58, 434–445.

137.

Van Heuven

W. J. B.

Mandera

Keuleers

Brysbaert

. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology, 67, 1176–1190.

138.

Van Summers

. (1987). Effects of stress and final-consonant voicing on vowel production: Articulatory and acoustic analyses. The Journal of the Acoustical Society of America, 82, 847–863.

139.

Vatikiotis-Bateson

(1984). The temporal effects of homorganic medial nasal clusters. Research in Phonetics, 4, 197–233.

140.

Volaitis

L. E.

Miller

J. L.

(1992). Phonetic prototypes: Influence of place of articulation and speaking rate on the internal structure of voicing categories. The Journal of the Acoustical Society of America, 92, 723–735.

141.

Walsh

Parker

(1981). Vowel length and voicing in a following consonant. Journal of Phonetics, 9, 305–308.

142.

Wardrip-Fruin

(1982). On the status of temporal cues to phonetic categories: Preceding vowel duration as a cue to voicing in final stop consonants. The Journal of the Acoustical Society of America, 71, 187–195.

143.

Wightman

C. W.

Shattuck-Hufnagel

Ostendorf

Price

P. J.

(1992). Segmental durations in the vicinity of prosodic phrase boundaries. The Journal of the Acoustical Society of America, 91, 1707–1717.