The influence of speech rate and accent on access and use of semantic information

Abstract

Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.

Keywords

Accents Speech processing Semantic processing Speech rate Word recognition

Considering that about 20% of the US population speaks a language other than English at home (US Census Bureau, 2011), that there are more non-native than native English speakers worldwide (Graddol, 1997), and that there is an increasing trend in diversification and intermixing of cultures and nationalities across the world, it is very likely that a person will be exposed throughout their lifetime to a variety of both regional and foreign accents. Processing of accented speech is generally more difficult than processing of native speech. In this study, we examine whether difficulties in processing accented speech can be alleviated by semantic information and investigate the top-down effects of lexical semantics on processing sub-optimal speech. In addition to examining the effect of semantics on accented speech, we investigated the relationship between speech rate and access to semantic information. Since comprehensibility of faster speech is slower and more prone to errors than the processing of slower speech, we hypothesize that at faster rates, participants will rely more on semantic information.

Semantic influences on word recognition

One can characterize the semantic structure of concepts by defining salient features of those concepts. For instance, when a listener is presented with the word cat, the meaning of the concept cat, in return, can be characterized as the shared activation of features such as furry, animal, pet, and so on. This kind of representation of semantic structure has been shown to be very useful in describing the results from semantic priming and categorization tasks (McRae, de Sa, & Seidenberg, 1997; Mirman & Magnuson, 2009; Yee, Huffstetler, & Thompson-Schill, 2011), where words that are semantically similar are activated based on overlap in features. Mirman and Magnuson (2009) have shown that the processing of spoken words is modulated by the number of semantic neighbours and the degree of semantic feature overlap. Moreover, semantic feature representations have been very suitable from a modelling perspective, specifically in describing the structure of semantic memory and accounting for a wide range of empirical results (Cree, McRae, & McNorgan, 1999; McRae, Cree, Seidenberg, & McNorgan, 2005; McRae et al., 1997; Plaut & Shallice, 1993).

Defining the meaning of words through featural representations is not the only available option. Semantics is considered to be multi-dimensional and can incorporate such aspects as imageability (i.e., the degree to which the stimulus can be perceived through the senses), number of meanings, number of associates, and contextual dispersion (i.e., how often the same words appear across content areas). All these dimensions, including the feature dimension, have been shown to influence the speed of visual word recognition (see Yap, Pexman, Wellsby, Hargreaves, & Huff, 2012, who explored the influence of the above-mentioned dimensions in a variety of recognition tasks). General findings suggest that richer semantic representations facilitate processing in lexical decision tasks. Hence, in the visual domain, words with higher imageability ratings, more associates, more meanings, and more features show a benefit in processing. Figure 1 presents a meta-analytic summary of studies with factorial designs that examined the influence of feature semantics (i.e., the number of features, NOF, effect) on visual word recognition from lexical decision and semantic categorization tasks.

Figure 1.

Studies with a factorial design that examined the effects of feature semantics on visual word recognition. Note that this figure does not include regression studies, which are much more prevalent in the literature. The filled diamond represents the average effect size (standardized mean difference) and the 95% confidence interval (CI) from a random effects meta-analysis. All studies in the figure had high precision (i.e., provided all the information needed to compute the effect size), except for the study by Rabovsky, Sommer, and Abdel Rahman (2012), who report a non-significant effect, but do not provide the means and the variance for the effect. For Rabovsky et al. (2012), we assumed an effect size of 0 and computed the variance based on the formula provided in comprehensive meta-analysis (CMA; Borenstein, Hedges, Higgins, & Rothstein, 2011).

Previous research in visual (Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008; Pexman, Lupker, & Hino, 2002; Yap et al., 2012; Yap, Tan, Pexman, & Hargreaves, 2011) and spoken word recognition (Sajin & Connine, 2014; Tyler, Moss, Galpin, & Voice, 2002; Tyler, Voice, & Moss, 2000) has shown that listeners are able to use information about the semantic structure of words during the recognition process. Moreover, it was generally found that increased processing demands created either because of adverse listening environment (e.g., presence of babble noise in background; Sajin & Connine, 2014) or because of lexical characteristics of words (e.g., word differing in their neighbourhood size; Tyler et al., 2000) led to greater reliance on semantic information.

Taking into account the dissimilarity between processing spoken words versus written words, Sajin and Connine (2014) investigated whether words with richer semantic representations are recognized faster. Specifically, they compared recognition for words that had a high number of features (high NOF) with words with a low number of features (low NOF) using the lexical decision task and the visual world paradigm (VWP). Words with higher number of features generally refer to concepts that have richer sensorimotor information as well as taxonomic information. For instance, the words bread and brick, despite being similar on a number of dimensions such as imageability, frequency, age of acquisition, and neighbourhood density, show large differences in the number of semantic features that participants provide in a feature listing questionnaire (see for details McRae et al., 2005). In this instance, the word bread has a larger number of features than the word brick. We focus on the feature dimension to index semantics because it showed robust effects in visual word recognition studies (see Figure 1). Sajin and Connine's (2014) findings suggest that words with higher number of features show faster recognition. Interestingly, they found that the difference among words on the feature dimension became particularly important when the processing demands were increased. For instance, recognition of an auditory target was delayed, and a NOF effect was found in the visual world paradigm only when both a target and onset competitor appeared in the display (e.g., subject heard high NOF word bread while being presented in the display with the target BREAD, onset low NOF competitor BRICK, and two distractors). In trials with no onset competitors in the display (i.e., with fewer processing demands on target selection), the NOF effect was absent. However, by embedding the token stimuli in babble speech, they found greater reliance on semantics in both competitor and no-competitor trials. These results suggest that when listeners are presented with spoken input that is degraded due to noise in the environment (e.g., babble), listeners experience delays in processing that subsequently enable them to rely on semantic information to speed up recognition. Background babble is but one circumstance under which listeners find themselves with additional processing demands, and in the following experiments we investigate whether speech rate and accent impact how semantics manifests when recognizing spoken words.

Recognition of accented speech

Accented speech produced by a non-native speaker leads to processing costs that are known to impact accuracy (Bradlow & Bent, 2008; Imai, Flege, & Walley, 2003; Munro & Derwing, 1995) and speed of recognition (Clarke & Garrett, 2004). Imai et al. (2003), for instance, found that English listeners are more accurate at recognizing words spoken in their native language without an accent than at hearing Spanish-accented words. Similarly, Munro and Derwing (1995) found that native English speakers were slower in transcribing sentences spoken in English by a Chinese speaker than sentences spoken in English by a native speaker. Despite the initial difficulty that listeners experience when processing accented speech, there is considerable evidence indicating that listeners learn to cope with the more difficult speech by adapting to systematic variation found in foreign accented speech (Baese-Berk, Bradlow, & Wright, 2013; Maye, Aslin, & Tanenhaus, 2008; Sidaras, Alexander, & Nygaard, 2009). For instance, Maye et al. (2008) created an artificial accent where front vowels were lowered in the vowel space (e.g., this led to a word like witch being pronounced as wetch). After exposure to a 20-min accented segment of the “Wizard of Oz”, listeners processed at test the newly learned production as being more word-like than listeners who had not been exposed to the novel accent.

A native speaker does not necessarily have to go through an adaptation process in order to recognize accented speech more accurately. Bradlow and Pisoni (1999) found that native listeners had greater transcription accuracy when presented with lexically “easy” words (low neighbourhood density and high frequency) in non-native English than when they had to listen to “hard” words (high neighbourhood density and low frequency) in non-native English. Moreover, they found that lexical information carries greater weight in recognition when non-native listeners rather than native listeners transcribed the non-native English speech. This finding indicates that for listeners who perceive speech as more difficult (e.g., a native speaker of English listening to Chinese-accented English), lexical information will play a more important role in spoken words recognition. This study further investigates the involvement of semantic information in the recognition of accented speech. In particular, we address the following question: When processing accented speech, do listeners of accented speech rely more on semantic information to help them recognize spoken words? To answer this question, this study compares the recognition of high and low NOF words for accented and native speech and tests whether the NOF effect is greater in the accented condition than in the native condition.

Speech rate and spoken word recognition

Faster speaking rates show greater impediments to processing spoken words than slower speech rates (Adank & Janse, 2009; Dupoux & Green, 1997). Bradlow and Pisoni (1999) found that words spoken at a fast rate have lower accuracy rates in a transcription task than words presented at medium and slow rates (medium and slow rates showed similar accuracy rates, suggesting that the effects of speech rate are nonlinear). Difficulty in processing faster speech has been generally attributed to faster speech having a somewhat degraded signal due to phoneme deletions and substitutions (Fosler-Lussier & Morgan, 1999) and more coarticulation and assimilation (Adank & Janse, 2009).

There are at least two ways in which prior research has investigated the influence of speech rate on word recognition. One approach examines how speech rate affects articulatory variation during production, and how that variation impacts processing. The second is to examine speech rate as a distal or global contextual cue. In the former approach, Byrd and Tan (1996) have shown that there are two ways that speech rate influences articulatory variation. Their production work shows that an increase in speech rate leads to an overall shortening in duration for each articulatory segment and an overall increase in overlap between articulatory units (see also Fosler-Lussier & Morgan, 1999). In other words, in order to talk faster, one increases the amount of coarticulation, deletions, and assimilations, as well as shortens the duration of each speech segment. An overall shortening in segment could be used to explain how speech rate affects the boundary for voice-onset time (VOT) when distinguishing voiced stop consonants (/b/, /d/, /g/) from voiceless consonants (/p/, /t/, /k/). Miller and Volaitis (1989) found that changes in speech rate (i.e., changes in duration of the segments) shifted the voice–voiceless category boundary as indicated by judgements that listeners made in a phoneme categorization task. Additionally, Shatzman and McQueen (2006) found that segment duration plays an important role in identifying word boundaries in connected speech. When Dutch listeners were presented with tokens such as “eens (s)peer,” “one (s)pear”, where the phoneme [s] could be shared between the two words, listeners were more likely to attribute the [s] to the second word if the duration of the [s] was longer. Shatzman and McQueen (2006) indicate that this perceptual effect is observed because segments with longer durations are more likely to be found as part of a word-initial position rather than a word-final position.

Speech rate has also been examined as a contextual cue as part of distal (in an utterance) or global context (in a conversation). Dilley and Pitt (2010) found that slowing a talker's speech around a target word led listeners to report the absence of the target word. Using targets that were embedded in either a fast or a slow rate context, they show that sentences with fast rate context led listeners to perceive function words that were not produced (e.g., perceiving leisure or time instead of leisure time), while sentences with a slow rate context led listeners to under-report the presence of function words that were produced (e.g., perceiving leisure time instead of leisure or time). According to Dilley and Pitt (2010), this is an indication that listeners use information about relative speech rates in order to identify and segment speech. Follow-up work shows that context speech rate effects can be found not only in distally occurring context (i.e., within an utterance), but also from the global speech rate context that listeners experience during a conversation (Baese-Berk et al., 2014).

In the following study, we examined properties of speech rate and semantics at the word level, and in keeping with this goal, the stimuli were recorded in isolation. Our manipulation thus cannot speak to the effects of contextual speech rate, but focuses on the effect of changes in segment duration and coarticulation on recognition of spoken words. Our prediction is that at faster rates of speech, words will be more difficult to process, subsequently leading to more top-down processing and a larger NOF effect. Under this prediction, listeners’ reliance on semantic knowledge will be impacted by the quality of the acoustic input under varying speech rates.

Summary

In the study described below, a set of lexical decision experiments were performed that examined the interaction between NOF and accentedness and the interaction between NOF and speech rate. Participants were presented with high and low NOF words either in an accented or in a native voice at one of the five speech rates (from very fast to very slow). We expected to observe a larger NOF effect for the accented condition than for the native condition. Additionally, we expected to observe a larger NOF effect for speech rates that are more difficult to process (i.e., very fast speech).

Experimental study

Method

Materials

Twenty-nine low NOF words (mean NOF = 9.28) and 30 high NOF words (mean NOF = 19.2) were selected from the McRae et al. (2005) feature norms database (see Appendix A). All stimuli were concrete nouns and were matched on a number of lexical dimensions: number of phonemes, number of letters, number of syllables, spoken neighbourhood size (Washington University Speech and Hearing Neighborhood Database), Coltheart N, familiarity (McRae et al., 2005), Kučera and Francis (1967) frequency, and spoken and written frequency counts for English language in CELEX (Baayen, Pipenbrock, & van Rijn, 1993). Subsequent norming revealed that words in the low and high NOF groups differed on a new frequency measure based on American English subtitles (Brysbaert & New, 2009). High NOF words had on average a subtitle frequency of 12.28 (SD = 11.23) words per million, and low NOF words had an average frequency of 6.9 (SD = 8.3; group-wise comparison p = .04). Descriptive statistics for the two sets of words are provided in Table 1. Additionally, 73 nonword fillers (NW) were used (Appendix B) that had a similar proportion of one-syllable and two-syllable nonwords as the set of words (one-syllable: NW 50% vs. words 49%; two-syllable: NW 46% vs. words 49%; three or more syllable: NW 4% vs. words 2%).

Table 1.

Mean statistics for low number of features and high number of features stimuli used in Experiment 1.

Measure	Low NOF	High NOF
Number of stimuli	29	30
Number of features	9.28 (2.153)	19.2 (2.27)
Familiarity	5.06 (1.77)	5.65 (1.9)
Number of letters	5.41 (1.09)	5.1 (1.19)
Number of phonemes	4.34 (0.94)	4.07 (1.08)
Number of syllables	1.52 (0.688)	1.57 (0.5)
K & F frequency	11 (12.24)	12.3 (14.06)
CELEX spoken frequency	4.93 (8.6)	3.79 (6.72)
CELEX written frequency	128.6 (148.47)	173.13 (206.7)
Subtitle frequency	6.9 (8.3)	12.28 (11.23)
Coltheart N	2.1 (2.01)	3.9 (6.01)
Phonological neighbours	4.48 (7.15)	7.43 (9.54)

Note: Means; standard deviations in parentheses. Low NOF = low number of features. High NOF = high number of features. Number of features and familiarity were taken from McRae's norms (2005). Familiarity ratings go from 1 to 9, with 1 corresponding to not at all familiar and 9 to extremely familiar. Phonological neighbours were taken from the Speech and Hearing Lab Neighborhood Database. Frequency measures (K & F) come from Kučera & Francis (1967) and CELEX (Baayen et. al, 1993). Subtitle frequency norms were taken from Brysbaert and New (2009).

Stimuli were recorded in a very fast, fast, medium, slow, and very slow rate by a female native speaker of American English and by a male speaker whose native language was Romanian. The Romanian language has a smaller vowel inventory than American English (Labov, Ash, & Boberg, 2006; Mallinson, 1986; there are 7 vowels in Romanian compared to 13 in the North American English vowel system), and the missing vowel categories in Romanian are distributed across the vowel space. For native Romanian speakers of English this leads to speech where an English [æ] is produced as the Romanian [e] (Pittman, 2008). Romanian speakers also show a failure to make the tense/lax distinction between, for example, English [i] and [ɪ] (e.g., pronouncing Tim as team). Shared vowels also differ in their acoustic realization; for example, the Romanian [u] falls somewhere between the English [u:] and [o]. The consonant inventories of English and Romanian are similar but Romanian does not have dental fricatives or the nasal velar segments found in English. Similar to shared vowels, the phonetic realization of shared consonants differs somewhat. For example, unlike English, Romanian voiceless plosives are unaspirated. Salient characteristics of Romanian accented English include the trilled [r] and a tendency to substitute dental fricatives /ʺ/ and /Ð/ with labio-dental fricatives /f/ and /v/ (see Pittman, 2008 for an extensive analysis of the way in which a Romanian accent is expressed in English productions).

Speech rates were matched across native and accented voice by sampling multiple recordings and selecting the ones that matched most closely in duration to each other. In this way each individual recording for a word in a particular rate in accented speech matched the duration of the recording for the same speech rate in native speech (all pairwise comparisons across speakers at each speech rate p > .1; see descriptive statistics in Table 2). The recordings in this experiment are not part of connected speech and ranged from 4.1 syl/sec for very fast, to 3.5 syl/sec for fast, 2.8 syl/sec for medium, 2.6 syl/sec for slow, and 2.2 syl/sec for very slow speech rate (these values were calculated by multiplying the average number of syllables for each NOF condition by 1000 ms/average duration of the stimuli). Speech rates in the middle ranges (fast to slow) are comparable to those in other studies that looked at processing of single recordings (Radeau, Morais, Mousty, & Bertelson, 2000). Durations recorded in a very slow speech are not typically encountered, while durations for very fast rates approximate the durations more typically seen in conversational speech. For instance, Fosler-Lussier and Morgan (1999) found that for the Switchboard corpus of phone conversations, a speech rate of 4.2 syllables s^–1 is considered in the range of medium to fast. All stimuli were recorded and digitized at 48 kHz (16-bit resolution) and stored on a computer.

Table 2.

Mean duration (ms) for accent and native recordings in the very fast to very slow speech rates.

Rate/voice	Accented		Native
Rate/voice	High NOF	Low NOF	High NOF	Low NOF
Very fast	362 (60)	391 (76)	359 (58)	392 (80)
Fast	441 (70)	454 (72)	439 (69)	453 (77)
Medium	535 (86)	545 (77)	537 (86)	548 (80)
Slow	552 (103)	599 (99)	551 (102)	602 (98)
Very slow	678 (111)	690 (109)	675 (109)	684 (118)

Note: Standard deviations in parentheses. NOF = number of features.

Participants

A group of 237 students at Binghamton University took part in the experiment in order to fulfil a psychology course requirement. All participants were native English speakers and reported normal hearing. Participants were assigned to one of 10 conditions: 2 (voice) × 5 (speech rates). Among participants in the native conditions, 24 heard recordings at a very fast rate, 23 heard recordings at a fast rate, 23 heard recordings at a medium rate, 25 subjects heard recordings at a slow rate, and 22 heard recordings at a very slow rate. In the accent conditions, 26 heard recordings at a very fast rate, 24 listened to recordings at a fast rate, 26 heard recordings at a medium rate, 18 participants heard recordings at a slow rate, and 26 heard recordings at a very slow speech rate. Subjects in the fast, medium, and slow rates were tested in a different semester from subjects in the very slow and very fast conditions.

Procedure

Participants were tested in groups of two or three in a sound-dampened room. Each subject was seated in front of a 15.7″ monitor, 1024 × 768-pixel resolution. During a trial a target word or a filler nonword was presented through closed-ear headphones, and participants had to make a lexical decision response by pressing the word or nonword key on the serial response box. Before the start of the experimental session, participants were provided with eight practice trials on which they had to achieve at least 80% accuracy, otherwise a second practice session was provided. Target and filler words were presented in a random order and were heard once, which made up a total of 132 trials. The whole experimental session lasted no more than 10 minutes. Participants were instructed to be both fast and accurate in their responses. Reaction times were measured from the offset of the spoken word until a response was made. Data collection and stimuli presentation was done through E-Prime 2.0 software (Schneider, Eschman, & Zuccolotto, 2002).

Data analysis

Analysis of RT data was done using linear mixed effects regression (Baayen, Davidson, & Bates, 2008) through lme4 package (Version 1.1–10; Bates, Maechler, Bolker, & Walker, 2015) in R (Version: 3.2.2; R Core Team, 2015). When specifying the random effect structure, we tried to follow suggestions provided by Barr (2013; see also Barr, Levy, Scheepers, & Tily, 2013), who recommended a maximal random effect structure that includes by-subject and by-item slopes for any within-unit effects and interactions that are used in the fixed effects structure of the model. A maximal random effects structure led to failure to converge, suggesting that the maximal model was over-parameterized (see Bates, Kliegl, Yosishth & Baayen, 2015). Consequently, we identified a random effect structure using the forward-fitting procedure from the LMERConvenienceFunctions package (Version 2.1) provided by Trembley (2015). The forward-fitting is an iterative model fitting procedure that compares a model without a particular random effect with a model that includes the random effect, and, based on likelihood ratio testing, this procedure retains the random effects that improve the model's fit while also removing random effects that lead to failure to converge.

When analysing RT data, we also observed some recommendations suggested by Baayen and Milin (2010) that relate to outlier trimming and auto-correlational structure of the data. Baayen and Milin suggest that minimal a priori outlier trimming combined with model based outlier removal outperforms traditional outlier screening procedures (e.g., removing subject-specific outliers 2.5 SDs from mean). Following their suggestions, we implemented very minimal a priori outlier removal and instead screened for outliers based on how normally the residuals of the fitted model were distributed. Additionally, we improved model fit by including variables that account for auto-correlational structure of the data. In particular, each linear mixed model that was tested included the previous trial reaction time (PreviousTrialRT) and the previous trial word/nonword status (PreviousTrialWord) as variables. Since presentation order in the experiment is randomized, the inclusion of these variables made the models that we tested less noisy and more sensitive to fixed effects. The p values were calculated using Satterthwaite approximation to degrees of freedom through the lmerTest package (Version 2.0–29; Kuznetsova, Krockhoff, & Rune, 2015). For all analyses, unstandardized regression coefficients, standard errors (SEs), t values, and significance levels are reported.

Analysis of accuracy data was performed using generalized linear mixed effects regression (GLMER) using similar procedures to what was done for RT analyses. Unfortunately, because there is a very small number of incorrect responses across various conditions, complex random effects and fixed effects structure led to failure in model convergence. To resolve this issue, we relied on a fitting procedure from the LMERConvenienceFunctions package. Unlike in the analysis for RT, where the fitting algorithm was used only to create a more parsimonious random effect structure, for accuracy data the fitting algorithm was used for both fixed and random effect structure. In particular, the following steps were used: (a) the fixed effects were first back-fitted, (b) the random effects were then forward-fitted, and lastly (c) the fixed effects were re-back-fitted.

Results

One subject from the accent very fast condition, who performed at chance, was removed. The rest of the subjects had accuracy above 70% and were retained. We retained the data for any words with at least 70% accuracy.¹

The following words (% accuracy) were removed from each condition:

Accent very fast: boots (68%), toad (68%), fence (68%), skillet (68%), bark (60%), spade (60%), guppy (56%), nylons (56%), olive (44%), bucket (40%), bouquet (24%).

Accent fast: bison (67%), guppy (67%), prune (67%), boots (42%), nylons (38%).

Accent medium: bucket (69%), finch (69%), nylons (65%), apron (62%), bison (54%), baton (23%), guppy (20%).

Accent slow: finch (67%), nylons (67%), guppy (56%), prune (56%), boots (28%).

Accent very slow: guppy (58%), bouquet (24%).

Native fast: bison (65%).

Native medium: finch (69%), bison (57%).

Native very slow: buckle (40%).

After word removal, a small number of extreme outliers with responses before the offset of the word or 5 s after the offset of the word were removed, followed by the removal of inaccurate responses. Finally, after a model was fitted to the data, responses with residual values more than 2.5 standard deviations from zero were removed. Table 3 indicates the number of data points removed from each condition for each step during outlier removal process. Table 3 also indicates the RT for the data used in the refitted model.

Table 3.

NOF effect across all groups.

Condition			Data points start	Word removal (<70%)	Extreme outliers	Incorrect responses	Model trimming	Data points left	RT (ms)	NOF effect (low – high) (ms)
Voice	Rate	Feature group	Data points start	Word removal (<70%)	Extreme outliers	Incorrect responses	Model trimming	Data points left	RT (ms)	NOF effect (low – high) (ms)
Accent	Very fast	high NOF	750	100	1	48	18	583	592	40
		low NOF	725	175	1	45	9	495	632	40
	Fast	high NOF	720	72	0	23	7	618	495	58
		low NOF	696	48	1	57	7	583	553	58
	Medium	high NOF	780	26	1	14	24	715	460	49
		low NOF	754	156	2	47	20	529	509	49
	Slow	high NOF	540	54	0	14	13	459	467	61
		low NOF	522	36	0	43	16	427	528	61
	Very slow	high NOF	780	0	2	50	29	699	471	27
		low NOF	754	52	12	64	39	587	498
Native	Very fast	high NOF	720	0	1	24	14	681	560	21
		low NOF	696	0	1	62	23	610	581
	Fast	high NOF	690	0	0	15	12	663	470	43
		low NOF	667	23	1	23	17	603	513
	Medium	high NOF	690	0	2	17	13	658	444	33
		low NOF	667	46	0	30	11	580	477
	Slow	high NOF	750	0	1	8	14	727	431	39
		low NOF	725	0	1	33	25	666	470
	Very slow	high NOF	660	0	6	18	32	604	398	30
		low NOF	638	22	5	54	35	522	428

Note: NOF = number of features; RT = reaction time. The column “Data points start” specifies the number of data points for each condition before any outlier removal was carried out. The column “Word removal” indicates the number of data points removed due to removal of words with threshold accuracy below 70%. The column “Extreme outliers” specifies the number of data points removed due to making responses before word offset or 5000 ms after offset. The column “Incorrect responses” indicates the number of incorrect responses removed from each condition. The column “Model trimming” indicates the number of responses removed through model criticism. The column “Data points left” indicates the number of responses left after all outlier removal was carried out. The column “RT” indicates the reaction time of responses after carrying out outlier removal from previous columns. The column “NOF effect” specifies the difference in RT between high and low NOF words within each condition.

Selecting a threshold of 70% for word removal leads to biased data loss, because accented speech is more difficult to process and is thus likely to have a larger portion of responses lost due to words not meeting the required threshold. Moreover, as Table 3 indicates, words with low NOF, with poorer accuracy and higher RT, are more likely to be removed because of this threshold. This creates a downward bias for the NOF effect and leads to the RT analyses reported to be on the conservative side (when it comes to both the magnitude of the effect size and statistical significance). We performed another analysis in which the word removal threshold was set at 50% (see Supplemental Material). After lowering the threshold to 50%, the overall effect of NOF for the accented speech increased and remained unchanged for the native speech. We also performed an analysis where the three way interactions between NOF, rate, and accent were examined (see also Supplemental Material). Despite increased model complexity, the performance of this model was similar to that of the model reported in Table 3, χ²(2) = 2.70, p = .258, so we focus on reporting the more parsimonious model, which tested for two-way interactions.

Reaction time analysis

Reaction time data were log-transformed, since a visual inspection of reaction time density plot and one-sample Kolmogorov–Smirnov test for each condition showed that RT responses were not normally distributed (p < .0001 for all conditions), having a positive skew.

A linear mixed model (LMM) with random intercepts for subjects and items, by-subject random slopes for NOF condition, and by-item random slopes for rate condition was used. The fixed effect structure included NOF (high vs. low), voice (accent vs. native), rate (very fast to very slow),²

Rate was recoded as 1 to 5 (very fast = 1 to very slow = 5). We performed LMER analyses either using rate as a factor variable and specifying multiple comparisons, or by using rate as numerical variable (ranging from 1 to 5). We did not note any interactions between NOF and rate for all of the multiple comparisons when rate was used as a factor, so we decided to present the model with the simpler output, where speech rate is treated as a continuous variable.

interaction between NOF and voice, interaction between NOF and rate, PreviousTrialRT, PreviousTrialWord, and natural log of subtitle frequency counts. After fitting this model, residuals greater than 2.5 standard deviations from zero were removed (2.25% of the data; 166 observations), and the model was refitted. Table 3 presents the structure of the refitted model and model output (the diagnostic QQ-plot for the model can be found in Supplemental Material). Table 4 includes the summary statistics for the dataset used in the refitted model. Figure 2 includes the means and confidence intervals for each condition. Figure 3 collapses the responses across rate conditions, in order to more easily visualize the NOF and voice interaction.

Figure 2.

Number of features (NOF) effect (ms) for each native and accented voice across the five speech rates (very fast to very slow). RT = reaction time. The mean values for the NOF effect can be found in Table 3. Error bars represent confidence intervals (CIs). The graph depicts within-subjects variable CI (95%) using the method reported in Morey (2008). This method provides a more accurate understanding for making inferences between high and low NOF conditions, because CI reflect the inferential tests based on a within-subjects analysis of variance (ANOVA) mean square error (MSE) rather than regular between-subjects ANOVA. It is important to note that the CIs in this figure do not fully reflect the complexity of the experimental design (such as accounting for random variability due to items or inter-trial dependencies) as presented in the LMER models.

Figure 3.

Number of features (NOF) effect (ms) for each native and accented voice after collapsing across the five speech rates. RT = reaction time. Confidence intervals (CIs; 95%) were plotted using the same procedure as that used in Figure 1.

Table 4.

Results of the model for RT analysis.

Predictor	β	SE	t	p
Intercept	6.347	0.0506	125.34	<.0001
NOF (high vs. low)	0. 0626	0. 0346	1.808	. 075
Voice (accent vs. native)	−0.0628	0.0234	−2.813	.005
Rate	−0.0736	0.0105	−6.988	<.0001
NOF:Rate	0. 0042	0. 0011	0.388	. 699
NOF:Voice	−0.0310	0. 0167	−2.658	. 0079
PreviousTrialRT	0.0014	0.0000	15.177	<.0001
PreviousTrialWord	−0.0497	0.0058	−8.502	<.0001
Subtitle frequency	−0.0016	0.0009	−1.740	.088

Note: Model structure: RTlog ∼ NOF + Voice + Rate + NOF:Voice + NOF:Rate + PreviousTrialRT + PreviousTrialWord + SubtitleFrequency + (1 + NOF|Subject_R) + (1 + Rate|Item_R). NOF = number of features; RT = reaction time; PreviousTrialRT = previous trial reaction time; PreviousTrialWord = previous trial word/nonword status. Subscript R next to a variable indicates a random effect (e.g., Subject_R), and the interaction between two variables is denoted using a colon (e.g., NOF:Voice). The output in this table is for the refitted model using the log-transformed RT responses. The rows in bold represent the effects of interest in the current study.

Exact p-values are reported, unless their value was <.0001.

The model presented in Table 4 examined the main effect of NOF, speech rate, and accentedness, as well as the interaction between speech rate and NOF and the interaction between accentedness and NOF. Results show a marginal effect of NOF, with high NOF words being recognized faster than low NOF words, and a main effect of speech rate, with words spoken at a fast rate generally taking longer to be recognized than words spoken in slower speech rates, an interaction between feature group and voice condition, which indicates that in the accent condition listeners show a greater NOF effect than in the native condition (see Figure 1 and RT in Table 3). No interaction between speech rate and NOF was observed, suggesting that listeners’ reliance on feature semantics is not dependent on speech rate. Lastly, a marginal effect of frequency was observed.³

Although during norming items were controlled for CELEX and Kučera and Francis frequency, high and low NOF items differed from each other on newer subtitle frequency norms (Brysbaert & New, 2009). The addition of subtitle frequency, either as a stand-alone variable (as presented in Table 4), or as part of an interaction with other manipulated variables, led to no model improvements or led to the frequency effect becoming not significant, suggesting that the effects found in this experiment are unlikely to be due to frequency not being adequately controlled.

Accuracy analysis

Accuracy (see Figures 4 and 5) was analysed using generalized linear mixed effects regression (GLMER). In order to minimize any bias occurring due to outlier removal, accuracy was analysed based on the original dataset, before any outlier removal was carried out. Table 5 indicates the number of data points for each condition as well as accuracy in each condition.

Figure 4.

Accuracy depicting the number of features (NOF) effect for each native and accented voice across the five speech rates (very fast to very slow). Confidence intervals (CIs) were calculated using the same method as that in Figure 1.

Figure 5.

Accuracy depicting the number of features (NOF) effect for each native and accented voice after collapsing across speech rates. Confidence intervals (CIs) were calculated using the same method as that in Figure 1.

Table 5.

Mean accuracy for each condition in the set of lexical decision experiments.

Condition			Data points	Accuracy (%)	NOF effect (high – low) (%)
Voice	Rate	Feature group	Data points	Accuracy (%)	NOF effect (high – low) (%)
Accent	Very fast	high NOF	750	87	5
		low NOF	725	82
	Fast	high NOF	720	92	3
		low NOF	696	89
	Medium	high NOF	780	97	14
		low NOF	754	83
	Slow	high NOF	540	93	4
		low NOF	522	89
	Very slow	high NOF	780	93	6
		low NOF	754	87
Native	Very fast	high NOF	720	97	6
		low NOF	696	91
	Fast	high NOF	690	98	3
		low NOF	667	95
	Medium	high NOF	690	97	4
		low NOF	667	93
	Slow	high NOF	750	99	2
		low NOF	725	97
	Very slow	high NOF	660	97	8
		low NOF	638	89

Note: NOF = number of features. The column “NOF effect” specifies the difference in accuracy between high and low NOF words within each condition.<<t/s: Style as TableFootnote>>

The initial GLMER model included speech rate, NOF, accent, subtitle frequency, and PreviousTrialWord as main effects and also examined the interaction between speech rate and NOF and the interaction between voice and NOF. The random effects structure included random intercepts for words and subjects as well as within-subject and within-item random slopes for NOF and rate (similarly to the structure in RT analyses). Using the fitting function from the LMERConvenienceFunctions package (Tremblay, 2015), the final model converged to a fixed effects structure having the main effects of NOF, voice, rate, PreviousTrialWord, and interaction between NOF and voice. The random effects structure for this model included the random intercepts for items and subjects (see output and model structure in Table 6).

Table 6.

Results of the GLMER model for the accuracy data.

Predictor	Odds ratio	95% CI	z	p
NOF (high vs. low)	1.97	[1.08, 3.58]	2.23	.0257
Voice (accent vs. native)	3.71	[2.78, 4.97]	8.85	<.0001
Rate	1.15	[1.04, 1.27]	2.89	.0038
NOF:Voice	1.57	[1.16, 2.13]	2.93	.0034
PreviousTrialWord	1.49	[1.30, 1.71]	5.77	<.0001

Note: Model structure: Accuracy (binary) ∼ NOF + Voice + Rate + PreviousTrialWord + NOF:Voice + (1 |Subject_R) + (1 + Rate|Item_R). Subscript R next to a variable indicates a random effect (e.g., Subject_R), and the interaction between two variables is denoted using a colon (e.g., NOF:Voice). GLMER = generalized linear mixed effects regression; NOF = number of features; CI = confidence interval; PreviousTrialWord = previous trial word/nonword status.

The results of the model indicate a main effect of NOF, with low NOF words 1.97 times more likely to have incorrect responses than high NOF words (z = 2.23, 95% confidence interval, CI [1.08, 3.58]), a main effect of voice, with accented conditions 3.71 times more likely to have an incorrect response (z = 8.85, 95% CI [2.78, 4.97]), a main effect of rate, with accuracy being lower at faster speech rates (odds = 1.15, z = 2.89, 95% CI [1.04, 1.27]), and a main effect of PreviousTrialWord (odds = 1.49; z = 5.77, 95% CI [1.3, 1.71]), indicating that subjects are more likely to respond correctly on the current trial if on the previous trial they were asked to respond to a word stimulus. Additionally, a significant interaction between NOF and voice was observed: The difference in accuracy between high and low NOF words was greater for accented conditions than for native conditions (odds = 1.57; z = 2.933, 95% CI [1.16, 2.13]).

General Discussion

In this experiment it was hypothesized that greater semantic effects will be observed for speech that is more difficult to process. This hypothesis was driven by previous findings indicating that listeners have early access to semantic knowledge and that semantic information becomes more important when listeners hear degraded speech (i.e., speech embedded in babble; Sajin & Connine, 2014). Hence, if accented speech is more difficult to process than native speech, one should see a greater NOF effect for the accented speech. Similarly, if various speech rates create processing difficulties for listeners, one should also observe an associated increase in the NOF effect.

Overall, the results reported in Figure 2 and Table 3 indicate that listeners use semantic information during speech processing. Moreover, the effect of semantic knowledge is larger in the accented speech conditions than in the native speech conditions, suggesting that when listeners hear accented speech, having access to semantic feature information aids in processing spoken words. Figure 2 also indicates that variability in speech rate does not generally influence access to semantics (i.e., the NOF effect remains fairly constant across speech rates).

The absence of an interaction between speech rate and NOF must be interpreted with a caveat. Although there is no overall interaction between rate and NOF, Figure 2 indicates that the effect of NOF changes across speech rates within the accent condition. For instance, in post hoc LMM analyses for the accent condition, the NOF effect in very slow speech (27 ms) is smaller than the NOF effect observed for the fast (58 ms; t = 1.99), medium (49 ms; t = 2.527), and slow rates (61 ms; t = 1.97), but comparable to the very fast rate (40 ms; t = 1.39). In other words, at abnormally slow speech rates, the NOF effects for the accent condition are relatively smaller than those for most of the other speech rates. An explanation for why unusually slow speech impacts access to semantic information for the accent condition might be because slow speech enhances the characteristics of the accented speech. For instance, Munro and Derwing (2001; see also Munro & Derwing, 1998) indicate that perceived accentedness and the comprehension of accented speech has a nonlinear relationship to speech rate, with very slow speech and very fast speech having higher accented scores and poorer comprehensibility. Accented speech in between very fast and very slow was judged as more optimal by native listeners. This implies that, in contrast to fast speech, in which the acoustic–phonetic information is more heavily overlapped and reduced via co-articulation and assimilation processes, slow speech draws out the acoustic information that makes accented speech different from native speech. For instance, Romanian speakers substitute the English dental fricative /ʺ/ with the labio-dental fricative /f/, pronouncing the expression “three fries” as “free fries”. Munro and Derwing (2001) indicate that when processing very slow accented speech, rather than focusing on processing the word, listeners fixate on phonological processing of mispronunciations (e.g., will note that the speaker mispronounced three as free). Thus, the rich acoustic–phonetic properties in slow speech will paradoxically impair processing of an accent, since listeners are not engaging in lexical access but rather focus on phonological processing of mis-pronunciations.

This explanation suggests that accented speech in the very slow rate should be perceived as being more accented than the accented speech presented at faster speech rates. To support this view, we collected transcription and ratings data from an additional 50 students who did not participate in the lexical decision task and who had to transcribe and rate the degree of accentedness for the very slow and medium rate accented words (see Appendix C for details). Results of the transcription analyses indicate that high NOF words were transcribed more accurately than low NOF words across both speech rate conditions (a pattern consistent with accuracy in the lexical decision data in which high NOF words were recognized more accurately than low NOF words). The ratings data show that very slow speech is rated as more accented than medium speech. These results indicate that there is a limit to the usefulness of semantic information when processing non-native speech. Slight deviations from expectations in the acoustic input can be corrected through feedback coming from listeners’ semantic representations; nonetheless, if the deviation in the input is too extreme, semantic information stops helping with processing.

Effortful listening and NOF

Why is there a greater NOF effect in the processing of accented speech rather than of native speech? One possibility is that during the processing of difficult speech, there is increased use of executive resources that enable the listener to focus on processing the acoustic input in order to match it with an internal linguistic representation (Van Engen & Peelle, 2014). If the listener hears a clear, unambiguous acoustic input, the comprehension process proceeds without a need to rely on top-down feedback from semantic knowledge. Nonetheless, if the acoustic input is a poor match to the listeners’ perceptual expectations, such as when listening to accented speech, then compensatory executive resources are recruited in order to speed up recognition. In a meta-analysis of 10 neuroimaging studies, Adank (2012) found that during the comprehension of distorted speech (e.g., speech with background noise, unfamiliar accent), listeners show increased bilateral activation for areas associated with general executive processing, such as anterior superior temporal sulcus (STS) and posterior middle temporal gyrus (MTG), regions that are sensitive to the intelligibility of a stimulus and processing of word meaning (Acheson & Hagoort, 2013). Similarly, Wild et al. (2012) report that the process of comprehending degraded speech depends on directed attention. Wild et al. found that clear speech is encoded and processed successfully regardless if participants are told to attend to it or not, while degraded speech was processed successfully only when participants were directly attending to it. This could be an indication that increased NOF effects in the accented speech in our study are due to a process of effortful listening and top-down integration of semantic knowledge (see also, for a possible neural mechanism, Sohoglu, Peelle, Carlyon, & Davis, 2012).

Support for a cognitive control mechanism for speech recognition has also been provided by studies examining age-related differences. As an individual ages, speech recognition generally becomes more effortful and slow. This decline is mainly attributed to hearing loss but also to declining grey matter volume in the temporal lobe (Eckert et al., 2008). Eckert et al. (2008) found that in older listeners cognitive control systems are used to a greater degree than in younger listeners during word recognition. Reliance on cognitive control mechanisms in frontal brain regions is increased in order to compensate for decreased grey matter volume in the temporal regions. This suggests that repeating the set of experiments in this paper with an older population with no cognitive control impairments should lead to even greater NOF effects across all conditions.

Attention is likely to play a key role in the modulation of semantic effects observed during word recognition. Research on spoken word recognition has shown that listeners are both highly adaptable and highly flexible in using lexical knowledge. An early demonstration of flexibility is the work of Cutler, Mehler, Norris, and Segui (1987) who showed that lexical information is not automatically used by listeners—the advantage for processing words over nonwords varies with both task and stimulus characteristics (see also, Eimas, Marcovitz Hornstein, & Payton, 1990). Mirman, McClelland, Holt, and Magnuson (2008) found that the lexicality effect between word and non-word stimuli is bigger when the task involves a larger proportion of word items (e.g., 80% of items are words vs. 20% non-words). In other words, if the task parameter biases participants towards using lexical knowledge, a greater effect is observed. The finding that task demands can modulate the influence of lexical knowledge has prompted the development of interactive theories that incorporate a dynamic process for directing attention across levels of representation.

Of particular interest are extensions of the TRACE model specifically designed to accommodate flexible deployment of attention. Mirman et al. (2008) instantiate attention by modulating activation of lexical form representations within the interactive architecture of TRACE. A decrease in lexical feedback accompanies decreased overall lexical activation based on task demands; simulations of the model indicate that lexical effects decrease when lexical units are modified to be less responsive to excitatory input. In other words, if the listener expects to hear mainly non-words, then lexical units are less likely to be activated, and there will be diminished feedback. A process of effortful listening may provide insight into how attention is used to enhance increased semantic/lexical feedback.

Implications, limitations, and future research

Although semantic effects have been investigated extensively in visual word recognition (Amsel & Cree, 2013; Pexman et al., 2008; Pexman, Holyk, & Monfils, 2003; Pexman et al., 2002; Yap et al., 2012; Yap et al., 2011), the literature documenting how semantic knowledge directly impacts processing of spoken words is fairly scarce. Except for a handful of experiments examining the impact of imageability (Tyler et al., 2000) and NOF (Sajin & Connine, 2014), the extent and strength of lexical semantic effects on spoken word processing remain to be examined in other tasks (e.g., sentence processing, semantic categorization) and across a larger range of semantic dimensions (e.g., sensory experience, concreteness). However, the findings provide a replication of previous experiments (Sajin & Connine, 2014) that point to an important influence of lexical semantics on spoken word recognition. This is a valuable insight, since the benefits conferred by semantics in recognizing orthographic word forms are not necessarily translatable to the spoken word domain. After all, the presentation of the input between the two domains differs. The speech signal is continuous, highly variable, and time dependent, while the input presented in orthographic form is static. The time-dependent nature of the auditory input has focused empirical and theoretical efforts on onset and rhyme competitor activation (Allopenna, Magnuson, & Tanenhaus, 1998; Connine, Blasko, & Titone, 1993; Dahan, Magnuson, Tanenhaus, & Hogan, 2001; Zwitserlood, 1989), where the degree of match between a spoken word and its representation influences activation. The experiments reported here extend prior research by considering the effect of semantics in conjunction with important speech properties known to influence spoken language processing (rate and accentedness).

Language processing typically occurs in a discourse situation, so it important to further examine semantic effects in sentential context. Sentence and discourse context can contribute to activation of semantics in a way that mitigates the impact of less efficient lexical access (see review by Federmeier, 2007). Further, slow speech in a discourse context may facilitate computation of sentence-level representations that will not be revealed by recognition of individual words. For example, introduction of pauses in time-compressed speech provides listeners with the time necessary to catch up (Wingfield, Tun, Koh, & Rosen, 1999), and the combination of contextual constraint and slow speech may serve a similar function. Consistent with this suggestion, Love, Walenski, and Swinney (2009) showed that slowed speech interferes with sentence-level processes involved in syntactic representations (e.g., determining pronoun reference) but improves processes that take into account semantic and pragmatic constraints.

Finally, this study indicates that effects of semantic knowledge on spoken word processing can plateau and eventually diminish if speech differs significantly from what is typical (e.g., accented speech with a very slow rate), suggesting that if speech becomes harder to process, semantic effects increase, but only up to a point. Nonetheless, we have observed the plateau for our semantic effect only in one condition (the accented speech presented at a very slow rate), so it is important to further test whether the constraint on use of semantics generalizes to other types of atypical or difficult-to-process speech. For instance, one suggestion would be to compare speakers of the same accent but with different degrees of accentedness. In this scenario, semantic effects for the heavily accented speech might be smaller than those for the less accented speech.

Disclosure statement

Authors have no financial benefit from the direct application of the research presented in this paper.

Footnotes

Appendix A

Appendix B

Appendix C

References

Acheson

D. J.

, & Hagoort

(2013). Stimulating the Brain's language network: Syntactic ambiguity resolution after TMS to the inferior frontal gyrus and middle temporal gyrus. Journal of Cognitive Neuroscience, 25(10), 1664–1677. doi:10.1162/jocn

Adank

(2012). The neural bases of difficult speech comprehension and speech production: Two Activation Likelihood Estimation (ALE) meta-analyses. Brain and Language, 122(1), 42–54. doi:10.1016/j.bandl.2012.04.014

Adank

, & Janse

(2009). Perceptual learning of time-compressed and natural fast speech. The Journal of the Acoustical Society of America, 126(November), 2649–2659. doi:10.1121/1.3216914

Allopenna

P. D.

, Magnuson

J. S.

, & Tanenhaus

M. K.

(1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38(38), 419–439. doi:10.1006/jmla.1997.2558

Baayen

R. H.

, Davidson

D. J.

, & Bates

D. M.

(2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. doi:10.1016/j.jml.2007.12.005

Baayen

R. H.

, & Milin

(2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12–28.

Baayen

, Piepenbrock

, & van Rijn

(1993). The CELEX lexical database (CD-ROM). Philadelphia: Linguistic Data Consortium.

Baese-Berk

M. M.

, Bradlow

A. R.

, & Wright

B. a.

(2013). Accent-independent adaptation to foreign accented speech. The Journal of the Acoustical Society of America, 133(February), EL174–80. doi:10.1121/1.4789864

Baese-Berk

M. M.

, Heffner

C. C.

, Dilley

L. C.

, Pitt

M. a.

, Morrill

T. H.

, & McAuley

J. D.

(2014). Long-Term Temporal Tracking of Speech Rate Affects Spoken-Word Recognition. Psychological Science, 25, 1546–1553. doi:10.1177/0956797614533705

10.

Barr

D. J.

(2013). Random effects structure for testing interactions in linear mixed-effects models. Frontiers in Psychology, 4, 328, 1–2. doi:10.3389/fpsyg.2013.00328

11.

Barr

D. J.

, Levy

, Scheepers

, & Tily

H. J.

(2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. doi:10.1016/j.jml.2012.11.001

12.

Bates

, Kliegl

, Vasishth

, & Baayen

(2015). Parsimonious mixed models.

13.

Bates

, Maechler

, Bolker

& Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01

14.

Borenstein

, Higging , & Rothstein (2011). Comprehensive meta-analysis version 2.2.064 [Software]. Retrieved from http://www.meta-analysis.com/index.php

15.

Bradlow

A. R.

, & Bent

(2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729. doi:10.1016/j.cognition.2007.04.005

16.

Bradlow

A. R.

, & Pisoni

D. B.

(1999). Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors. Journal of Acoustical Society, 106(4), 2074–2085.

17.

Brysbaert

, & New

(2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. doi:10.3758/BRM.41.4.977

18.

Byrd

, & Tan

C. C.

(1996). Saying consonant clusters quickly. Journal of Phonetics, 24, 263–282. doi:10.1006/jpho.1996.0014

19.

Clarke

C. M.

, & Garrett

M. F.

(2004). Rapid adaptation to foreign-accented English. The Journal of the Acoustical Society of America, 116(January), 3647–3658. doi:10.1121/1.1815131

20.

Cree

, McRae

, & McNorgan

(1999). An attractor model of lexical conceptual processing: Simulating semantic priming. Cognitive Science, 23(3), 371–414. doi:10.1016/S0364-0213(99)00005-1

21.

Connine

C. M.

, Blasko

D. G.

, & Titone

(1993). Do the beginnings of spoken words have a special status in auditory word recognition? Journal of Memory and Language, 32, 193–210.

22.

Cutler

, Mehler

, Norris

, & Segui

(1987). Phoneme identification and the lexicon. Cognitive Psychology, 19(2), 141–177.

23.

Dahan

, Magnuson

J. S.

, Tanenhaus

M. K.

, & Hogan

E. M.

(2001). Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition. Language and Cognitive Processes, 16, 507–534. doi:10.1080/01690960143000074

24.

Dilley

L. C.

, & Pitt

M. A.

(2010). Altering context speech rate can cause words to appear or disappear. Psychological Science, 21(11), 1664–1670. doi:10.1177/0956797610384743

25.

Dupoux

, & Green

(1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes. Journal of Experimental Psychology. Human Perception and Performance, 23(3), 914–927. doi:10.1037/0096-1523.23.3.914

26.

Eckert

M. a.

, Walczak

, Ahlstrom

, Denslow

, Horwitz

, & Dubno

J. R.

(2008). Age-related effects on word recognition: Reliance on cognitive control systems with structural declines in speech-responsive cortex. JARO - Journal of the Association for Research in Otolaryngology, 9, 252–259. doi:10.1007/s10162-008-0113-3

27.

Eimas

P.D.

, Marcovitz Hornstein

S.B.

, & Payton

(1990). Attention and the role of dual codes in phoneme monitoring. Journal of Memory and Language, 29, 160–180.

28.

Federmeier

K. D.

(2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44, 491–505. doi:10.1111/j.1469-8986.2007.00531.x

29.

Fosler-Lussier

, & Morgan

(1999). Effects of speaking rate and word frequency on pronunciation in conversational speech. Speech Communication, 29, 137–158.

30.

Graddol

(1997). The future of the english? The British Council, 14(1), 61. doi:10.1017/S0266078400000754

31.

Grondin

, Lupker

S. J.

, & McRae

(2009). Shared features dominate semantic richness effects for concrete concepts. Journal of Memory and Language, 60(1), 1–19. doi:10.1016/j.jml.2008.09.001

32.

Hargreaves

I. S.

, Pexman

P. M.

, Johnson

J. C.

, & Zdrazilova

(2012). Richer concepts are better remembered: Number of features effects in free recall. Frontiers in Human Neuroscience, 6(April), 73. doi:10.3389/fnhum.2012.00073

33.

Imai

I. S.

, Flege

J. E.

, & Walley

(2003). Spoken word recognition of accented and unaccented speech: Lexical factors affecting native and non-native listeners. 15^th ICPhS Barcelona, 845–848.

34.

Kučera

, & Francis

W. N.

(1967). A computational analysis of present day American English. Providence, RI: Brown University Press.

35.

Kuznetsova

, Brockhoff

, & Rune

H. B. C.

(2015). lmerTest: Tests in linear mixed effects models. Retrieved from https://CRAN.R-project.org/package=lmerTest

36.

Labov

, Ash

, & Boberg

(2006). The Atlas of North American English. Berlin: Mouton de Gruyter. doi:10.1515/9783110206838

37.

Love

, Walenski

, & Swinney

(2009). Slowed speech input has a differential impact on on-line and off-line processing in children's comprehension of pronouns. Journal of Psycholinguistic Research, 38, 285–304. doi:10.1007/s10936-009-9103-9

38.

Mallinson

(1986). Rumanian. London: Croom Helm Ltd.

39.

Maye

, Aslin

R. N.

, & Tanenhaus

M. K.

(2008). The weckud wetch of the wast: Lexical adaptation to a novel accent. Cognitive Science, 32, 543–562. doi:10.1080/03640210802035357

40.

McRae

, Cree

G. S.

, Seidenberg

M. S.

, & McNorgan

(2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37(4), 547–59. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/16629288

41.

McRae

, de Sa

V. R.

, & Seidenberg

M. S.

(1997). On the nature and scope of featural representations of word meaning. Journal of Experimental Psychology. General, 126(2), 99–130. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/9163932

42.

Miller

J. L.

, & Volaitis

L. E.

(1989). Effect of speaking rate on the perceptual structure of a phonetic category. Perception & Psychophysics, 46(6), 505–512. doi:10.3758/BF03208147

43.

Mirman

, & Magnuson

J. S.

(2009). Dynamics of activation of semantically similar concepts during spoken word recognition. Memory & Cognition, 37(7), 1026–1039. doi:10.3758/MC.37.7.1026

44.

Mirman

, McClelland

J.L.

, Holt

L.L.

& Magnuson

J.S.

(2008). Effects of attention on the strength of lexical influences in speech perception: Behavioral experiments and computational mechanisms. Cognitive Science, 32, 398–417.

45.

Morey

R. D.

(2008). Confidence intervals from normalized data: A correction to Cousineau (2005). Tutorials in Quantitative Methods for Psychology, 4(2), 61–64. doi:10.3758/s13414-012-0291-2

46.

Munro

M. J.

, & Derwing

T. M.

(1995). Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech. Language and Speech, 38, 289–306. doi:10.1177/002383099503800305.

47.

Munro

M. J.

, & Derwing

T. M.

(1998). The effects of speaking rate on listener evaluations of native and foreign-accented speech. Language Learning, 48(2), 159–182.

48.

Munro

M. J.

, & Derwing

T. M.

(2001). Modeling perceptions of the accentedness and comprehensibility of L2 speech. Studies in Second Language Aquisition, 23(4), 451–468.

49.

Pexman

P. M.

, Hargreaves

I. S.

, Siakaluk

P. D.

, Bodner

G. E.

, & Pope

(2008). There are many ways to be rich: Effects of three measures of semantic richness on visual word recognition. Psychonomic Bulletin & Review, 15(1), 161–167. doi:10.3758/PBR.15.1.161

50.

Pexman

P. M.

, Holyk

G. G.

, & Monfils

M.-H.

(2003). Number-of-features effects and semantic processing. Memory & Cognition, 31(6), 842–855.

51.

Pexman

P. M.

, Lupker

S. J.

, & Hino

(2002). The impact of feedback semantics in visual word recognition: Number-of-features effects in lexical decision and naming tasks. Psychonomic Bulletin & Review, 9(3), 542–9. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12412895

52.

Pittman

(2008). The Romanian Accent in English. University of Mississippi Press.

53.

Plaut

, & Shallice

(1993). Deep Dyslexia - a case-study of connectionist neuropsychology. Cognitive Neuropsychology, 10(5), 377–500. doi:10.1080/02643299308253469

54.

Rabovsky

, Sommer

, & Abdel Rahman

(2012). The time course of semantic richness effects in visual word recognition. Frontiers in Human Neuroscience, 6(February), 11, 1–9. doi:10.3389/fnhum.2012.00011

55.

Radeau

, Morais

, Mousty

, & Bertelson

(2000). The effect of speaking rate on the role of the uniqueness point in spoken word recognition. Journal of Memory and Language, 42(3), 406–422. doi:10.1006/jmla.1999.2682

56.

R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/

57.

Sajin

S. M.

, & Connine

C. M.

(2014). Semantic richness: The role of semantic features in processing spoken words. Journal of Memory and Language, 70, 13–35. doi:10.1016/j.jml.2013.09.006

58.

Schneider

, Eschman

, & Zuccolotto

, (2002). e-Prime reference guide. Pittsburgh: Psychology Software Tools.

59.

Shatzman

K. B.

, & McQueen

J. M.

(2006). Segment duration as a cue to word boundaries in spoken-word recognition. Perception & Psychophysics, 68(1), 1–16. doi:10.3758/BF03193651

60.

Sidaras

S. K.

, Alexander

J. E. D.

, & Nygaard

L. C.

(2009). Perceptual learning of systematic variation in Spanish-accented speech. The Journal of the Acoustical Society of America, 125(5), 3306–16. doi:10.1121/1.3101452

61.

Sohoglu

, Peelle

J. E.

, Carlyon

R. P.

, & Davis

M. H.

(2012). Predictive top-down integration of prior knowledge during speech perception. Journal of Neuroscience, 32(25), 8443–8453. doi:10.1523/JNEUROSCI.5069-11.2012

62.

Tremblay

(2015). LMERConvenienceFunctions: Model Selection and Post-hoc Analysis for (G)LMER Models. (R package version 2.10).http://CRAN.Rproject.org/package=LMERConvenienceFunctions

63.

Tyler

L. K.

, Moss

H. E.

, Galpin

, & Voice

J. K.

(2002). Activating meaning in time: The role of imageability and form-class. Language and Cognitive Processes, 17(5), 471–502. doi:10.1080/01690960143000290

64.

Tyler

L. K.

, Voice

J. K.

, & Moss

H. E.

(2000). The interaction of meaning and sound in spoken word recognition. Psychonomic Bulletin & Review, 7(2), 320–6. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10909140

65.

United States Census Bureau. (2011). The American Community Survey. Available at: http://www.census.gov/acs/www/

66.

Van Engen

K. J.

, & Peelle

J. E.

(2014). Listening effort and accented speech. Frontiers in Human Neuroscience, 8(August), 1–4. doi:10.3389/fnhum.2014.00577

67.

Wild

C. J.

, Yusuf

, Wilson

D. E.

, Peelle

J. E.

, Davis

M. H.

, & Johnsrude

I. S.

(2012). Effortful listening: The processing of degraded speech depends critically on attention. Journal of Neuroscience, 32(40), 14010–14021. doi:10.1523/JNEUROSCI.1528-12.2012

68.

Washington University Speech and Hearing Lab Neighborhood Database (2012). Neighborhood search. http://www.psych.wustl.edu/sommers/

69.

Wingfield

, Tun

P. a.

, Koh

C. K.

, & Rosen

M. J.

(1999). Regaining lost time: Adult aging and the effect of time restoration on recall of time-compressed speech. Psychology and Aging, 14(3), 380–389. doi:10.1037/0882-7974.14.3.380

70.

Yap

M. J.

, Pexman

P. M.

, Wellsby

, Hargreaves

I. S.

, & Huff

M. J.

(2012). An abundance of riches: Cross-task comparisons of semantic richness effects in visual word recognition. Frontiers in Human Neuroscience, 6(April), 72. doi:10.3389/fnhum.2012.00072

71.

Yap

M. J.

, Tan

S. E.

, Pexman

P. M.

, & Hargreaves

I. S.

(2011). Is more always better? Effects of semantic richness on lexical decision, speeded pronunciation, and semantic classification. Psychonomic Bulletin & Review, 18(4), 742–750. doi:10.3758/s13423-011-0092-y

72.

Yee

, Huffstetler

, & Thompson-Schill

S. L.

(2011). Function follows form: Activation of shape and function features during object identification. Journal of Experimental Psychology. General, 140(3), 348–363. doi:10.1037/a0022840

73.

Zwitserlood

(1989). The locus of the effects of sentential-semantic context in spoken-word processing. Cognition, 32(1), 25–64. doi:http://dx.doi.org/10.1016/0010-0277(89)90013-9