Processing wh-filler-gap dependencies

Abstract

We present experimental evidence showing that different wh-filler-gap dependencies are processed differently, depending on their syntactic licensors. Our studies compared the active storage profiles for why, how, and who (serving as subject or object of the verb). The results of offline and online experiments revealed that these wh-fillers are stored in memory for different durations, and predictably so based on the hypothesised structural distance between each wh-filler and the licensor which determines its grammatical and interpretive functions. Furthermore, the results showed that once the wh-filler is licenced, it is integrated to the current structure, and no longer engenders additional memory costs. Based on these findings, we argue that the mechanism of online sentence processing may employ both storage and integration components in memory.

Keywords

Filler-gap dependencies processing wh-fillers syntax storage active maintenance sentence processing memory

Introduction

A prominent feature of human language is its display of dependencies that span across unbounded distances (Chomsky, 1977). Sentence (1) is an example of such a dependency.

(1) What did Jason eat yesterday?

Specifically, a Wh-Filler-Gap Dependency (WhFGD) involves two elements: a wh-filler like what in (1) and its grammatical licensor. In (1), the wh-filler is licenced by the verb, eat. The grammatical properties of the wh-filler (e.g., whether it relates to subject or object position, whether it bears nominative or accusative case) and its interpretation (e.g., whether it questions the agent or patient roles of the event described by the clause) are controlled by its grammatical licensor (See Huddleston & Pullum, 2005, p. 1079, for the description of this type of dependency relation). In (1), the wh-filler is licenced by the verb eat: what functions grammatically as the direct object of that verb, and is interpreted as ranging over the patient of the event described by the verb (i.e., the thing eaten). Evidence of the close-knit relationship between what and the verb in (1) can be found by filling the surface object position with some other distinct expression, and observing that this uniformly leads to judgements of unacceptability (e.g., *What did the student in the playground eat sushi?; Chomsky, 1981). The wh-filler is thus occasionally referred to as the “dependent element” and its licensor as the “controller.”¹

The possible distance between the wh-filler and its licensor has no necessary limit, and so constructions like (1) are often described as instantiating “unbounded dependencies” (McElree et al., 2003; Ross, 1984). The existing literature provides a useful framework within which to talk about how WhFGDs are processed. Wagers and Phillips (2014) identified three processes involved. First, upon encountering a wh-filler, the comprehender’s syntactic parser recognises the presence of a WhFGD. This wh-element—the wh-filler—is then stored in memory until its licensor is encountered and the filler can be integrated into the existing structural representation (Wagers & Phillips call this a “retrieval event”).² Gibson (1998) and Gibson and Warren (2004) furthermore identified two sources of processing complexity induced by WhFGDs: (a) Storage cost: holding an open dependency is difficult, and therefore processing of material within an open dependency is more costly than processing the same material when it is not within an open dependency; (b) Integration cost: the longer the filler is stored in memory, the harder it is to integrate into the structure.

The processing of WhFGDs involves these component processes due to their grammatical properties. A wh-filler cannot be interpreted unless it has a licensor; and, often, a wh-filler and its licensor are not adjacent to one another. Therefore, a wh-filler needs to be stored until the parser encounters the appropriate licensor. Given that the wh-filler does not dictate the structural location of its licensor, and given that the distance between the two is potentially unbounded, the filler must be stored over a potentially long period regardless of any concomitant cost to memory resources. The WhFGD will no longer impact memory resources once it is integrated into the current parse and interpreted (Gibson, 1998, 2000; Wagers & Phillips, 2014; Wanner & Maratsos, 1978). The intrinsic cost of storing a wh-filler, then, could serve as a strong motivation for a comprehender to complete the dependency as soon as possible, which has been observed in a number of studies related to active gap filling (Chacón et al., 2016; Crain & Fodor, 1985; Frazier, 1987; Gibson, 1998; Gibson & Warren, 2004; Stepanov & Stateva, 2015; Stowe, 1986; Wagers et al., 2015; Wagers & Phillips, 2014; Wanner & Maratsos, 1978).

These properties of WhFGD processing raise the interesting question of how different kinds of WhFGDs are processed (de Vincenzi, 1991; Frazier, 1987; Gibson, 1998, 2000; Stepanov & Stateva, 2015; Wagers & Phillips, 2014; Wanner & Maratsos, 1978). Grammatically, different WhFGD constructions have different dependency lengths, which is to say that they reliably resolve at different structural distances from their licensors. This study investigates how wh-fillers such as who (serving as subject or object of the verb), how, and why are processed online, paying special attention to the storage and integration components of the mechanism of online dependency formation. If both storage and integration costs are at play in the mechanism of sentence processing, we expect that the processing should incur costs between a filler and its licensor because the filler has to be maintained in memory (i.e., storage costs; Chen et al., 2005) and there is an associated difficulty with completing the dependency (i.e., integration costs; de Vincenzi, 1991; Grodner & Gibson, 2005; Warren & Gibson, 2002). Our results provide supporting evidence for storage and integration components in the mechanism of online sentence processing (Gibson, 1998; Gibson & Warren, 2004; Kim et al., 2020; Ness & Meltzer-Asscher, 2017, 2019; Wagers & Phillips, 2014; Wanner & Maratsos, 1978).

The processing of wh-filler-gap dependencies

Grammatical properties of wh-fillers

Different wh-fillers enter dependency relations with different licensors (Chapman & Kučerová, 2016; Chomsky, 1986; Huang, 1982; Lasnik & Saito, 1994; Stepanov & Tsai, 2008). What is constant between these wh-fillers, we observe, is that their surface sentence-initial position marks their presence at the syntactic level CP.³ The difference between them is not where they start out but where they are licenced. Differences in dependency length for a WhFGD are thereby captured in the syntactic theory in terms of the distance between the CP and position of the licensor. Thus, we begin with the structural positions of different wh-phrases, briefly reviewing the uncontroversial cases of who and how, and spending a little more time on the relatively more abstract case of why.

First, the wh-filler who can function as a subject or object, and it is interpreted roughly as ranging over the agent or patient roles, respectively, of the event described by the verb. These distributional and interpretive facts are captured as follows: when who functions as the subject (who_subj), a grammatical dependency is established between its initial position in the TP and its surface position in the CP, as shown in (2a) (in this article, we refer to the lower position of the dependency as “the gap”). When who functions as the object (who_obj), a dependency is formed between CP and its initial position as sister to V, as shown in (2b).

(2)

The wh-filler how, interpreted as ranging over manners in which the agent carries out the described event, functions as an adjunct of VP (Ernst, 2001, 2007; Jackendoff, 1972). Dependency formation with how, in this case, links VP with CP, as shown in (3); it has been shown that manner adverbs appear at the VP-edge position as illustrated in the sentences with a manner adverb (e.g., You have harshly/mildly criticised Mary; Ernst, 2001; Jackendoff, 1972).

(3)

The case of why is more subtle. Sentences involving why are known to give rise to distinct “reason” and “purpose” interpretations, typically captured in terms of structural ambiguity (Bale, 2007; Bromberger, 1992; Chapman & Kučerová, 2016; Kawamura, 2007; Ko, 2005; Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015). In general, the two different readings of why can be distinguished by the form of their answers (Chapman & Kučerová, 2016; Li & Kim, 2022), as shown in (4). A response such as that in (4a) indicates that the why-question is interpreted as a request to know the reason the agent performed the described action. By contrast, a response such as that in (4b) indicates that the why-question is interpreted as a request to know the purpose for which the agent performed the described action:

(4) Why did you criticize Mary?

a. Because she deserved it. (reason why)

b. In order to demonstrate my critical-thinking skills. (purpose why)

In structural terms, reason why (why_R) forms a dependency with TP, (5a), while purpose why (why_P) forms a dependency with VP, (5b) (Chapman & Kučerová, 2016; Ko, 2005; Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015). Semantically, this analysis dovetails nicely with the idea that purpose adverbs relate to events (association with VP; Kawamura, 2007), whereas reason adverbs relate to propositions (association with TP; see Bale, 2007; Bromberger, 1992; Chapman & Kučerová, 2016; Kawamura, 2007; Ko, 2005; Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015). The intended interpretation for why may not be evident in an out-of-the-blue context. However, fixing one of the two interpretations involves representing different dependency lengths, with one being shorter than the other:

(5)

Importantly for our purposes, this syntactic analysis in combination with general expectations about how dependency lengths relate to online processing effects makes clear predictions. Specifically, we expect to observe differential preferences between the two interpretations of why in out-of-the-blue contexts or in the context of questions such as (4) that leave both answering options available. Specifically, given that the dependency length for why_R is shorter than that for why_P, readers may be more inclined to interpret why as why_R in such cases.

However, some syntactic contexts can differentiate between the two senses of why (in Section “Experiments 1b and 1c,” we report the results of a rating experiment supporting this claim). For example, why-questions have been suggested to interact differently with sentential negation, depending on how they are interpreted. In general, the acceptability of an adjunct WhFGD is degraded when it spans sentential negation (i.e., so-called “negative islands”; see also Abrusán, 2011; Cinque, 1990; Rizzi, 1990; Ross, 1984; Rullmann, 1995). As we have seen, why_P, comparable to how, involves dependency-formation between the VP and CP. Since VP is structurally lower than sentential negation, we should therefore expect degraded acceptability for both how and why_P with sentential negation, but we would not expect this for why_R, because it involves dependency-formation between phrases that are structurally higher than negation (CP and TP; Chapman & Kučerová, 2016; Yoshida et al., 2015). Now consider (6): it appears that sentential negation is unacceptable with embedded how, (6a); with embedded why, the only available interpretation concerns reasons, (6b) (Chapman & Kučerová, 2016; Stepanov & Tsai, 2008; Yoshida et al., 2015). The structures underlying (6a) and (6b) are shown in (6c) and (6d), respectively:

(6) (a) Q: I know how you did not eat pizza.

A: *Quickly.

(b) Q: I know why you did not eat pizza.

A: Because I was too full.

A’: *In order to not get ill.

Thus, the different dependency lengths of why_R and why_P appear well-supported by their difference in sensitivity to negative islands (Bale, 2007; Bromberger, 1992; Chapman & Kučerová, 2016; Kawamura, 2007; Ko, 2005; Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015): why_R dependencies relate TP to CP (see Miyagawa, 2004), and do not span across sentential negation; why_P dependencies, in contrast, relate VP to CP and span across sentential negation.

Taken together, the data sketched in this section support a representational theory in which different wh-fillers are linked at different syntactic lengths depending on the location of their licensor. The question we are interested in is whether these differences are predictive with respect to the time-course and cost of processing WhFGDs.

Processing of wh-fillers

How are the wh-fillers with different licensors processed? Different WhFGDs have different dependency lengths. If wh-fillers need to be stored in memory until their licensors are recognised, different wh-fillers may incur differing processing costs as they are held in memory for different durations.

Adopting models of sentence processing in which storage and integration are involved in the processing of dependencies (Gibson, 1998; Gibson & Warren, 2004), we expect dependency length to predict both storage and integration costs. If the dependency is longer, we expect that the reader needs to hold the wh-phrase for a longer duration, and we expect concomitant integration costs; the reverse is true for shorter dependency. This could be due to the introduction of new discourse referents throughout the dependency (Gibson, 1998) or due to decay (Lewis & Vasishth, 2005). Furthermore, if the relevant dependency is constructed, we do not expect to see the storage or integration cost associated with the dependency in question after the point where the dependency is constructed.

Models of sentence processing that incorporate storage and integration allow us to empirically investigate the processing cost that may arise between the wh-filler and the licensor, and the cost that arises at the point where the wh-filler is integrated into the existing structure (Chen et al., 2005; Gibson, 1998; Gibson & Warren, 2004; Grodner & Gibson, 2005; Stepanov & Stateva, 2015). Thus, such theories could potentially help us to investigate the time-course of filler-gap dependency formation in detail.

One way to calculate storage costs is to count the number of open dependencies (Gibson, 1991; Gibson, 1998). By counting “open dependencies,” we mean counting the first member of any dependency that is held awaiting its licensor, an idea formulated most clearly in Gibson and Warren (2004, p. 63). Among the ways of calculating complexity effects (e.g., Lewis & Vasishth, 2005; see also Hale, 2016 for a recent survey), Gibson’s (1998, 2000) proposal is the most useful for our purposes.⁴ We provide arguments for this view below.

As Gibson and Warren (2004) suggested, if the time-course of online sentence processing is influenced by storage, we could expect reading time slowdowns associated with the storage cost in regions lying in the middle of a WhFGD. Comparing a sentence that involves a WhFGD with one that does not involve such a dependency, Gibson and Warren (2004) observed that regions lying between the wh-filler and the gap were read slower than the corresponding regions in sentences that did not involve WhFGD.

In a previous study, Stepanov and Stateva (2015) adopted a similar approach and examined storage cost effects for VP-level wh-filler modifiers such as how quickly and why in English. Filler-gap dependencies with wh-adjuncts are processed similarly to wh-arguments, despite the lack of thematic or subcategorisation information associated with the verb. The results of their study revealed that not all adjuncts induce similar storage and integration costs because their dependencies are resolved at different positions. Specifically, how quickly needs to be stored in memory until the integration point located in the VP domain is encountered, whereas why, assuming it is why_R, does not need to be stored in memory because its integration site overlaps with the point at which why is encountered, in line with what we have explained above with regards to the different resolution sites. Based on these findings, they argued that the storage costs were linked to the number of incomplete syntactic heads (or incomplete phrase structure rules), and could be traced to the syntactic position of the licensors of different wh-fillers (Gibson, 1998, 2000). Our study further investigates whether different dependency lengths create different storage costs in the middle of the dependency, at the most deeply embedded noun.

Processing material which is within an open dependency, between the filler and its licensor, is expected to be harder than processing the same material when it is not within an open dependency, but should be reduced once the WhFGD formation is completed and integrated into the current structure (Gibson, 1998; Gibson & Warren, 2004). Following this logic, it is possible to observe differences in processing complexity with respect to the length of the dependency. With an aim to examine the effects of storage costs, we investigate a site between the filler and its licensor. In principle, any site between the filler and the licensor should show the storage costs. However, to increase the chances of finding any effects of storage, we use a complex structure that will put a strain on participants’ memory resources.

The key to testing such effects for our purposes is to include a complex domain such as a centre-embedded relative clause that increases the structural distance within WhFGDs. Example (7) is an object-gapped relative clause modifying a subject NP, a type of centre-embedded relative clause configuration known to independently induce a processing cost (Abney & Johnson, 1991; Chomsky, 1964; Chomsky & Miller, 1963; Gibson, 1991; Gibson, 1998, 2000; Gibson & Thomas, 1999):

(7) [_TP [_NP The babysitter [_CP that the children loved]]] handed a toy to the baby.

Considering (7), there are at least three open dependencies at the most deeply embedded noun children: the dependency between [_NP the babysitter] and the third verb, handed, between [_NP the children] and the embedded verb, loved, plus the prediction of a gap in the object position within the RC. In theories of processing complexity wherein open dependencies are associated with storage costs, and with the number of open dependencies as a predictor (Chen et al., 2005; Gibson, 1998, 2000; Grodner & Gibson, 2005; Kaan & Stowe, 2002; Kluender & Kutas, 1993; Stepanov & Stateva, 2015), the processing complexity for centre-embedded clauses is expected to be highest at the most deeply embedded position, at the word children in the present example.

Let us consider what is expected if a centre-embedded relative clause such as that in (7) is added in the context of different WhFGD constructions. When a wh-filler is encountered during online processing, it signals the presence of a dependency that remains open until it is linked to an appropriate licensor. Encountering a relative clause, as in (7), before the dependency is completed should incur an additional processing cost. Given that different wh-fillers have different licensors, with concomitant differences in dependency length, different WhFGDs should thereby incur different costs along with the centre-embedded relative clause. These are detectable at the most deeply embedded noun position (children) in the relative clause, because the highest processing cost is predicted there. Storage costs are expected throughout the embedded subject (the babysitter that the children loved), and the choice of measuring them on the most embedded noun (children) supports increased chances to observe measurable costs.

To observe our predictions more exactly, let us walk through the hypothesised representation of each WhFGD construction, now including the complex-embedded NP. First, in (8), why_R forms a dependency with TP,⁵ and results in three open dependencies at the point of the most deeply embedded noun children. This is because the dependency between why_R and TP is completed even before the complex domain is encountered, leaving only the dependencies between [_NP the babysitter] and the third verb, that between [_NP the children] and the embedded verb loved, plus the prediction of a gap in the object position within the relative clause:

Second, why_P and how are both licensed by the VP, and there are four open dependencies at the point of children, as shown in (9):

In the construction involving who_obj, which is linked to the verb as in (10a), there are also four open dependencies at the point of children. By contrast, the construction involving who_subj involves three open dependencies at the point of children (Browning, 1987; Contreras, 1993; Stowell, 1985):

Thus, if the number of open dependencies predicts processing complexity, we expect that complexity associated with the processing at the point of children will be higher in why_P, how, and who_obj constructions relative to why_R and who_subj constructions.⁶ Given that the processing complexity is associated with a reading time slowdown in real-time sentence processing (Gibson, 1998; Gibson & Warren, 2004), we expect that the most deeply embedded noun children in the why_P, how, and who_obj constructions will be processed significantly slower than in the why_R and who_subj constructions. Furthermore, we expect integration costs when attempting to complete the dependency (de Vincenzi, 1991; Grodner & Gibson, 2005; Warren & Gibson, 2002); integration costs will be higher for who_obj and why_P compared with how, why_R, and who_subj constructions at the third verb (handed) position.

Given that why_R and why_P constructions are string ambiguous in English (i.e., the parser cannot hear the difference between reason and purpose why in this language; thus, it is typically an inferential matter), we face challenges in teasing apart any processing differences between them. To recap the expectations, why_R is linked with TP whereas why_P is linked with VP; therefore, why_P should lead to higher cost at the most deeply embedded noun. One challenge is that people may have parsing preferences for resolving this string-ambiguous element; assuming that the parser generally prefers to minimise dependency lengths to conserve memory resources (Gibson, 1998, 2000; Kazanina et al., 2007; Phillips, 2006; Pickering & Barry, 1991; Stowe, 1986), why_R, with its shorter dependency length, might be preferred. We keep this in mind in our studies, as any test of the why_R versus why_P processing distinctions needs to consider whether constructions intended with a why_P analysis are parsed as involving why_R, obscuring any differences that we otherwise expect to see.

Offline Experiments: 1a, 1b, and 1c

The purpose of Experiment 1a was to show that why_R had a shorter dependency length than other wh-fillers (how and why_P) by examining their sensitivity to negative islands. If why_P moves from VP to CP, the why_P–VP dependency spans sentential negation, and we can expect a negative island effect. If why_R moves from TP to CP, then the why_R–TP dependency does not span sentential negation and a negative island effect cannot be expected. In light of this, Experiment 1a shows the distinct structural positions of why_R and why_P where the different ratings reflect the acceptability of distinct structures due to the negative island. The longer dependency formed by why_P spans across negation, while the shorter dependency formed by why_R does not. If there are two potential dependencies formed by why, and if only why_P is sensitive to negative islands, then we should observe ratings for why_P to be lower than those for why_R.

The second acceptability rating Experiments 1b and 1c (Experiments 2a and 2b are the online counterparts of 1b and 1c) were conducted wherein Experiment 1b tested sentences with who_obj, how, and why and Experiment 1c tested sentences with who_obj, who_subj, and why. Our goal in the offline experiments was to check whether processing complexity (storage or integration costs) is manifested in offline experiments, if we assumed that greater complexity corresponds to lower ratings. If so, we might argue that this would constitute another tool to test for storage effects.

Given that different wh-fillers have different licensors depending on their grammatical functions, with concomitant differences in dependency lengths, we expected measures of processing cost to track those differences. As a control, we included sentences with that instead of a WhFGD, as these constructions do not involve movement (Ross, 1984). In Experiment 1b, at the most deeply embedded N of an intervening centre-embedded RC (at the word children in Table 3), the complexity costs would be the highest for who_obj and how compared with that (control) and why. These differences in costs could potentially be reflected in the offline acceptability ratings. Experiment 1c tested for storage effects with who_subj, who_obj, and why. At the most deeply embedded N of an RC (at the word children in Table 4), the complexity would be the highest for who_obj, with no differences between who_subj, why, and that.

Experiment 1a

Previous studies have shown that there are two different whys (Stepanov & Tsai, 2008), one licenced by the TP and the other by the VP. Experiment 1a investigated whether the distinction between reason why and purpose why in relation to their distinct licensors in TP and VP could be observed directly (Stepanov & Tsai, 2008).

We tested the acceptability ratings of Question–Answer pairs (Q/A) pairs for different wh-fillers in a negative island context.⁷ The answers for why were divided between those with because-clauses (indicating why_R) and those with in order to-clauses (indicating why_P). Owing to their different licensors, we expected that why_P should be sensitive to negative islands and why_R should be insensitive to negative islands. This is because the former moves from a VP-adjoined position to the CP, spanning negation, whereas the latter moves from the TP which does not involve spanning negation. Given the relationship between the types of answers and types of why dependency, we expected that Q/A pairs with because-clause answers would elicit higher acceptability ratings than those with in order to-clause answers in negative islands. Examples of such questions embedding negative islands are illustrated in (11a) for why_R and (11b) for why_P:

(11) (a) A: Why didn’t Mary criticize John?

B: Because his answer was correct.

(b) A: Why didn’t Mary criticize John?

B: In order to look nice.

The finding that acceptability differences are modulated by answer type⁸ would provide evidence for a representational theory in which why-questions in English can have distinct structural analyses. Meanwhile, in the absence of such a representational distinction, acceptability differences between examples such as (11a) and (11b) are not straightforwardly predicted, although there may be pragmatic sources contributing to a preference for why_R relative to why_P.

Participants, materials, and design

For this experiment, 43 native English speakers from Northwestern University participated and gave informed consent. They were granted one credit for introductory linguistic classes taught at Northwestern University.

The critical items consisted of 32 sentence sets in the form of a 1 × 4 within-subjects design, in which four different conditions were generated: how, why_R, why_P, and that (control). In addition to the current experimental items, we included 32 filler sentences that involved orthogonal manipulations of the current ones. A sample set of stimuli is shown in Table 1.

Table 1.

Sample stimuli for Experiment 1a.

Factors	Examples
Wh-fillers	Examples
how	A: How didn’t Mary criticise John? B: Harshly.
that	A: Didn’t Mary criticise John? B: No, she didn’t.
reason why (why_R)	A: Why didn’t Mary criticise John? B: Because his answer was correct.
purpose why (why_P)	A: Why didn’t Mary criticise John? B: In order to look nice.

Procedure

Stimuli were presented on a desktop PC using the Linger software (Rohde, 2003). For each stimulus, participants read the sentence on a desktop screen, and were directed to rate the sentences from 1 to 7 on the basis of their acceptability (1: totally unacceptable; 7: totally acceptable). To familiarise the participants with the rating process, five practice items were presented prior to presenting the actual experimental items.

Analysis

Data were analysed using cumulative logit models performed with the ordinal package (Christensen, 2019) in R version 3.2.3 (Baayen, 2008; Baayen et al., 2008; Bates et al., 2014).

Results and discussion

The mean acceptability scores are shown in Figure 1, and the ordinal mixed-effect model outputs (Estimate, SE, z-value, and p-value from drop1 function) are listed in Table 2. Overall, the results revealed significant differences between how, why_R, and why_P versus that. They also revealed a significant difference between how versus why_R and why_P. We also found a significant difference between why_R and why_P, which provided evidence that why_R did not involve a negation-spanning dependency (i.e., it shows no sensitivity to negative islands) whereas why_P does, supporting the representational theory (Li & Kim, 2022).

Figure 1.

Mean acceptability ratings from Experiment 1a.

Table 2.

Summary of ordinal mixed-effect model outputs in Experiment 1a.

Wh-fillers	Estimate	SE	z-value	p-value
how, why_R, and why_P vs. that	1.42	0.34	4.13***	p < .001***
how vs. why_R and why_P	2.75	0.37	7.45***	p < .001***
why_R vs. why_P	2.71	0.28	9.54***	p < .001***

***

indicates p < 0.001.

Experiments 1b and 1c

Experiments 1b and 1c examined the storage effects of different kinds of wh-fillers by embedding WhFGDs with centre-embedded RCs. As reviewed above, until a wh-filler is licenced, it participates in an open dependency. Interpolating a centre-embedded NP within the WhFGD introduces additional cost. Based on the representational differences of WhFGDs, we expect any measurable differences in cost to be predicted by the dependency lengths associated with each specific wh-filler and licensor.

Participants, materials, and design

For Experiments 1b and 1c, 45 native English speakers (Experiment 1b) and 24 speakers (Experiment 1c) from Northwestern University participated and provided their informed consent. They were granted one credit for introductory linguistic classes or were paid US$8/hr.

Critical items consisted of 24 sentence sets in a 1 × 4 design, in which four different conditions with who_obj, how, that, why (Experiment 1b) and who_obj, who_subj, that, and why (Experiment 1c) were generated. For Experiment 1c, we included who_subj in addition to who_obj, why and that and who_subj question sentences were about an altogether different verb (considered rather than loved in Table 4), that is, the dependency is resolved in a completely different clause. A sample set of stimuli is presented in Tables 3 and 4, respectively.

Table 3.

Sample stimuli for Experiment 1b.

Factors	Examples
Wh-fillers	Examples
who_obj	The father considered who the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to.
how	The father considered how the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
that	The father considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
why	The father considered why the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.

Table 4.

Sample stimuli for Experiment 1c.

Factors	Examples
Wh-fillers	Examples
who_obj	The father considered who the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to.
who_subj	Who considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma?
that	The father considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
why	The father considered why the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.

To avoid participants encountering the same types of target items consecutively, we presented the items in a pseudo-randomised manner in a Latin-square design. In addition to the current experimental items, we included 32 filler sentences⁹ that involved orthogonal manipulations to the current ones.

Procedure

The same procedure as 1a was employed.

Analysis

Data were analysed using cumulative logit models performed with the ordinal package (Christensen, 2019) in R version 3.2.3 (Baayen, 2008; Baayen et al., 2008; Bates et al., 2014). Each model included Helmert coding: for Experiment 1b, we compared (a) who_obj, how, and why with that as a baseline; (b) who_obj and how with why; and (c) who_obj and how. For Experiment 1c, we compared (a) who_obj, who_subj and why with that as a baseline; (b) who_subj and why with who_obj; and (c) who_subj and why.¹⁰ All models contained the maximal random effects structure (Barr et al., 2013), which involved random intercepts for participants and items as well as random slopes for fixed effects, given that the model successfully converged. In cases where the model failed to converge, the random effects with the smallest variance were taken out stepwise.

Results and discussion

The mean acceptability scores for Experiment 1b and 1c are shown in Figures 2 and 3, and the ordinal mixed-effect model outputs are shown in Tables 5 and 6. The results of Experiment 1b revealed no differences between who_obj, how, and why versus that. Furthermore, no differences between who_obj and how versus why were observed. But the results revealed a significant difference between who_obj and how. The relatively low acceptability of who_obj may reflect complexity effects: who_obj sentences reflect both the greatest number of open dependencies (four) and the longest duration to complete the last of those dependencies (at matrix V, later than matrix VP for how and why_P, or for why_R). Following an anonymous reviewer’s suggestions, we also directly compared that with the wh-phrases by employing t-tests. In Experiment 1b, we found significant differences between who_obj and that (β = −.44, SE = 0.11, t = −3.98) but no differences between why and that (β = .07, SE = 0.11, t = .62), which are consistent with our predictions. But we also found significant differences between how and that (β = .24, SE = 0.11, t = 2.15), where how was rated higher than that.¹¹

Figure 2.

Mean acceptability ratings from Experiment 1b.

Figure 3.

Mean acceptability ratings from Experiment 1c.

Table 5.

Summary of ordinal mixed-effect model outputs in Experiment 1b.

Summary of ordinal mixed-effect model
Wh-fillers	Estimate	SE	z-value	p-value
who_obj, how, and why vs. that	0.01	0.13	0.95	.34
who_obj and how vs. why	0.06	0.10	0.53	.59
who_obj vs. how	−0.52	0.13	−3.93 ***	8.57e-05***

***

indicates p < 0.001.

Table 6.

Summary of ordinal mixed-effect model outputs in Experiment 1c.

Summary of ordinal mixed-effect model
Wh-fillers	Estimate	SE	z-value	p-value
who_obj, who_subj, and why vs. that	0.39	0.16	2.48*	.01*
who_obj vs. who_subj and why	−0.49	0.16	−3.04**	.00**
who_subj vs. why	0.01	0.12	0.06	.95

indicates p < 0.05.

indicates p < 0.01.

In Experiment 1c, we found a significant effect between who_obj, who_subj, and why versus that. Further, we also observed significant differences between who_obj versus who_subj and why. But we found no differences between who_subj and why. Following an anonymous reviewer’s suggestions, we also directly compared that with the wh-phrases by employing t-tests in Experiment 1c. We found marginal significant differences between who_obj and that (β = −.24, SE = 0.14, t = −1.70) but no differences between who_subj and that (β = −.20, SE = 0.11, t = −1.40) nor between why and that (β = .21, SE = 0.14, t = 1.46), consistent with our predictions.

Overall, the results of Experiment 1c revealed some differences in processing complexity with respect to the different hypothesised dependency lengths; who_obj was rated significantly lower than other wh-phrases (who_subj and why), which could be predicted from the greatest number of open dependencies until the licensor was encountered.

Summary of Experiments 1a–1c

Experiment 1a, involving Q/A pairs, tested whether different wh-fillers were sensitive to negative islands, a different source of evidence for their specific structural complexity. The results revealed that why_R was insensitive to negative islands in contrast to how and why_P, as expected, and that why_R and why_P, as suggested by different answer types, were differentially sensitive to negative islands. A dependency formed by an adjunct wh-filler that spans sentential negation is judged worse than that formed by an argument wh-filler or a wh-adjunct filler that does not (Li & Kim, 2022; Ross, 1984). If why_P moves from VP to CP, then the why_P–VP dependency spans sentential negation, and a negative island effect may be expected. If why_R moves from TP to CP, then the why_R–TP dependency does not span sentential negation, and we expect no negative island effect (Li & Kim, 2022). Why_R was rated higher than why_P and how. This result supports the claim that the why_R–dependency differs from the why_P–and how–dependencies.

The primary goal of Experiments 1b and 1c was to examine whether integration and storage costs arising from wh-phrases could be revealed in offline experiments. Experiment 1b revealed significant differences between who_obj and how such that who_obj was rated lower than how. Experiment 1c revealed a difference between who_obj versus who_subj and why where who_obj was rated lower than who_subj and why, but no differences between who_subj and why. These results align with our general expectation that who_subj and why (with a general preference for interpretation as high why_R) are grammatically similar.

Note that in Experiment 1b, who_obj was rated lower than how, which was not compatible with our claim that storage/integration costs are dependent on dependency length. Although the reasons for this are unclear, studies on RCs show that subject RCs are easier to process than object RCs because of the shorter linear/structural dependency length (e.g., Gibson, 1998; Hawkins, 1994; Keenan & Comrie, 1977; King & Just, 1991). In the case of the processing of who, who can be discharged in the subject or object position. Thus, readers can initially assume that who is the subject. However, when the parser encounters the subject NP, the parser needs to discard the who_subj analysis. Then the parser needs to stretch the wh-dependency towards the object position, which is a longer dependency. Meanwhile, how cannot serve as the subject; thus, there is no ambiguity and no need for a reanalysis. Thus, the processing of how is potentially easier, leading to higher acceptability ratings than who_obj. In Experiment 1c, who_obj is rated lower compared with other wh-phrases and this could also be understood in the context of a reanalysis process.

Online Experiments 2a–2c

In Experiments 2a and 2b, we examined the storage and integration costs for different wh-fillers and asked whether these differences were predicted by their hypothesised dependency lengths. Specifically, we asked whether the conditions under which the wh-fillers (who_subj, why_R) were interpreted before the centre-embedded RC was processed could be easier than the conditions where the wh-filler had to be interpreted at or inside the VP, which requires the wh-filler to be maintained throughout the subject and centre-embedded RC (who_obj, how, why_P).

Our assumptions are as follows: storing the wh-filler in memory gives rise to processing complexity in such a way that (a) the processing should be harder in cases where the intervening material is within the dependency (between the wh-filler and its respective licensor) (storage cost); and (b) the longer the wh-filler is stored in memory, the more difficult it is to integrate the filler into the structure (integration cost) (Chen et al., 2005; Gibson, 1998, 2000; Gibson & Warren, 2004; Grodner & Gibson, 2005). If storage and integration are crucial in the mechanism of real-time sentence processing, we predict that the distance between the wh-filler and licensor should correlate with processing complexity associated with the materials intervening between the wh-filler and licensor. By including a complex domain between different wh-fillers and their clause’s verb, we expected reading times to differ in accordance with the dependency lengths of different kinds of wh-fillers at the most deeply embedded N position of that complex domain. Furthermore, we predict processing complexity at the licensor position where the dependency is completed and the wh-filler is integrated into the structure.

In anticipating the critical regions in our online studies to include the most embedded N (children) and third verb (handed), we tested sentences featuring several adjectives before the most deeply embedded noun and adverbs occurring before the third verb (see Table 8). This was expected to ensure a sufficient distance between the point at which dependency was initiated and the storage costs were measured. Specifically, we reasoned that introducing a wh-filler could introduce a new discourse referent (Gibson, 1998; Warren & Gibson, 2002), inducing an initial, independent, and non-structure-dependent processing cost. Any slowdown associated with this process at the point of the wh-filler could spill over to subsequent regions and can be misclassified as costs associated with open dependencies. The adjectives included prior to the most embedded N provide a region where such a slowdown can be accommodated.

Meanwhile, similar independent costs associated with integrating the wh-filler at the end of a dependency could contribute to processing complexity (i.e., integration costs; Gibson & Warren, 2004). By including adverbs prior to the third verb region (handed), we provide the best conditions for observing any clear differences in reading time for who_obj versus how and why at the third verb. If how associates with VP, while who_obj associates with V, then the integration costs associated with the former would arise earlier than those for the latter: for the former they would arise at the adverb, whereas only on the verb itself for the latter.

Experiment 2a

In Experiment 2a, we investigate the storage and integration costs for who_obj, how, why, and that. We predicted that reading times at the most deeply embedded N position would be longest for who_obj and how; why could be interpreted before the centre-embedded RC is processed, and hence easier to process relative to who_obj and how, which need to be interpreted at or inside the VP, requiring these wh-fillers to be maintained throughout the RC. At this point, increased processing costs for why should not be observed, given our expectation that the parser prefers a why_R parse. Specifically, if a wh-filler is integrated into the current structure only once its licencing element is encountered, we predict no differences in reading times for how and who_obj (licenced at/in VP) at the preverbal, most deeply embedded N position, but why_R must have a shorter reading time (licenced at TP). If the parser chooses why_p, the expected pattern is similar to how and who_obj.

In terms of integration costs, we expect slower reading times for sentences with how (compared with why) at the adverb because the adverb is located at the edge of VP and can signal the presence of VP and induce integration costs before the third verb is introduced. Thus, we predict that at the first adverb and spill-over regions (right before the third verb), a reading time slowdown should be observed for how compared with why. We expected storage costs for who_obj at the adverb position. For the third verb, we expected the integration costs to be the highest for who_obj compared with why and how because the dependency terminates before the third verb is encountered for why and how. The predictions of Experiment 2a are presented in Table 7.

Table 7.

Predictions for Experiment 2a.

	The most deeply embedded noun	The second adverb	The third verb
Who_obj	Slower than why and that	Slower than why and that (storage costs)	Slower than why, that, and how (integration costs)
How	Slower than why and that	Similar to why and that and faster than who_obj	Faster than who_obj and similar to why and that
That	Similar to why	Similar to why	Similar to why
Why	Faster than who_obj & how	Faster than who_obj & how	Faster than who_obj and similar to how

Participants, materials and design

Sixty-four Northwestern University students who were native English speakers provided informed consent and participated in this experiment. They were granted one credit required for introductory linguistic classes at Northwestern University.

The critical items consisted of 24 sentence sets in the form of a 1 × 4 within-subject design, in which four different conditions were generated: who_obj, how, why, and that (control). A sample set of stimuli is illustrated in Table 8 (i.e., the same as in Table 3; Kim, 2019). The items were presented in a pseudo-randomised manner. In addition to the current experimental items, we included 74 filler sentences that involved orthogonal manipulations to the current ones.

Table 8.

Sample stimuli for Experiment 2a (same as 1b).

Factors	Examples
Wh-fillers	Examples
who_obj	The father considered who the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to.
How	The father considered how the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
That	The father considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
Why	The father considered why the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.

Procedure

Stimuli were presented on a desktop PC using Linger software (Rohde, 2003) employing a self-paced word-by-word moving window paradigm (Just et al., 1982). Each experimental trial was initiated with dashes that masked words in the sentence. As participants pressed a button to move forward, each subsequent word appeared. Participants were instructed to read the sentences as they would normally read and to answer a comprehension question after reading each sentence. The yes/no comprehension question asked participants to press the F (yes) or J (no) keys. An example of a comprehension question is, Was the word “toys” mentioned in the story?. After receiving feedback on their responses, participants pressed the spacebar to proceed to the next experimental item. Six practice items were presented at the beginning of the experiment to allow participants to grasp the experimental format. The experiment took each participant approximately 30–45 mins to complete.

Analysis

Data were analysed using linear mixed-effect regression models performed with the lme4 package in R version 3.2.3 (Baayen, 2008; Baayen et al., 2008; Bates et al., 2014). Each model included Helmert coding, where at the deeply embedded N, we compared (a) who_obj, how, why with that as a baseline; (b) who_obj and how with why; and (c) who_obj with how.¹² At the adverb before the third verb and the third verb region, we compared (a) who_obj, how, why with that as a baseline; (b) who_obj with how and why and (c) why with how. At this region, we compared who_obj with how and why because the adverb already signals the introduction to the VP at which how should be integrated. Thus, we expected how and why to be read similarly unlike who_obj which induces storage costs. All models contained the maximal random effects structure that fit the data (Barr et al., 2013), including random intercepts for participants and items, and random slopes for fixed effects if the model successfully converged. In cases where the model did not converge, random effects containing the smallest variance were removed in a step-wise manner. Participants with an accuracy lower than 68% were excluded (Kim et al., 2019, 2020).

The reading times were log-transformed (Box & Cox, 1964). Two regions of interest were identified in this study. The first critical region was the most deeply embedded N (embedded N critical region: children in Table 8) where the processing complexity due to storage was expected to be the highest. We were also interested in one word following the most deeply embedded N critical region (embedded N spill-over region), given that effects arising in the critical region often spill over to one region following, that is, the spill-over region (Mitchell, 1984, 2018; Vasishth, 2006). The second critical region was the adverb region before the third verb, the third verb (e.g., benevolently and handed in Table 8) and the word following it (third verb spill-over region).

As Gibson (1998) suggests, the effect of the integration of the wh-filler can be observed at the point where the dependency is completed. At these regions, who_obj should be read slower compared with how and why because the adverb already signals the introduction to the VP at which how should be integrated. We were also interested in the spill-over regions following the third verb (e.g., the).

Results and discussion

The mean accuracy for the critical trial comprehension questions was 77.0%. We speculate that this low level of accuracy reflects the level of complexity which was necessary given the hypothesised storage and integration costs. Figures 4 to 6 present the region-by-region reading times, graphs at the critical region (the most deeply embedded N), and graphs at the third verb spill-over region, respectively. The mixed-effects model outputs are listed in Table 9.

Figure 4.

Means of region-by-region reading times from Experiment 2a.

Figure 5.

Reading times at the most deeply embedded N region (children) in Experiment 2a.

Figure 6.

Reading times at the third verb spill-over region (the) in Experiment 2a.

Table 9.

Summary of mixed-effect model outputs for Experiment 2a.

Region	Estimate (SE)	p-value	t-value
The most deeply embedded N (children)
Intercept	5.82 (0.03)		188.65
who_obj, how, and why vs. that	−0.00 (0.01)	p = .73	−0.34
who_obj and how vs. why	−0.03 (0.01)	p = .03*	−2.18*
who_obj vs. how	0.01(0.01)	p = .62	0.49
The most deeply embedded N spill-over region (loved)
Intercept	5.98 (0.04)		167.80
who_obj, how, and why vs. that	−0.03 (0.02)	p = .06	−1.91
who_obj and how vs. why	−0.00 (0.03)	p = .95	−0.07
who_obj vs. how	0.01 (0.02)	p = .38	0.88
Second adverb (benevolently)
Intercept	5.79 (0.03)		186.66
who_obj, how, and why vs. that	−0.00(0.01)	p = .89	−0.14
who_obj vs. how and why	0.01(0.01)	p = .59	0.54
how vs. why	0.01(0.01)	p = .24	1.18
Third verb (handed)
Intercept	5.81 (0.03)		206.77
who_obj, how, and why vs. that	0.02 (0.01)	p = .23	1.21
who_obj vs. how and why	0.02 (0.01)	p = .16	1.42
how vs. why	0.00 (0.01)	p = .65	0.45
Third Verb spill-over Region (the)
Intercept	5.77 (0.03)		221.23
who_obj, how, and why vs. that	−0.01 (0.01)	p = .48	−0.71
who_obj vs. how and why	0.03 (0.01)	p = .01**	2.66**
how vs. why	0.01 (0.01)	p = .22	1.24

indicates p < 0.05.

indicates p < 0.01.

The results revealed that, at the most deeply embedded N (children), who_obj and how were read significantly slower than why, whereas who_obj and how showed no significant difference from one another. We found no significant differences among who_obj, how, and why versus that. The region after the embedded N revealed a marginal difference between who_obj, how, and why versus that.¹³ The difference between why versus who_obj and how shows that these wh-fillers are processed differently, where who_obj and how behave similarly slowly at the most deeply embedded N compared with why. This is in the expected direction, given the grammatical differences between them. This finding adds to the evidence suggesting that a why_R parse is preferred to a why_P one.

At the second critical region, no significant effects were observed at the adverb and spill-over regions of the first adverb (benevolently). At the third verb (handed), there were no significant differences in reading times for different wh-phrases. However, at the third verb spill-over region (the), who_obj was read significantly slower than why and how, and no differences were observed between how and why. We return to this issue and interpret the results in the context of storage and integration costs in the "Discussion of Experiment 2a-2c" section.

Experiment 2b

In Experiment 2b, we investigate the storage and integration costs for who_obj, who_subj, why, and that. This study, in particular, examined constructions in which who is interpreted as a matrix subject (who_subj). We expected different complexity effects between who interpreted as the subject (who_subj) and as the object (who_obj), because they form dependencies with different licensors. Specifically, at the most deeply embedded N (at the word children in Table 11), in the middle of the RC, sentences involving who_obj are predicted to be read significantly slower than those involving who_subj and why. This is because the presence of the subject NP entails the presence of TP since the subject NP must be located in Spec_TP in English (Chomsky, 1981). Therefore, as for why, upon encountering the subject NP, the parser can recognise and project the TP, to which the parser can immediately link why. As for who_subj, upon encountering who_subj, the TP is projected and who_subj can be linked to Spec_TP. At the second critical region (the adverb and the third verb), we expected who_obj to be read slower than who_subj and why due to integration costs. The predictions for Experiment 2b are presented in Table 10.

Table 10.

Predictions for Experiment 2b.

	The most deeply embedded noun	The adverb	The third verb
Who_subj	Faster than who_obj and similar to why and that	Faster than who_obj and similar to why and that	Faster than who_obj and similar to why and that
Who_obj	Slower than why, that, and who_subj	Slower than why, that, and who_subj (storage costs)	Slower than why, that, and who_subj (integration costs)
That	Similar to why and who_subj	Similar to why and who_subj	Similar to why and who_subj
Why	Faster than who_obj and similar to who_subj and that	Faster than who_obj and similar to who_subj and that	Faster than who_obj and similar to who_subj and that

Participants, materials, and design

Seventy-eight Northwestern University students who were native English speakers provided informed consent and participated in this experiment. They were granted one credit necessary for introductory linguistic classes at Northwestern University or were paid US$8 for their participation.

The critical items consisted of 24 sentences in the form of a 1 × 4 within-subjects design, in which four different conditions were generated: who_obj, who_subj, why, and that (control). A sample set of stimuli is presented in Table 11 (i.e., the same as in Table 4; Kim, 2019). In addition to the current experimental items, we used 80 filler sentences that involved orthogonal manipulations to the current sentences.

Table 11.

Sample stimuli for Experiment 2b (same as 1c).

Factors	Examples
Wh-fillers	Examples
who_obj	The father considered who the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to.
who_subj	Who considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma?
that	The father considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
why	The father considered why the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.

Procedure

The same procedure was used as in Experiment 2a.

Analysis

A similar analysis was employed as with Experiment 2a.¹⁴ For each region, reading times slower than 2,000 ms were excluded from the analysis. Participants with an accuracy below 72% were excluded from the analysis.¹⁵ The critical regions include the embedded N critical region (e.g., children in Table 11), and one word following it (embedded N spill-over region). The second critical region was the adverb region before the third verb, the third verb (e.g., benevolently and handed in Table 11) and the word following it (third verb spill-over region). Each model included Helmert coding, where at the deeply embedded N as well as at the adverb before the third verb/third verb, we compared (a) who_obj, how, why with that as a baseline; (b) who_obj with who_subj and why; and (c) who_subj with why.

Table 12.

Summary of mixed-effect model outputs for Experiment 2b.

Region	Estimate (SE)	p-value	t-value
The most deeply embedded N (children)
Intercept	5.88 (0.03)		209.852
who_obj, who_subj, and why vs. that	–0.01 (0.01)	p = .24	−1.18
who_obj vs. who_subj and why	0.02 (0.01)	p = .03*	2.22 *
who_subj vs. why	0.01 (0.01)	p = .38	0.88
The most deeply embedded N spill-over region (loved)
Intercept	6.00 (0.03)		221.18
who_obj, who_subj, and why vs. that	0.02 (0.01)	p = .07	1.81
who_obj vs. who_subj and why	0.01 (0.01)	p = .31	1.02
who_subj vs. why	0.01 (0.01)	p = .51	0.67
Second adverb (benevolently)
Intercept	5.86(0.03)		222.63
who_obj, who_subj, and why vs. that	0.01(0.01)	p = .22	1.22
who_obj vs. who_subj and why	0.01(0.01)	p = .21	1.25
who_subj vs. why	−0.01(0.01)	p = .76	−0.31
Third Verb(handed)
Intercept	5.89 (0.03)		233.25
who_obj, who_subj, and why vs. that	0.01 (0.01)	p = .32	1.00
who_obj vs. who_subj and why	0.03 (0.01)	p = .004**	2.94**
who_subj vs. why	0.01 (0.01)	p = .31	1.02
Third verb spill-over region (the)
Intercept	5.86 (0.02)		273.41
who_obj, who_subj, and why vs. that	−0.01 (0.01)	p = .30	−1.03
who_obj vs. who_subj and why	0.04 (0.01)	p = 5.451e−06***	4.56 ***
who_subj vs. why	−0.00 (0.01)	p = .85	−0.19

indicates p < 0.05.

indicates p < 0.01.

***

indicates p < 0.001.

Results and discussion

The mean accuracy for the critical trial comprehension questions was 79.0%. Region-by-region reading times are shown in Figure 7. The results for the critical region (the most deeply embedded N), embedded N spill-over region, third verb, and third verb spill-over region are shown in Figures 8 to 11. The mixed-effects model outputs are listed in Table 12.

Figure 7.

Means of region-by-region reading times from Experiment 2b.

Figure 8.

Reading times at the most deeply embedded N region (children) in Experiment 2b.

Figure 9.

Reading times at the embedded N spill-over region (loved) in Experiment 2b.

Figure 10.

Reading times at the third verb (handed) in Experiment 2b.

Figure 11.

Reading times at the third verb spill-over region (the) in Experiment 2b.

The results revealed that at the most deeply embedded N, children, who_obj significantly slower than who_subj and why. In the most deeply embedded N spill-over region, loved, the difference between who_obj, who_subj, and why and that reached near significance.

We did not find any effect at the adverb regions including the spill-over regions of the first adverb (benevolently).¹⁶ However, as predicted, who_obj was read significantly slower than why and who_subj at the third verb critical region (handed) and at the third verb spill-over region (the). This suggests that the effect at the third verb and the spill-over regions is consistent with integration costs for who_obj at that point.

Experiment 2c

In Experiments 2a and 2b, we included the that condition as a baseline because that does not trigger a relevant movement dependency (or it has one less open dependency compared with other wh-dependencies). If so, and if processing complexity is measured over the number of open dependencies, we expect the that condition to show the smallest complexity effect at the most deeply embedded N compared with other movement conditions in which wh-dependencies span across TP and VP. Thus, regarding the that condition, we did not expect any complexity effect at the most deeply embedded N position. Furthermore, we did not expect any integration costs at the later point of the sentence. Along these lines, we expected that there should be a significant difference between who_obj, how, why versus that in Experiment 2a, and who_obj, who_subj, why versus that in Experiment 2b at the most deeply embedded N position. But we found a significant difference between who_obj, how, why versus that at the embedded N spill-over region only in Experiment 2a and not in 2b. This suggests the necessity of directly comparing different wh-phrases (who_obj, who_subj, how and why) in a 2 × 2 within-subjects factorial design without the inclusion of that. In this experiment, we manipulated the Length (filler licencing condition: high/TP vs. low/VP) and Filler type (NP/entity-denoting filler vs. AdvP/non-entity denoting filler), giving four different conditions: who_subj (high-NP), who_obj (low-NP), why (high-AdvP), and how (low-AdvP).

In Experiment 1c, we further investigate storage and integration costs in the processing of different kinds of wh-phrases in filler-gap dependencies in line with our assumptions regarding the structural position in which they are licenced. We conducted an experiment that allows direct comparison by manipulating the Length (filler-licencing position: high [TP] vs. low [VP]) and the Filler type (NP/entity denoting filler vs. AdvP/non-entity denoting filler) in a 2 × 2 factorial design.

At the most deeply embedded N (at the word children), in the middle of the RC, we expect a main effect of Length where the longer dependencies spanning across VP (e.g., who_obj and how) should induce additional processing costs if processing is more difficult when the intervening material is within the dependency (between the filler and the licensor, since the filler needs to be held in memory). Thus, we expect who_obj and how to be read slower than who_subj and why. But we expect no interaction of the Length and Filler type, as well as no main effect of Filler type, at the identical region. At the adverb position prior to the third verb, we expect to observe a main effect of Length where who_obj and how should be read slower than who_subj and why due to storage costs for who_obj and integration costs for how, predictions that can be generated based on the results from Experiments 2a and 2b. At the third verb, we expect an interaction between Length and Filler type where who_subj, how and why should be read faster than who_obj. The predictions for Experiment 2c are presented in Table 13.

Table 13.

Predictions for Experiment 2c.

	The most deeply embedded noun(children)	The second adverb(benevolently)	The third verb(handed)
Length	Who_obj and how significantly slower than who_subj and why	Who_obj and how significantly slower than who_subj and why	Who_obj and how significantly slower than who_subj and why
Filler type	No main effect predicted	No main effect predicted	No main effect predicted
Length × Filler type	No interaction predicted	No interaction predicted	Who_obj is significantly slower than who_subj, why and how

Participants, materials, and design

Eighty-seven native speakers of English from Prolific (https://www.prolific.co/) participated and provided their informed consent. They were paid US$5 for 20 mins of their time. We manipulated the Length (filler licencing condition: high/TP vs. low/VP) and Filler type (NP/entity-denoting filler vs. AdvP/non-entity denoting filler), giving four different conditions: who_subj (high-NP), who_obj (low-NP), why (high-AdvP), and how (low-AdvP).

The critical items consisted of 24 sentences in the form of a 2 × 2 within-subjects factorial design. A sample set of stimuli is presented in Table 14. In addition to the current experimental items, we used 37 filler sentences that contained orthogonal manipulations to the current sentences.

Table 14.

Sample stimuli for Experiment 2c.

Factors	Examples
Wh-fillers	Examples
who_obj	The father considered who the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to.
who_subj	Who considered that the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma?
how	The father considered how the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.
why	The father considered why the naive babysitter that the spunky children loved unexpectedly and benevolently handed the toys to the grandma.

Procedure

The same procedure was used as in Experiment 2a.

Analysis

A similar analysis was employed as in Experiment 2a. Each model included simple difference sum-coded fixed effects of Length (low filler licencing condition vs. high filler licencing condition; contrasts −0.5 and 0.5) and Filler Type (NP/entity-denoting filler vs. AdvP/non-entity denoting filler; contrasts −0.5 and 0.5) and their interactions. The critical regions include the embedded N critical region (e.g., children in Table 14), and one word following it (embedded N spill-over region). The second critical region included the adverb region before the third verb (e.g., benevolently in Table 14), the third verb (e.g., handed in Table 14), and the word following it (third verb spill-over region).

Results and discussion

Region-by-region reading times are shown in Figure 12. The results for the critical region (the most deeply embedded N), embedded N spill-over region, second adverb, third verb, and third verb spill-over region are shown in Figures 13 to 16. The mixed-effects model outputs are listed in Table 15.

Figure 12.

Means of region-by-region reading times from Experiment 2c.

Figure 13.

Reading times at the most deeply embedded N region (children) in Experiment 2c.

Figure 14.

Reading times at the adverb (benevolently) in Experiment 2c.

Figure 15.

Reading times at the third verb (handed) in Experiment 2c.

Figure 16.

Reading times at the third verb spill-over region (the) in Experiment 2c.

Table 15.

Summary of mixed-effect model outputs for Experiment 2c.

Region		Estimate (SE)	p-value	t-value
The most deeply embedded N (children)	Intercept	5.86 (0.04)		148.30
	Length	0.03 (0.02)	p = .04*	2.03*
	Filler type	0.01 (0.02)	p = .55	0.60
	Length × Filler type	−0.00 (0.03)	p = .93	−0.09
The most deeply embedded Nspill-over region (loved)	Intercept	5.88 (0.04)		143.61
	Length	0.02 (0.02)	p = .21	1.24
	Filler type	0.01(0.02)	p = .66	0.44
	Length × Filler type	−0.02(0.04)	p = .66	−0.44
Second adverb (benevolently)	Intercept	5.90 (0.04)		142.99
	Length	0.03 (0.02)	p = .07	1.84
	Filler type	−0.01 (0.02)	p = .76	−0.31
	Length × Filler type	−0.032(0.03)	p = .35	−0.93
Third verb (handed)	Intercept	5.89 (0.04)		153.80
	Length	0.04(0.02)	p = .01 *	2.50*
	Filler type	−0.01(0.02)	p = .76	−0.31
	Length × Filler type	−0.07 (0.03)	p = .05 *	−2.01*
Third verbspill-over region (the)	Intercept	5.87(0.03)		175.32
	Length	0.02 (0.01)	p = .11	1.60
	Filler type	−0.02 (0.01)	p = .18	−1.34
	Length × Filler type	−0.06 (0.03)	p = .04 *	−2.09*

indicates p < 0.05.

At the most deeply embedded N (children), we observed a main effect of Length (β = .03, SE = 0.02, t = 2.03), such that the reading times for wh-phrases with the low filler-licencing position (who_obj and how) were read significantly slower than those for the higher filler-licencing position (who_subj and why). However, there was no main effect of Filler type (β = .01, SE = 0.02, t = 0.55) nor an interaction between Length and Filler type (β = −.00, SE = 0.03, t = −0.09).

At the second adverb position (benevolently) prior to the third verb, we observed a marginal main effect of Length (β = .03, SE = 0.02, t = 1.84) such that who_obj and how were read significantly slower than why and who_subj. This potentially suggests the integration costs for how and storage costs for who_obj lead to slower reading times even before the third verb is encountered. At the third verb region, we continued to find a main effect of Length (β = .04, SE = 0.02, t = 2.50) as well as an interaction of Filler type and Length (β = −.07, SE = 0.03, t = −2.01) where who_obj was read slower compared with the other wh-phrases (how, why, and who_subj). At the third verb spill-over region, we continued to find an interaction between Filler type and Length (β = −.06, SE = 0.03, t = −2.09) such that who_obj was read slower compared with the other wh-phrases.

Discussion of Experiment 2a–2c

We conducted three online experiments to probe whether different wh-fillers were stored in memory for different amounts of time, and whether they were integrated into the current structure once they were licenced by their licensors. Our first self-paced reading experiment revealed that who_obj and how were read significantly slower than why at the most deeply embedded N position. Thus, once why is licenced by TP (i.e., the preferred parse of reason why), it is integrated into the current structure, and no longer consumes memory. However, who_obj and how need to be kept in memory until the third verb is reached (e.g., handed), thereby leading to an additional open dependency waiting to be resolved and contributing to increased processing complexity. At the third verb spill-over region (the), we found a significant difference between who_obj versus how and why, suggesting increased integration costs for who_obj at the third verb.

The Experiment 2b revealed that who_obj processed differently from who_subj and why at the most deeply embedded N. Given that the dependency length is a predictor of the storage cost effect, we showed that who_subj and why were integrated early once the dependency was formed. As why and who_subj are integrated when TP is recognised, no corresponding increase in the complexity at the middle of the subject is found. By contrast, who_obj must be held in memory until the third verb is reached. This results in increased complexity at the center of the centre-embedded RC. At the third verb region, we observed significant differences between who_obj versus who_subj and why where who_obj was read significantly slower than who_subj and why.

In Experiment 2c, we further observed, at the most deeply embedded N, a main effect of Length in which wh-phrases with the low filler-licencing position (who_obj and how) were read significantly slower than wh-phrases with the higher filler-licencing position (who_subj and why). At the adverb region prior to the third verb, there was also a marginal main effect of Length where who_obj and how were read slower compared with why and who_subj. At the third verb, we found a main effect of Length as well as an interaction between the Length and Filler type such that who_obj was read slower compared with the other wh-phrases (how, why, and who_subj). This further provides evidence that different WhFGD constructions have different structural distances from their licensors, and that the storage and integration costs of wh-phrases can be attributed to the dependency length between the wh-filler and its respective licensor.

The differences in reading times for who_obj and how versus who_subj and why can be accounted for in terms of the grammatical differences between these wh-fillers. Different wh-fillers have different licensors; who_obj is licenced by the verb, how by the VP, and why_R and who_subj by TP. Because different wh-fillers have different licensors with which they form dependencies, the reading times at the most deeply embedded N also differ accordingly. In our examples (see Table 14), at the point of the most deeply embedded N (children), each WhFGD construction had a different number of open dependencies. If the number of open dependencies corresponds to the processing complexity, the storage costs for who_obj and how are higher at the most deeply embedded N compared with who_subj and why.

For who_subj and why, only three open dependencies must be resolved at the most deeply embedded N in a centre-embedded RC. However, who_obj and how have four open dependencies. Thus, an extra open dependency for who_obj and how (who_obj with the verb and how with VP) leads to slower reading times at the most deeply embedded N. These results are consistent with the Syntactic Prediction Locality Theory (SPLT: Gibson, 1998; Gibson & Warren, 2004), where the WhFGD formation and the object-gapped centre-embedded RC interact in such a way that the storage costs are at their highest when the highest number of dependencies are awaiting resolution.

In Experiment 2a, as we have discussed earlier, at the adverb position, we expected who_obj to be read slower compared with how and why due to storage costs, whereas how and why should be read similarly. Regarding integration costs, given that who_obj is an argument of the verb handed, and how serves as a modifier of the VP, we expected the reading time differences at the adverb because the parser integrates how upon encountering the adverb already, which marks the beginning of the matrix VP. However, we found no reading time differences between who_obj versus how and why nor between how and why at the adverb position. Why did we observe no reading time differences? This may be because at the point of an adverb marking VP, how induces an integration cost that spills over to the first and second adverbs. Thus, reading times at the adverbs prior to the third verb were still slower because of integration costs for how. Consequently, we did not observe the expected difference between how and who_obj at the first or second adverb. For who_obj, it is read similarly to how because of storage costs.

Also, we did not observe reading time differences between how and why. This is possibly because when the verb is encountered, the parser attempts to reanalyse why_R to why_P. From the retrieval point of view, the verb could deploy retrieval cues related to adverbs that reactivate why, as adverbs can form a dependency with the verb. Since why is an adverb which can be attached to VP (as in why_P), certain interference effects due to the memory retrieval could potentially cause a reading time slowdown at the adverb position for why. But if we assume that retrieval of adverbs (including why and preverbal adverbs) is only triggered at the point of the verb, as adverbs and verbs form dependencies, then an adverb should not trigger retrieval of another adverb. In this sense, the observed effect can be better explained by an encoding interference effect, that is, the encoding interference effect arises when the elements within the sentence are of similar properties, which are encoded in memory with less distinctiveness. In our experiment, why is an adverb which can be attached to VP (as in why_P) and facing another adverb prior to the third verb would lead to degraded representation in memory (Gordon et al., 2001, 2004)

But in Experiment 2c, when we compared only wh-phrases with the exclusion of that, we did observe a marginal main effect of Length at the adverb position where reading times for who_obj and how were slower than why and who_subj.

In Experiment 2a, our results showed that who_obj was read significantly slower than why and how at the spill-over region after the third verb (handed), but no differences were observed between how and why. In Experiments 2b and 2c, at the third verb (handed) as well as at that verb’s spill-over region (the), who_obj was read significantly slower compared with why and who_subj_. Given that who_obj is an argument of that verb, and given that how is a modifier of its VP, the reader integrates how upon encountering the adverb before the third verb which marks the beginning of the matrix VP; hence, how does not affect reading time at the third verb.

General discussion

We examined the storage cost for different kinds of wh-fillers in a complex domain, namely when paired with a centre-embedded RC configuration. We hypothesised that if distinct wh-fillers need to be stored until their licensors are encountered, and are subsequently integrated to the current structure, then different wh-fillers should incur different processing costs according to their dependency lengths at the most deeply embedded N on the subject. Through both acceptability and online reading experiments, we presented evidence that readers store wh-fillers in memory by showing that processing costs are higher for who_obj and how, and less so for who_subj and why at the most deeply embedded noun position within the subject.

Sentences with why are structurally ambiguous (why asking for the reason, or why_R, and why asking for purpose, or why_P). As Stepanov and Tsai (2008) note, why_P specifically forms a dependency with VP (Bale, 2007; Bromberger, 1992; Chapman & Kučerová, 2016; Kawamura, 2007; Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015). The longer dependency formed by why_P spans across negation, and the shorter dependency formed by why_R does not span across negation. Our Experiment 1a confirms this claim, as why_P was rated lower in negative island contexts. Thus, this experiment confirms that there are two different dependencies formed by why with only why_P sensitive to negative islands. If the parser prefers the shorter dependency, the parser prefers the why_R dependency over the why_P dependency.

We observed why behaving differently from other wh-fillers such as who_obj and how, but similarly to who_subj in our online experiments. This pattern may be due to the principle of the parsing economy: the parser assumes the shortest dependency that is grammatically licenced (de Vincenzi, 1991; Gibson, 1998, 2000; Kazanina et al., 2007; Phillips, 2006; Pickering & Barry, 1991; Stowe, 1986). That is, why could be licenced either by TP or VP, but wh-TP dependency (reason why) is shorter than wh-VP dependency (purpose why). Thus, when encountering a why-sentence, the parser tries to find a way to licence why, and discharges the dependency as soon as it encounters the TP. If so, it naturally follows that sentences with why are read faster than other wh-fillers at the most deeply embedded N—it is preferentially read as the shorter-dependency why_R. In our online experiments, the reading times for sentences with why were shorter than those for how and who_obj and behaved similarly with who_subj at the most deeply embedded N. Given that why forms a dependency with TP, only three open dependencies need to be resolved at the most deeply embedded N.

Regarding the integration effects of how and who_obj in Experiment 2a, who_obj induces the storage cost effect in the regions before the third verb because the who_obj dependency is not yet terminated at the point of the adverb, and needs to be actively stored in memory. We further predicted that how should be read faster than who_obj and should be read similarly to why at the adverb position. But we found no reading time differences between who_obj versus how and why at the adverb position. Why did we not observe reading time differences at this region? If the adverb position before the third verb signals the upcoming VP, this could lead to integration costs for how before the third verb is encountered. Integrating newly encountered materials into the existing structure causes integration costs, which gives rise to reading time slowdown. If this is the case, the third verb region in who_obj is read slower than in the other conditions, and the adverb region in how condition is read significantly slower compared with other conditions owing to differences in integration costs. Such costs of how at the adverb could be conflated with the active maintenance cost of who_obj at the same point. Consequently, the difference between how and who_obj at the first and second adverbs was not observed. The differences between who_obj versus how and why were observed only at the third verb spill-over region in Experiment 2a, which could be due to integration with the verb and spill-over to the subsequent region. Our results showed that increased reading times at the third verb integrating who_obj with the third verb was costly at that point.

Regarding the general implications of our study, we argue that various components of memory (storage and integration) actively interact with online structure building. Specifically, both storage and integration elements play important roles over the course of online dependency formation. In our experiments, the distance between different wh-fillers (who_obj, who_subj, why, and how) and their licensors led to different storage cost effects at a most deeply embedded position, in our case the most deeply embedded N in a centre-embedded RC. The differences in the complexity effects for different wh-fillers follow naturally if we assume that different wh-fillers are stored in memory for different durations, and integrated at targeted points. According to processing complexity accounts wherein processing complexity rests on the number of open dependencies, the highest storage costs can be observed at the most deeply embedded N position.

Our account of the differences between how and who_obj at the third verb assumes that integration of who_obj occurs when the reader encounters the verb. A dependency can be established only when the licencing element (the word that deploys the retrieval cues and triggers the search, such as a verb) is introduced as lexical input, and when the reader encounters that element. It is also possible that the dependency relation may be terminated earlier in the middle of the dependency before the licencing element is encountered. Specifically, if readers can predict the relevant verb to come, they can trigger retrieval using this predictively inserted verb. One potential problem of the “predictive retrieval” frame is that it cannot explain the different complexity effects observed for different wh-fillers at the most deeply embedded N within the complex domain since this account does not assume that wh-phrases are actively maintained in memory until the dependency is resolved and makes no predictions pertaining to the storage cost effects. It remains unanswered whether the insertion of the head is predictive and stochastic, or deterministic and tightly linked to the licencing element.

If the dependency formation process involves only the integration process under the content-addressable memory store (Lewis & Vasishth, 2005) at the end of the dependency, then the storage cost effect at the middle of the sentence is not expected. In other words, the effect should be observed only at the end of the dependency or at the position where it is resolved. However, we do not expect to observe an effect at a position irrelevant to the dependency. On the contrary, if storage is at work, we can observe the effect at the middle of the sentence; thus, we can observe the effect at the position that is not relevant to the dependency itself (i.e., not the beginning or end of the dependency).

Finally, we can ask what kind of information is stored in the processing of different wh-fillers. Even if fillers such as why or who_subj form a dependency and are integrated earlier in the sentence, some information needs to be kept in memory at least for semantic purposes. Who_subj depends on VP for its thematic information, whereas why is interpreted relative to a complete proposition. Thus, while the fillers’ syntactic information is integrated at points before the verb, some aspects of their representation should still be kept in memory. The specific aspects are understood in terms of the features encoded by each wh-filler. Wh-fillers encode syntactic, semantic, and morphological features, and engage in dependency relations with licencing elements. The termination of the syntactic part of the dependency (e.g., checking formal/syntactic features) leads to the effects of processing complexity. For example, why has features like [wh], [Q(uestion)], and [T(ense)] and semantic features (Stepanov & Tsai, 2008; Tsai, 2008; Yoshida et al., 2015). When C is encountered, the [wh] and [Q] features can be eliminated; however, the semantic features are not checked or eliminated. Thus, the number of open dependencies is reduced at “integration,” but this need not entail that all relevant aspects of the why-dependency are established and terminated. Some semantic information (e.g., thematic roles) can be maintained in memory and used to achieve semantic interpretation later.

Thus, our study provides strong evidence that unintegrated wh-fillers are actively maintained in working memory, and that both active maintenance and integration components play critical roles in the mechanism of real-time sentence processing (Gibson, 1998; Gibson & Warren, 2004; Kim et al., 2020; Ness & Meltzer-Asscher, 2017, 2019; Wagers & Phillips, 2014; Wanner & Maratsos, 1978).

Conclusion

Our study showed that readers store different kinds of wh-fillers for different durations, depending on the licensors of these wh-fillers. We compared the storage profiles for the adjunct why, how, and argument who (serving as subject or object of the verb). The results of our experiments revealed that different kinds of wh-fillers are stored in memory for different amounts of time, depending on the structural distance between each wh-filler and its respective licensor, by which the grammatical function and interpretation of the wh-filler are governed. Our results also demonstrated that once the wh-filler is licenced, it is integrated into the current structure, no longer engendering additional memory costs. Based on these findings, we argued that the mechanism of online sentence processing employs both storage and integration components in the memory.

Supplemental Material

sj-pdf-1-qjp-10.1177_17470218241231872 – Supplemental material for Processing wh-filler-gap dependencies

Supplemental material, sj-pdf-1-qjp-10.1177_17470218241231872 for Processing wh-filler-gap dependencies by Nayoun Kim, Alexis Wellwood and Masaya Yoshida in Quarterly Journal of Experimental Psychology

Footnotes

Acknowledgements

The authors are extremely grateful to the anonymous reviewers for their invaluable suggestions and comments. They also thank the members of the Syntax, Semantics, and the Sentence Processing Lab at Northwestern University, audiences at Haskins Laboratories, and the audiences at the 30th and 32nd Annual CUNY Conference on Human Sentence Processing.

Authors' note

This paper is based on the second chapter of the first author's Ph.D dissertation. It is the natural extension and modification of some portions of Kim (2019)'s dissertation (Kim, 2019).

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2023S1A5A8079483) as well as the National Science Foundation DDRI Grant (BCS-1749580).

ORCID iD

Nayoun Kim

Supplementary Material

The Supplementary Material is available at .

Notes

References

Abney

S. P.

Johnson

(1991). Memory requirements and local ambiguities of parsing strategies. Journal of Psycholinguistic Research, 20(3), 233–250. https://doi.org/10.1007/BF01067217

Abrusán

(2011). Presuppositional and negative islands: A semantic account. Natural Language Semantics, 19(3), 257–321. https://doi.org/10.1007/s11050-010-9064-4

Baayen

R. H.

(2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge University Press. https://doi.org/10.1017/cbo9780511801686.011

Baayen

R. H.

Davidson

D. J.

Bates

D. M.

(2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005

Bale

A. C.

(2007). Quantifiers and verb phrases: An exploration of propositional complexity. Natural Language & Linguistic Theory, 25(3), 447–483. https://doi.org/10.1007/s11049-007-9019-8

Barr

D. J.

Levy

Scheepers

Tily

H. J.

(2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001

Bates

Maechler

Bolker

Walker

(2014). Lme4: Linear mixed-effects models using Eigen and S4 [R package version]. https://cran.r-project.org/web/packages/lme4/index.html

Box

G. E.

Cox

D. R.

(1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x

Bresnan

J. W.

(1973). Syntax of the comparative clause construction in English. Linguistic Inquiry, 4(3), 275–343. https://www.jstor.org/stable/4177775

10.

Bromberger

(1992). On what we know we don’t know: Explanation, theory, linguistics, and how questions shape them. University of Chicago Press.

11.

Browning

(1987). Null operator constructions [Doctoral dissertation]. Massachusetts Institute of Technology.

12.

Chacón

D. A.

Imtiaz

Dasgupta

Murshed

S. M.

Dan

Phillips

(2016). Locality and word order in active dependency formation in Bangla. Frontiers in Psychology, 7, Article 1235. https://doi.org/10.3389/fpsyg.2016.01235

13.

Chapman

Kučerová

(2016). Structural and semantic ambiguity of why-questions: An overlooked case of weak islands in English. Proceedings of the Linguistic Society of America, 1, 15. https://doi.org/10.3765/plsa.v1i0.3713

14.

Chen

Gibson

Wolf

(2005). Online syntactic storage costs in sentence comprehension. Journal of Memory and Language, 52(1), 144–169. https://doi.org/10.1016/j.jml.2004.10.001

15.

Chomsky

(1981). Lectures on government and binding. eds. Koster

Van Riemsdijk

New York, NY: Foris

16.

Chomsky

(1977). On wh-movement. In Culicover

P. W.

Wasow

Akmajian

(Eds.), Formal syntax (pp.71–132). Academic Press.

17.

Chomsky

(1986). Barriers (Vol. 13). MIT Press.

18.

Chomsky

Miller

G. A.

(1963). Introduction to the formal analysis of natural languages. In Luce

R. D.

Bush

R. R.

Galanter

(Eds.), Handbook of mathematical psychology (pp.269–321). John Wiley. https://doi.org/10.1515/9783111541587

19.

Christensen

R. H. B.

(2019). A tutorial on fitting cumulative link mixed models with clmm2 from the ordinal package (Tutorial for the R Package ordinal). https://cran.r-project.org/web/packages/ordinal

20.

Cinque

(1990). Types of A-dependencies. MIT Press.

21.

Contreras

(1993). On null operator structures. Natural Language & Linguistic Theory, 11(1), 1–30. https://doi.org/10.1007/bf00993019

22.

Crain

Fodor

J. D.

(1985). Rules and constraints in sentence processing. North East Linguistics Society, 15(1), 8.

23.

de Vincenzi

. (1991). Syntactic parsing strategies in Italian: The minimal chain principle (Vol. 12). Springer Science & Business Media.

24.

Ernst

(2001). The syntax of adjuncts (Vol. 96). Cambridge University Press. https://doi.org/10.1017/CBO9780511486258

25.

Ernst

(2007). On the role of semantics in a theory of adverb syntax. Lingua, 117(6), 1008–1033. https://doi.org/10.1016/j.lingua.2005.03.015

26.

Fiebach

C. J.

Schlesewsky

Friederici

A. D.

(2002). Separating syntactic memory costs and syntactic integration costs during parsing: The processing of German WH-questions. Journal of Memory and Language, 47(2), 250–272. https://doi.org/10.1016/S0749-596X(02)00004-9

27.

Frazier

(1987). Syntactic processing: Evidence from Dutch. Natural Language and Linguistics Theory, 5(4), 519–559. https://doi.org/10.1007/bf00138988

28.

Gibson

(1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1–76. https://doi.org/10.1016/s0010-0277(98)00034-1

29.

Gibson

(2000). The dependency locality theory A distance-based theory of linguistic complexity. In Marantz

Miyashita

O’Neil

(Eds.), Image, language, brain: Papers from the first Mind Articulation Project symposium (pp. 95–126). The MIT Press. https://doi.org/10.7551/mitpress/3654.003.0008

30.

Gibson

Thomas

(1999). Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes, 14(3), 225–248. https://doi.org/10.1080/016909699386293

31.

Gibson

Warren

(2004). Reading-time evidence for intermediate linguistic structure in long-distance dependencies. Syntax, 7(1), 55–78. https://doi.org/10.1111/j.1368-0005.2004.00065.x

32.

Gibson

E. A. F.

(1991). A computational theory of human linguistic processing: Memory limitations and processing breakdown [Doctoral dissertation]. Carnegie Mellon University.

33.

Gordon

P. C.

Hendrick

Johnson

(2001). Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(6), 1411. https://doi.org/10.1037/0278-7393.27.6.1411

34.

Gordon

P. C.

Hendrick

Johnson

(2004). Effects of noun phrase type on sentence complexity. Journal of Memory and Language, 51(1), 97–114. https://doi.org/10.1016/j.jml.2004.02.003

35.

Grodner

Gibson

(2005). Consequences of the serial nature of linguistic input for sentential complexity. Cognitive Science, 29(2), 261–290. https://doi.org/10.1207/s15516709cog0000_7

36.

Hale

(2016). Information-theoretical complexity metrics. Language and Linguistics Compass, 10(9), 397–412. https://doi.org/10.1111/lnc3.12196

37.

Hawkins

J. A.

(1994). A performance theory of order and constituency (No. 73). Cambridge University Press. https://doi.org/10.1017/CBO9780511554285

38.

Huang

C. T. J.

(1982). Logical relations in Chinese and the theory of grammar [Doctoral dissertation]. Massachusetts Institute of Technology.

39.

Huddleston

Pullum

(2005). The Cambridge grammar of the English language. Zeitschrift für Anglistik und Amerikanistik, 53(2), 193–194.

40.

Jackendoff

R. S.

(1972). Semantic interpretation in generative grammar. MIT Press.

41.

Just

M. A.

Carpenter

P. A.

Woolley

J. D.

(1982). Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General, 111(2), 228–238. https://doi.org/10.1037/0096-3445.111.2.228

42.

Kaan

Stowe

(2002). Storage and computation in sentence processing a neuroimaging perspective. In Nooteboom

Weerman

Wijnen

(Eds.), Storage and computation in the language faculty. Studies in theoretical psycholinguistics (Vol. 30, pp. 257–295). Springer. https://doi.org/10.1007/978-94-010-0355-1_9

43.

Kawamura

(2007). Some interactions of focus and focus sensitive elements [Doctoral dissertation]. State University of New York at Stony Brook.

44.

Kazanina

Lau

E. F.

Lieberman

Yoshida

Phillips

(2007). The effect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language, 56(3), 384–409. https://doi.org/10.1016/j.jml.2006.09.003

45.

Keenan

E. L.

Comrie

(1977). Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8(1), 63–99. http://www.jstor.org/stable/4177973

46.

Kim

(2019). Hold, Release, and Retrieve: The Study of Wh-Filler-Gap Dependencies and Ellipsis [dissertation]. Northwestern University.

47.

Kim

Brehm

Sturt

Yoshida

(2019, January 3–6). Processing of different kind of fillers: Reactivated fillers vs. active fillers. The 93rd Annual Meeting of the Linguistic Society of America, New York, NY, United States.

48.

Kim

Brehm

Sturt

Yoshida

(2020). How long can you hold the filler: Maintenance and retrieval. Language, Cognition and Neuroscience, 35(1), 17–42. https://doi.org/10.1080/23273798.2019.1626456

49.

King

Just

M. A.

(1991). Individual differences in syntactic processing: The role of working memory. Journal of Memory and Language, 30(5), 580–602. https://doi.org/10.1016/0749-596x(91)90027-h

50.

Kluender

Kutas

(1993). Bridging the gap: Evidence from ERPs on the processing of unbounded dependencies. Journal of Cognitive Neuroscience, 5(2), 196–214. https://doi.org/10.1162/jocn.1993.5.2.196

51.

(2005). Syntax of Why-in-situ: Merge into [SPEC, CP] in the overt syntax. Natural Language & Linguistic Theory, 23(4), 867–916. https://doi.org/10.1007/s11049-004-5923-3

52.

Lasnik

Saito

(1994). Move alpha: Conditions on its application and output (No. 22). MIT press.

53.

Lee

M. W.

(2004). Another look at the role of empty categories in sentence processing (and grammar). Journal of Psycholinguistic Research, 33(1), 51–73. https://doi.org/10.1023/b:jopr.0000010514.50468.30

54.

Lewis

R. L.

Vasishth

(2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29(3), 375–419. https://doi.org/10.1207/s15516709cog0000_25

55.

Kim

(2022). Asymmetries of English wh-adjunct/argument in negative island sensitivity: An experimental study. Discourse and Cognition, 29(4), 37–59. https://doi.org/10.15718/discog.2022.29.4.37

56.

Lowder

M. W.

Gordon

P. C.

(2012). The pistol that injured the cowboy: Difficulty with inanimate subject–verb integration is reduced by structural separation. Journal of Memory and Language, 66(4), 819–832. https://doi.org/10.1016/j.jml.2012.03.006

57.

McElree

Foraker

Dyer

(2003). Memory structures that subserve sentence comprehension. Journal of Memory and Language, 48(1), 67–91. https://doi.org/10.1016/s0749-596x(02)00515-6

58.

Mitchell

D. C.

(2018). An evaluation of subject-paced reading tasks and other methods for investigating immediate processes in reading1. In Kieras

D. E.

Just

M. A.

(Eds.), New methods in reading comprehension research (pp.69–89). Routledge. https://doi.org/10.4324/9780429505379-4 (Original work published 1984).

59.

Miyagawa

(2004). The nature of weak islands. Massachusetts Institute of Technology. https://static1.squarespace.com/static/5728dfa81bbee0da7943b811/t/5799a57a414fb51b6f3e169c/1469687164291/Weak+Islands+2.pdf

60.

Ness

Meltzer-Asscher

(2017). Working memory in the processing of long-distance dependencies: Interference and filler maintenance. Journal of Psycholinguistic Research, 46(6), 1353–1365. https://doi.org/10.1007/s10936-017-9499-6

61.

Ness

Meltzer-Asscher

(2019). When is the verb a potential gap site? The influence of filler maintenance on the active search for a gap. Language, Cognition and Neuroscience, 34(7), 936–948. https://doi.org/10.1080/23273798.2019.1591471

62.

Nicenboim

Schad

D. J.

Vasishth

(2021). Introduction to Bayesian data analysis for cognitive science (Under contract with Chapman and Hall/CRC Statistics in the Social and Behavioral Sciences Series). https://vasishth.github.io/bayescogsci/.

63.

Orth

Yoshida

Sloggett

(2021). Negative polarity item (NPI) illusion is a quantification phenomenon. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(6), 906–947. https://doi.org/10.1037/xlm0000957

64.

Phillips

(2006). The real-time status of island phenomena. Language, 82(4), 795–823. https://doi.org/10.1353/lan.2006.0217

65.

Pickering

Barry

(1991). Sentence processing without empty categories. Language and Cognitive Processes, 6(3), 229–259. https://doi.org/10.1080/01690969108406944

66.

Rizzi

(1990). Relativized minimality. The MIT Press.

67.

Rohde

D. L.

(2003). Linger: A flexible platform for language processing experiments (Version 2.94). https://web.archive.org/web/20191220181934/http://tedlab.mit.edu/~dr/Linger/

68.

Ross

(1984). Inner islands. In Brugman

Macaulay

(Eds.), Annual Meeting of the Berkeley Linguistics Society (Vol. 10, pp. 258–265). University of California, Berkeley. https://doi.org/10.3765/bls.v10i0.1940

69.

Rullmann

(1995). Maximality in the semantics of wh-constructions [Doctoral dissertation]. University of Massachusetts Amherst].

70.

Stepanov

Stateva

(2015). Cross-linguistic evidence for memory storage costs in filler-gap dependencies with wh-adjuncts. Frontiers in Psychology, 6, Article 1301. https://doi.org/10.3389/fpsyg.2015.01301

71.

Stepanov

Tsai

W. T. D.

(2008). Cartography and licensing of wh-adjuncts: A cross-linguistic perspective. Natural Language & Linguistic Theory, 26(3), 589–638. https://doi.org/10.1007/s11049-008-9047-z

72.

Stowe

L. A.

(1986). Parsing WH-constructions: Evidence for on-line gap location. Language and Cognitive Processes, 1(3), 227–245. https://doi.org/10.1080/01690968608407062

73.

Stowell

(1985). Licensing conditions on null operators. Proceedings of the West Coast Conference on Formal Linguistics, 4, 314–326.

74.

Tsai

W. T. D.

(2008). Left periphery and how-why alternations. Journal of East Asian Linguistics, 17, 83–115. https://doi.org/10.1007/s10831-008-9021-0

75.

Vasishth

(2006). On the proper treatment of spillover in real-time reading studies: Consequences for psycholinguistic theories. In Proceedings of the International Conference on Linguistic Evidence (pp. 96–100). University of Tübingen.

76.

Wagers

M. W.

Borja

M. F.

Chung

(2015). The real-time comprehension of WH-dependencies in a WH-agreement language. Language, 91(1), 109–144. https://doi.org/10.1353/lan.2015.0001

77.

Wagers

M. W.

Phillips

(2014). Going the distance: Memory and control processes in active dependency construction. Quarterly Journal of Experimental Psychology, 67(7), 1274–1304. https://doi.org/10.1080/17470218.2013.858363

78.

Wanner

Maratsos

(1978). An ATN approach to comprehension. In Halle

M. A.

Bresnan

Miller

G. A.

(Eds.), Linguistic theory and psychological reality (pp. 119–159). MIT Press.

79.

Warren

Gibson

(2002). The influence of referential processing on sentence complexity. Cognition, 85(1), 79–112. https://doi.org/10.1016/s0010-0277(02)00087-2

80.

Yoshida

Nakao

Ortega-Santos

(2015). The syntax of why-stripping. Natural Language & Linguistic Theory, 33(1), 323–370. https://doi.org/10.1007/s11049-014-9253-9

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB