Abstract
Aims and objectives:
This study analyzes the proclitic and enclitic positions of Spanish clitic se (e.g., ella se quería ir / ella quería irse ‘she wanted to go’) across two generations of Spanish speakers in New York City. In an effort to contribute to ongoing research aimed at better understanding Spanish in the US, the following questions are addressed. In syntactic environments that permit variation, does placement of Spanish se differ between the two generations? From the internal variables identified for this study (nonfinite verb type, finite verb, tense of finite verb, grammatical person, use of se, grammatical mood of finite verb, negation), which ones have a statistically significant effect on placement? From the external variables identified for this study (national origin, region, areal origins, sex, age, years in US, socioeconomic class, education, English skill, Spanish skill, general Spanish use), which ones have a statistically significant effect on placement?
Design and data:
This study is carried out within a variationist-sociolinguistic framework and the sample consists of 50 participants, 25 from the first generation (G1) and 25 from the second (G2).
Analysis:
Bivariate chi-square tests are performed in order to determine what internal and external variables constrain placement of the dependent variable (clitic se placement).
Findings:
Generation has a statistically significant effect on placement (p = .016), wherein proclisis is more frequent amongst the G2 participants. These results corroborate previous research showing an overall preference for proclisis in both monolingual and bilingual/heritage speakers. Further, chi-square tests pinpoint five conditioning effects for G1 (nonfinite verb type, use of se, finite verb, years in US, and English skill), but only two for G2 (use of se and English skill).
Originality and implications:
The present study is the first to discover strong correlations between the proclitic position and the numerous internal and external variables quantitatively assessed. Future research is thus warranted.
Keywords
Introduction
The US variety of Spanish has been the subject of a myriad of linguistic studies over the past several decades, ranging from lexical and phonological (e.g., Brice et al., 2009; Goldstein, 2005; Kohnert et al. 1999; Montrul, 2006; Rao, 2019; Zentella, 1990), to grammatical and sociolinguistic (e.g., Viner, 2016, 2017, 2018a, 2018b, 2020; Erker & Otheguy, 2016; Giancaspro, 2019a, 2019b; Otheguy & Zentella, 2012). Amongst the latter, we find a highly variable syntactic feature: Spanish clitics (SC). Scholars have analyzed the variable behavior of SC in monolingual (e.g., Davies, 1995; Gudmestad, 2006; Schwenter & Torres Cacoullos, 2014) and language contact settings (e.g., Gutiérrez, 2008; Limerick, 2018; Shin et al., 2017; Shin et al., 2019; Silva-Corvalán & Gutiérrez, 1995). Studies carried out in the US are of particular interest given the radical differences of this grammatical feature between Spanish, the subordinate language, and English, the predominant language. Although many of the specific findings from these investigations vary, the proclitic positon is by far dominant, for both monolinguals and bilingual/heritage Spanish speakers (e.g., Limerick, 2018; Shin et al., 2017). However, the majority of these studies center on accusative objects, with a heavy focus on lo/s/la/s ‘it-he/they/it-she/they’ (e.g., Darwich, 2007; Shin et al., 2017), paying little-to-no attention to other common SC, such as the variable position of the very frequent deictic clitic se (e.g., ella se quería ir / ella quería irse ‘she wanted to go’). As such, the present study considers placement of Spanish se, in all of its uses – that is, as a reflexive, reciprocal, or an impersonal – in the speech of 50 participants from two generations (balanced at 25 participants from each) in New York City (NYC). To that end, we address the following research questions:
In syntactic environments that permit variation, does placement of Spanish se differ between the two generations?
From the internal variables identified for this study (nonfinite verb type, finite verb, tense of finite verb, grammatical person, use of se, grammatical mood of finite verb, negation), which ones have a statistically significant effect on placement?
From the external variables identified for this study (national origin, region, areal origins, sex, age, years in US, socioeconomic class, education, English skill, Spanish skill, general Spanish use), which ones have a statistically significant effect on placement?
Tackling these questions by way of quantitative analyses will contribute to ongoing efforts to better understand Spanish in the US.
Participants and data
As briefly mentioned above, the present study analyzes data stemming from 50 Spanish speakers, 25 of which fit into our pre-determined criteria for the first generation (G1); the other 25, the second generation (G2). Following Viner, 2016, 2017, 2018a, 2018b, 2020 and Otheguy and Zentella (2012), G1 participants arrived in the US at age 16 or older and G2 participants were born in the US or brought there before age three. The rationale supporting these generational stratification criteria is that because G1 moved to the US so far along in their language development, their Spanish would represent the benchmark to which G2’s Spanish could be compared.
Regarding the source of data, we used the now well-known corpus developed by Otheguy and Zentella from which many linguistic projects and publications have stemmed (e.g., Viner, 2016, 2017, 2018a, 2018b, 2020; Erker & Otheguy, 2016; Otheguy & Zentella, 2012; Shin & Erker, 2015; Shin & Otheguy, 2013). The corpus consists of transcribed sociolinguistic interviews that were recorded in participants’ homes, lasting on average one hour. The aim was to engage in natural conversation, at least as natural as possible given the presence of a recording device and an interviewer. For the present study, balance for generation was achieved (G1 = 25; G2 = 25), as was national origin, grouped into 2 primary categories: Caribbean (14 participants per generation, 28 total) and Mainland (11 participants per generation, 22 total). The countries that comprised the Caribbean cohort included Puerto Rico, the Dominican Republic, and Cuba; Mainland included Mexico, Ecuador, and Colombia. These six countries represent the national origins of the largest Spanish-speaking groups in NYC. Moreover, because Spanish speakers with Caribbean origins are the majority in NYC, our sample consists of three more participants in each generation from this category in order to better represent this sociolinguistic reality. Finally, all efforts were made to balance for speaker sex, but this was not possible: G1 16 females, 9 males; G2 12 females, 13 males. Even though precise balance for speaker sex was not achieved, speaker sex has yet to show any significant effect in previous studies centered on the present topic (Limerick, 2018). Moreover, there is ample representation of each sex within the two generational cohorts and, as such, the subsequent findings are reliable.
Methodology
The dependent variable
The dependent variable identified for this morphosyntactic investigation is the syntactic position of Spanish se. In order to be included in the study, the clitic must appear in an environment that permits variation of its position; that is, it can manifest as either a proclitic or an enclitic. The following examples will illustrate the syntactic contexts that allow for inclusion and exclusion of the item.
1. Entonces ‘So that should be stopped’ Y eso ‘And that should be stopped’ 2. Uno ‘One has to marry the family as well’ Uno ‘One only has to marry the woman’
Examples 1 and 2 above show variation of the dependent variable produced by two participants, one from each of the two generational cohorts (G1 and G2). We chose transparent examples in order to clearly highlight variation of the form. That is, the syntactic and semantic factors are essentially identical in each set of examples, for the given generation. What is more, these examples also demonstrate individual variation, i.e., the same individual participant using the form in both of the possible positions. For G1 participant 263, we see the use of proclitic se debe acabar, followed by enclitic debe acabarse, both conveying the same meaning – ‘should be stopped’. Next, G2 331 follows the same pattern, only the participant uses se tiene que casar and tiene que casarse, both meaning ‘one has to marry’. As such, the basic syntactic formula for variation is finite verb + nonfinite verb, wherein se can appear either in the proclitic or enclitic position. With respect to the nonfinite verb, it can either be an infinitive (e.g., casarse ‘to marry’) or a gerund (e.g., casándose ‘marrying’).
Equally important to firmly establishing the parameters for inclusion in the envelope of variation are the criteria that exclude the variable under study. We consider these now, starting with the most obvious cases before moving on to the more complex ones.
3. a. Él se rió muchísimo - G1 002 ‘He laughed a lot’ b. Es porque en inglés se dice trust - G2 198 ‘It’s because in English you say trust’ c. Cuando le da el deseo a ella venirse para acá - G2 086 ‘When she feels like coming around here’ d. Todas las parejas dizque dándose besos y abrazos - G2 311 ‘All the couples appearantly kissing and hugging each other’ 4. a. Le gusta superarse - G1 354 ‘S/he likes to better her/him self’ b. Hay que recordarse mucho - G1 374 ‘One must remember a lot’ 5. a. Tengo que traérselo acá - G2 206 ‘I have to bring it to them here’ b.Yo se lo estoy diciendo - G2 329 ‘I’m saying it to them’
The reason for the exclusion of se in the utterances situated under example 3 is evident – they do not meet the criteria laid out in the basic syntactic formula presented above. That is, examples ‘a’ and ‘b’ both present with only one verb, se rió and se dice respectively, thus se can only appear in the preverbal position. The forms from ‘c’ and ‘d’ are both in enclitic position – that is, are attached to their host (venirse and dándose) – but the lack of a finite verb form renders these positions fixed, and they are therefore rejected.
The two sentences under example 4 appear to meet the criteria (finite and nonfinite verbs in both), yet these particular classes of finite verbs do not permit clitic climbing, meaning they must appear in the enclitic position. In other words, gustar ‘to like’ and other similar verbs – that is, verbs that appear with an indirect object pronoun, as well as the fixed expression hay que ‘one must’ – categorically produce an enclitic se. Thus, contexts such as those fall outside of the envelope of variation.
Lastly, we exclude the variable when it presents with any other clitic, be they proclitic or enclitic, as observed in example 5. The cause for this exclusion is that the proclitic position of two clitics was used solely one time by the entire G1 cohort. The G2 participants produced slightly more in this position, but enclitics were still markedly dominant. The number of tokens for this particular context was inadequate for sound quantitative analysis and was therefore excluded from the present study.
Collection process and identification of variables
The first step for collecting the item entailed meticulously perusing each of the 50 transcriptions, considering every instance of se and extracting all contexts that fell within the envelope of variation, described above. G1 produced a total of 248 tokens; G2 produced 172. We acknowledge the somewhat small number of tokens, but given the relatively infrequent occurrence of the constructions that permit variable placement of the form, a data set of 420 tokens is sufficient for our goal here. Next, to facilitate the tally and reference of the extracted items, each item was arranged and grouped together for each individual participant across the two generations by its proclitic or enclitic position; that is, on whether it was proclitic or enclitic. Finally, based on previous studies (e.g., Limerick, 2018; Shin et al., 2017) and our own observation of syntactic and grammatical patterns, each and every token was categorized and quantified according to the following internal variables:
nonfinite verb type (infinitive or gerund)
finite verb (deber, ir a, tener que, estar, poder)
tense of finite verb (present or past)
grammatical person (singular or plural)
use of se (reflexive or impersonal/passive)
grammatical mood of finite verb (indicative or subjunctive)
negation (negated or non-negated).
Concerning external variables, we designated the following for quantitative analyses in the present study:
national origin (countries listed above)
region (Caribbean and Mainland)
areal origins (low lands or interior/high lands)
sex (male or female)
age (13–39, 40–60+)
years in US (0–2 years, 3–15 years for G1; 16+ years, native for G2)
socioeconomic class (high, middle, working)
education (elementary, secondary, college, graduate)
English skill (poor, passable, good/excellent)
Spanish skill (good, excellent)
general Spanish use (none, low, middle, high).
We will flesh out the particulars of these external variables in more detail as we analyze the findings, in the following section.
Finally, the data underwent quantitative assessment by way of bivariate tests in order to determine what internal and external variables constrain placement of the dependent variable.
Findings
We begin our presentation of the findings by addressing the first research question: in syntactic environments that permit variation, does placement of Spanish se differ between the two generations?
Table 1 displays the numerical and percentile distribution of the dependent variable by each generation. The chi-square results, X² (1, n = 420) = 5.8, p = .016, indicate that generation has a statistically significant effect on position as per a p-value less than the standard p < .05. As such, we answer research question 1 in the affirmative: our quantitative analysis reveals that G2 used the proclitic position significantly more than did G1. We can now proceed with further testing of the individual generational cohorts because this finding suggests that two distinct generational groups exist in the sample based on their placement of the item.
Variable position of se by generation.
p < .05.
Before we continue, however, let us consider how these numbers compare to previous studies looking at clitic placement. Investigating this topic amongst US Spanish speakers, Limerick (2018) found very similar rates: US-born with 74% proclitic and the monolingual group with 69%. Compare this to our G2 78% and G1 68%, respectively – nearly identical. Further, Silva-Corvalán and Gutiérrez (1995), studying this syntactic phenomenon over two decades ago in Los Angeles, California, observed higher proclitic rates by their US-born participants than those who arrived after 11 years old. On the other hand, Gutiérrez (2008) found no significant effect on placement across three generations of Spanish speakers in Houston, Texas, with proclitics occurring at 70%, 72%, and 72%, respectively. Other hallmark studies centered on monolingual varieties of Spanish reflect similar patterns, with a heavy skewing toward the proclitic position: Gudmestad (2006) with 89%; Schwenter and Torres Cacoullos (2014) with 73%; and Davies (1995) with 66%. Indeed, our findings here seems to corroborate previous studies, at least generally speaking since the given sample, methodology, treatment, and interpretation of the data vary greatly from one study to the other.
Returning now to the present study’s findings, the following two tables begin to address the last two research questions, presenting the results from the remaining bivariate treatments of the internal and external variables for both generations. Table 2 below shows G1 chi-square test results for all internal and external variables. The first seven rows are internal variables and the remaining 11 are the externals. Variables that reached statistical significance present with an asterisk and will be briefly discussed.
G1 bivariate test results.
p < .05, **p < .01, ***p < .001.
Firstly, ‘Nonfinite verb type’ (X² (1, n = 248) = 3.9, p = .047) included the factors infinitive and gerund and the distribution was the following: infinitive with proclitic 65%, with enclitic 35%; gerund with proclitic 82%, with enclitic 18%. We can conclude, therefore, that the proclitic position of se is more likely to occur with a gerund than it is with an infinitive, which corroborates both Gutiérrez (2008) and Limerick (2018) whose studies also found a significant effect here and very similar outputs, namely 88% and 89% proclitic with gerund, respectively. Next, the allocation of position within the internal variable ‘Use of se’ (X² (1, n = 248) = 33.3, p = .000) was as follows: reflexive with proclitic 53%, with enclitic 47%; impersonal/passive with proclitic 88%, with enclitic 11%. As such, an impersonal/passive use has a significantly higher possibility of producing the item in the proclitic position than does reflexive use. The last internal variable to reach significance was ‘Finite verb’ (X² (5, n = 248) = 33.6, p = .000), which distributed thus: deber 54% proclitic, 46% enclitic; ir a 84% proclitic, 16% enclitic; tener que 30% proclitic, 70% enclitic; estar proclitic 78%, enclitic 22%; poder proclitic 83%, enclitic 17%. These percentages and the test results show that the proclitic position correlates with ir a, estar, and poder. On the other hand, tener que produces more instances of enclitics and deber uses both positions nearly equally.
Regarding G1 external variables, ‘Years in US’ (X² (1, n = 248) = 7.4, p = .007) and ‘English skill’ (X² (3, n = 248) = 10.7, p = .013) significantly influenced the position of se as follows: 0–2 years in the US 61% proclitic, 39% enclitic and 3–15 years in the US 77% proclitic, 23% enclitic; then for English skill, poor 68% proclitic, 32% enclitic; passable 57% proclitic, 43% enclitic; good/excellent 80% proclitic, 20% enclitic. According to these findings, it would appear that increased exposure to English results in an increase in use of the proclitic position. That is, participants residing longer in the US showed a significantly higher proclitic output than those living there under two years. Similarly, participants who self-assessed as having good and excellent English skills used proclitic se significantly more than did those with passable or poor English skills.
Moving now to G2, Table 3 below details the test results stemming from the same variables as those presented above in Table 2 with G1.
G2 bivariate test results.
p < .05, **p < .001.
A noteworthy generational comparison is that G2 se placement is conditioned by fewer variables in both the internal and external categories than that of G1. Further, the two that reached significance here were also found to be significant for G1. The internal variable ‘Use of se’ (X² (1, n = 248) = 40.5, p = .000) for G2 distributed as follows: reflexive with proclitic 58%, with enclitic 42%; impersonal/passive with proclitic 98%, with enclitic 2%. We note the near categorical use of proclitic se for impersonal/passive structures, an increase of 11 percentage points compared to G1 at 88%. With regard to proclisis as it pertains to reflexives, an interesting contrast emerges here when compared to previous studies. Specifically, both Davies (1995) and Gutiérrez (2008) encountered a higher rate of enclisis with reflexives, whereas we did not (G1 53% proclisis; G2 58%). This could be a result of the narrower scope of the present study, that is, considering se and not all other deictic clitics (me, te, nos). Or perhaps the fact that our sample includes Spanish speakers from six distinct national origins had an impact on the distribution of placement.
The one external that tested at significance for G2 was ‘English skill’ (X² (2, n = 248) = 6.2, p = .045), thusly: good 62% proclitic, 39% enclitic; excellent 80% proclitic, 20% enclitic. Firstly, because G2 participants grew up in the US, all self-assessed their English skills as either good or excellent – this is why poor and passable do not appear here like they did above for G1. Nonetheless, we observe a similar pattern here with G2 as we did with G1; that is, the stronger the English skill, the higher the proclitic position was used. Moreover, it is worth noting that the ‘excellent’ G2 subgroup produced the same rate of proclitics as did the ‘good/excellent’ G1 subgroup – to wit, 80%.
Conclusion
This investigation centered on proclitic and enclitic position of Spanish se across two generations of Spanish speakers in NYC. We presented three research questions that would guide the study, after which we detailed the corpus, the participants, the data, and the envelope of variation. Generation was the first variable tested in order to determine whether or not it significantly conditioned position, which it did, subsequently answering the first research question in the affirmative. A series of bivariate tests were then realized in order to identify further possible effects and address the remaining two research questions, which were as follows:
2. From the internal variables identified for this study, which ones have a statistically significant effect on placement?
3. From the external variables identified for this study, which ones have a statistically significant effect on placement?
For the G1 cohort, three internal (nonfinite verb type, use of se, finite verb) and two external (years in US, English skill) variables reached statistical significance. For G2, however, only two variables were found significant, one internal (use of se), the other external (English skill).
What does this all mean? Firstly, it is important to reiterate that proclitic position was at the center of the data interpretation because it was the position with the most tokens. That said, the primary takeaway from this study is that G2 uses proclitic se significantly more frequently than does G1. From there, the conditioning variables that emerged from the bivariate tests help formulate an idea of the factors involved in the heightened use of the proclitic position over the enclitic one in each respective generation’s grammar. We already detailed the significant variables’ distribution above, under ‘Findings’, and we will therefore refrain from unnecessary repetition here. We will, however, make mention again of the significant role that English seems to have on the proclitic position since ‘English skill’ reached statistical significance for both groups. Indeed, as we pointed to earlier, the better the participants conceived their level of English, irrespective of generation, the higher the frequency of the proclitic position was produced. This finding is particularly curious given the marked differences in clitic placement between Spanish and English. That is, because modern English places its clitics post verbally (e.g., They want to see
Regarding limitations of the present study, the testing models utilized and the subsequent analyses of the results were based on the framework in which the paper is couched; that is, a research note, not a full-length article. In other words, the findings, meant to begin to formulate an idea of what might be taking place in the grammar of Spanish speakers in NYC. More thorough and sophisticated analyses of the data, the type a bigger project could tackle, would almost certainly result in a deeper comprehension of the underlying factors involved in the clitic placement tendencies across these generations. To wit, a larger data set would permit a logistic regression in order to better understand the relationship between all variables en masse. Moreover, we acknowledge the limitation of focusing exclusively on se. A study in which all clitics were assessed very well could shed further light on this linguistic phenomenon. We leave these tasks for future research.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
