Abstract
The internet promised to counter cultural homogeneity by enabling niche content to find a larger audience, but attention online often resembles offline patterns in which generic cultural forms attract the bulk of attention while a long tail of niche forms remains under resourced. I argue that understanding the internet’s potential to sustain cultural diversity requires moving beyond studying audience size to consider the mode and location of attention capture. Leveraging longitudinal data from Reddit, I find that communities that shift toward niche content find success not by attracting larger audiences, but instead by capturing deeper engagement in high-traffic hubs shared with the platform’s largest, most visible communities. This is in contrast to offline settings, where niche forms survive only in the periphery. These findings suggest that the internet offers a counterforce to cultural homogeneity in which niche content becomes more accessible and visible within the centers of digital attention.
Introduction
Research in the late 20th and early 21st centuries portrayed the internet as a great attention equalizer—a force that would reduce cultural homogeneity by enabling niche cultural forms to more readily capture attention (Anderson, 2007; DiMaggio et al., 2001; Neuman, 1991). By lowering search, storage, and distribution costs, digital markets were expected to support a “long tail” in which a wider range of specialized cultural forms could thrive (Brynjolfsson et al., 2011). Yet whether this promise has been realized remains an open question.
In pre-internet society, the audience for any particular cultural form—manifestations of culture such as products, genres, or communities (Griswold, 1986)—surfaced through local social networks and the structure of prevailing cultural markets (Fischer, 1975; Peterson and Berger, 1975). Within offline cultural markets, an imbalance emerged between generic and niche forms in the market for attention—evident in both the audience size of these respective forms and their locations in the overall ecology of attention. More specifically, generic forms—those appealing to widely held or mass-market tastes—attracted the bulk of attention and occupied central locations in the attention ecology (Anderson, 2007; Brynjolfsson et al., 2011), while niche forms—those appealing to more specialized tastes—attracted only small, and often separate, audiences in the peripheries of the ecology (Carroll, 1985).
Early studies of online markets appeared to challenge this pattern, finding evidence of increased demand for niche products (Anderson, 2007; Bhattacharjee et al., 2007; Brynjolfsson et al., 2011). Other research since the beginning of the 21st century has instead found evidence of the reproduction of “superstar” or “rich-get-richer” effects online (Elberse and Oberholzer-Gee, 2006; Fleder and Hosanagar, 2009; Tan et al., 2017). Explanations for the emergence of familiar patterns include the fact that attention is still a limited resource online (Goldhaber, 1997), and that individuals rely on readily available information about the choices of their peers when making their own decisions (Salganik et al., 2006).
I argue that existing work is limited because it defines success in attention capture solely in terms of audience size (e.g. Brynjolfsson et al., 2011). A more complete account requires recognizing that attention can be captured in multiple modes (Webster, 2014) and within particular locations in an ecology of cultural forms (e.g. Carroll, 1985). Online environments facilitate a shift from discrete to divisible attention, allowing users to distribute their attention across multiple cultural forms more easily (TeBlunthuis and Hill, 2022; Waller and Anderson, 2019). As a result, cultural forms can share user resources and potentially succeed through different patterns of attention capture. Because of this, and the fact that niche forms primarily appeal to more limited audiences, differences in the success of niche forms online are more likely to appear in how and where they capture attention rather than in the size of their audience.
In this article, I broaden the definition of success in attention capture to encompass multiple modes and locations of attention capture. I conceptualize attention capture along two modes—breadth, reflecting audience size, and depth, reflecting the intensity of engagement—and situate these modes within an ecology of interrelated forms and users (Hannan and Freeman, 1989; McPherson, 1983). I ask how—in what modes—and where—within which structural locations—cultural forms capture attention. These online cultural forms vary considerably in how much they resemble traditional communities versus more transient interactional vessels (TeBlunthuis and Hill, 2022), but regardless of where any particular space falls on this spectrum, people spend time there—and that accumulated time constitutes attention being captured.
In theorizing about attention capture, I treat generic and niche cultural forms as ideal types—opposite ends of a continuous spectrum. Empirically, however—particularly in online environments—cultural forms are dynamic, shifting over time in both content and membership. To capture this dynamism, I leverage fixed-effects models to investigate how within-community shifts in cultural content and structural location are associated with success in capturing a greater breadth or depth of attention.
Drawing on data from Reddit, a platform organized around thousands of user-created communities, I find that in the most central locations of the ecology, shifts toward generic content are associated with faster subsequent growth in audience breadth, while shifts toward niche content are associated with faster subsequent growth in depth of attention. These findings suggest that, in online environments, niche cultural forms can succeed not by attracting larger audiences, but by capturing attention more deeply in the most central, competitive locations of the attention ecology.
In understanding the potential impact of this, consider an example from Reddit. Large, general-interest communities such as r/movies serve as central hubs for mainstream cultural discussion. More specialized communities like r/TrueFilm, which focuses on in-depth discussion of film, often share substantial user overlap with these hubs. This structural proximity allows niche content to circulate within central regions of the platform, even without attracting large standalone audiences.
This path to success represents a new perspective on the long tail debate: the internet offers a counterforce to cultural homogeneity through which, rather than garnering larger audiences, niche forms are more readily accessible and visible, reflecting greater discoverability of diverse cultural content.
Theory
Cultural homogeneity in the pre-internet era
Up until the introduction of mass communication and mass-media technologies, culture emerged primarily through local social networks defined by physical co-location (Durkheim, 1984). Following Geertz (1973) and Sewell (1999), culture can be understood as “webs of significance” sustained by shared symbols and semantic practices—the distinctive vocabularies, references, and meanings through which particular groups make sense of the world. Cultural forms are tangible manifestations of such webs. In the pre-internet era, the emergence of particular cultural forms, and their ability to find an audience, depended on the available density and diversity of people in the local area (Fischer, 1975).
Societal and technological shifts throughout the 20th century in the United States contributed to decreasing diversity and density in local social networks, reducing the resource pool for cultural forms to draw upon, and thus fueling the rise of homogeneous culture. Specifically, post-World War II, the proliferation of long-term, low-interest housing loans through the Federal Housing Administration (FHA), combined with new techniques for mass-producing homes and construction guidelines that favored sparsely populated areas, encouraged a massive movement of individuals from densely populated urban areas to new homes built in the suburbs (Baldassare, 1992; Nicolaides and Wiese, 2017). The interstate highway system enabled suburban sprawl to extend even further away from urban cores (Hawley, 1971). Suburbs were not only less dense than their urban predecessors, but also less diverse due to FHA underwriting guidelines requiring racial segregation until the early 1950s (Nicolaides and Wiese, 2017).
The explosion of mass-media in the 20th century only further contributed to cultural homogenization at the national level (e.g. Macdonald, 1953; Wilensky, 1964). Increasingly concentrated economic markets, consisting of corporations making broad-based appeals to common-denominator tastes in an attempt to draw in newly available national audiences, led to a “massification” of tastes (Shils, 1962). Within many cultural industries, only generic, or mass, cultural products could survive broad marketing campaigns at the national level (Carroll, 1985; Peterson and Berger, 1975). And so, paradoxically, the more geographic mobility people gained through advancements in transportation, and the more that culture could travel across time and space through new media technologies, the more homogenization seemed to emerge.
In addition to the dominance of generic cultural forms in the capture of audience attention, cultural homogenization also represented a particular market dynamic. Ecological models characterize this dynamic by situating cultural forms in a resource space of individuals, with forms competing for the attention or consumer power of individuals as a scarce resource (e.g. Hannan and Freeman, 1989; McPherson, 1983). These models locate cultural forms in the center, representing the area of greatest resource abundance, or the periphery, representing areas of relative resource scarcity (Péli and Nooteboom, 1999).
Increasingly homogeneous local social networks and the spread of generic culture through mass media meant that generic culture, catering to common-denominator tastes, was heavily marketed and readily accessible. Finding niche cultural content or communities, on the other hand, necessitated extensive searching (McPherson, 2004). Under such conditions, individuals tend to restrict their activity to a cluster of related offerings, optimizing to find the communities or “products that best match their preferences” (Péli and Nooteboom, 1999: 1133). As a result, cultural forms compete for the attention of individuals as discrete resources pursuing limited tastes (Carroll, 1985; Wang et al., 2013), meaning in turn that generic and niche content rely on largely separate audiences, occupying different positions in the ecology.
More specifically, this meant a duality between particular cultural forms and particular locations in the ecology: generic forms (e.g. top 40 music) came to occupy the most prominent positions of the ecology: the centers consisting of many people holding popular tastes, and niche forms (e.g. death metal music) came to occupy the periphery consisting of a separate and smaller pool of people with more esoteric tastes (Carroll and Hannan, 1989; Carroll and Swaminathan, 2000). Stated differently, the carrying capacity to support cultural forms at any particular location in the resource space was constrained by the limited tastes of the resources in that location (Ruef, 2000).
The promise and the reality of the internet
A key promise of the internet was to provide a more level playing field for different kinds of content to emerge and find an audience (Dimaggio et al., 2001), offering an avenue for pushing back against cultural homogenization. Research in the early 21st century predicted that the emerging digital age would lead to a “long tail of the internet,” specifically meaning a shift away from uneven distributions of attention, in which a few generic offerings dominate, and toward a thicker, and thus more well-resourced, long tail of niche offerings (Anderson, 2007; Brynjolfsson et al., 2011). Much of the empirical work that initially examined this shift evaluated the context of online marketplaces, such as Amazon and eBay, with a number of studies finding support for the hypothesized “long tail” (Anderson, 2007; Brynjolfsson et al., 2011).
Several affordances of the internet fueled these predictions. On the supply side, the internet lowers costs of production, storage, and distribution—offering nearly limitless space for a variety of options to coexist. On the demand side, the internet eliminates geographic constraints, enabling individuals, including those historically disenfranchised, to find cultural forms matching their preferences (Brynjolfsson et al., 2011). Online, individuals can evaluate whether a particular community or piece of content matches their preferences with only a few clicks of a mouse and a few minutes of reading or watching. Similarly, online communities have fluid boundaries and low membership barriers, which enable individuals to easily become active community participants and quickly judge whether or not a particular community meets their needs (Faraj et al., 2011).
Despite these hopes and predictions, a large body of research paints a picture of attention capture on the internet resembling patterns in the offline world. Namely, there is evidence that attention is captured unevenly by cultural forms online, with mass-market content becoming broadly popular and attracting the bulk of attention (Elberse and Oberholzer-Gee, 2006; Newman, 2003; Taeuscher, 2019; Tan et al., 2017)—reminiscent of the dominance of generic culture in many offline settings (Carroll, 1985).
Explanations for these patterns revolve around the idea that attention remains a limited resource online, and that individuals must make choices in how to allocate their attention (Goldhaber, 1997). Competitive crowding and form differentiation reflect limits to attention online (Lin et al., 2017; Wang et al., 2013). Similarly, scale-free distributions of attention suggest that individuals reference the choices of others in deciding how to allocate attention. At the same time that an online context lowers search costs, it also enables more direct access to information on peer preferences, which can bolster social influence effects leading to herding behavior (Adamic and Huberman, 2000; Aral and Walker, 2014)—especially in settings where recommendation algorithms funnel users toward popular content (Fleder and Hosanagar, 2009).
A different path to success
A key limitation of existing research comparing attention capture between generic and niche cultural forms online is that success is typically defined in terms of audience size alone (e.g. Brynjolfsson et al., 2011; Taeuscher, 2019). Such an approach is limited because it overlooks other dimensions of attention capture which can reflect potential alternate paths to success.
The first dimension concerns the mode of attention capture. Cultural forms can attract attention not only through breadth—by reaching many users—but also through depth—by eliciting more intensive engagement (Webster, 2014). The second dimension concerns the location of attention capture. Drawing on organizational ecology (e.g. Hannan and Freeman, 1989; McPherson, 1983), cultural forms can be situated within a structural resource space that characterizes the local attention environment in which they operate.
I argue that success in attention capture for niche forms online is likely reflected in these alternate dimensions, rather than in the size of the audience. This is because, first, niche forms lack the broad appeal to gain large audiences, and second, attention is a more easily divisible resource online (TeBlunthuis and Hill, 2022; Waller and Anderson, 2019), opening up possibilities for cultural forms to attract attention in different ways and in different locations relative to offline contexts. Understanding where cultural forms land in this resource space requires an ecological perspective: a community’s structural location reflects the configuration of relationships between a community and all other communities in the ecology—a property that emerges from individual choices aggregated across many spaces but which cannot be derived from the preferences or behaviors of any individual user or community alone.
As discussed above, in offline settings, where attention is difficult to divide, cultural forms and structural locations tend to stabilize in patterned ways: generic forms are more likely to be found in central locations, capturing large audiences, while niche forms tend to occupy peripheral locations, serving smaller audiences with more specialized tastes (Hannan and Freeman, 1989). In this sense, structure and culture are mutually constitutive, forming what prior work describes as a duality between cultural form and structural location (Breiger, 1974; Mohr, 1998). This offline duality—characterized by a one to one pairing between structural locations and cultural forms—stabilizes through a zero-sum constraint grounded in the limited carrying capacity of attention, which is relatively discrete and non-divisible.
Online, however, because attention is more divisible, the carrying capacity increases. Under these conditions, a single location can potentially sustain multiple cultural forms. This does not imply a breakdown of the structure–culture duality; rather, it implies that the specific mappings between cultural forms and structural locations can depend on the carrying capacity of attention.
The difference in carrying capacity online opens up an alternate path to success for niche content: rather than capturing a larger audience, it is possible that niche forms can occupy the ecology’s most central, competitive hubs by capturing deep engagement from a resource pool shared with the platform’s largest, most visible communities.
To investigate this possibility, I ask how—in what modes—and where—within which structural locations—cultural forms (here subreddit communities) capture attention on Reddit.
Empirically, I proceed in two steps. First, I estimate fixed-effects models that examine how within-community shifts in cultural content and structural location are associated with changes in attention capture. Attention capture is operationalized along two modes: growth in the number of users (“size,” capturing breadth) and growth in the average number of comments per user (“engagement,” capturing depth). Cultural content is measured by the distinctiveness of a community’s language on a continuum from generic to niche, while structural location is measured by weighted user overlap with other communities, capturing the extent to which a community shares users with large, central hubs in the ecology.
Second, I synthesize these results into an ecological interpretation, mapping how the interaction between shifts in cultural profile and structural location relates to the modes through which communities capture attention. Together, these analyses illuminate how niche cultural forms can achieve success online—not by growing large audiences, but by capturing attention more deeply within the most central locations of the attention ecology.
Methods
Data
Data for this study come from Reddit.com. Reddit is an online forum comprised of numerous topic-specific subforums, known as subreddits, which are denoted with an “r,” a forward slash, and then the subreddit title. Reddit is an ideal site for answering the proposed research question because it contains thousands of subreddits, ranging from those targeting more generic content (e.g. r/Sports and r/movies) to those targeting more niche content (e.g. r/Curling and r/Mumblecore).
I limited the sample to active subreddits averaging at least 100 comments per month, excluding single-month, non-English, and bot-dominated subreddits. At the comment and user levels, I removed deleted content and bot accounts, and limited users to those with at least 10 lifetime comments. Additional sampling details are in the S1 Appendix. The final sample includes more than 2.26 billion comments, 13,757 unique subreddits, and more than 7.3 million unique users across 437,837 subreddit-month observations.
To construct measures for the analyses, I aggregate from individual user behavior to create subreddit-level data. The main models are then run at the level of the subreddit. Finally, from these models, I draw conclusions about the overall ecology of communities on Reddit. This bottoms-up analytical process avoids ecological fallacy, by directly aggregating from individual action, to attention capture at the community-level, and finally to a macro-level ecological view of how and where different kinds of communities capture attention. It is important to note that these aggregate properties may reflect compositional effects rather than uniform behavioral changes. For example, a niche shift may trigger the selective attrition of less engaged users, thereby concentrating the average engagement of the remaining members.
In line with this, while the data are constructed through the aggregation of individual user actions, the theoretical inferences operate at the community level. I avoid attributing community level properties to specific kinds of individual behavior, instead treating measures as the aggregate properties of each community. As one example of this, I do not attribute a community’s location in a structurally crowded resource space to the diverse taste patterns of its users—rather, I treat this location as a community-level property reflecting the fact that the community shares many of its users with other communities. Similarly, I do not treat community-level outcomes as the result of intentional community strategy—shifts in cultural content and structural location are emergent properties of individual participation patterns, not evidence of coordinated action by the community as a corporate actor.
Measures
Dependent variables
The two dependent variables measure two different ways that subreddits can capture attention. Individual subreddits and the overall ecology are constantly shifting, and measuring rates of change captures these dynamics more effectively than static measures. The first measures growth in the number of users—a mode of capturing greater breadth of attention. Following organizational studies of growth (e.g. Geroski, 2005), I measure this as the logged ratio of size in month t to size in month t-1:
Where i denotes the subreddit of interest and t denotes the month. The second dependent variable measures the rate of change in the average number of comments per user in a subreddit. I call this measure “engagement.” This measure of engagement captures a specific mode of success in which a subreddit draws a greater depth of user attention. It is not intended to reflect every possible form of engagement, but instead to serve as a behavioral proxy for intensive user interaction within a community. This measure complements the size growth rate, which captures the breadth of attention. Together, these two measures allow for a comparison of different modes of attention capture. I measure the growth rate of engagement in a similar way:
Measuring structure
The first key independent variable is structure. This variable is central to this analysis because it specifies the location of a subreddit in the overall resource space of users. To measure the structural location of subreddits, I use a weighted crowding measure. A standard structural crowding variable aggregates the resource overlaps between an entity and other entities in the ecology (e.g. Podolny et al., 1996; Wang et al., 2013)—here the user overlaps between a subreddit and other subreddits. In doing so, a crowding variable captures how connected a subreddit is with other subreddits on Reddit overall. Another way of stating this is that a structural crowding variable captures the amount of competition a subreddit faces for its users. A “crowded” location indicates high competition, whereas an uncrowded or “sparse” location indicates low competition. See Appendix S5 for example subreddits in different locations.
I followed previous studies in organizational ecology (e.g. Podolny et al., 1996) in measuring crowding as the sum of niche overlaps. Niche overlap represents the proportion of shared users between two subreddits. For example, suppose there are two subreddits i and j. If subreddit i has 100 users, subreddit j has 50, and 25 users are members of both i and j, then i’s niche overlap value is 0.25, and j’s overlap value is 0.5. The structural crowding variable is constructed by adding up the overlaps between subreddit i and all other subreddits j in J that are not i.
To turn this standard crowding variable into a measure of position in the user space of Reddit, I weight each user overlap by the size of the alter subreddit. This means the crowding variable captures not only how crowded a subreddit’s location is, but also how far it is from the largest subreddits on Reddit overall—subreddits with a high structure score are closely connected with the most prominent communities on the platform, while those with a low score are relatively isolated. Because this measure aggregates overlaps across the entire ecology, structural crowding captures a property of a community’s position relative to all other communities in the population—not a property of the community’s internal composition alone, but of how that composition relates to the broader ecology as a whole. A highly crowded state (2SD above the mean) represents a community deeply integrated with the platform’s largest hubs; an isolated state (2SD below the mean) represents minimal overlap with central communities. Overall, the structural crowding measure, which I refer to as structure for brevity, represents the sum of weighted niche overlaps for focal subreddit i with all other subreddits j at time t:
Measuring culture in two ways
The second key independent variable is culture. These measures capture what kind of content a subreddit produces and thus what kind of taste it serves. Specifically, these measures capture the distinctiveness of content relative to the content of all other subreddits. This enables measurement of whether a subreddit serves a relatively generic or niche taste. A high culture score indicates a niche taste, whereas a low culture score indicates a generic taste. I refer to these measures as measures of culture for the sake of brevity. The two culture measures are not intended to capture complementary dimensions of culture, but instead serve as robustness checks. Both aim to measure the same generic–niche dimension, but do so through different computational approaches, allowing assessment of whether the results hold across alternative operationalizations of subreddit culture. See Appendix S5 for relevant examples of generic and niche subreddits.
I measure culture through the lens of language used in subreddits. Before calculating the measures, I conducted standard text-preprocessing steps on comment text data, including correcting spelling errors, removing hyperlinks, and lowercasing all words. I also ran an algorithm that detects common bigrams. For both measures, I included only words that appear at least 5 times in the entire Reddit comment data for each month, to limit the inclusion of junk words.
The first measure of culture is relatively simple, yet powerful. I follow Zhang et al. (2017) in measuring a subreddit’s culture in terms of the distinctiveness of words appearing in that subreddit. Specifically, I measure the distinctiveness to subreddit i of each word w that appears in the subreddit, as the pointwise mutual information (PMI) between w and its subreddit context i relative to all subreddits I:
Where
I also measure culture using word embeddings, which capture semantic similarity based on contextual co-occurrence rather than raw word frequencies (Mikolov et al., 2013). Word embeddings can capture similarities not only of words that co-occur with one another, but also that co-occur with shared context words. I trained a word embedding model on each month of text data to produce 300-dimensional word vectors (results robust with 100-dimensional vectors; see S3 Appendix for details). I then created comment-level vectors by averaging the word vectors within each comment, and subreddit-level vectors by averaging the comment vectors within each subreddit in a given month.
Using these subreddit embeddings, I then calculated similarity scores. Within text embedding spaces, cosine similarity is the preferred method of measuring similarity because it captures vector direction but not magnitude, and is thus independent of document length. To generate a culture measure for each subreddit, I measured the cosine similarity between that subreddit’s vector and every other subreddit’s vector in each month, and then calculated the average of these similarities(e.g. Burtch et al., 2022). This measure can be represented as follows:
Where
Controls
I include controls for community age, default subreddit status (a designation where new Reddit users were auto-subscribed to these communities), moderator activity (whether any comments were removed by moderators), and average comment length, in line with prior ecological studies of online communities (Wang et al., 2013). Descriptive statistics for all variables are reported in Table 1.
Descriptive statistics.
Analysis
For the main analyses, I ran panel models for the growth rates of size and engagement. I applied natural log transformations to all variables except age, moderator activity, and default status. Structure and culture variables were highly skewed, so I leveraged the lnskew0 command in Stata, which applies a log transformation on the variable plus or minus a constant chosen such that skewness is zero, and which allows log transformation of variables including zero values.
Some subreddits have gaps in activity, meaning the subreddit-months observed are a subset of all possible subreddit-months. This represents a potential selection problem—subreddits that become inactive may do so for reasons related to the outcomes of interest, which could bias parameter estimates. To address this, I ran a probit regression predicting the likelihood of a subreddit being active in a given month, transformed the predicted probability via inverse Mills ratio, and included it as a control in the main models (Heckman, 1979; Tortoriello et al., 2012). Including this control did not significantly change parameter estimates.
I specify the following model for the growth rates of both user size and engagement. Following Geroski (2005), I estimate log-difference growth models. This is a dynamic specification where the current value
Where
Results
Distribution of attention on Reddit
Before exploring the model results, Figure 1 plots the log number of subreddits against the log number of comments to illustrate the size distribution of subreddits in the sample. The linear trend in log–log space suggests a scale-free distribution, indicating that a small number of subreddits account for a disproportionately large volume of comments. The S5 Appendix gives more detail on the kinds of subreddits that attract varying amounts of attention.

Log–log plot showing the distribution of the number of subreddits by their comment counts. Each point represents how many subreddits have exactly x comments.
Main analysis step 1: Results from modeling two modes of attention capture
Results from size growth rate models
Table 2 reports fixed-effects models of subreddit user size growth. Because these models include subreddit fixed effects, coefficients capture how within-community shifts over time are associated with subsequent changes in growth, net of time-invariant community characteristics. Following the standard practice for log–log specifications, the reported coefficients are interpreted as elasticities—or more precisely, local elasticities given the lnskew0 transformations—where the coefficient represents the approximate percentage change in the subsequent rate associated with a 1% within-community shift in the predictor.
Models predicting growth of size.
Notes: Models were estimated using cluster-robust standard errors at the level of the subreddit.
p < 0.05, **p < 0.01, ***p < 0.001.
Control variables perform as expected. Community age is positively associated with subsequent user growth, consistent with the liability of newness, while increases in subreddit size are associated with slower subsequent growth, suggesting diminishing returns in user acquisition. Becoming a default subreddit is associated with an approximate 21.6% increase in subsequent monthly growth (p < 0.001), reflecting the large boost in visibility that default status confers.
Models 2 and 3 incorporate structural and cultural shifts. A 10% within-community increase in structural crowding is associated with an approximate 1.2% decrease in subsequent user growth (b = −0.119, p < 0.001)—consistent with competitive crowding theories in organizational ecology (Carroll and Hannan, 1989). Across both cultural measures, within-community shifts toward more niche content are associated with slower subsequent user growth. For example, in Model 2, a 10% within-community increase in niche linguistic distinctiveness (CulturePMI) is associated with an approximate 0.4% decrease in the subsequent monthly size growth rate (b = −0.040, p < 0.001).
Models 4 and 5 present two-way interaction terms between membership overlap (structure) and linguistic distinctiveness (culture). Marginal effects from these interactions are shown in Table 3. For CulturePMI, within-community shifts toward more niche content are associated with slower subsequent user growth across the structural spectrum, from isolated environments (2SD below structure mean; b = −0.0498, p < 0.001) to highly crowded ones (2SD above structure mean; b = −0.0327, p < 0.01). Substantively, these coefficients indicate that a 10% within-community increase in niche linguistic distinctiveness is associated with an approximate 0.5% decrease in the subsequent user growth rate in isolated environments, and an approximate 0.3% decrease in highly crowded ones. CultureW2V shows a more context-dependent pattern, with negative associations in structurally crowded environments (+2SD; b = −0.0400, p < 0.001, representing an approximate 0.4% decrease in subsequent growth for a 10% niche shift) and non-significant associations in isolated environments. Overall, within-community shifts toward more generic language are associated with faster subsequent user growth, with the most consistent associations observed in crowded locations.
Average marginal effects of culture on growth of size.
Notes: Table shows average marginal effects of culture measures across levels of structure.
p < 0.05, **p < 0.01, ***p < 0.001.
Results from Engagement Models—Table 4 reports results for models of subreddit engagement. Control variables largely mirror the size growth models, with community age positively associated with subsequent engagement growth. Two notable contrasts emerge. First, within-community increases in size are positively associated with subsequent engagement growth—the opposite of the size growth result—suggesting that larger communities can facilitate deeper participation. Second, transitioning to default status is associated with an approximate 9% decrease in subsequent engagement growth (p < 0.001), again the opposite of its effect on size growth.
Models predicting growth of engagement.
Notes: Models were estimated using cluster-robust standard errors, at the level of the subreddit.
p < 0.05, **p < 0.01, ***p < 0.001.
Models 2 and 3 show that within-community increases in structural crowding are associated with slower subsequent engagement growth: a 10% within-community increase in structural crowding is associated with an approximate 0.5% decrease in its subsequent engagement growth rate (b = −0.053, p < 0.001). In contrast to user size growth results, within-community shifts toward higher culture scores (i.e. more niche content) are associated with faster subsequent growth in engagement. For example, in Model 2, a 10% within-community increase in niche content (CulturePMI) is associated with an approximate 0.39% increase in subsequent engagement growth (b = 0.039, p < 0.001). Together, these results indicate a tradeoff: within communities, shifts toward more generic content are associated with faster subsequent user growth, while shifts toward more niche content are associated with faster subsequent growth in engagement.
Models 4 and 5 present interaction effects indicating how the association between niche shifts and subsequent engagement varies by structural state. In Model 4, the interaction coefficient between structural crowding and CulturePMI is positive and highly significant (b = 0.109, p < 0.001). This indicates that the association between niche linguistic shifts and engagement growth intensifies as a community becomes more structurally integrated into central hubs.
Table 5 translates this interaction into substantive marginal effects across levels of structural crowding. When a community is in an isolated state (2SD below structure mean), a 10% shift toward niche content is associated with an approximate 0.3% decrease in subsequent engagement growth (b = −0.0289, p < 0.01). However, when a community is in a highly crowded state (2SD above structure mean), the same 10% shift toward niche content is associated with an approximate 0.9% increase in subsequent engagement growth (b = 0.0936, p < 0.001). This indicates that the relationship between niche cultural shifts and participation depth is strongest specifically when a community shares a significant resource pool with the platform’s most visible communities.
Average marginal effects of culture on growth of engagement.
Notes: Table shows average marginal effects of culture measures across levels of structure.
p < 0.05, **p < 0.01, ***p < 0.001.
Main analysis step 2: Ecological interpretation of model results
Figure 2 summarizes these results by plotting marginal associations between within-community cultural shifts and subsequent changes in user growth and engagement across levels of structural crowding. The x-axis represents structural crowding, and the y-axis represents the predicted association between within-community cultural shifts and subsequent changes in growth of user size or engagement.

Average marginal effects of culture at different levels of structure. The x-axis identifies the structural state of the community, ranging from Isolated (−2SD) to Crowded (+2SD). Size and engagement trajectories are derived from separate models (Tables 1 and 3) and plotted together to illustrate the breadth–depth tradeoff. Units are approximate elasticities; shaded areas represent 95% confidence intervals.
Positive values indicate that within-community shifts toward more niche cultural profiles are associated with increases in the outcome relative to a community’s baseline. Negative values indicate that within-community shifts toward more generic profiles are associated with increases.
The results show that in highly crowded locations—the most central and competitive areas of the ecology—both shifts toward generic and niche content predict subsequent success in attention capture, but through different modes. In these locations, within-community shifts toward more generic content are associated with faster subsequent user growth, while within-community shifts toward more niche content are associated with faster subsequent engagement growth. These results indicate that communities shifting toward niche content are not confined to peripheral ecological positions, but can capture attention in central locations through deeper engagement. Robustness checks leveraging different structural and engagement measures, reported in Appendices S6 and S7, confirm these results in crowded locations of the ecology.
To illustrate these trajectories, consider two subreddits that frequently occupy central ecological positions. r/Minecraft, a large and broadly accessible community about the popular video game, tends to shift toward more generic content, thereby experiencing faster subsequent user growth. In contrast, r/truezelda, which focuses on specialized discussion such as lore and narrative analysis within the Zelda franchise, often shifts toward more niche content and experiences faster subsequent growth in engagement rather than user count. These examples illustrate how different cultural shifts are associated with distinct modes of attention capture even within similarly crowded structural locations.
At lower levels of structural crowding, these tradeoffs are substantially weaker. For CulturePMI, shifts toward more generic content are associated with modest increases in both growth in users and engagement, while for CultureW2V, cultural shifts show no significant association with attention dynamics. In these more isolated regions of the ecology, cultural profile appears less consequential for how attention is captured. Together, these patterns suggest that sharp breadth–depth tradeoffs are a feature of crowded central regions, while sparse regions allow greater independence between cultural profile and mode of attention capture.
Discussion
In this article, I find evidence that niche cultural forms (here subreddits shifting toward niche content) can attract attention even in the most competitive and central locations of the ecology, rather than being relegated to the peripheries, reliant on small pockets of separate audiences with esoteric tastes as seen in many offline ecological studies (Carroll, 1985; McPherson, 1983, 2004). This represents a different path to success, and a different perspective on the long tail debate that reflects a counterbalance to cultural homogeneity through greater accessibility and visibility of niche cultural content in central hubs of attention.
This article also offers a contribution to the literature on structure and culture. First, I identify a key consistency in the relationship between structure and culture in the offline and online worlds. Just as in the offline world, cultural diversity online flourishes in locations that are densely interconnected (i.e. crowded), but not as much in sparse locations. Theory about the relationship between social crowding and cultural diversity goes back to Durkheim’s concept of “dynamic density” and extends to urban subcultures (Fischer, 1975) and market product diversity (Peterson and Berger, 1975). Social density, Durkheim argues, operates as a form of competition producing an adaptive response in the form of differentiation, in which different units (e.g. individuals, communities, or organizations and institutions) in the system focus on different activities (Bellah, 1959: 452). Social density also means greater resource availability to support this diversity.
At the same time, the ability of socially dense resource locations to support diversity relies on the carrying capacity of attention—meaning here the ability of attention to support cultural forms. In offline settings, the carrying capacity of even dense areas is often limited by the fact that attention is relatively discrete—because search and exploration is costly—such that these dense areas end up supporting only generic content (McPherson, 1983, 2004; Carroll, 1985). This competitive equilibrium shifts online, where cultural forms compete for attention that is more easily divisible—bolstering the carrying capacity, which makes a particular difference in highly resourced locations. In line with this, I find that these competitive locations on Reddit can support subreddits shifting toward both generic and niche content. There remains here a duality between structure and culture, but one that reflects a different carrying capacity online. Specifically, the greater carrying capacity of central locations in the ecology manifests in user overlaps between different cultural forms: linkages, bridges, interstitial tissue connecting different communities. It is important to qualify, however, that while these structural overlaps suggest the presence of ecological linkages, these findings establish the structural conditions under which cross-community exposure is likely to occur rather than directly demonstrating the behavioral mechanisms of diffusion itself. Nevertheless, such overlaps reflect greater visibility and accessibility of niche forms and can serve as open conduits facilitating further cultural exploration and discovery.
These findings also speak directly to the relationship between individual-level and ecological accounts of cultural diversity online. The argument that the internet affords individual-level specialization—that people can sort into spaces matching their preferences—is well-established and foundational here rather than a competing hypothesis. Individual choices, aggregated, produce community-level properties: such as the size, engagement, and linguistic distinctiveness of particular spaces. But the relationships between communities that emerge from this aggregation reflect a structural configuration that no individual-level account is designed to observe or explain. Structural crowding captures this configuration—how the pattern of individual choices across thousands of communities aggregate to a structural web in which some spaces are deeply integrated with the platform’s largest hubs while others are isolated. While communities thus do reflect the outcomes of people’s choices, the web of cross-cutting memberships that emerges in turn lays the groundwork for shaping the kinds of cultural content that others are exposed to—individual choice and ecological structure are successive levels of this social process.
These findings connect with work on audience fragmentation and polarization online (e.g. Dellaposta et al., 2015; Greve et al., 2022). Even if individuals seek out communities matching their own tastes, the structural overlaps documented here mean they will likely share spaces with members of other, potentially different groups—creating opportunities for exposure to diverse content in unexpected ways.
In addition, these findings have implications for platform design and governance. Specifically, many online communities employ algorithmic recommendations for content discovery (Fleder and Hosanagar, 2009)—the findings here suggest that encouraging individuals to branch out by recommending different types of content may help to further foster a more interlinked community, with follow-on effects of even more cultural exploration among members. Similarly, community moderation that seeks to maintain strict rules and topical boundaries within communities may limit the potential for cross-cutting discussion that introduces users to other related, yet distinct, communities. A key challenge is finding a balance between achieving a topical focus and enabling cross-cutting discussion that facilitates cultural discovery.
The contributions of this study are limited in several ways. First, this study is limited in measuring activity in online communities. Due to data availability, I was not able to capture more passive forms of engagement, such as views, clicks, and liking posts. While focusing on comments is powerful in that it centers the analysis on a relatively committed form of engagement—and thus shows where and how users allocate significant, rather than fleeting, attention—there may be different dynamics at play when looking at less active forms of content engagement. In addition, this article did not examine diffusion or influence processes—future work could examine how these factors relate to cultural exploration.
Second, this study was conducted solely on Reddit, and so some findings may not generalize to other online contexts. It may be that attention dynamics operate differently in other online settings where ranking algorithms play a more prominent role. In addition, Reddit’s topic-based community structure and pseudonymous user profiles differ from platforms organized around user pages or real identities—features that may relate to how cultural forms capture attention. Thus, while this article offers an initial step toward understanding attention capture in online communities, more research is needed to study other platforms.
Supplemental Material
sj-pdf-1-nms-10.1177_14614448261454335 – Supplemental material for The long tale of the long tail: The internet as counterforce to cultural homogeneity
Supplemental material, sj-pdf-1-nms-10.1177_14614448261454335 for The long tale of the long tail: The internet as counterforce to cultural homogeneity by James C Mellody in New Media & Society
Footnotes
Acknowledgements
This project would not have been possible without the continuous guidance, inspiration, and support of Ray Reagans. I am also deeply grateful to Ezra Zuckerman Sivan and Susan Silbey, whose insight and encouragement helped shape this work. Thank you also to Glenn Carroll, Charles Fine, Chris Rider, Toby Stuart, Brad Turner, Victoria Zhang, Eppa Rixey, Arrow Minster, Michael Mellody, and Roni Luo for their extremely helpful comments on earlier versions of this manuscript. This article also greatly benefited from feedback during presentations at the MIT Sloan Economic Sociology Working Group, as well as at the Annual Meeting of the Academy of Management. All mistakes are my own.
Ethical considerations
This research uses publicly available, aggregated, and anonymized data from Reddit. No personally identifiable information (PII) was collected or analyzed. According to the guidelines of the Committee on the Use of Humans as Experimental Subjects at the Massachusetts Institute of Technology, this study does not constitute human subjects research.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data were collected through the Pushshift (Baumgartner et al., 2020) project, in accordance with Reddit’s Terms and Conditions. The raw data analyzed in this study are not publicly available because the massive quantity of the data (~300 GB) makes it unsuitable for hosting in open data sites. However, the raw data may be available on reasonable request, in accordance with Reddit’s Terms and Conditions. Prepared and anonymized dataset will be made available at the Harvard Dataverse site.
Supplemental material
Supplemental material for this article is available online.
Author biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
